Skip to content

KNN

toyml.classification.knn.KNN dataclass

KNN(k: int, std_transform: bool = True, dataset_: Optional[list[list[float]]] = None, labels_: Optional[list[Any]] = None, standardizationer_: Optional[Standardizationer] = None)

K-Nearest Neighbors classification algorithm implementation.

This class implements the K-Nearest Neighbors algorithm for classification tasks. It supports optional standardization of the input data.

Examples:

>>> dataset = [[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0]]
>>> labels = ['A', 'A', 'B', 'B']
>>> knn = KNN(k=3, std_transform=True).fit(dataset, labels)
>>> knn.predict([2.5, 3.5])
'A'
ATTRIBUTE DESCRIPTION
k

The number of nearest neighbors to consider for classification.

TYPE: int

std_transform

Whether to standardize the input data (default: True).

TYPE: bool

dataset_

The fitted dataset (standardized if std_transform is True).

TYPE: Optional[list[list[float]]]

labels_

The labels corresponding to the fitted dataset.

TYPE: Optional[list[Any]]

standardizationer_

The Standardizationer instance if std_transform is True.

TYPE: Optional[Standardizationer]

References
  1. Li Hang
  2. Tan
  3. Zhou Zhihua
  4. Murphy
  5. Harrington

fit

fit(dataset: list[list[float]], labels: list[Any]) -> KNN

Fit the KNN model to the given dataset and labels.

PARAMETER DESCRIPTION
dataset

The input dataset to fit the model to.

TYPE: list[list[float]]

labels

The labels corresponding to the input dataset.

TYPE: list[Any]

RETURNS DESCRIPTION
KNN

The fitted KNN instance.

Source code in toyml/classification/knn.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
def fit(self, dataset: list[list[float]], labels: list[Any]) -> KNN:
    """
    Fit the KNN model to the given dataset and labels.

    Args:
        dataset: The input dataset to fit the model to.
        labels: The labels corresponding to the input dataset.

    Returns:
        The fitted KNN instance.
    """
    self.dataset_ = dataset
    self.labels_ = labels
    if self.std_transform:
        self.standardizationer_ = Standardizationer()
        self.dataset_ = self.standardizationer_.fit_transform(self.dataset_)
    return self

predict

predict(x: list[float]) -> Any

Predict the label of the input data.

PARAMETER DESCRIPTION
x

The input data to predict.

TYPE: list[float]

RETURNS DESCRIPTION
Any

The predicted label.

RAISES DESCRIPTION
ValueError

If the model is not fitted yet.

Source code in toyml/classification/knn.py
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
def predict(self, x: list[float]) -> Any:
    """
    Predict the label of the input data.

    Args:
        x: The input data to predict.

    Returns:
        The predicted label.

    Raises:
        ValueError: If the model is not fitted yet.
    """
    if self.dataset_ is None or self.labels_ is None:
        raise ValueError("The model is not fitted yet!")

    if self.std_transform:
        if self.standardizationer_ is None:
            raise ValueError("Cannot find the standardization!")
        x = self.standardizationer_.transform([x])[0]
    distances = [self._calculate_distance(x, point) for point in self.dataset_]
    # get k-nearest neighbors' label
    k_nearest_labels = [label for _, label in sorted(zip(distances, self.labels_), key=lambda x: x[0])][:: self.k]
    label = Counter(k_nearest_labels).most_common(1)[0][0]
    return label

toyml.classification.knn.Standardizationer dataclass

Standardizationer(_means: list[float] = list(), _stds: list[float] = list(), _dimension: Optional[int] = None)

A class for standardizing numerical datasets.

Provides methods to fit a standardization model to a dataset, transform datasets using the fitted model, and perform both operations in a single step.

fit

fit(dataset: list[list[float]]) -> Standardizationer

Fit the standardization model to the given dataset.

PARAMETER DESCRIPTION
dataset

The input dataset to fit the model to.

TYPE: list[list[float]]

RETURNS DESCRIPTION
Standardizationer

The fitted Standardizationer instance.

RAISES DESCRIPTION
ValueError

If the dataset has inconsistent dimensions.

Source code in toyml/classification/knn.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
def fit(self, dataset: list[list[float]]) -> Standardizationer:
    """
    Fit the standardization model to the given dataset.

    Args:
        dataset: The input dataset to fit the model to.

    Returns:
        The fitted Standardizationer instance.

    Raises:
        ValueError: If the dataset has inconsistent dimensions.
    """
    self._dimension = self._get_dataset_dimension(dataset)
    self._means = self._dataset_column_means(dataset)
    self._stds = self._dataset_column_stds(dataset)
    return self

transform

transform(dataset: list[list[float]]) -> list[list[float]]

Transform the given dataset using the fitted standardization model.

PARAMETER DESCRIPTION
dataset

The input dataset to transform.

TYPE: list[list[float]]

RETURNS DESCRIPTION
list[list[float]]

The standardized dataset.

RAISES DESCRIPTION
ValueError

If the model has not been fitted yet.

Source code in toyml/classification/knn.py
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
def transform(self, dataset: list[list[float]]) -> list[list[float]]:
    """
    Transform the given dataset using the fitted standardization model.

    Args:
        dataset: The input dataset to transform.

    Returns:
        The standardized dataset.

    Raises:
        ValueError: If the model has not been fitted yet.
    """
    if self._dimension is None:
        raise ValueError("The model is not fitted yet!")
    return self.standardization(dataset)

fit_transform

fit_transform(dataset: list[list[float]]) -> list[list[float]]

Fit the standardization model to the dataset and transform it in one step.

PARAMETER DESCRIPTION
dataset

The input dataset to fit and transform.

TYPE: list[list[float]]

RETURNS DESCRIPTION
list[list[float]]

The standardized dataset.

Source code in toyml/classification/knn.py
162
163
164
165
166
167
168
169
170
171
172
173
def fit_transform(self, dataset: list[list[float]]) -> list[list[float]]:
    """
    Fit the standardization model to the dataset and transform it in one step.

    Args:
        dataset: The input dataset to fit and transform.

    Returns:
        The standardized dataset.
    """
    self.fit(dataset)
    return self.transform(dataset)

standardization

standardization(dataset: list[list[float]]) -> list[list[float]]

Standardize the given numerical dataset. The standardization is performed by subtracting the mean and dividing by the standard deviation for each feature. When the standard deviation is 0, all the values in the column are the same, here we set std to 1 to make every value in the column become 0 and avoid division by zero.

PARAMETER DESCRIPTION
dataset

The input dataset to standardize.

TYPE: list[list[float]]

RETURNS DESCRIPTION
list[list[float]]

The standardized dataset.

RAISES DESCRIPTION
ValueError

If the model has not been fitted yet.

Source code in toyml/classification/knn.py
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
def standardization(self, dataset: list[list[float]]) -> list[list[float]]:
    """
    Standardize the given numerical dataset.
    The standardization is performed by subtracting the mean and dividing by the standard deviation for each feature.
    When the standard deviation is 0, all the values in the column are the same,
    here we set std to 1 to make every value in the column become 0 and avoid division by zero.

    Args:
        dataset: The input dataset to standardize.

    Returns:
        The standardized dataset.

    Raises:
        ValueError: If the model has not been fitted yet.
    """
    if self._dimension is None:
        raise ValueError("The model is not fitted yet!")
    for j, column in enumerate(zip(*dataset)):
        mean, std = self._means[j], self._stds[j]
        # ref: https://github.com/scikit-learn/scikit-learn/blob/7389dbac82d362f296dc2746f10e43ffa1615660/sklearn/preprocessing/data.py#L70
        if math.isclose(std, 0, abs_tol=1e-9):
            std = 1
        for i, value in enumerate(column):
            dataset[i][j] = (value - mean) / std
    return dataset