KNN¶
toyml.classification.knn.KNN
dataclass
¶
KNN(k: int, std_transform: bool = True, dataset_: list[list[float]] | None = None, labels_: list[Any] | None = None, standardizationer_: Standardizationer | None = None)
K-Nearest Neighbors classification algorithm implementation.
This class implements the K-Nearest Neighbors algorithm for classification tasks. It supports optional standardization of the input data.
Examples:
>>> dataset = [[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0]]
>>> labels = ["A", "A", "B", "B"]
>>> knn = KNN(k=3, std_transform=True).fit(dataset, labels)
>>> knn.predict([2.5, 3.5])
'A'
ATTRIBUTE | DESCRIPTION |
---|---|
k |
The number of nearest neighbors to consider for classification.
TYPE:
|
std_transform |
Whether to standardize the input data (default: True).
TYPE:
|
dataset_ |
The fitted dataset (standardized if std_transform is True). |
labels_ |
The labels corresponding to the fitted dataset. |
standardizationer_ |
The Standardizationer instance if std_transform is True.
TYPE:
|
References
- Li Hang
- Tan
- Zhou Zhihua
- Murphy
- Harrington
fit
¶
Fit the KNN model to the given dataset and labels.
PARAMETER | DESCRIPTION |
---|---|
dataset
|
The input dataset to fit the model to. |
labels
|
The labels corresponding to the input dataset. |
RETURNS | DESCRIPTION |
---|---|
KNN
|
The fitted KNN instance. |
Source code in toyml/classification/knn.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
predict
¶
Predict the label of the input data.
PARAMETER | DESCRIPTION |
---|---|
x
|
The input data to predict. |
RETURNS | DESCRIPTION |
---|---|
Any
|
The predicted label. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the model is not fitted yet. |
Source code in toyml/classification/knn.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
_calculate_distance
staticmethod
¶
Calculate the Euclidean distance between two points using a numerically stable method.
This implementation avoids overflow by using the two-pass algorithm.
Source code in toyml/classification/knn.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
toyml.classification.knn.Standardizationer
dataclass
¶
Standardizationer(_means: list[float] = list(), _stds: list[float] = list(), _dimension: int | None = None)
A class for standardizing numerical datasets.
Provides methods to fit a standardization model to a dataset, transform datasets using the fitted model, and perform both operations in a single step.
fit
¶
fit(dataset: list[list[float]]) -> Standardizationer
Fit the standardization model to the given dataset.
PARAMETER | DESCRIPTION |
---|---|
dataset
|
The input dataset to fit the model to. |
RETURNS | DESCRIPTION |
---|---|
Standardizationer
|
The fitted Standardizationer instance. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the dataset has inconsistent dimensions. |
Source code in toyml/classification/knn.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
|
transform
¶
Transform the given dataset using the fitted standardization model.
PARAMETER | DESCRIPTION |
---|---|
dataset
|
The input dataset to transform. |
RETURNS | DESCRIPTION |
---|---|
list[list[float]]
|
The standardized dataset. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the model has not been fitted yet. |
Source code in toyml/classification/knn.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
fit_transform
¶
Fit the standardization model to the dataset and transform it in one step.
PARAMETER | DESCRIPTION |
---|---|
dataset
|
The input dataset to fit and transform. |
RETURNS | DESCRIPTION |
---|---|
list[list[float]]
|
The standardized dataset. |
Source code in toyml/classification/knn.py
159 160 161 162 163 164 165 166 167 168 169 |
|
standardization
¶
Standardize the given numerical dataset.
The standardization is performed by subtracting the mean and dividing by the standard deviation for each feature. When the standard deviation is 0, all the values in the column are the same, here we set std to 1 to make every value in the column become 0 and avoid division by zero.
PARAMETER | DESCRIPTION |
---|---|
dataset
|
The input dataset to standardize. |
RETURNS | DESCRIPTION |
---|---|
list[list[float]]
|
The standardized dataset. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the model has not been fitted yet. |
Source code in toyml/classification/knn.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
_dataset_column_means
staticmethod
¶
Calculate vectors mean.
Source code in toyml/classification/knn.py
208 209 210 211 |
|
_dataset_column_stds
staticmethod
¶
Calculate vectors(every column) standard variance.
Source code in toyml/classification/knn.py
213 214 215 216 |
|