KNN¶

toyml.classification.knn.KNN `dataclass` ¶

KNN(k: int, std_transform: bool = True, dataset_: list[list[float]] | None = None, labels_: list[Any] | None = None, standardizationer_: Standardizationer | None = None)

K-Nearest Neighbors classification algorithm implementation.

This class implements the K-Nearest Neighbors algorithm for classification tasks. It supports optional standardization of the input data.

Examples:

>>> dataset = [[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0]]
>>> labels = ["A", "A", "B", "B"]
>>> knn = KNN(k=3, std_transform=True).fit(dataset, labels)
>>> knn.predict([2.5, 3.5])
'A'

ATTRIBUTE	DESCRIPTION
`k`	The number of nearest neighbors to consider for classification. TYPE: `int`
`std_transform`	Whether to standardize the input data (default: True). TYPE: `bool`
`dataset_`	The fitted dataset (standardized if std_transform is True). TYPE: `list[list[float]] \| None`
`labels_`	The labels corresponding to the fitted dataset. TYPE: `list[Any] \| None`
`standardizationer_`	The Standardizationer instance if std_transform is True. TYPE: `Standardizationer \| None`

References

Li Hang
Tan
Zhou Zhihua
Murphy
Harrington

fit ¶

fit(dataset: list[list[float]], labels: list[Any]) -> KNN

Fit the KNN model to the given dataset and labels.

PARAMETER	DESCRIPTION
`dataset`	The input dataset to fit the model to. TYPE: `list[list[float]]`
`labels`	The labels corresponding to the input dataset. TYPE: `list[Any]`

RETURNS	DESCRIPTION
`KNN`	The fitted KNN instance.

Source code in toyml/classification/knn.py

def fit(self, dataset: list[list[float]], labels: list[Any]) -> KNN:
    """Fit the KNN model to the given dataset and labels.

    Args:
        dataset: The input dataset to fit the model to.
        labels: The labels corresponding to the input dataset.

    Returns:
        The fitted KNN instance.
    """
    self.dataset_ = dataset
    self.labels_ = labels
    if self.std_transform:
        self.standardizationer_ = Standardizationer()
        self.dataset_ = self.standardizationer_.fit_transform(self.dataset_)
    return self

predict ¶

predict(x: list[float]) -> Any

Predict the label of the input data.

PARAMETER	DESCRIPTION
`x`	The input data to predict. TYPE: `list[float]`

RETURNS	DESCRIPTION
`Any`	The predicted label.

RAISES	DESCRIPTION
`ValueError`	If the model is not fitted yet.

Source code in toyml/classification/knn.py

def predict(self, x: list[float]) -> Any:  # noqa: ANN401
    """Predict the label of the input data.

    Args:
        x: The input data to predict.

    Returns:
        The predicted label.

    Raises:
        ValueError: If the model is not fitted yet.
    """
    if self.dataset_ is None or self.labels_ is None:
        msg = "The model is not fitted yet!"
        raise ValueError(msg)

    if self.std_transform:
        if self.standardizationer_ is None:
            msg = "Cannot find the standardization!"
            raise ValueError(msg)
        x = self.standardizationer_.transform([x])[0]
    distances = [self._calculate_distance(x, point) for point in self.dataset_]
    # get k-nearest neighbors' label
    k_nearest_labels = [
        label for _, label in sorted(zip(distances, self.labels_, strict=False), key=lambda x: x[0])
    ][:: self.k]
    label = Counter(k_nearest_labels).most_common(1)[0][0]
    return label

_calculate_distance `staticmethod` ¶

_calculate_distance(x: list[float], y: list[float]) -> float

Calculate the Euclidean distance between two points using a numerically stable method.

This implementation avoids overflow by using the two-pass algorithm.

Source code in toyml/classification/knn.py

@staticmethod
def _calculate_distance(x: list[float], y: list[float]) -> float:
    """Calculate the Euclidean distance between two points using a numerically stable method.

    This implementation avoids overflow by using the two-pass algorithm.
    """
    assert len(x) == len(y), f"{x} and {y} have different length!"

    # First pass: find the maximum absolute difference
    max_diff = max(abs(xi - yi) for xi, yi in zip(x, y, strict=False))

    if math.isclose(max_diff, 0, abs_tol=1e-9):
        return 0.0  # All elements are identical

    # Second pass: calculate the normalized sum of squares
    sum_squares = sum(((xi - yi) / max_diff) ** 2 for xi, yi in zip(x, y, strict=False))

    return max_diff * math.sqrt(sum_squares)

toyml.classification.knn.Standardizationer `dataclass` ¶

Standardizationer(_means: list[float] = list(), _stds: list[float] = list(), _dimension: int | None = None)

A class for standardizing numerical datasets.

Provides methods to fit a standardization model to a dataset, transform datasets using the fitted model, and perform both operations in a single step.

fit ¶

fit(dataset: list[list[float]]) -> Standardizationer

Fit the standardization model to the given dataset.

PARAMETER	DESCRIPTION
`dataset`	The input dataset to fit the model to. TYPE: `list[list[float]]`

RETURNS	DESCRIPTION
`Standardizationer`	The fitted Standardizationer instance.

RAISES	DESCRIPTION
`ValueError`	If the dataset has inconsistent dimensions.

Source code in toyml/classification/knn.py

def fit(self, dataset: list[list[float]]) -> Standardizationer:
    """Fit the standardization model to the given dataset.

    Args:
        dataset: The input dataset to fit the model to.

    Returns:
        The fitted Standardizationer instance.

    Raises:
        ValueError: If the dataset has inconsistent dimensions.
    """
    self._dimension = self._get_dataset_dimension(dataset)
    self._means = self._dataset_column_means(dataset)
    self._stds = self._dataset_column_stds(dataset)
    return self

transform ¶

transform(dataset: list[list[float]]) -> list[list[float]]

Transform the given dataset using the fitted standardization model.

PARAMETER	DESCRIPTION
`dataset`	The input dataset to transform. TYPE: `list[list[float]]`

RETURNS	DESCRIPTION
`list[list[float]]`	The standardized dataset.

RAISES	DESCRIPTION
`ValueError`	If the model has not been fitted yet.

Source code in toyml/classification/knn.py

def transform(self, dataset: list[list[float]]) -> list[list[float]]:
    """Transform the given dataset using the fitted standardization model.

    Args:
        dataset: The input dataset to transform.

    Returns:
        The standardized dataset.

    Raises:
        ValueError: If the model has not been fitted yet.
    """
    if self._dimension is None:
        msg = "The model is not fitted yet!"
        raise ValueError(msg)
    return self.standardization(dataset)

fit_transform ¶

fit_transform(dataset: list[list[float]]) -> list[list[float]]

Fit the standardization model to the dataset and transform it in one step.

PARAMETER	DESCRIPTION
`dataset`	The input dataset to fit and transform. TYPE: `list[list[float]]`

RETURNS	DESCRIPTION
`list[list[float]]`	The standardized dataset.

Source code in toyml/classification/knn.py

def fit_transform(self, dataset: list[list[float]]) -> list[list[float]]:
    """Fit the standardization model to the dataset and transform it in one step.

    Args:
        dataset: The input dataset to fit and transform.

    Returns:
        The standardized dataset.
    """
    self.fit(dataset)
    return self.transform(dataset)

standardization ¶

standardization(dataset: list[list[float]]) -> list[list[float]]

Standardize the given numerical dataset.

The standardization is performed by subtracting the mean and dividing by the standard deviation for each feature. When the standard deviation is 0, all the values in the column are the same, here we set std to 1 to make every value in the column become 0 and avoid division by zero.

PARAMETER	DESCRIPTION
`dataset`	The input dataset to standardize. TYPE: `list[list[float]]`

RETURNS	DESCRIPTION
`list[list[float]]`	The standardized dataset.

RAISES	DESCRIPTION
`ValueError`	If the model has not been fitted yet.

Source code in toyml/classification/knn.py

def standardization(self, dataset: list[list[float]]) -> list[list[float]]:
    """Standardize the given numerical dataset.

    The standardization is performed by subtracting the mean and dividing
    by the standard deviation for each feature.
    When the standard deviation is 0, all the values in the column are the same,
    here we set std to 1 to make every value in the column become 0 and avoid division by zero.

    Args:
        dataset: The input dataset to standardize.

    Returns:
        The standardized dataset.

    Raises:
        ValueError: If the model has not been fitted yet.
    """
    if self._dimension is None:
        msg = "The model is not fitted yet!"
        raise ValueError(msg)
    for j, column in enumerate(zip(*dataset, strict=False)):
        mean, std = self._means[j], self._stds[j]
        # ref: https://github.com/scikit-learn/scikit-learn/blob/7389dbac82d362f296dc2746f10e43ffa1615660/sklearn/preprocessing/data.py#L70
        if math.isclose(std, 0, abs_tol=1e-9):
            std = 1
        for i, value in enumerate(column):
            dataset[i][j] = (value - mean) / std
    return dataset

_dataset_column_means `staticmethod` ¶

_dataset_column_means(dataset: list[list[float]]) -> list[float]

Calculate vectors mean.

Source code in toyml/classification/knn.py

@staticmethod
def _dataset_column_means(dataset: list[list[float]]) -> list[float]:
    """Calculate vectors mean."""
    return [statistics.mean(column) for column in zip(*dataset, strict=False)]

_dataset_column_stds `staticmethod` ¶

_dataset_column_stds(dataset: list[list[float]]) -> list[float]

Calculate vectors(every column) standard variance.

Source code in toyml/classification/knn.py

@staticmethod
def _dataset_column_stds(dataset: list[list[float]]) -> list[float]:
    """Calculate vectors(every column) standard variance."""
    return [statistics.stdev(column) for column in zip(*dataset, strict=False)]

KNN¶

toyml.classification.knn.KNN dataclass ¶

fit ¶

predict ¶

_calculate_distance staticmethod ¶

toyml.classification.knn.Standardizationer dataclass ¶

fit ¶

transform ¶

fit_transform ¶

standardization ¶

_dataset_column_means staticmethod ¶

_dataset_column_stds staticmethod ¶

toyml.classification.knn.KNN `dataclass` ¶

_calculate_distance `staticmethod` ¶

toyml.classification.knn.Standardizationer `dataclass` ¶

_dataset_column_means `staticmethod` ¶

_dataset_column_stds `staticmethod` ¶