Kmeans¶
toyml.clustering.kmeans.Kmeans
dataclass
¶
Kmeans(k: int, max_iter: int = 500, tol: float = 1e-05, centroids_init_method: Literal['random', 'kmeans++'] = 'random', random_seed: Optional[int] = None, distance_metric: Literal['euclidean'] = 'euclidean', iter_: int = 0, clusters_: dict[int, list[int]] = dict(), centroids_: dict[int, list[float]] = dict(), labels_: list[int] = list())
K-means algorithm (with Kmeans++ initialization as option).
Examples:
>>> from toyml.clustering import Kmeans
>>> dataset = [[1.0, 2.0], [1.0, 4.0], [1.0, 0.0], [10.0, 2.0], [10.0, 4.0], [11.0, 0.0]]
>>> kmeans = Kmeans(k=2, random_seed=42).fit(dataset)
>>> kmeans.clusters_
{0: [3, 4, 5], 1: [0, 1, 2]}
>>> kmeans.centroids_
{0: [10.333333333333334, 2.0], 1: [1.0, 2.0]}
>>> kmeans.labels_
[1, 1, 1, 0, 0, 0]
>>> kmeans.predict([0, 1])
1
>>> kmeans.iter_
2
There is a fit_predict
method that can be used to fit and predict.
Examples:
>>> from toyml.clustering import Kmeans
>>> dataset = [[1, 0], [1, 1], [1, 2], [10, 0], [10, 1], [10, 2]]
>>> Kmeans(k=2, random_seed=42).fit_predict(dataset)
[1, 1, 1, 0, 0, 0]
References
- Zhou Zhihua
- Murphy
Note
Here we just implement the naive K-means algorithm.
See Also
- Bisecting K-means algorithm: toyml.clustering.bisect_kmeans
max_iter
class-attribute
instance-attribute
¶
max_iter: int = 500
The number of iterations the algorithm will run for if it does not converge before that.
centroids_init_method
class-attribute
instance-attribute
¶
centroids_init_method: Literal['random', 'kmeans++'] = 'random'
The method to initialize the centroids.
random_seed
class-attribute
instance-attribute
¶
The random seed used to initialize the centroids.
distance_metric
class-attribute
instance-attribute
¶
distance_metric: Literal['euclidean'] = 'euclidean'
The distance metric to use.(For now we only support euclidean).
clusters_
class-attribute
instance-attribute
¶
The clusters of the dataset.
centroids_
class-attribute
instance-attribute
¶
The centroids of the clusters.
labels_
class-attribute
instance-attribute
¶
The cluster labels of the dataset.
fit
¶
Fit the dataset with K-means algorithm.
PARAMETER | DESCRIPTION |
---|---|
dataset
|
the set of data points for clustering |
RETURNS | DESCRIPTION |
---|---|
'Kmeans'
|
self. |
Source code in toyml/clustering/kmeans.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
|
fit_predict
¶
Fit and predict the cluster label of the dataset.
PARAMETER | DESCRIPTION |
---|---|
dataset
|
the set of data points for clustering |
RETURNS | DESCRIPTION |
---|---|
list[int]
|
Cluster labels of the dataset samples. |
Source code in toyml/clustering/kmeans.py
102 103 104 105 106 107 108 109 110 111 112 |
|
predict
¶
Predict the label of the point.
PARAMETER | DESCRIPTION |
---|---|
point
|
The data point to predict. |
RETURNS | DESCRIPTION |
---|---|
int
|
The label of the point. |
Source code in toyml/clustering/kmeans.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
|