DBSCAN¶
toyml.clustering.dbscan.DBSCAN
dataclass
¶
DBSCAN(eps: float = 0.5, min_samples: int = 5, clusters_: list[list[int]] = list(), core_objects_: set[int] = set(), noises_: list[int] = list(), labels_: list[int] = list())
DBSCAN algorithm
Examples:
>>> from toyml.clustering import DBSCAN
>>> dataset = [[1, 2], [2, 2], [2, 3], [8, 7], [8, 8], [25, 80]]
>>> dbscan = DBSCAN(eps=3, min_samples=2).fit(dataset)
>>> dbscan.clusters_
[[0, 1, 2], [3, 4]]
>>> dbscan.noises_
[5]
>>> dbscan.labels_
[0, 0, 0, 1, 1, -1]
References
- Zhou Zhihua
- Han
- Kassambara
eps
class-attribute
instance-attribute
¶
eps: float = 0.5
The maximum distance between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function. (same as sklearn)
min_samples
class-attribute
instance-attribute
¶
min_samples: int = 5
The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself. If min_samples is set to a higher value, DBSCAN will find denser clusters, whereas if it is set to a lower value, the found clusters will be more sparse. (same as sklearn)
clusters_
class-attribute
instance-attribute
¶
The clusters found by the DBSCAN algorithm.
core_objects_
class-attribute
instance-attribute
¶
The core objects found by the DBSCAN algorithm.
noises_
class-attribute
instance-attribute
¶
The noises found by the DBSCAN algorithm.
labels_
class-attribute
instance-attribute
¶
The cluster labels found by the DBSCAN algorithm.
fit
¶
Fit the DBSCAN model.
PARAMETER | DESCRIPTION |
---|---|
data
|
The dataset. |
RETURNS | DESCRIPTION |
---|---|
self
|
The fitted DBSCAN model.
TYPE:
|
Source code in toyml/clustering/dbscan.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
|
fit_predict
¶
Fit the DBSCAN model and return the cluster labels.
PARAMETER | DESCRIPTION |
---|---|
data
|
The dataset. |
RETURNS | DESCRIPTION |
---|---|
list[int]
|
The cluster labels. |
Source code in toyml/clustering/dbscan.py
106 107 108 109 110 111 112 113 114 115 116 |
|
toyml.clustering.dbscan.Dataset
dataclass
¶
Dataset for DBSCAN
PARAMETER | DESCRIPTION |
---|---|
data
|
The dataset. |
ATTRIBUTE | DESCRIPTION |
---|---|
data |
The dataset. |
n |
The number of data points.
TYPE:
|
distance_matrix_ |
The distance matrix. |
distance_metric
class-attribute
instance-attribute
¶
distance_metric: Literal['euclidean'] = 'euclidean'
The distance metric to use.(For now we only support euclidean).
get_neighbors
¶
Get the neighbors of the i-th data point.
PARAMETER | DESCRIPTION |
---|---|
i
|
The index of the data point.
TYPE:
|
eps
|
The maximum distance between two samples for one to be considered as in the neighborhood of the other.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list[int]
|
The indices of the neighbors (Don't include the point itself). |
Source code in toyml/clustering/dbscan.py
151 152 153 154 155 156 157 158 159 160 161 162 |
|
get_core_objects
¶
Get the core objects and noises of the dataset.
PARAMETER | DESCRIPTION |
---|---|
eps
|
The maximum distance between two samples for one to be considered as in the neighborhood of the other.
TYPE:
|
min_samples
|
The number of samples (or total weight) in a neighborhood for a point to be considered as a core point.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
core_objects
|
The indices of the core objects. |
noises
|
The indices of the noises. |
Source code in toyml/clustering/dbscan.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
|