Isolation Forest¶
toyml.ensemble.iforest.IsolationForest
dataclass
¶
IsolationForest(n_itree: int = 100, max_samples: None | int = None, score_threshold: float = 0.5, random_seed: int | None = None, itrees_: list[IsolationTree] = list())
Isolation Forest.
Examples:
>>> from toyml.ensemble.iforest import IsolationForest
>>> dataset = [[-1.1], [0.3], [0.5], [100.0]]
>>> IsolationForest(n_itree=100, max_samples=4).fit_predict(dataset)
[1, 1, 1, -1]
References
Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." 2008 eighth ieee international conference on data mining. IEEE, 2008.
n_itree
class-attribute
instance-attribute
¶
n_itree: int = 100
The number of isolation tree in the ensemble.
max_samples
class-attribute
instance-attribute
¶
max_samples: None | int = None
The number of samples to draw from X to train each base estimator.
score_threshold
class-attribute
instance-attribute
¶
score_threshold: float = 0.5
The score threshold that is used to define outlier: If sample's anomaly score > score_threshold, then the sample is detected as outlier(predict return -1); otherwise, the sample is normal(predict return 1)
random_seed
class-attribute
instance-attribute
¶
random_seed: int | None = None
The random seed used to initialize the centroids.
itrees_
class-attribute
instance-attribute
¶
itrees_: list[IsolationTree] = field(default_factory=list)
The isolation trees in the forest.
fit
¶
fit(dataset: list[list[float]]) -> IsolationForest
Fit the isolation forest model.
Source code in toyml/ensemble/iforest.py
164 165 166 167 168 169 170 171 172 |
|
score
¶
Predict the sample's anomaly score.
PARAMETER | DESCRIPTION |
---|---|
sample |
The data sample. |
RETURNS | DESCRIPTION |
---|---|
float
|
The anomaly score. |
Source code in toyml/ensemble/iforest.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
|
predict
¶
Predict the sample is outlier ot not.
PARAMETER | DESCRIPTION |
---|---|
sample |
The data sample. |
RETURNS | DESCRIPTION |
---|---|
Outlier
|
-1; Normal: 1.
TYPE:
|
Source code in toyml/ensemble/iforest.py
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
|
toyml.ensemble.iforest.IsolationTree
dataclass
¶
IsolationTree(max_height: int, random_seed: int | None = None, sample_size_: int | None = None, feature_num_: int | None = None, left_: IsolationTree | None = None, right_: IsolationTree | None = None, split_at_: int | None = None, split_value_: float | None = None)
The isolation tree.
Note
The isolation tree is a proper(full) binary tree, which has either 0 or 2 children.
random_seed
class-attribute
instance-attribute
¶
random_seed: int | None = None
The random seed used to initialize the centroids.
feature_num_
class-attribute
instance-attribute
¶
feature_num_: int | None = None
The number of features at each sample.
left_
class-attribute
instance-attribute
¶
left_: IsolationTree | None = None
The left child of the tree.
right_
class-attribute
instance-attribute
¶
right_: IsolationTree | None = None
The right child of the tree.
split_at_
class-attribute
instance-attribute
¶
split_at_: int | None = None
The index of feature which is used to split the tree's samples into children.
split_value_
class-attribute
instance-attribute
¶
split_value_: float | None = None
The value of split_at feature that use to split samples
fit
¶
fit(samples: list[list[float]]) -> IsolationTree
Fit the isolation tree.
Source code in toyml/ensemble/iforest.py
49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
get_sample_path_length
¶
Get the sample's path length to the external(leaf) node.
PARAMETER | DESCRIPTION |
---|---|
sample |
The data sample. |
RETURNS | DESCRIPTION |
---|---|
float
|
The path length of the sample. |
Source code in toyml/ensemble/iforest.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|
is_external_node
¶
is_external_node() -> bool
The tree node is external(leaf) node or not.
Source code in toyml/ensemble/iforest.py
88 89 90 91 92 93 94 |
|