Skip to content

Isolation Forest

toyml.ensemble.iforest.IsolationForest dataclass

IsolationForest(n_itree: int = 100, max_samples: None | int = None, score_threshold: float = 0.5, random_seed: int | None = None, itrees_: list[IsolationTree] = list())

Isolation Forest.

Examples:

>>> from toyml.ensemble.iforest import IsolationForest
>>> dataset = [[-1.1], [0.3], [0.5], [100.0]]
>>> IsolationForest(n_itree=100, max_samples=4).fit_predict(dataset)
[1, 1, 1, -1]
References

Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." 2008 eighth ieee international conference on data mining. IEEE, 2008.

n_itree class-attribute instance-attribute

n_itree: int = 100

The number of isolation tree in the ensemble.

max_samples class-attribute instance-attribute

max_samples: None | int = None

The number of samples to draw from X to train each base estimator.

score_threshold class-attribute instance-attribute

score_threshold: float = 0.5

The score threshold that is used to define outlier: If sample's anomaly score > score_threshold, then the sample is detected as outlier(predict return -1); otherwise, the sample is normal(predict return 1)

random_seed class-attribute instance-attribute

random_seed: int | None = None

The random seed used to initialize the centroids.

itrees_ class-attribute instance-attribute

itrees_: list[IsolationTree] = field(default_factory=list)

The isolation trees in the forest.

fit

fit(dataset: list[list[float]]) -> IsolationForest

Fit the isolation forest model.

Source code in toyml/ensemble/iforest.py
164
165
166
167
168
169
170
171
172
def fit(self, dataset: list[list[float]]) -> IsolationForest:
    """
    Fit the isolation forest model.
    """
    if self.max_samples is None or self.max_samples > len(dataset):
        self.max_samples = len(dataset)

    self.itrees_ = self._fit_itrees(dataset)
    return self

score

score(sample: list[float]) -> float

Predict the sample's anomaly score.

PARAMETER DESCRIPTION
sample

The data sample.

TYPE: list[float]

RETURNS DESCRIPTION
float

The anomaly score.

Source code in toyml/ensemble/iforest.py
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
def score(self, sample: list[float]) -> float:
    """
    Predict the sample's anomaly score.

    Args:
        sample: The data sample.

    Returns:
        The anomaly score.
    """
    assert len(self.itrees_) == self.n_itree, "Please fit the model before score sample!"
    assert self.max_samples is not None, "Please fit the model before score sample!"
    itree_path_lengths = [itree.get_sample_path_length(sample) for itree in self.itrees_]
    expect_path_length = sum(itree_path_lengths) / len(itree_path_lengths)
    score = 2 ** (-expect_path_length / bst_expect_length(self.max_samples))
    return score

predict

predict(sample: list[float]) -> int

Predict the sample is outlier ot not.

PARAMETER DESCRIPTION
sample

The data sample.

TYPE: list[float]

RETURNS DESCRIPTION
Outlier

-1; Normal: 1.

TYPE: int

Source code in toyml/ensemble/iforest.py
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
def predict(self, sample: list[float]) -> int:
    """
    Predict the sample is outlier ot not.

    Args:
        sample: The data sample.

    Returns:
        Outlier: -1; Normal: 1.
    """
    score = self.score(sample)
    # outlier
    if score > self.score_threshold:
        return -1
    else:
        return 1

toyml.ensemble.iforest.IsolationTree dataclass

IsolationTree(max_height: int, random_seed: int | None = None, sample_size_: int | None = None, feature_num_: int | None = None, left_: IsolationTree | None = None, right_: IsolationTree | None = None, split_at_: int | None = None, split_value_: float | None = None)

The isolation tree.

Note

The isolation tree is a proper(full) binary tree, which has either 0 or 2 children.

max_height instance-attribute

max_height: int

The maximum height of the tree.

random_seed class-attribute instance-attribute

random_seed: int | None = None

The random seed used to initialize the centroids.

sample_size_ class-attribute instance-attribute

sample_size_: int | None = None

The sample size.

feature_num_ class-attribute instance-attribute

feature_num_: int | None = None

The number of features at each sample.

left_ class-attribute instance-attribute

left_: IsolationTree | None = None

The left child of the tree.

right_ class-attribute instance-attribute

right_: IsolationTree | None = None

The right child of the tree.

split_at_ class-attribute instance-attribute

split_at_: int | None = None

The index of feature which is used to split the tree's samples into children.

split_value_ class-attribute instance-attribute

split_value_: float | None = None

The value of split_at feature that use to split samples

fit

fit(samples: list[list[float]]) -> IsolationTree

Fit the isolation tree.

Source code in toyml/ensemble/iforest.py
49
50
51
52
53
54
55
56
57
58
59
60
61
def fit(self, samples: list[list[float]]) -> IsolationTree:
    """
    Fit the isolation tree.
    """
    self.sample_size_ = len(samples)
    self.feature_num_ = len(samples[0])
    # exNode
    if self.max_height == 0 or self.sample_size_ == 1:
        return self
    # inNode
    left_itree, right_itree = self._get_left_right_child_itree(samples)
    self.left_, self.right_ = left_itree, right_itree
    return self

get_sample_path_length

get_sample_path_length(sample: list[float]) -> float

Get the sample's path length to the external(leaf) node.

PARAMETER DESCRIPTION
sample

The data sample.

TYPE: list[float]

RETURNS DESCRIPTION
float

The path length of the sample.

Source code in toyml/ensemble/iforest.py
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def get_sample_path_length(self, sample: list[float]) -> float:
    """
    Get the sample's path length to the external(leaf) node.

    Args:
        sample: The data sample.

    Returns:
        The path length of the sample.
    """
    if self.is_external_node():
        assert self.sample_size_ is not None
        if self.sample_size_ == 1:
            return 0
        else:
            return bst_expect_length(self.sample_size_)

    assert self.split_at_ is not None and self.split_value_ is not None
    if sample[self.split_at_] < self.split_value_:
        assert self.left_ is not None
        return 1 + self.left_.get_sample_path_length(sample)
    else:
        assert self.right_ is not None
        return 1 + self.right_.get_sample_path_length(sample)

is_external_node

is_external_node() -> bool

The tree node is external(leaf) node or not.

Source code in toyml/ensemble/iforest.py
88
89
90
91
92
93
94
def is_external_node(self) -> bool:
    """
    The tree node is external(leaf) node or not.
    """
    if self.left_ is None and self.right_ is None:
        return True
    return False