pylof-master
所属分类:Windows编程
开发工具:Python
文件大小:43KB
下载次数:16
上传日期:2017-11-30 15:27:54
上 传 者:
loyal飞
说明: python里实现lof算法,异常值检测,为各数据计算异常因子,衡量异常程度
(Implementation of lof in Python)
文件列表:
.travis.yml (188, 2016-05-06)
LICENSE (18050, 2016-05-06)
example1.png (19328, 2016-05-06)
example2.png (23514, 2016-05-06)
lof.py (8280, 2016-05-06)
test_lof.py (1214, 2016-05-06)
pylof
=====
[![Build Status](https://travis-ci.org/damjankuznar/pylof.png?branch=master)](https://travis-ci.org/damjankuznar/pylof)
Python implementation of Local Outlier Factor algorithm by [Markus M. Breunig](http://www.dbs.ifi.lmu.de/Publikationen/Papers/LOF.pdf).
Examples
--------
### Example 1
The following example illustrates the simple use case of computing LOF values of several instances (e.g. `[0,0],[5,5],[10,10] and [-8,-8]`) based on the `instances` variable that we pass to the `LOF` constructor.
```
instances = [
(-4.8447532242074978, -5.6869538132901658),
(1.72655771093***076, -2.5446963280374302),
(-1.***85***2441038819, 1.705719***3962865),
(-1.999050026772494, -4.0367551415711844),
(-2.05508601268***9***, -3.624740***9323***26),
(-1.4456945632547327, -3.7669258809535102),
(-4.6676062022635554, 1.4925324371089148),
(-3.652***20667796877, -3.5582661345085662),
(***551493172954029, -0.45434966683144573),
(-0.56730591589443669, -5.5859532963153349),
(-5.1400897823762239, -1.33592489940190***),
(5.2586932439960243, 0.032431285797532586),
(6.3610915734502838, -0.99059***8246991894),
(-0.31086913190231447, -2.8352818694180***4),
(1.2288582719783967, -1.1362795178325829),
(-0.17***6204466346614, -0.32813130288006365),
(2.2532002509929216, -0.5142311840491***9),
(-0.75397166138399296, 2.2465141276038754),
(1.9382517***8161239, -1.7276112460593251),
(1.6809250808549676, -2.3433636210337503),
(0.68466572523884783, 1.4374914487477481),
(2.00323***431791514, -2.9191062023123635),
(-1.7565895138024741, 0.96995712544043267),
(3.3809***42950***505, 6.7497121359292684),
(-4.27***152718650896, 5.6551328734397766),
(-3.6347215445083019, -0.8514***61***4875741),
(-5.6249411288060385, -3.9251965527768755),
(4.6033708001912093, 1.3375110154658127),
(-0.685421751407***3, -0.73115552***4211407),
(-2.3744241805625044, 1.3443896265777866)]
from lof import LOF
lof = LOF(instances)
for instance in [[0,0],[5,5],[10,10],[-8,-8]]:
value = lof.local_outlier_factor(5, instance)
print value, instance
```
The output should be:
```
0.901765248682 [0, 0]
1.36792777562 [5, 5]
2.28926007995 [10, 10]
1.91195816119 [-8, -8]
```
This example is also visualized on the following figure, where blue dots
represent instances passed to LOF constructor, green dots are instances that
are not outliers (lof value <= 1) and red dots are instances that are outliers
(lof value > 1). The size or red dots represents the lof value, meaning that
greater lof values result in larger dots.
![Plot](https://github.com/damjankuznar/pylof/raw/master/example1.png)
Code used for plotting the above plot (matplotlib is required):
```
from matplotlib import pyplot as p
x,y = zip(*instances)
p.scatter(x,y, 20, color="#0000FF")
for instance in [[0,0],[5,5],[10,10],[-8,-8]]:
value = lof.local_outlier_factor(3, instance)
color = "#FF0000" if value > 1 else "#00FF00"
p.scatter(instance[0], instance[1], color=color, s=(value-1)**2*10+20)
p.show()
```
### Example 2
Pylof also has a helper function to identify outliers in a given instances dataset.
```
instances = [
(-4.8447532242074978, -5.6869538132901658),
(1.72655771093***076, -2.5446963280374302),
(-1.***85***2441038819, 1.705719***3962865),
(-1.999050026772494, -4.0367551415711844),
(-2.05508601268***9***, -3.624740***9323***26),
(-1.4456945632547327, -3.7669258809535102),
(-4.6676062022635554, 1.4925324371089148),
(-3.652***20667796877, -3.5582661345085662),
(***551493172954029, -0.45434966683144573),
(-0.56730591589443669, -5.5859532963153349),
(-5.1400897823762239, -1.33592489940190***),
(5.2586932439960243, 0.032431285797532586),
(6.3610915734502838, -0.99059***8246991894),
(-0.31086913190231447, -2.8352818694180***4),
(1.2288582719783967, -1.1362795178325829),
(-0.17***6204466346614, -0.32813130288006365),
(2.2532002509929216, -0.5142311840491***9),
(-0.75397166138399296, 2.2465141276038754),
(1.9382517***8161239, -1.7276112460593251),
(1.6809250808549676, -2.3433636210337503),
(0.68466572523884783, 1.4374914487477481),
(2.00323***431791514, -2.9191062023123635),
(-1.7565895138024741, 0.96995712544043267),
(3.3809***42950***505, 6.7497121359292684),
(-4.27***152718650896, 5.6551328734397766),
(-3.6347215445083019, -0.8514***61***4875741),
(-5.6249411288060385, -3.9251965527768755),
(4.6033708001912093, 1.3375110154658127),
(-0.685421751407***3, -0.73115552***4211407),
(-2.3744241805625044, 1.3443896265777866)]
from lof import outliers
lof = outliers(5, instances)
for outlier in lof:
print outlier["lof"],outlier["instance"]
```
The output should be:
```
2.20484969217 (3.3809***42950***505, 6.749712135929268)
1.79484408482 (-4.27***1527186509, 5.6551328734397766)
1.50121865848 (***55149317295403, -0.45434966683144573)
1.47940253262 (6.361091573450284, -0.99059***824699189)
1.37216956549 (5.258693243996024, 0.032431285797532586)
1.29100195101 (4.603370800191209, 1.3375110154658127)
1.20274006333 (-4.8447532242074***, -5.686953813290166)
1.187180183*** (-5.6249411288060385, -3.9251965527768755)
1.108***567816 (0.6846657252388478, 1.4374914487477481)
1.05728304007 (-4.667606202263555, 1.4925324371089148)
1.04216295935 (-5.140089782376224, -1.33592489940190***)
1.02801167935 (-0.5673059158944367, -5.585953296315335)
```
This example is also visualized on the following figure, where blue dots
represent instances passed to LOF constructor, green dots are instances that
are not outliers (lof value <= 1) and red dots are instances that are outliers
(lof value > 1). The size or red dots represents the lof value, meaning that
greater lof values result in larger dots.
![Plot](https://github.com/damjankuznar/pylof/raw/master/example2.png)
Code used for plotting the above plot (matplotlib is required):
```
from matplotlib import pyplot as p
x,y = zip(*instances)
p.scatter(x,y, 20, color="#0000FF")
for outlier in lof:
value = outlier["lof"]
instance = outlier["instance"]
color = "#FF0000" if value > 1 else "#00FF00"
p.scatter(instance[0], instance[1], color=color, s=(value-1)**2*10+20)
p.show()
```
TODO
-----
* Increase the unit test coverage
近期下载者:
相关文件:
收藏者: