pylof-master

所属分类:Windows编程
开发工具:Python
文件大小:43KB
下载次数:16
上传日期:2017-11-30 15:27:54
上 传 者loyal飞
说明:  python里实现lof算法,异常值检测,为各数据计算异常因子,衡量异常程度
(Implementation of lof in Python)

文件列表:
.travis.yml (188, 2016-05-06)
LICENSE (18050, 2016-05-06)
example1.png (19328, 2016-05-06)
example2.png (23514, 2016-05-06)
lof.py (8280, 2016-05-06)
test_lof.py (1214, 2016-05-06)

pylof ===== [![Build Status](https://travis-ci.org/damjankuznar/pylof.png?branch=master)](https://travis-ci.org/damjankuznar/pylof) Python implementation of Local Outlier Factor algorithm by [Markus M. Breunig](http://www.dbs.ifi.lmu.de/Publikationen/Papers/LOF.pdf). Examples -------- ### Example 1 The following example illustrates the simple use case of computing LOF values of several instances (e.g. `[0,0],[5,5],[10,10] and [-8,-8]`) based on the `instances` variable that we pass to the `LOF` constructor. ``` instances = [ (-4.8447532242074978, -5.6869538132901658), (1.72655771093***076, -2.5446963280374302), (-1.***85***2441038819, 1.705719***3962865), (-1.999050026772494, -4.0367551415711844), (-2.05508601268***9***, -3.624740***9323***26), (-1.4456945632547327, -3.7669258809535102), (-4.6676062022635554, 1.4925324371089148), (-3.652***20667796877, -3.5582661345085662), (***551493172954029, -0.45434966683144573), (-0.56730591589443669, -5.5859532963153349), (-5.1400897823762239, -1.33592489940190***), (5.2586932439960243, 0.032431285797532586), (6.3610915734502838, -0.99059***8246991894), (-0.31086913190231447, -2.8352818694180***4), (1.2288582719783967, -1.1362795178325829), (-0.17***6204466346614, -0.32813130288006365), (2.2532002509929216, -0.5142311840491***9), (-0.75397166138399296, 2.2465141276038754), (1.9382517***8161239, -1.7276112460593251), (1.6809250808549676, -2.3433636210337503), (0.68466572523884783, 1.4374914487477481), (2.00323***431791514, -2.9191062023123635), (-1.7565895138024741, 0.96995712544043267), (3.3809***42950***505, 6.7497121359292684), (-4.27***152718650896, 5.6551328734397766), (-3.6347215445083019, -0.8514***61***4875741), (-5.6249411288060385, -3.9251965527768755), (4.6033708001912093, 1.3375110154658127), (-0.685421751407***3, -0.73115552***4211407), (-2.3744241805625044, 1.3443896265777866)] from lof import LOF lof = LOF(instances) for instance in [[0,0],[5,5],[10,10],[-8,-8]]: value = lof.local_outlier_factor(5, instance) print value, instance ``` The output should be: ``` 0.901765248682 [0, 0] 1.36792777562 [5, 5] 2.28926007995 [10, 10] 1.91195816119 [-8, -8] ``` This example is also visualized on the following figure, where blue dots represent instances passed to LOF constructor, green dots are instances that are not outliers (lof value <= 1) and red dots are instances that are outliers (lof value > 1). The size or red dots represents the lof value, meaning that greater lof values result in larger dots. ![Plot](https://github.com/damjankuznar/pylof/raw/master/example1.png) Code used for plotting the above plot (matplotlib is required): ``` from matplotlib import pyplot as p x,y = zip(*instances) p.scatter(x,y, 20, color="#0000FF") for instance in [[0,0],[5,5],[10,10],[-8,-8]]: value = lof.local_outlier_factor(3, instance) color = "#FF0000" if value > 1 else "#00FF00" p.scatter(instance[0], instance[1], color=color, s=(value-1)**2*10+20) p.show() ``` ### Example 2 Pylof also has a helper function to identify outliers in a given instances dataset. ``` instances = [ (-4.8447532242074978, -5.6869538132901658), (1.72655771093***076, -2.5446963280374302), (-1.***85***2441038819, 1.705719***3962865), (-1.999050026772494, -4.0367551415711844), (-2.05508601268***9***, -3.624740***9323***26), (-1.4456945632547327, -3.7669258809535102), (-4.6676062022635554, 1.4925324371089148), (-3.652***20667796877, -3.5582661345085662), (***551493172954029, -0.45434966683144573), (-0.56730591589443669, -5.5859532963153349), (-5.1400897823762239, -1.33592489940190***), (5.2586932439960243, 0.032431285797532586), (6.3610915734502838, -0.99059***8246991894), (-0.31086913190231447, -2.8352818694180***4), (1.2288582719783967, -1.1362795178325829), (-0.17***6204466346614, -0.32813130288006365), (2.2532002509929216, -0.5142311840491***9), (-0.75397166138399296, 2.2465141276038754), (1.9382517***8161239, -1.7276112460593251), (1.6809250808549676, -2.3433636210337503), (0.68466572523884783, 1.4374914487477481), (2.00323***431791514, -2.9191062023123635), (-1.7565895138024741, 0.96995712544043267), (3.3809***42950***505, 6.7497121359292684), (-4.27***152718650896, 5.6551328734397766), (-3.6347215445083019, -0.8514***61***4875741), (-5.6249411288060385, -3.9251965527768755), (4.6033708001912093, 1.3375110154658127), (-0.685421751407***3, -0.73115552***4211407), (-2.3744241805625044, 1.3443896265777866)] from lof import outliers lof = outliers(5, instances) for outlier in lof: print outlier["lof"],outlier["instance"] ``` The output should be: ``` 2.20484969217 (3.3809***42950***505, 6.749712135929268) 1.79484408482 (-4.27***1527186509, 5.6551328734397766) 1.50121865848 (***55149317295403, -0.45434966683144573) 1.47940253262 (6.361091573450284, -0.99059***824699189) 1.37216956549 (5.258693243996024, 0.032431285797532586) 1.29100195101 (4.603370800191209, 1.3375110154658127) 1.20274006333 (-4.8447532242074***, -5.686953813290166) 1.187180183*** (-5.6249411288060385, -3.9251965527768755) 1.108***567816 (0.6846657252388478, 1.4374914487477481) 1.05728304007 (-4.667606202263555, 1.4925324371089148) 1.04216295935 (-5.140089782376224, -1.33592489940190***) 1.02801167935 (-0.5673059158944367, -5.585953296315335) ``` This example is also visualized on the following figure, where blue dots represent instances passed to LOF constructor, green dots are instances that are not outliers (lof value <= 1) and red dots are instances that are outliers (lof value > 1). The size or red dots represents the lof value, meaning that greater lof values result in larger dots. ![Plot](https://github.com/damjankuznar/pylof/raw/master/example2.png) Code used for plotting the above plot (matplotlib is required): ``` from matplotlib import pyplot as p x,y = zip(*instances) p.scatter(x,y, 20, color="#0000FF") for outlier in lof: value = outlier["lof"] instance = outlier["instance"] color = "#FF0000" if value > 1 else "#00FF00" p.scatter(instance[0], instance[1], color=color, s=(value-1)**2*10+20) p.show() ``` TODO ----- * Increase the unit test coverage

近期下载者

相关文件


收藏者