bufalometro

所属分类:自动编程
开发工具:Jupyter Notebook
文件大小:5232KB
下载次数:0
上传日期:2017-06-26 22:42:33
上 传 者sh-1993
说明:  bufalometro.it的源代码-一个在虚假新闻检测背景下分析意大利文本的Web应用程序
(Source code of bufalometro.it - a webapp for analysis of Italian text in the context of fake news detection)

文件列表:
CHANGELOG.txt (71, 2017-06-27)
LICENSE.txt (1060, 2017-06-27)
MANIFEST.in (199, 2017-06-27)
bufalometro (0, 2017-06-27)
bufalometro\__init__.py (0, 2017-06-27)
bufalometro\app.py (1322, 2017-06-27)
bufalometro\config.py (127, 2017-06-27)
bufalometro\models (0, 2017-06-27)
bufalometro\models\__init__.py (0, 2017-06-27)
bufalometro\models\cache.py (829, 2017-06-27)
bufalometro\models\data (0, 2017-06-27)
bufalometro\models\data\lerciosity.pkl (525393, 2017-06-27)
bufalometro\models\data\referencity.pkl (2640551, 2017-06-27)
bufalometro\models\data\reuters-sample.pkl (763209, 2017-06-27)
bufalometro\models\data\sentiment.pkl (505522, 2017-06-27)
bufalometro\models\markup.py (749, 2017-06-27)
bufalometro\models\nlp.py (2307, 2017-06-27)
bufalometro\models\web.py (1048, 2017-06-27)
bufalometro\static (0, 2017-06-27)
bufalometro\static\css (0, 2017-06-27)
bufalometro\static\css\style.css (3635, 2017-06-27)
bufalometro\static\images (0, 2017-06-27)
bufalometro\static\images\accroccathon.png (249294, 2017-06-27)
bufalometro\static\images\bufalogo-ok-sm.png (217744, 2017-06-27)
bufalometro\static\images\bufalogo-sm.png (229235, 2017-06-27)
bufalometro\static\images\bufalogo-small.png (829381, 2017-06-27)
bufalometro\static\images\devi.gif (2250983, 2017-06-27)
bufalometro\static\images\dial-back-reverse-sm.png (21782, 2017-06-27)
bufalometro\static\images\favicon-64.png (5016, 2017-06-27)
bufalometro\static\images\glass-sm.png (8347, 2017-06-27)
bufalometro\static\images\speedo-back-small-100.png (11416, 2017-06-27)
bufalometro\static\images\speedo-glass-small-100.png (4060, 2017-06-27)
bufalometro\static\js (0, 2017-06-27)
bufalometro\static\js\MeterWidget-1.0.js (10455, 2017-06-27)
bufalometro\static\js\canvas_utils.js (2089, 2017-06-27)
bufalometro\static\js\caret.js (997, 2017-06-27)
bufalometro\static\js\index.js (4583, 2017-06-27)
... ...

=========== Bufalometro =========== `Il Bufalometro `_ is a half-serious webapp for analysis of Italian text in the context of fake news detection. It was created during the Global AI Hackathon on June 24-25 in Rome. It is currently (as of June 2017) hosted at http://bufalometro.it. The app provides a web-based user interface for experimenting with two kinds of analysis on a given piece of text. A. Text analysis for measuring the *style* of the message. During the hackathon three basic NLP models were deployed in their most simplistic form: - A classical *sentiment analysis* model, trained to recognize literal positivity in the `SENTIPOLC `_ corpus. - A *"Lerciosity"* classifier, trained to discriminate between texts of the Italian Wikinews articles and texts of the joke-news website http://lercio.it. The former dataset was collected by downloading the official dump, whilst the latter - by scraping the website. - A *"Referencity"* classifier, trained on the texts of the Italian Wikipedia to predict whether a given sentence would be followed by a reference footnote. Presumably, such sentences would contain facts which may require checking. As is usual for a hackathon, not much time was spent on tuning the classifiers (just getting and cleaning the data is time-consuming enough). The most basic tokenization strategy along with a Naive Bayes classifier tucked on top seemed to produce results which were reasonable and fun enough for a prototype. The structure of the application makes it rather easy to experiment with other models or improve on the existing ones. A Jupyter notebook is provided to illustrate a sample model training. The focus on word-based models made it possible to visually highlight the effect of each word in the final classification - this visualization is shown if you click on one of the "meters" in the UI. B. Internet search for measuring the credibility of the *content* of the message. Using a search engine to check whether the search results for a given phrase include known credible sources or known fake sites seems to be very simple yet hard to beat baseline for credibility analysis. This metric is referred to as the "Bufala Pro" analysis in the UI. Usage ----- The webapp is currently hosted on http://bufalometro.it. It is a Flask-based Python3 webapp. If you want to run a local instance or develop the code, clone the repository, and set up a virtualenv with the necessary requirement packages (optional, but highly recommended):: $ python3 -m venv venv $ . venv/bin/activate $ easy_install --upgrade pip $ pip install -r requirements.txt Then install the package itself:: $ python setup.py develop and launch the app by either using the built-in Flask development server:: $ export FLASK_APP=bufalometro.app $ flask run Alternatively, you may rely on a separate container, such as ``gunicorn``:: $ pip install gunicorn $ gunicorn bufalometro.app:app If you want to try changing the models, consider studying the contents of the ``notebooks/Model - Example.ipynb`` Jupyter file. Creators -------- - Konstantin Tretyakov - Valerio Viperino - Danilo Bruno - Guglielmo Senese - Giacomo Saliola - Daniele Ciciani - Marco Brilli License ------- - The source code of the project is subject to the MIT License. - The texts of the webpage, the logo and team images are not licensed for public redistribution. - The project relies on several third-party open-source libraries. Each of these libraries may be subject to its own license.

近期下载者

相关文件


收藏者