demo-ard-text
所属分类:虚拟/增强现实-VR/AR
开发工具:Jupyter Notebook
文件大小:2065KB
下载次数:0
上传日期:2022-01-21 20:33:20
上 传 者:
sh-1993
说明: 演示笔记本程序,以减少与某些主题相关的新闻文章的数量。可在活页夹中跑步。
(Demonstration notebooks for processes to reduce volume of news articles relative to some topic. Runnable in Binder.)
文件列表:
data (0, 2022-01-22)
data\nyt_front_page_policy_agendas_codebook.pdf (530095, 2022-01-22)
data\nyt_ftpg_1996_2006_no_text.csv (4507712, 2022-01-22)
notebooks (0, 2022-01-22)
notebooks\1. Word Embeddings and Word2Vec.ipynb (7089, 2022-01-22)
notebooks\2. Cosine Similarity.ipynb (39628, 2022-01-22)
notebooks\3. Entity Extraction and Jaccard Similarity.ipynb (42930, 2022-01-22)
notebooks\4. TFIDF - Logistic Regression.ipynb (11770, 2022-01-22)
postBuild (28, 2022-01-22)
requirements.txt (1186, 2022-01-22)
# Demo for News Article Collection and Volume Reduction Pipeline
[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/pmbaumgartner/demo-ard-text/master)
Brief notebooks that run through the following processes using a dataset of [NYT Front Page articles](http://www.amber-boydstun.com/supplementary-information-for-making-the-news.html)
1. Find efficient keywords with word embeddings (`gensim`)
2. Remove duplicitous articles with cosine similarity on TFIDF vectors (`scikit-learn`)
3. Remove duplicitous articles with entity extraction and jaccard similarity (`spacy`)
4. Classify relevant articles (`scikit-learn`)
近期下载者:
相关文件:
收藏者: