demo-ard-text

所属分类:虚拟/增强现实-VR/AR
开发工具:Jupyter Notebook
文件大小:2065KB
下载次数:0
上传日期:2022-01-21 20:33:20
上 传 者sh-1993
说明:  演示笔记本程序,以减少与某些主题相关的新闻文章的数量。可在活页夹中跑步。
(Demonstration notebooks for processes to reduce volume of news articles relative to some topic. Runnable in Binder.)

文件列表:
data (0, 2022-01-22)
data\nyt_front_page_policy_agendas_codebook.pdf (530095, 2022-01-22)
data\nyt_ftpg_1996_2006_no_text.csv (4507712, 2022-01-22)
notebooks (0, 2022-01-22)
notebooks\1. Word Embeddings and Word2Vec.ipynb (7089, 2022-01-22)
notebooks\2. Cosine Similarity.ipynb (39628, 2022-01-22)
notebooks\3. Entity Extraction and Jaccard Similarity.ipynb (42930, 2022-01-22)
notebooks\4. TFIDF - Logistic Regression.ipynb (11770, 2022-01-22)
postBuild (28, 2022-01-22)
requirements.txt (1186, 2022-01-22)

# Demo for News Article Collection and Volume Reduction Pipeline [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/pmbaumgartner/demo-ard-text/master) Brief notebooks that run through the following processes using a dataset of [NYT Front Page articles](http://www.amber-boydstun.com/supplementary-information-for-making-the-news.html) 1. Find efficient keywords with word embeddings (`gensim`) 2. Remove duplicitous articles with cosine similarity on TFIDF vectors (`scikit-learn`) 3. Remove duplicitous articles with entity extraction and jaccard similarity (`spacy`) 4. Classify relevant articles (`scikit-learn`)

近期下载者

相关文件


收藏者