News-Classification

所属分类:数据库系统
开发工具:Jupyter Notebook
文件大小:6556KB
下载次数:0
上传日期:2019-04-12 20:55:07
上 传 者sh-1993
说明:  一个使用TagMyNews数据库的python新闻分类项目
(A python news classification project using TagMyNews Database)

文件列表:
.DS_Store (8196, 2019-04-13)
.ipynb_checkpoints (0, 2019-04-13)
.ipynb_checkpoints\News Classification-checkpoint.ipynb (34143, 2019-04-13)
Averages of Three.png (57853, 2019-04-13)
Multinomial Naive Bayes.png (64894, 2019-04-13)
News Classification.ipynb (70463, 2019-04-13)
SVM.png (64964, 2019-04-13)
count_vector.pkl (423209, 2019-04-13)
data (0, 2019-04-13)
data\.DS_Store (8196, 2019-04-13)
data\business (0, 2019-04-13)
data\business\0.txt (360, 2019-04-13)
data\business\1.txt (269, 2019-04-13)
data\business\10.txt (303, 2019-04-13)
data\business\100.txt (268, 2019-04-13)
data\business\101.txt (385, 2019-04-13)
data\business\102.txt (343, 2019-04-13)
data\business\103.txt (362, 2019-04-13)
data\business\104.txt (318, 2019-04-13)
data\business\105.txt (219, 2019-04-13)
data\business\106.txt (292, 2019-04-13)
data\business\107.txt (348, 2019-04-13)
data\business\108.txt (366, 2019-04-13)
data\business\109.txt (269, 2019-04-13)
data\business\11.txt (393, 2019-04-13)
data\business\110.txt (322, 2019-04-13)
data\business\111.txt (298, 2019-04-13)
data\business\112.txt (290, 2019-04-13)
data\business\113.txt (293, 2019-04-13)
data\business\114.txt (373, 2019-04-13)
data\business\115.txt (361, 2019-04-13)
data\business\116.txt (410, 2019-04-13)
data\business\117.txt (354, 2019-04-13)
data\business\118.txt (344, 2019-04-13)
data\business\119.txt (293, 2019-04-13)
data\business\12.txt (387, 2019-04-13)
data\business\120.txt (286, 2019-04-13)
data\business\121.txt (388, 2019-04-13)
... ...

# Classification of News HeadLines ##### [DEMO VIDEO](https://www.youtube.com/watch?v=HeKchZ1dauM&feature=youtu.be) News Headline Classification through multiple machine learning model and comparison of results. Models implemented: * Multinomial Naive Bayes * Support Vector Machines * Neural Network with Softmax Layer Metrics used to evaluate the performance of models: * Precision * Recall * F1 Score We evaluate each classifier's ability to select the appropriate category given an article’s title and a brief article description. The confusion matrix is created to explore the results and calculate the metrics. ###### Feature Extraction Techniques: The collection of text documents is converted to a matrix of token counts using count vectorize that produces a sparse representation of the counts. TFIDF,term frequency–inverse document frequency, is the statistic that is intended to reflect how important a word is to a document in our corpus. This is used to extract the most meaningful words in the Corpus. ###### Link to Dataset: [News Article Dataset](http://acube.di.unipi.it/tmn-dataset/) TagMyNews Datasets is a collection of datasets of short text fragments that we used for the evaluation of our topic-based text classifier. This is a dataset of ~32K english news extracted from RSS feeds of popular newspaper websites (nyt.com, usatoday.com, reuters.com). Categories are: Sport, Business, U.S., Health, Sci&Tech, World and Entertainment. Packages required: * Pandas * sklearn * Numpy ![Multinomial Naive Bayes](https://i.imgur.com/2gaK9iO.png) ![Softmax](https://i.imgur.com/R2XHiuB.png) ![SVM](https://i.imgur.com/dvfwxY8.png) ![Average of three](https://i.imgur.com/1WtrPRv.png)

近期下载者

相关文件


收藏者