News-Articles-Classification
所属分类:特征抽取
开发工具:Python
文件大小:15190KB
下载次数:0
上传日期:2018-08-13 08:04:32
上 传 者:
sh-1993
说明: 该数据集包含从《赫芬顿邮报》获得的2013年至2018年的约12.5万条新闻标题。模型训练...
(This dataset contains around 125k news headlines from the year 2013 to 2018 obtained from HuffPost. The model trained on this dataset could be used to identify tags for untracked news articles or to identify the type of category of the news.)
文件列表:
News_Category_Dataset.json.zip (15551444, 2018-08-13)
news_headlines.py (5384, 2018-08-13)
# ML---News-Category
[It is not yet completed , I have Implemented using machine learning algorithms and the accuracy is saturating at ~3.5. I am currently implementing Neural Networks to get a better model ]
To categorize news articles based on their headlines and short descriptions?
Acknowledgements:
I have taken this HuffPost dataset from Kaggle to practise purpose only If this is against the TOS, please let me know and I will take it down.
This dataset contains around 125k news headlines from the year 2013 to 2018 obtained from HuffPost. The model trained on this dataset could be used to identify tags for untracked news articles or to identify the type of category of the news.
Content
Each news headline has a corresponding category. Categories and corresponding article counts are as follows:
POLITICS: 32739
ENTERTAINMENT: 14257
HEALTHY LIVING: 6694
QUEER VOICES: 4995
BUSINESS: 4254
SPORTS: 4167
COMEDY: 3971
PARENTS: 3955
BLACK VOICES: 3858
THE WORLDPOST: 36***
WOMEN: 3490
CRIME: 2893
MEDIA: 2815
WEIRD NEWS: 2670
GREEN: 2622
IMPACT: 2602
WORLDPOST: 2579
RELIGION: 2556
STYLE: 2254
WORLD NEWS: 2177
TRAVEL: 2145
TASTE: 2096
ARTS: 1509
FIFTY: 1401
GOOD NEWS: 13***
SCIENCE: 1381
ARTS & CULTURE: 1339
TECH: 1231
COLLEGE: 1144
LATINO VOICES: 1129
EDUCATION: 1004
近期下载者:
相关文件:
收藏者: