Reuters-full-data-set

所属分类:matlab编程
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2016-08-17 06:01:57
上 传 者sh-1993
说明:  路透社的完整数据集由8551441个新闻标题、链接和时间戳组成(2007年1月至2016年8月)。
(Full dataset of Reuters composed of 8,551,441 news titles, links and timestamps (Jan 2007 - Aug 2016).)

文件列表:
data/ (0, 2016-08-16)
data/20070101.pkl (26289, 2016-08-16)
data/20070102.pkl (69196, 2016-08-16)
data/20070103.pkl (130044, 2016-08-16)
data/20070104.pkl (143682, 2016-08-16)
data/20070105.pkl (124043, 2016-08-16)
data/20070106.pkl (38716, 2016-08-16)
data/20070107.pkl (56943, 2016-08-16)
data/20070108.pkl (188309, 2016-08-16)
data/20070109.pkl (183605, 2016-08-16)
data/20070110.pkl (203423, 2016-08-16)
data/20070111.pkl (195912, 2016-08-16)
data/20070112.pkl (158258, 2016-08-16)
data/20070113.pkl (33568, 2016-08-16)
data/20070114.pkl (59872, 2016-08-16)
data/20070115.pkl (117096, 2016-08-16)
data/20070116.pkl (207482, 2016-08-16)
data/20070117.pkl (222016, 2016-08-16)
data/20070118.pkl (248647, 2016-08-16)
data/20070119.pkl (211374, 2016-08-16)
data/20070120.pkl (42910, 2016-08-16)
data/20070121.pkl (66701, 2016-08-16)
data/20070122.pkl (251929, 2016-08-16)
data/20070123.pkl (268736, 2016-08-16)
data/20070124.pkl (291943, 2016-08-16)
data/20070125.pkl (318249, 2016-08-16)
data/20070126.pkl (212796, 2016-08-16)
data/20070127.pkl (49783, 2016-08-16)
data/20070128.pkl (61117, 2016-08-16)
data/20070129.pkl (255438, 2016-08-16)
data/20070130.pkl (281687, 2016-08-16)
data/20070131.pkl (316315, 2016-08-16)
data/20070201.pkl (328075, 2016-08-16)
data/20070202.pkl (230648, 2016-08-16)
data/20070203.pkl (39161, 2016-08-16)
data/20070204.pkl (63757, 2016-08-16)
data/20070205.pkl (253776, 2016-08-16)
data/20070206.pkl (289504, 2016-08-16)
data/20070207.pkl (281199, 2016-08-16)
... ...

# Reuters-full-data-set Full unofficial data set of Reuters composed of 8,551,441 news titles, links and timestamps (Jan 2007 - Aug 2016). ``` git clone https://github.com/philipperemy/Reuters-full-data-set.git python read.py ``` ``` ts = 2007022811:46 AM EST, t = European stocks hit 7-week low amid new sell-off, h= http://www.reuters.com/article/companyNewsAndPR/idUSWEB277620070228 ts = 2007022811:46 AM EST, t = Schering-Plough announces Ismail Kola as VP and Chief Scientific Officer, h= http://www.reuters.com/article/inPlayBriefing/idUSIN20070228164651SGP20070228 ts = 2007022811:46 AM EST, t = O'Reilly Automotive forecasts 2007 earnings growth, h= http://www.reuters.com/article/marketsNews/idUSN2845320220070228 ts = 2007022811:42 AM EST, t = Market Wrap, h= http://www.reuters.com/article/inPlayBriefing/idUSIN20070228164235WRAPX20070228 ts = 2007022811:42 AM EST, t = Chile's CMPC net profit falls 13 pct in 2006, h= http://www.reuters.com/article/tnBasicIndustries-SP/idUSN2844077020070228 ts = 2007022811:42 AM EST, t = Toyota Venezuela to halt March ops on currency woes, h= http://www.reuters.com/article/tnBasicIndustries-SP/idUSN2827887820070228 ``` Each pickle file in `data` represents a day (e.g. `20160102.pkl` is for Jan, 2 2016). One day is composed of several news, gathered in a `list`. Each news is a `dict` of the form: ``` ts: timestamp of the form 2007022811:46 AM EST title: title of the news href: link to the article to get the full content ```

近期下载者

相关文件


收藏者