rssbot

所属分类:工具库
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2016-08-28 17:06:39
上 传 者sh-1993
说明:  Python机器人程序,从RSS提要列表中下载新闻,将其转换为docx格式,去除所有HTML混乱,并将其组织为fol...,
(Python bot that downloads news from a list of RSS feeds, converts them to docx format striping all HTML clutter and organizes them in folders according to publishing date.)

文件列表:
docxConverter.py (899, 2016-08-28)
htmlReader.py (1165, 2016-08-28)
license.txt (35141, 2016-08-28)
newsbot.py (1252, 2016-08-28)
rssNewsSpider.py (4269, 2016-08-28)

# rssbot Python bot that downloads news from any RSS feed, converts them to docx format after striping all HTML and ads clutter -sth. like the Safari Reader mode- and organizes them chronologically in folders according to the posts' publishing date. ##Screenshots ![overview](https://cloud.githubusercontent.com/assets/15065645/10857587/3c18add4-7f4f-11e5-8f20-89b1a3b52ae6.png) ![overview2](https://cloud.githubusercontent.com/assets/15065645/10857593/475027a4-7f4f-11e5-8863-1051ee427ed5.png) ##Use cases: - keep track of any hot topic in an organized way (like Bitcoin). - compare news coming from different media - create an archive of news diggests Maybe journalists will find it useful if there is no better (and free) tool. ##Current features: - scrape from a list of RSS feeds - ads and HTML clutter removal - docx conversion - chronological organization in folders ##Nice to have: If you found this project interesting and want to contribute, these are the main ideas that come to my mind: - add tests - pass RSS url and root location as arguments, do not touch code - package the app in a bundle, so no need to install dependencies - improve docx formatting (bold, italics, etc) - add the post's main image (if any) - PDF conversion - add tags feature to be able to organize them by sub-topic (call an AI web service like IBM's Watson) - add GUI to set up bot (feed's URL, refresh frequency, root location, ...) ##Installation: There are several dependencies that need to be installed: - beautifulSoup4 - docx - readability ##Usage: Just type on the command line: python newsbot.py You may want to change the feeds list in the code according to your interests and rename the root folder.

近期下载者

相关文件


收藏者