straits-times-summarizer

所属分类:特征抽取
开发工具:Python
文件大小:88KB
下载次数:0
上传日期:2018-11-12 14:45:09
上 传 者sh-1993
说明:  刮掉和总结《海峡时报》的新闻
(Scrape and summarize news from The Straits Times)

文件列表:
Sample News (0, 2018-11-12)
Sample News\ST_News-Headlines.txt (4961, 2018-11-12)
Sample News\o_ST_Asia.txt (23482, 2018-11-12)
Sample News\o_ST_Politics.txt (40565, 2018-11-12)
Sample News\o_ST_Scientific-Research.txt (28224, 2018-11-12)
Sample News\o_ST_Singapore.txt (13140, 2018-11-12)
Sample News\o_ST_World.txt (23786, 2018-11-12)
Sample News\s_ST_Asia.txt (9546, 2018-11-12)
Sample News\s_ST_Politics.txt (16607, 2018-11-12)
Sample News\s_ST_Scientific-Research.txt (11370, 2018-11-12)
Sample News\s_ST_Singapore.txt (5980, 2018-11-12)
Sample News\s_ST_World.txt (9102, 2018-11-12)
email_config.py (396, 2018-11-12)
news_config.py (1897, 2018-11-12)
requirements.txt (85, 2018-11-12)
st_email.py (7221, 2018-11-12)
st_news.py (11515, 2018-11-12)
st_summarizer.py (705, 2018-11-12)

# straits-times-summarizer ![PyPI](https://img.shields.io/pypi/pyversions/Django.svg) ______ _ __ _______ / __/ /________ _(_) /____ /_ __(_)_ _ ___ ___ _\ \/ __/ __/ _ `/ / __(_-< / / / / ' \/ -_|_-< /___/\__/_/ \_,_/_/\__/___/ /_/ /_/_/_/_/\__/___/ _ __ ____ _ / |/ /__ _ _____ / __/_ ____ _ __ _ ___ _____(_)__ ___ ____ / / -_) |/|/ (_-< _\ \/ // / ' \/ ' \/ _ `/ __/ /_ // -_) __/ /_/|_/\__/|__,__/___/ /___/\_,_/_/_/_/_/_/_/\_,_/_/ /_//__/\__/_/ A set of scripts to scrape for news from The Straits Times and summarize the news articles. An added function is to e-mail the summaries and send the original articles as an e-mail attachment. ## Features Main script: `st_news.py` - Summaries of news articles (TexRank summarization). - Includes article's tags, author, and the date/time it was published. - Outputs summaries, original texts, and news headlines as .txt files. ![st_news](https://user-images.githubusercontent.com/32814703/34471824-89a95b10-ef8d-11e7-8b11-8c55aaf1276c.gif) `summarizer.py` - Handles text summary. `st_email.py` - Option to e-mail articles through Gmail. - Summarized text is in html format. - Original news text is sent as an attachment. ![st1](https://user-images.githubusercontent.com/32814703/34471794-73e9bd16-ef8c-11e7-930e-292b96ffa6f7.png) ![st2](https://user-images.githubusercontent.com/32814703/34471793-73***6396-ef8c-11e7-89fc-7f61ae9f25ca.png) ## Configuration Configure your settings in news_config.py and email_config.py. #### News settings `news_config.py` 1. Toggle to search for only today's news. Set to True if you only want to pull news articles that were published today. ``` todays_news = False ``` 2. Set the number of news articles (top headlines) you want to pull per category. ``` headline_count = 5 ``` 3. Set categories. Comment to exclude category; decomment to include it. ``` st_categories = [ ("Singapore", "http://www.straitstimes.com/singapore", "ST_Singapore.txt"), ("Politics", "http://www.straitstimes.com/politics", "ST_Politics.txt"), ("Asia", "http://www.straitstimes.com/asia", "ST_Asia.txt"), ("World", "http://www.straitstimes.com/world", "ST_World.txt"), # ("Lifestyle", "http://www.straitstimes.com/lifestyle", "ST_Lifestyle.txt"), # ("Food", "http://www.straitstimes.com/lifestyle/food", "ST_Food.txt"), # ("Sport", "http://www.straitstimes.com/sport", "ST_Sport.txt"), # ("Tech", "http://www.straitstimes.com/tech", "ST_Tech.txt") ] ``` Set any other specific 'Tags' to pull news from. These tags would be appended to the main categories. Again, comment to exclude category; decomment to include it. ``` st_tags = [ ("Scientific Research", "http://www.straitstimes.com/tags/scientific-research", "ST_Scientific-Research.txt") # ("Science", "http://www.straitstimes.com/tags/science", "ST_Science.txt"), # ("US Politics", "http://www.straitstimes.com/tags/us-politics", "ST_US-Politics.txt") ] ``` #### E-mail/Gmail settings `email_config.py` Gmail is used in st_email.py. Ensure that ["Allow less secure apps" is turned on](https://support.google.com/accounts/answer/6010255?hl=en) for the gmail account. 1. Toggle send e-mail function. Set to True to enable; False to disable. ``` email_news = False ``` 2. User's gmail details. ``` gmail_user = "user@gmail.com" # user gmail address gmail_pwd = "userpwd" # user gmail password ``` 3. E-mail addresses of reciepients. ``` send_to = [ "recipient1@gmail.com", "recipient2@gmail.com" ] ``` ## Author * [**ahthon**](https://github.com/ahthon) Feel free to contact me if you run into issues or have suggests for improvement.

近期下载者

相关文件


收藏者