Minim

所属分类:前端开发
开发工具:Python
文件大小:13KB
下载次数:0
上传日期:2022-01-12 11:36:55
上 传 者sh-1993
说明:  一个Python爬虫和前端,一次最小化一个站点的新闻。
(A Python crawler and front-end to minimize the news one site at a time.)

文件列表:
Minim.py (3604, 2022-01-12)
build (0, 2022-01-12)
build\favicon.ico (318, 2022-01-12)
build\style.css (1656, 2022-01-12)
minimCrawler (0, 2022-01-12)
minimCrawler\__init__.py (72, 2022-01-12)
minimCrawler\crawl.py (4318, 2022-01-12)
minimCrawler\parser (0, 2022-01-12)
minimCrawler\parser\BBC.py (3225, 2022-01-12)
minimCrawler\parser\ParserBase.py (2353, 2022-01-12)
minimCrawler\parser\VICE.py (931, 2022-01-12)
minimCrawler\parser\VOX.py (951, 2022-01-12)
minimCrawler\parser\__init__.py (102, 2022-01-12)
minimCrawler\parser\sitesList.py (650, 2022-01-12)
minimCrawler\requirements.txt (27, 2022-01-12)
minimCrawler\schema.sql (638, 2022-01-12)
minimCrawler\words (0, 2022-01-12)
minimCrawler\words\conjunctions.csv (361, 2022-01-12)
minimCrawler\words\determiners.csv (78, 2022-01-12)
minimCrawler\words\otherborings.csv (383, 2022-01-12)
minimCrawler\words\prepositions.csv (657, 2022-01-12)
minimCrawler\words\pronouns.csv (483, 2022-01-12)
requirements.txt (32, 2022-01-12)
templates (0, 2022-01-12)
templates\home.html (2131, 2022-01-12)
templates\layout.html (975, 2022-01-12)

# MINIM A web-crawling news aggregation engine using Python and [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/), and a Flask application that spits out a slightly too large static frontend from the crawled data. Live and refreshed hourly at https://minim.jedd.pw ###### WHAT words are most common? ###### WHICH articles do they mainly appear in?! ###### HOW many of the articles are they in?!?! ###### WHY is the BBC mostly writing "Mr"?!?!?! #### Pulls from: * BBC NEWS UK (http://feeds.bbci.co.uk/news/rss.xml?edition=uk) * BBC NEWS US (http://feeds.bbci.co.uk/news/rss.xml?edition=us) * VICE (http://vice.com/rss) * VOX (http://vox.com/rss/index.xml) * YOUR-FAVE-ONLINE-TEXT-OUTLET (http://www.asourceyouwroteaniftycrawlingscriptforandsubmittedapullrequestwith.heckyes) #### To crawl for wordz (this will vary wildly depending on your environment) * $git clone https://github.com/Jeddf/Minim.git * $cd Minim/minimCrawler * $pip3 install -r requirements.txt * $python3 crawl.py --host --user --password --db #### ...and build index.html * $cd .. * $pip3 install -r requirements.txt * $python3 Minim.py --host --user --password --db

近期下载者

相关文件


收藏者