Minim
所属分类:前端开发
开发工具:Python
文件大小:13KB
下载次数:0
上传日期:2022-01-12 11:36:55
上 传 者:
sh-1993
说明: 一个Python爬虫和前端,一次最小化一个站点的新闻。
(A Python crawler and front-end to minimize the news one site at a time.)
文件列表:
Minim.py (3604, 2022-01-12)
build (0, 2022-01-12)
build\favicon.ico (318, 2022-01-12)
build\style.css (1656, 2022-01-12)
minimCrawler (0, 2022-01-12)
minimCrawler\__init__.py (72, 2022-01-12)
minimCrawler\crawl.py (4318, 2022-01-12)
minimCrawler\parser (0, 2022-01-12)
minimCrawler\parser\BBC.py (3225, 2022-01-12)
minimCrawler\parser\ParserBase.py (2353, 2022-01-12)
minimCrawler\parser\VICE.py (931, 2022-01-12)
minimCrawler\parser\VOX.py (951, 2022-01-12)
minimCrawler\parser\__init__.py (102, 2022-01-12)
minimCrawler\parser\sitesList.py (650, 2022-01-12)
minimCrawler\requirements.txt (27, 2022-01-12)
minimCrawler\schema.sql (638, 2022-01-12)
minimCrawler\words (0, 2022-01-12)
minimCrawler\words\conjunctions.csv (361, 2022-01-12)
minimCrawler\words\determiners.csv (78, 2022-01-12)
minimCrawler\words\otherborings.csv (383, 2022-01-12)
minimCrawler\words\prepositions.csv (657, 2022-01-12)
minimCrawler\words\pronouns.csv (483, 2022-01-12)
requirements.txt (32, 2022-01-12)
templates (0, 2022-01-12)
templates\home.html (2131, 2022-01-12)
templates\layout.html (975, 2022-01-12)
# MINIM
A web-crawling news aggregation engine using Python and [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/), and a Flask application that spits out a slightly too large static frontend from the crawled data.
Live and refreshed hourly at https://minim.jedd.pw
###### WHAT words are most common?
###### WHICH articles do they mainly appear in?!
###### HOW many of the articles are they in?!?!
###### WHY is the BBC mostly writing "Mr"?!?!?!
#### Pulls from:
* BBC NEWS UK (http://feeds.bbci.co.uk/news/rss.xml?edition=uk)
* BBC NEWS US (http://feeds.bbci.co.uk/news/rss.xml?edition=us)
* VICE (http://vice.com/rss)
* VOX (http://vox.com/rss/index.xml)
* YOUR-FAVE-ONLINE-TEXT-OUTLET (http://www.asourceyouwroteaniftycrawlingscriptforandsubmittedapullrequestwith.heckyes)
#### To crawl for wordz (this will vary wildly depending on your environment)
* $git clone https://github.com/Jeddf/Minim.git
* $cd Minim/minimCrawler
* $pip3 install -r requirements.txt
* $python3 crawl.py --host --user --password --db
#### ...and build index.html
* $cd ..
* $pip3 install -r requirements.txt
* $python3 Minim.py --host --user --password --db
近期下载者:
相关文件:
收藏者: