storm-kafka-based-jsoupCrawler

所属分类:数据库系统
开发工具:Java
文件大小:0KB
下载次数:0
上传日期:2017-07-18 02:56:26
上 传 者sh-1993
说明:  爬取一个新闻网站的二级站点,如网易军事。根据二级站点的url将所有新闻的url爬取下来发送到kafka,从kafka取出新闻url,与数据库存储的爬过的url比对,排重。爬取新闻。存储到数据库。
(Crawl a secondary website of a news website, such as Netease Military. According to the url of the secondary site, all the news urls are crawled down and sent to kafka. The news urls are extracted from kafka and compared with the crawled urls stored in the database to remove duplicates. Climb the news. Store to database.)

文件列表:
stormsuccesscrawler/ (0, 2017-07-17)
stormsuccesscrawler/logs/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499846669/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499846669/1024/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499846669/1024/worker.yaml (110, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499846708/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499846708/1027/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499846708/1027/worker.yaml (110, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499851337/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499851337/1027/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499851337/1027/worker.yaml (110, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499851444/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499851444/1027/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499851444/1027/worker.yaml (110, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499851519/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499851519/1027/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499851519/1027/worker.yaml (110, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499851666/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499851666/1024/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499851666/1024/worker.yaml (110, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499854371/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499854371/1027/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499854371/1027/worker.yaml (110, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499855956/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499855956/1027/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499855956/1027/worker.yaml (110, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499856682/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499856682/1027/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499856682/1027/worker.yaml (110, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499865850/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499865850/1027/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499865850/1027/worker.yaml (110, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499865944/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499865944/1024/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499865944/1024/worker.yaml (110, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499867133/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499867133/1027/ (0, 2017-07-17)
stormsuccesscrawler/logs/workers-artifacts/crawler-1-1499867133/1027/worker.yaml (110, 2017-07-17)
... ...

# storm-kafka-based-jsoupCrawler 爬取一个新闻网站的二级站点,如网易军事。根据二级站点的url将所有新闻的url爬取下来发送到kafka,从kafka取出新闻url,与数据库存储的爬过的url比对,排重。爬取新闻。存储到数据库。

近期下载者

相关文件


收藏者