NewsSpider
所属分类:数据采集/爬虫
开发工具:GO
文件大小:32KB
下载次数:0
上传日期:2019-12-23 09:28:56
上 传 者:
sh-1993
说明: 基于Go语言新闻爬虫系统开发
(Development of News Crawler System Based on Go Language)
文件列表:
CrawlerDistributed (0, 2019-12-23)
CrawlerDistributed\config (0, 2019-12-23)
CrawlerDistributed\config\config.go (290, 2019-12-23)
CrawlerDistributed\main.go (617, 2019-12-23)
CrawlerDistributed\rpcSuppert (0, 2019-12-23)
CrawlerDistributed\rpcSuppert\rpc_server.go (594, 2019-12-23)
CrawlerDistributed\saveDistributed (0, 2019-12-23)
CrawlerDistributed\saveDistributed\client (0, 2019-12-23)
CrawlerDistributed\saveDistributed\client\itemsaver.go (670, 2019-12-23)
CrawlerDistributed\saveDistributed\rpc.go (358, 2019-12-23)
CrawlerDistributed\saveDistributed\server (0, 2019-12-23)
CrawlerDistributed\saveDistributed\server\client_test.go (1415, 2019-12-23)
CrawlerDistributed\saveDistributed\server\main.go (637, 2019-12-23)
CrawlerDistributed\worker (0, 2019-12-23)
CrawlerDistributed\worker\rpc.go (356, 2019-12-23)
CrawlerDistributed\worker\server (0, 2019-12-23)
CrawlerDistributed\worker\server\main.go (256, 2019-12-23)
CrawlerDistributed\worker\types.go (1746, 2019-12-23)
engine (0, 2019-12-23)
engine\concurrent.go (1664, 2019-12-23)
engine\simple.go (478, 2019-12-23)
engine\types.go (1027, 2019-12-23)
engine\worker.go (299, 2019-12-23)
fetcher (0, 2019-12-23)
fetcher\fetcher_news.go (934, 2019-12-23)
go.mod (225, 2019-12-23)
go.sum (13200, 2019-12-23)
main.go (511, 2019-12-23)
model (0, 2019-12-23)
model\newsfileds.go (341, 2019-12-23)
newsSave (0, 2019-12-23)
newsSave\itemsave.go (1005, 2019-12-23)
scheduler (0, 2019-12-23)
scheduler\queued.go (1232, 2019-12-23)
scheduler\simple.go (508, 2019-12-23)
stcn (0, 2019-12-23)
... ...
```markdown
基于Go语言开发新闻爬虫系统:
--简单爬虫
--并发爬虫
--分布式爬虫
```
```markdown
分布式系统消息传递的方法:REST、RPC、中间件
对外:REST
模块内部:RPC
模块之间:中间件、REST
```
```markdown
分布式架构 VS 微服务架构
分布式架构:指导节点之间如何通信
微服务架构:鼓励按业务划分模块
微服务架构通过分布式架构来实现
多层架构 VS 微服务架构
微服务架构具有更多的“服务”
```
```markdown
1、限流问题
单节点能够承受的流量有限,将 worker 放到不同的节点 (不同的服务器)
2、去重问题
单节点能够承受的去重数据量有限
无法保存之前(重启)的去重结果
基于Key-Value Store(如Redis) 进行分布式去重
3、数据存储问题
存储模块独立为一个服务
```
近期下载者:
相关文件:
收藏者: