NewsSpider

所属分类:数据采集/爬虫
开发工具:GO
文件大小:32KB
下载次数:0
上传日期:2019-12-23 09:28:56
上 传 者sh-1993
说明:  基于Go语言新闻爬虫系统开发
(Development of News Crawler System Based on Go Language)

文件列表:
CrawlerDistributed (0, 2019-12-23)
CrawlerDistributed\config (0, 2019-12-23)
CrawlerDistributed\config\config.go (290, 2019-12-23)
CrawlerDistributed\main.go (617, 2019-12-23)
CrawlerDistributed\rpcSuppert (0, 2019-12-23)
CrawlerDistributed\rpcSuppert\rpc_server.go (594, 2019-12-23)
CrawlerDistributed\saveDistributed (0, 2019-12-23)
CrawlerDistributed\saveDistributed\client (0, 2019-12-23)
CrawlerDistributed\saveDistributed\client\itemsaver.go (670, 2019-12-23)
CrawlerDistributed\saveDistributed\rpc.go (358, 2019-12-23)
CrawlerDistributed\saveDistributed\server (0, 2019-12-23)
CrawlerDistributed\saveDistributed\server\client_test.go (1415, 2019-12-23)
CrawlerDistributed\saveDistributed\server\main.go (637, 2019-12-23)
CrawlerDistributed\worker (0, 2019-12-23)
CrawlerDistributed\worker\rpc.go (356, 2019-12-23)
CrawlerDistributed\worker\server (0, 2019-12-23)
CrawlerDistributed\worker\server\main.go (256, 2019-12-23)
CrawlerDistributed\worker\types.go (1746, 2019-12-23)
engine (0, 2019-12-23)
engine\concurrent.go (1664, 2019-12-23)
engine\simple.go (478, 2019-12-23)
engine\types.go (1027, 2019-12-23)
engine\worker.go (299, 2019-12-23)
fetcher (0, 2019-12-23)
fetcher\fetcher_news.go (934, 2019-12-23)
go.mod (225, 2019-12-23)
go.sum (13200, 2019-12-23)
main.go (511, 2019-12-23)
model (0, 2019-12-23)
model\newsfileds.go (341, 2019-12-23)
newsSave (0, 2019-12-23)
newsSave\itemsave.go (1005, 2019-12-23)
scheduler (0, 2019-12-23)
scheduler\queued.go (1232, 2019-12-23)
scheduler\simple.go (508, 2019-12-23)
stcn (0, 2019-12-23)
... ...

```markdown 基于Go语言开发新闻爬虫系统: --简单爬虫 --并发爬虫 --分布式爬虫 ``` ```markdown 分布式系统消息传递的方法:REST、RPC、中间件 对外:REST 模块内部:RPC 模块之间:中间件、REST ``` ```markdown 分布式架构 VS 微服务架构 分布式架构:指导节点之间如何通信 微服务架构:鼓励按业务划分模块 微服务架构通过分布式架构来实现 多层架构 VS 微服务架构 微服务架构具有更多的“服务” ``` ```markdown 1、限流问题 单节点能够承受的流量有限,将 worker 放到不同的节点 (不同的服务器) 2、去重问题 单节点能够承受的去重数据量有限 无法保存之前(重启)的去重结果 基于Key-Value Store(如Redis) 进行分布式去重 3、数据存储问题 存储模块独立为一个服务 ```

近期下载者

相关文件


收藏者