PACHONG

所属分类:网络编程
开发工具:C#
文件大小:780KB
下载次数:198
上传日期:2010-05-01 22:22:07
上 传 者hametan
说明:  网络爬虫程序源码 这是一款用 C# 编写的网络爬虫 主要特性有: 可配置:线程数、线程等待时间,连接超时时间,可爬取文件类型和优先级、下载目录等。 状态栏显示统计信息:排入队列URL数,已下载文件数,已下载总字节数,CPU使用率和可用内存等。 有偏好的爬虫:可针对爬取的资源类型设置不同的优先级。 健壮性:十几项URL正规化策略以排除冗余下载、爬虫陷阱避免策略的使用等、多种策略以解析相对路径等。 较好的性能:基于正则表达式的页面解析、适度加锁、维持HTTP连接等。 今后有空可能加入的特性: 新特性 介绍 爬取文件用Berkeley DB存储 提高性能: 常用操作系统不善于处理大量小文件 基于URL Ranking的优先级队列 主题爬虫: 机器学习算法对链接与主题相关度进行评估,并按照得出的优先级顺序进行爬取 爬虫礼仪 遵循爬虫禁止协议、以及避免对服务器资源的过度使用等 性能优化 用UDP取代封装好的HttpWebRequest/Response DNS缓存 异步的DNS地址解析 硬盘缓存或内存数据库以避免频繁的磁盘寻道 分布式爬虫以扩展单机能力(CPU、内存和硬盘访问)
(GreySky source personal accounting system, management of daily accounting classification of report management user management built several sets of beautiful skin for beginners learning to use.)

文件列表:
NWebCrawler\config.ini (109, 2009-12-30)
NWebCrawler\MainForm.cs (6852, 2010-01-05)
NWebCrawler\MainForm.Designer.cs (23228, 2009-12-30)
NWebCrawler\MainForm.resx (44070, 2009-12-30)
NWebCrawler\NWebCrawler.csproj (4349, 2010-01-05)
NWebCrawler\obj\Debug\NWebCrawler.csproj.FileListAbsolute.txt (989, 2010-01-05)
NWebCrawler\obj\Debug\NWebCrawler.csproj.GenerateResource.Cache (915, 2010-01-05)
NWebCrawler\obj\Debug\NWebCrawler.exe (62976, 2010-01-05)
NWebCrawler\obj\Debug\NWebCrawler.MainForm.resources (23679, 2010-01-05)
NWebCrawler\obj\Debug\NWebCrawler.pdb (50688, 2010-01-05)
NWebCrawler\obj\Debug\NWebCrawler.Properties.Resources.resources (180, 2010-01-05)
NWebCrawler\obj\Debug\NWebCrawler.SettingsForm.resources (180, 2010-01-05)
NWebCrawler\obj\Debug\ResolveAssemblyReference.cache (6147, 2010-01-05)
NWebCrawler\Program.cs (506, 2009-12-26)
NWebCrawler\Properties\AssemblyInfo.cs (1452, 2009-12-26)
NWebCrawler\Properties\Resources.Designer.cs (2851, 2009-12-26)
NWebCrawler\Properties\Resources.resx (5612, 2009-12-26)
NWebCrawler\Properties\Settings.Designer.cs (1096, 2009-12-26)
NWebCrawler\Properties\Settings.settings (249, 2009-12-26)
NWebCrawler\SettingsForm.cs (2852, 2010-01-05)
NWebCrawler\SettingsForm.Designer.cs (59153, 2010-01-02)
NWebCrawler\SettingsForm.resx (5814, 2010-01-02)
NWebCrawlerLib\Common\Logger.cs (2764, 2009-12-29)
NWebCrawlerLib\Common\PriorityQueue.cs (4949, 2009-12-27)
NWebCrawlerLib\CrawleHistroyEntry.cs (323, 2009-12-30)
NWebCrawlerLib\CrawlerThread.cs (11251, 2010-01-04)
NWebCrawlerLib\Downloader.cs (5124, 2010-01-04)
NWebCrawlerLib\NWebCrawlerLib.csproj (2938, 2010-01-05)
NWebCrawlerLib\obj\Debug\NWebCrawlerLib.csproj.FileListAbsolute.txt (487, 2010-01-05)
NWebCrawlerLib\obj\Debug\NWebCrawlerLib.exe (22528, 2010-01-05)
NWebCrawlerLib\obj\Debug\NWebCrawlerLib.pdb (60928, 2010-01-05)
NWebCrawlerLib\Parser.cs (2316, 2009-12-30)
NWebCrawlerLib\Program.cs (1308, 2009-12-28)
NWebCrawlerLib\Properties\AssemblyInfo.cs (1458, 2009-12-26)
NWebCrawlerLib\Settings.cs (3677, 2010-01-02)
NWebCrawlerLib\UrlFrontierQueueManager.cs (3135, 2010-01-02)
NWebCrawlerLib\Utility.cs (3642, 2010-01-02)
NWebCrawler.sln (1419, 2009-12-26)
NWebCrawler.suo (21504, 2010-01-05)
51aspx源码必读.txt (2642, 2010-01-05)
... ...

近期下载者

相关文件


收藏者