详细说明:目录
目录 1
摘要 3
第一章 引言 4
第二章 搜索引擎的结构 5
2.1系统概述 5
2.2搜索引擎的构成 5
2.2.1网络机器人 5
2.2.2索引与搜索 5
2.2.3 Web服务器 6
2.3搜索引擎的主要指标及分析 6
2.4小节 6
第三章 网络机器人 7
3.1什么是网络机器人 7
3.2网络机器人的结构分析 7
3.2.1如何解析HTML 7
3.2.2 Spider程序结构 8
3.2.3如何构造Spider程序 9
3.2.4如何提高程序性能 11
3.2.5网络机器人的代码分析 12
3.3小节 14
第四章 基于LUCENE的索引与搜索 15
4.1什么是LUCENE全文检索 15
4.2 LUCENE的原理分析 15
4.2.1全文检索的实现机制 15
4.2.2 Lucene的索引效率 15
4.2.3 中文切分词机制 17
4.3 LUCENE与SPIDER的结合 18
4.4小节 21
第五章 基于TOMCAT的WEB服务器 22
5.1什么是基于TOMCAT的WEB服务器 22
5.2用户接口设计 22
5.3.1客户端设计 22
5.3.2服务端设计 23
5.3在TOMCAT上部署项目 25
5.4小节 25
第六章 搜索引擎策略 26
6.1简介 26
6.2面向主题的搜索策略 26
6.2.1导向词 26
6.2.3权威网页和中心网页 27
6.3小节 27
参考文献 28
[htmlParsersrc.zip] - html解析包 可以很方便的解析html 纯java 实现
[source.rar] - java 搜索引擎,运行环境为tomcat5.5
[SearcherEngine.rar] - Java做的搜索引擎,用bot和lucene搭建 非常不错的程序
[clucene-core-0.9.16a.zip] - clucene是c++版的全文检索引擎,完全移植于lucene,采用 stl 编写.
[source.rar] - 对给定的大规模汉语文本库,实现全文检索系统。可接受任意查询串,给出所有相关文本集合(显示标题或相关行内容等)
[jspider-0.5.0-dev.zip] - 一个由java实现的搜索引擎代码。实现对网页内容的分析和采集功能
[cpphtp4_examples.zip] - 《C++程序语言设计》是C++之父Bjarne Stroustrup 写的,本源码是其实例源码
[cmutithread.rar] - 这是一本C++面向对象多线程编程的好书籍,里面讲述了如何C++多线程编程技术,讲了多有的同步技术.
[SearchEngine(Java).rar] - 研究与实现了以个简单的搜索引擎,用Java实现。
[java_search_engineer_develop.rar] - 用幻灯片勾勒出完整的一套开发java搜索引擎实例,让初学者快速建立工程的框架。
[source.rar] - java 搜索引擎,运行环境为tomcat5.5
[SearcherEngine.rar] - Java做的搜索引擎,用bot和lucene搭建 非常不错的程序
[clucene-core-0.9.16a.zip] - clucene是c++版的全文检索引擎,完全移植于lucene,采用 stl 编写.
[source.rar] - 对给定的大规模汉语文本库,实现全文检索系统。可接受任意查询串,给出所有相关文本集合(显示标题或相关行内容等)
[jspider-0.5.0-dev.zip] - 一个由java实现的搜索引擎代码。实现对网页内容的分析和采集功能
[cpphtp4_examples.zip] - 《C++程序语言设计》是C++之父Bjarne Stroustrup 写的,本源码是其实例源码
[cmutithread.rar] - 这是一本C++面向对象多线程编程的好书籍,里面讲述了如何C++多线程编程技术,讲了多有的同步技术.
[SearchEngine(Java).rar] - 研究与实现了以个简单的搜索引擎,用Java实现。
[java_search_engineer_develop.rar] - 用幻灯片勾勒出完整的一套开发java搜索引擎实例,让初学者快速建立工程的框架。
文件列表(点击判断是否您需要的文件):
一个非常实用的搜索引擎,里面有它的研究与实现(Java)(含源码)
..........................................................\bot.jar
..........................................................\News
..........................................................\....\bak
..........................................................\....\...\news
..........................................................\....\...\....\HTMLParse.java~11~
..........................................................\....\...\....\HTMLParse.java~12~
..........................................................\....\...\....\HTMLParse.java~13~
..........................................................\....\...\....\HTMLParse.java~14~
..........................................................\....\...\....\HTMLParse.java~15~
..........................................................\....\...\....\HTMLParse.java~16~
..........................................................\....\...\....\HTMLParse.java~17~
..........................................................\....\...\....\HTMLParse.java~18~
..........................................................\....\...\....\HTMLParse.java~19~
..........................................................\....\...\....\HTMLParse.java~20~
..........................................................\....\...\....\Index.java~32~
..........................................................\....\...\....\Index.java~33~
..........................................................\....\...\....\Index.java~34~
..........................................................\....\...\....\Index.java~35~
..........................................................\....\...\....\Index.java~36~
..........................................................\....\...\....\Index.java~37~
..........................................................\....\...\....\Index.java~38~
..........................................................\....\...\....\Index.java~39~
..........................................................\....\...\....\Index.java~40~
..........................................................\....\...\....\Index.java~41~
..........................................................\....\...\....\QueryNews.java~1~
..........................................................\....\...\....\QueryNews.java~2~
..........................................................\....\...\....\QueryNews.java~3~
..........................................................\....\...\....\QueryNews.java~4~
..........................................................\....\...\....\QueryNews.java~5~
..........................................................\....\...\....\QueryNews.java~6~
..........................................................\....\...\....\Searcher.java~1~
..........................................................\....\...\....\Searcher.java~2~
..........................................................\....\...\....\Searcher.java~3~
..........................................................\....\...\....\Searcher.java~4~
..........................................................\....\...\....\Searcher.java~5~
..........................................................\....\...\....\Searcher.java~6~
..........................................................\....\...\....\Searcher.java~7~
..........................................................\....\...\....\Searcher.java~8~
..........................................................\....\...\....\Searcher.java~9~
..........................................................\....\classes
..........................................................\....\.......\news
..........................................................\....\.......\....\HTMLParse.class
..........................................................\....\.......\....\Index.class
..........................................................\....\.......\....\Searcher.class
..........................................................\....\.......\package cache
..........................................................\....\.......\.............\news.dep2
..........................................................\....\News.jpx
..........................................................\....\News.jpx.local
..........................................................\....\News.jpx.local~
..........................................................\....\News.jpx~
..........................................................\....\src
..........................................................\....\...\news
..........................................................\....\...\....\HTMLParse.java
..........................................................\....\...\....\Index.java
..........................................................\....\...\....\Searcher.java
..........................................................\News.htm
..........................................................\NewsServer
..........................................................\..........\bak
..........................................................\..........\...\defaultroot
..........................................................\..........\...\...........\WEB-INF
..........................................................\..........\...\...........\.......\web.xml~69~
..........................................................\..........\...\...........\.......\web.xml~70~
..........................................................\..........\...\...........\.......\web.xml~71~
..........................................................\..........\...\...........\.......\web.xml~72~
..........................................................\..........\...\...........\.......\web.xml~73~
..........................................................\..........\...\...........\.......\web.xml~74~
..........................................................\..........\...\...........\.......\web.xml~75~
..........................................................\..........\...\...........\.......\web.xml~76~
..........................................................\..........\...\...........\.......\web.xml~77~
..........................................................\..........\...\...........\.......\web.xml~78~
..........................................................\..........\...\NewsSearcher.jsp~1~
..........................................................\..........\...\NewsSearcher.jsp~2~
..........................................................\..........\...\NewsSearcher.jsp~3~
..........................................................\..........\...\NewsSearcher.jsp~4~
..........................................................\..........\...\NewsSearcher.jsp~5~
..........................................................\..........\...\NewsSearcher.jsp~6~
..........................................................\..........\...\results.html~1~
..........................................................\..........\...\results.html~2~
..........................................................\..........\...\results.html~3~
..........................................................\..........\...\results.html~4~
..........................................................\..........\...\results.html~5~
..........................................................\..........\...\results.html~6~
..........................................................\..........\...\src
..........................................................\..........\...\...\newsserver
..........................................................\..........\...\...\..........\Results.java~27~
..........................................................\..........\...\...\..........\Results.java~28~
..........................................................\..........\...\...\..........\Results.java~29~
..........................................................\..........\...\...\..........\Results.java~30~
..........................................................\..........\...\...\..........\Results.java~31~
..........................................................\..........\...\...\..........\Results.java~32~
..........................................................\..........\...\...\..........\Results.java~33~
..........................................................\..........\...\...\..........\Results.java~34~
..........................................................\..........\...\...\..........\Results.java~35~
..........................................................\..........\...\...\..........\Results.java~36~
..........................................................\..........\...\WEB-INF
..........................................................\..........\...\.......\web.xml~47~
..........................................................\..........\...\.......\web.xml~48~
..........................................................\..........\...\.......\web.xml~49~
..........................................................\..........\...\.......\web.xml~50~
一个非常实用的搜索引擎,里面有它的研究与实现(Java)(含源码)
..........................................................\bot.jar
..........................................................\News
..........................................................\....\bak
..........................................................\....\...\news
..........................................................\....\...\....\HTMLParse.java~11~
..........................................................\....\...\....\HTMLParse.java~12~
..........................................................\....\...\....\HTMLParse.java~13~
..........................................................\....\...\....\HTMLParse.java~14~
..........................................................\....\...\....\HTMLParse.java~15~
..........................................................\....\...\....\HTMLParse.java~16~
..........................................................\....\...\....\HTMLParse.java~17~
..........................................................\....\...\....\HTMLParse.java~18~
..........................................................\....\...\....\HTMLParse.java~19~
..........................................................\....\...\....\HTMLParse.java~20~
..........................................................\....\...\....\Index.java~32~
..........................................................\....\...\....\Index.java~33~
..........................................................\....\...\....\Index.java~34~
..........................................................\....\...\....\Index.java~35~
..........................................................\....\...\....\Index.java~36~
..........................................................\....\...\....\Index.java~37~
..........................................................\....\...\....\Index.java~38~
..........................................................\....\...\....\Index.java~39~
..........................................................\....\...\....\Index.java~40~
..........................................................\....\...\....\Index.java~41~
..........................................................\....\...\....\QueryNews.java~1~
..........................................................\....\...\....\QueryNews.java~2~
..........................................................\....\...\....\QueryNews.java~3~
..........................................................\....\...\....\QueryNews.java~4~
..........................................................\....\...\....\QueryNews.java~5~
..........................................................\....\...\....\QueryNews.java~6~
..........................................................\....\...\....\Searcher.java~1~
..........................................................\....\...\....\Searcher.java~2~
..........................................................\....\...\....\Searcher.java~3~
..........................................................\....\...\....\Searcher.java~4~
..........................................................\....\...\....\Searcher.java~5~
..........................................................\....\...\....\Searcher.java~6~
..........................................................\....\...\....\Searcher.java~7~
..........................................................\....\...\....\Searcher.java~8~
..........................................................\....\...\....\Searcher.java~9~
..........................................................\....\classes
..........................................................\....\.......\news
..........................................................\....\.......\....\HTMLParse.class
..........................................................\....\.......\....\Index.class
..........................................................\....\.......\....\Searcher.class
..........................................................\....\.......\package cache
..........................................................\....\.......\.............\news.dep2
..........................................................\....\News.jpx
..........................................................\....\News.jpx.local
..........................................................\....\News.jpx.local~
..........................................................\....\News.jpx~
..........................................................\....\src
..........................................................\....\...\news
..........................................................\....\...\....\HTMLParse.java
..........................................................\....\...\....\Index.java
..........................................................\....\...\....\Searcher.java
..........................................................\News.htm
..........................................................\NewsServer
..........................................................\..........\bak
..........................................................\..........\...\defaultroot
..........................................................\..........\...\...........\WEB-INF
..........................................................\..........\...\...........\.......\web.xml~69~
..........................................................\..........\...\...........\.......\web.xml~70~
..........................................................\..........\...\...........\.......\web.xml~71~
..........................................................\..........\...\...........\.......\web.xml~72~
..........................................................\..........\...\...........\.......\web.xml~73~
..........................................................\..........\...\...........\.......\web.xml~74~
..........................................................\..........\...\...........\.......\web.xml~75~
..........................................................\..........\...\...........\.......\web.xml~76~
..........................................................\..........\...\...........\.......\web.xml~77~
..........................................................\..........\...\...........\.......\web.xml~78~
..........................................................\..........\...\NewsSearcher.jsp~1~
..........................................................\..........\...\NewsSearcher.jsp~2~
..........................................................\..........\...\NewsSearcher.jsp~3~
..........................................................\..........\...\NewsSearcher.jsp~4~
..........................................................\..........\...\NewsSearcher.jsp~5~
..........................................................\..........\...\NewsSearcher.jsp~6~
..........................................................\..........\...\results.html~1~
..........................................................\..........\...\results.html~2~
..........................................................\..........\...\results.html~3~
..........................................................\..........\...\results.html~4~
..........................................................\..........\...\results.html~5~
..........................................................\..........\...\results.html~6~
..........................................................\..........\...\src
..........................................................\..........\...\...\newsserver
..........................................................\..........\...\...\..........\Results.java~27~
..........................................................\..........\...\...\..........\Results.java~28~
..........................................................\..........\...\...\..........\Results.java~29~
..........................................................\..........\...\...\..........\Results.java~30~
..........................................................\..........\...\...\..........\Results.java~31~
..........................................................\..........\...\...\..........\Results.java~32~
..........................................................\..........\...\...\..........\Results.java~33~
..........................................................\..........\...\...\..........\Results.java~34~
..........................................................\..........\...\...\..........\Results.java~35~
..........................................................\..........\...\...\..........\Results.java~36~
..........................................................\..........\...\WEB-INF
..........................................................\..........\...\.......\web.xml~47~
..........................................................\..........\...\.......\web.xml~48~
..........................................................\..........\...\.......\web.xml~49~
..........................................................\..........\...\.......\web.xml~50~