weblucene
所属分类:Java编程
开发工具:Java
文件大小:2823KB
下载次数:124
上传日期:2005-09-21 10:04:14
上 传 者:
蓝天白云碧野
说明: Lucene Web interface, use XML as a lightweight protocol. developer can convert data source (text, DB, MS Word, PDF... etc) into xml format, indexing with lucene engine, and get full text search result via HTTP, with XML format output, user can easily intergrated with JSP ASP PHP front end or use XSLT at server side transform output.
文件列表:
weblucene (0, 2005-05-17)
weblucene\.checkstyle (85, 2004-10-04)
weblucene\.cvsignore (34, 2003-06-03)
weblucene\.project (578, 2003-06-03)
weblucene\build.properties.default (441, 2003-11-30)
weblucene\BUILD.txt (8217, 2004-11-21)
weblucene\build.xml (6592, 2004-10-04)
weblucene\CHANGES.txt (5461, 2004-11-21)
weblucene\CVS (0, 2005-05-17)
weblucene\CVS\Entries (573, 2004-11-21)
weblucene\CVS\Repository (29, 2004-10-06)
weblucene\CVS\Root (53, 2004-10-06)
weblucene\dump (0, 2005-05-17)
weblucene\dump\blog.xml (69607, 2004-10-04)
weblucene\dump\blogchina.inc.sample (230, 2003-12-16)
weblucene\dump\blog_dump.php (3435, 2004-10-04)
weblucene\dump\check_freshness.php (4106, 2004-02-26)
weblucene\dump\comments.xml (24656, 2003-12-16)
weblucene\dump\comments_dump.php (3180, 2003-12-16)
weblucene\dump\CVS (0, 2005-05-17)
weblucene\dump\CVS\Entries (460, 2004-10-06)
weblucene\dump\CVS\Repository (34, 2004-10-06)
weblucene\dump\CVS\Root (53, 2004-10-06)
weblucene\dump\dump.sh (138, 2004-10-04)
weblucene\dump\include.php (2389, 2004-10-04)
weblucene\dump\index.sh (479, 2004-10-04)
weblucene\dump\task_dump.php (1533, 2004-02-26)
weblucene\INSTALL.txt (7649, 2004-11-21)
weblucene\jalopy.xml (13825, 2003-06-03)
weblucene\LICENSE.txt (2670, 2003-06-03)
weblucene\TODO.txt (189, 2003-06-03)
weblucene\webapp (0, 2005-05-17)
weblucene\webapp\CVS (0, 2005-05-17)
weblucene\webapp\CVS\Entries (305, 2004-11-21)
weblucene\webapp\CVS\Repository (36, 2004-10-06)
weblucene\webapp\CVS\Root (53, 2004-10-06)
weblucene\webapp\rss.png (282, 2003-12-04)
weblucene\webapp\search.html (1218, 2004-08-24)
weblucene\webapp\search_tab.html (1219, 2004-08-24)
... ...
$Id: README.txt,v 1.5 2004/10/30 11:36:18 lhelper Exp $
中文文档请参考: http://www.chedong.com/tech/weblucene.html
WebLucene
=========
Lucene Web interface, use XML as a lightweight protocol. developer can convert data source (text, DB, MS Word, PDF... etc) into xml format, indexing with lucene engine, and get full text search result via HTTP, with XML format output, user can easily intergrated with JSP ASP PHP front end or use XSLT at server side transform output.
Indexing Process
================
MySQL \ / JSP
Oracle - DB - ==> XML ==> (Lucene Index) ==> XML - ASP
MSSQL / - PHP
MS Word / \ / XHTML
PDF / =XSLT=> - TEXT
\ XML
\_______WebLucene_______/
i18n issue: for Java is Unicode based, user can indexing data source(XML) in different charset into one lucene index(in unicode) and output result according to client browser support languages.
GBK \ / BIG5
BIG5 - UNICODE ====> Unicode - GB2312
SJIS - (XML) (XML) - SJIS
ISO-8859-1 / \ ISO-8859-1
Searching Process
=================
Input/Output: "HTTP GET"/XML
Client Browser Input==(HTTP GET)==> WebLuceneServlet ==> XML Result Set==(XSLT)==> XHTML output ==> Output to Client Browser
XML format search result
========================
Lucene_result.dtd
Chinese_gbk.xml Simplified Chinese indexing source sample
Chinese_big5.xml Triditional Chinese indexing source sample
Japanese_sjis.xml Japanese indexing source sample
English_en.xml English indexing source sample
every sample contents 5 articles
indexing source: XML format
Lucene_index.dtd
WebLuceneSource
Document:
Field:
title author content pub_date lang meta_info(not stored)
Index:
all_idx: title + content + meta_info, for full text searching
author_idx: index only without token, for author match
date_idx: index only without token, for date range search
lang_idx: index only without token, for language filter search
searching result: XML format
WebLuceneResult: Simple search result
ResultSet:
Record: contents with stored
Field name
Query
QueryString
OffSet
PageSize
OutputFormat
Filter field type(match/prefix/before/after)
SortType (score/doc/doc_desc)
result xslt transform:
source map:
1 lucene extension:
CJKTokenizer: a simple tokenizer support European languages and East Asia languages
org/apache/lucene/analysis/cjk/
IndexOrderSearcher: docID based result sorting
org/apache/lucene/search/
2 web application:
FileBasedPropertiesSupplier.java: properties supplier, supplier properties for SimplePropertiesFactory
PropertyFileFilenameFilter.java: properties file filter, filte file by filename
SimplePropertiesConsumer.java: properties consumer, which acquire data from SimplePropertiesFactory
SimplePropertiesFactory.java: properties container
com/chedong/properties
ParamUtil.java: used to validate the validation of the parameter
RequestParser.java: a utility that can used to parser the http request
com/chedong/util/
WebLuceneAdminServlet.java: Globle configurations viewer and re-loader
WebLucenePropertiesConsumer.java:
WebLucenePropertiesPreprocessor.java:
WebLuceneServlet.java: Search Entrance <==construct WebLuceneQuery and choose correct xslt trans XML output
com/chedong/weblucene/
SAXIndexer: SAX based lucene xml source indexer
com/chedong/weblucene/index/
DOMSearcher.java: invoke lucene indexSearcher,highlight result hits and convert search result to XML
WebLuceneHighlighter.java: search result hits highligher and abstractor.
WebLuceneQuery.java: Search Query Bean
WebLuceneResultSet.java:
WebLuceneSearcherBase.java:
com/chedong/weblucene/search/
XsltCache: xslt transformer caching
com/chedong/xslt/
3 application file list:
BUILD.txt install document in chinese
README.txt read me document
INSTALL.txt install documnet
CHANGES.txt change log
LICENSE.txt we use "The Apache Software License"
build.xml ant build file
webapp/
index.html test entrnace ==> WebLucene?dir=demo&q=keyword&encoding=utf-8&offset=10&size=10
weblucene_results.dtd search results XML definition
weblucene_index.dtd indexing source XML definition
WEB-INF/
web.xml webapp configuration
src/ java source directory
test/ unit test directory
classes/ java classes directory
bin/ shell commands: java LuceneXMLIndexer input.xml output directory
conf/
weblucene.conf global config file
log4j.conf config file for log4j
blog.conf config file for demo, which can override the declaration in weblucene.conf
var/ lucene indics
blog/ demo directory
index/ lucene index library
html.xsl xslt template for html
rss.xsl xslt template for rss
lib/ include jar files
java-getopt.jar java command line get options
lucene.jar Lucene: core full text index engine
xerces.jar XML parser
xalan.jar XSLT
log4j.jar logger
README.txt the jar file download path
近期下载者:
相关文件:
收藏者: