weblucene

所属分类:Java编程
开发工具:Java
文件大小:2823KB
下载次数:124
上传日期:2005-09-21 10:04:14
上 传 者蓝天白云碧野
说明:  Lucene Web interface, use XML as a lightweight protocol. developer can convert data source (text, DB, MS Word, PDF... etc) into xml format, indexing with lucene engine, and get full text search result via HTTP, with XML format output, user can easily intergrated with JSP ASP PHP front end or use XSLT at server side transform output.

文件列表:
weblucene (0, 2005-05-17)
weblucene\.checkstyle (85, 2004-10-04)
weblucene\.cvsignore (34, 2003-06-03)
weblucene\.project (578, 2003-06-03)
weblucene\build.properties.default (441, 2003-11-30)
weblucene\BUILD.txt (8217, 2004-11-21)
weblucene\build.xml (6592, 2004-10-04)
weblucene\CHANGES.txt (5461, 2004-11-21)
weblucene\CVS (0, 2005-05-17)
weblucene\CVS\Entries (573, 2004-11-21)
weblucene\CVS\Repository (29, 2004-10-06)
weblucene\CVS\Root (53, 2004-10-06)
weblucene\dump (0, 2005-05-17)
weblucene\dump\blog.xml (69607, 2004-10-04)
weblucene\dump\blogchina.inc.sample (230, 2003-12-16)
weblucene\dump\blog_dump.php (3435, 2004-10-04)
weblucene\dump\check_freshness.php (4106, 2004-02-26)
weblucene\dump\comments.xml (24656, 2003-12-16)
weblucene\dump\comments_dump.php (3180, 2003-12-16)
weblucene\dump\CVS (0, 2005-05-17)
weblucene\dump\CVS\Entries (460, 2004-10-06)
weblucene\dump\CVS\Repository (34, 2004-10-06)
weblucene\dump\CVS\Root (53, 2004-10-06)
weblucene\dump\dump.sh (138, 2004-10-04)
weblucene\dump\include.php (2389, 2004-10-04)
weblucene\dump\index.sh (479, 2004-10-04)
weblucene\dump\task_dump.php (1533, 2004-02-26)
weblucene\INSTALL.txt (7649, 2004-11-21)
weblucene\jalopy.xml (13825, 2003-06-03)
weblucene\LICENSE.txt (2670, 2003-06-03)
weblucene\TODO.txt (189, 2003-06-03)
weblucene\webapp (0, 2005-05-17)
weblucene\webapp\CVS (0, 2005-05-17)
weblucene\webapp\CVS\Entries (305, 2004-11-21)
weblucene\webapp\CVS\Repository (36, 2004-10-06)
weblucene\webapp\CVS\Root (53, 2004-10-06)
weblucene\webapp\rss.png (282, 2003-12-04)
weblucene\webapp\search.html (1218, 2004-08-24)
weblucene\webapp\search_tab.html (1219, 2004-08-24)
... ...

$Id: README.txt,v 1.5 2004/10/30 11:36:18 lhelper Exp $ 中文文档请参考: http://www.chedong.com/tech/weblucene.html WebLucene ========= Lucene Web interface, use XML as a lightweight protocol. developer can convert data source (text, DB, MS Word, PDF... etc) into xml format, indexing with lucene engine, and get full text search result via HTTP, with XML format output, user can easily intergrated with JSP ASP PHP front end or use XSLT at server side transform output. Indexing Process ================ MySQL \ / JSP Oracle - DB - ==> XML ==> (Lucene Index) ==> XML - ASP MSSQL / - PHP MS Word / \ / XHTML PDF / =XSLT=> - TEXT \ XML \_______WebLucene_______/ i18n issue: for Java is Unicode based, user can indexing data source(XML) in different charset into one lucene index(in unicode) and output result according to client browser support languages. GBK \ / BIG5 BIG5 - UNICODE ====> Unicode - GB2312 SJIS - (XML) (XML) - SJIS ISO-8859-1 / \ ISO-8859-1 Searching Process ================= Input/Output: "HTTP GET"/XML Client Browser Input==(HTTP GET)==> WebLuceneServlet ==> XML Result Set==(XSLT)==> XHTML output ==> Output to Client Browser XML format search result ======================== Lucene_result.dtd Chinese_gbk.xml Simplified Chinese indexing source sample Chinese_big5.xml Triditional Chinese indexing source sample Japanese_sjis.xml Japanese indexing source sample English_en.xml English indexing source sample every sample contents 5 articles indexing source: XML format Lucene_index.dtd WebLuceneSource Document: Field: title author content pub_date lang meta_info(not stored) Index: all_idx: title + content + meta_info, for full text searching author_idx: index only without token, for author match date_idx: index only without token, for date range search lang_idx: index only without token, for language filter search searching result: XML format WebLuceneResult: Simple search result ResultSet: Record: contents with stored Field name Query QueryString OffSet PageSize OutputFormat Filter field type(match/prefix/before/after) SortType (score/doc/doc_desc) result xslt transform: source map: 1 lucene extension: CJKTokenizer: a simple tokenizer support European languages and East Asia languages org/apache/lucene/analysis/cjk/ IndexOrderSearcher: docID based result sorting org/apache/lucene/search/ 2 web application: FileBasedPropertiesSupplier.java: properties supplier, supplier properties for SimplePropertiesFactory PropertyFileFilenameFilter.java: properties file filter, filte file by filename SimplePropertiesConsumer.java: properties consumer, which acquire data from SimplePropertiesFactory SimplePropertiesFactory.java: properties container com/chedong/properties ParamUtil.java: used to validate the validation of the parameter RequestParser.java: a utility that can used to parser the http request com/chedong/util/ WebLuceneAdminServlet.java: Globle configurations viewer and re-loader WebLucenePropertiesConsumer.java: WebLucenePropertiesPreprocessor.java: WebLuceneServlet.java: Search Entrance <==construct WebLuceneQuery and choose correct xslt trans XML output com/chedong/weblucene/ SAXIndexer: SAX based lucene xml source indexer com/chedong/weblucene/index/ DOMSearcher.java: invoke lucene indexSearcher,highlight result hits and convert search result to XML WebLuceneHighlighter.java: search result hits highligher and abstractor. WebLuceneQuery.java: Search Query Bean WebLuceneResultSet.java: WebLuceneSearcherBase.java: com/chedong/weblucene/search/ XsltCache: xslt transformer caching com/chedong/xslt/ 3 application file list: BUILD.txt install document in chinese README.txt read me document INSTALL.txt install documnet CHANGES.txt change log LICENSE.txt we use "The Apache Software License" build.xml ant build file webapp/ index.html test entrnace ==> WebLucene?dir=demo&q=keyword&encoding=utf-8&offset=10&size=10 weblucene_results.dtd search results XML definition weblucene_index.dtd indexing source XML definition WEB-INF/ web.xml webapp configuration src/ java source directory test/ unit test directory classes/ java classes directory bin/ shell commands: java LuceneXMLIndexer input.xml output directory conf/ weblucene.conf global config file log4j.conf config file for log4j blog.conf config file for demo, which can override the declaration in weblucene.conf var/ lucene indics blog/ demo directory index/ lucene index library html.xsl xslt template for html rss.xsl xslt template for rss lib/ include jar files java-getopt.jar java command line get options lucene.jar Lucene: core full text index engine xerces.jar XML parser xalan.jar XSLT log4j.jar logger README.txt the jar file download path

近期下载者

相关文件


收藏者