siuying_segment

所属分类:图形图象
开发工具:Java
文件大小:1633KB
下载次数:9
上传日期:2007-09-02 17:07:58
上 传 者gytlfj
说明:  「我是中國人」,ChineseTokenizer會將之分割為五個中文字:「我、是、中、國、人」,CJKTokenizer則會將之分割為「我是、是中、中國、國人」四個二節的詞。前者的問題是沒有考慮中文詞語的問題,如搜尋「國中」一樣搜尋到「我是中國人」。後者的問題則是制做了大量沒意義的詞如「是中」「國人」,讓索引沒必要地增大、降低搜尋效率。

文件列表:
build.xml (1336, 2005-03-11)
src\org\apache\lucene\analysis\cjk\CJKAnalyzer.java (4615, 2005-02-24)
src\org\apache\lucene\analysis\cjk\CJKTokenizer.java (10402, 2005-02-24)
src\org\apache\lucene\analysis\cjk (0, 2005-02-24)
src\org\apache\lucene\analysis\cn\ChineseAnalyzer.java (3567, 2005-03-15)
src\org\apache\lucene\analysis\cn\ChineseFilter.java (4921, 2005-03-15)
src\org\apache\lucene\analysis\cn\ChineseTokenizer.java (5573, 2005-03-15)
src\org\apache\lucene\analysis\cn (0, 2005-03-16)
src\org\apache\lucene\analysis\cw\bothlexu8.txt (1979632, 2002-03-03)
src\org\apache\lucene\analysis\cw\CharStream.java (3798, 2005-03-11)
src\org\apache\lucene\analysis\cw\CStandardTokenizer.java (5744, 2005-03-11)
src\org\apache\lucene\analysis\cw\CStandardTokenizer.jj (6861, 2005-03-11)
src\org\apache\lucene\analysis\cw\CStandardTokenizerConstants.java (734, 2005-03-11)
src\org\apache\lucene\analysis\cw\CStandardTokenizerTokenManager.java (39991, 2005-03-11)
src\org\apache\lucene\analysis\cw\CWordAnalyzer.java (3489, 2005-03-11)
src\org\apache\lucene\analysis\cw\CWordFilter.java (4699, 2005-03-11)
src\org\apache\lucene\analysis\cw\CWordFilter.java~ (4557, 2005-03-11)
src\org\apache\lucene\analysis\cw\CWordTokenizer.java (2356, 2005-03-11)
src\org\apache\lucene\analysis\cw\CWordTokenizer.java~ (2020, 2005-03-11)
src\org\apache\lucene\analysis\cw\data\sforeign_u8.txt (755, 1999-08-28)
src\org\apache\lucene\analysis\cw\data\snotname_u8.txt (192, 1999-08-25)
src\org\apache\lucene\analysis\cw\data\snumbers_u8.txt (246, 2002-12-07)
src\org\apache\lucene\analysis\cw\data\ssurname_u8.txt (1703, 1999-08-25)
src\org\apache\lucene\analysis\cw\data\tforeign_u8.txt (781, 1999-10-23)
src\org\apache\lucene\analysis\cw\data\tnotname_u8.txt (188, 1999-08-25)
src\org\apache\lucene\analysis\cw\data\tnumbers_u8.txt (245, 2002-12-07)
src\org\apache\lucene\analysis\cw\data\tsurname_u8.txt (1703, 1999-08-25)
src\org\apache\lucene\analysis\cw\data (0, 2005-03-11)
src\org\apache\lucene\analysis\cw\ParseException.java (6566, 2005-03-11)
src\org\apache\lucene\analysis\cw\Segmenter.jav.old (32728, 2005-03-11)
src\org\apache\lucene\analysis\cw\segmenter.java (23840, 2005-03-11)
src\org\apache\lucene\analysis\cw\segmenter.java~ (23840, 2005-03-11)
src\org\apache\lucene\analysis\cw\SegmenterUtils.java (3046, 2005-03-11)
src\org\apache\lucene\analysis\cw\SegmenterUtils.java~ (2599, 2005-03-11)
src\org\apache\lucene\analysis\cw\simplexu8.txt (1310830, 2003-11-29)
src\org\apache\lucene\analysis\cw\test\SegmenterUtilsTest.java (1781, 2005-03-11)
src\org\apache\lucene\analysis\cw\test\SegmenterUtilsTest.java~ (2038, 2005-03-11)
src\org\apache\lucene\analysis\cw\test (0, 2005-03-11)
src\org\apache\lucene\analysis\cw\Token.java (2757, 2005-03-11)
src\org\apache\lucene\analysis\cw\TokenMgrError.java (4353, 2005-03-11)
... ...

近期下载者

相关文件


收藏者