Chinese-Word-Segment-And-POS-Tagger

所属分类:语音合成
开发工具:Python
文件大小:3940KB
下载次数:143
上传日期:2011-02-19 12:55:54
上 传 者wlzmwq
说明:  实现了中文分词和词性标注程序。分词方法采用“三词正向最长匹配”。词性标注使用HMM方法,用Viterbi算法实现。“三词正向最长匹配”保持了“正向最长匹配算法”快速的特点,同时提高了分词的准确性。
(Chinese word segmentation and implemented procedures for POS tagging. Segmentation Methods, " the longest three-match positive words." POS tagging using HMM method, the Viterbi algorithm. " Three words maximum positive match" to maintain a " positive maximum matching algorithm," Fast features, while improving the accuracy of segmentation.)

文件列表:
SegAndTag\chnsegtager_segtag_200828016029024.py (1509, 2009-05-07)
SegAndTag\CovertToUTF-8.py (573, 2009-05-06)
SegAndTag\dict.py (682, 2009-05-06)
SegAndTag\dict.pyc (1521, 2009-05-07)
SegAndTag\diction.py (2948, 2010-05-19)
SegAndTag\diction.py.bak (2873, 2010-05-19)
SegAndTag\seg.py (4155, 2009-05-07)
SegAndTag\seg.pyc (3830, 2009-05-07)
SegAndTag\selecttool.py (1542, 2009-05-07)
SegAndTag\selecttool.pyc (1928, 2009-05-07)
SegAndTag\viterbi.py (4822, 2009-05-07)
SegAndTag\viterbi.pyc (4717, 2009-05-07)
SegAndTag\word.py (367, 2009-05-06)
SegAndTag\word.pyc (1197, 2009-05-07)
SegAndTag\__init__.py (0, 2009-05-06)
data\dict.dat (5540923, 2009-05-06)
data\diction.txt (995341, 2009-05-06)
data\segoutput.txt (107813, 2009-05-07)
data\tagoutput.txt (147928, 2009-05-07)
data\testinput.txt (88778, 2009-05-06)
data\utf8train.txt (10670780, 2009-05-06)
SegAndTag (0, 2010-05-19)
data (0, 2009-05-07)

近期下载者

相关文件


收藏者