personNER

所属分类:模式识别(视觉/语音等)
开发工具:Java
文件大小:7298KB
下载次数:93
上传日期:2007-07-10 11:20:18
上 传 者yaersilan
说明:  基于CRF(conditional random fields)统计模型的文本人名识别工具源代码,是Mallet开放源码项目的一部分
(based on CRF (conditional random fields) statistical model of text my name recognition tools source code, open source Mallet is part of the project)

文件列表:
personNER\personNER\mallet\bin\CVS\Entries (57, 2004-01-21)
personNER\personNER\mallet\bin\CVS\Repository (20, 2004-01-21)
personNER\personNER\mallet\bin\CVS\Root (31, 2004-01-21)
personNER\personNER\mallet\bin\personNER (319, 2004-01-21)
personNER\personNER\mallet\bin\prepend-license.sh (61, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\AccuracyCoverage$1.class (806, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\AccuracyCoverage$ClassificationComparator.class (1334, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\AccuracyCoverage.class (8136, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\ConfusionMatrix.class (5745, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\Graph$Legend.class (1848, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\Graph.class (4327, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\Graph2.class (2512, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\GraphItem.class (562, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\examples\DocumentClassifier.class (3112, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\tests\TestClassifiers.class (4139, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\tests\TestMaxEntTrainer.class (2880, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\tests\TestNaiveBayes.class (7571, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\AccuracyEvaluator.class (1648, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\AdaBoost.class (1823, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\AdaBoostTrainer.class (4124, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\ApproximityPruning.class (6429, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\BaggingClassifier.class (1874, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\BaggingTrainer.class (2105, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\Boostable.class (133, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\Classification.class (2763, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\Classifier.class (6689, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\ClassifierEvaluating.class (364, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\ClassifierTrainer.class (8651, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\ConfidencePredictingClassifier.class (4329, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\ConfidencePredictingClassifierTrainer.class (5481, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\DecisionTree$Node.class (6798, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\DecisionTree.class (4881, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\DecisionTreeTrainer.class (4200, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\FeatureSelectingClassifierTrainer.class (1603, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\Kernel.class (2844, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\LinearKernel.class (870, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\MaxEnt.class (6733, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\MaxEnt2.class (14748, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\MaxEnt3.class (14596, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\MaxEnt4.class (13904, 2004-01-21)
... ...

This directory contains a MALLET-based Person Name Extractor trained on ~1000 Enron email messages (calendar text entries, actually) labeled by William Cohen and others at CMU. We conducted a 4-fold cross validation on this data, and the average field-level F1 score is 0.8103. ********************************************* To run the program, type: mallet/bin/personNER outFile inFile1 inFile2 ... ********************************************* The program requires that java version 1.4.x be in your PATH. The argument "outFile" specifies a file in which to save the extraction results. This output is in the same "stand-off annotation" format used by William to label the data. The program will produce a large amount of diagnostic output on standard out, which can be ignored. (In fact, pipe it to /dev/null to increase speed.) Here's an example of output of running the Enron data: addToType baughman-d__calendar__1 237 11 predicted_name addToType baughman-d__calendar__11 584 11 predicted_name addToType baughman-d__calendar__11 849 13 predicted_name addToType baughman-d__calendar__12 161 5 predicted_name Each line corresponds to one person name. --- The first column can be ignored. --- The second column is the input file name. --- The third and fourth column are the start position of the person name and its length. --- The fifth column is the label. ********************************************* The arguments "inFile1 inFile2 ..." specify the input files. There are sample files under the directory "samples". And the program will use them as input if the user doesn't specify otherwise. ********************************************* If you have any questions, please contact Andrew McCallum or Wei Li, {mccallum,weili}@cs.umass.edu

近期下载者

相关文件


收藏者