personNER
所属分类:模式识别(视觉/语音等)
开发工具:Java
文件大小:7298KB
下载次数:93
上传日期:2007-07-10 11:20:18
上 传 者:
yaersilan
说明: 基于CRF(conditional random fields)统计模型的文本人名识别工具源代码,是Mallet开放源码项目的一部分
(based on CRF (conditional random fields) statistical model of text my name recognition tools source code, open source Mallet is part of the project)
文件列表:
personNER\personNER\mallet\bin\CVS\Entries (57, 2004-01-21)
personNER\personNER\mallet\bin\CVS\Repository (20, 2004-01-21)
personNER\personNER\mallet\bin\CVS\Root (31, 2004-01-21)
personNER\personNER\mallet\bin\personNER (319, 2004-01-21)
personNER\personNER\mallet\bin\prepend-license.sh (61, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\AccuracyCoverage$1.class (806, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\AccuracyCoverage$ClassificationComparator.class (1334, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\AccuracyCoverage.class (8136, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\ConfusionMatrix.class (5745, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\Graph$Legend.class (1848, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\Graph.class (4327, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\Graph2.class (2512, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\evaluate\GraphItem.class (562, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\examples\DocumentClassifier.class (3112, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\tests\TestClassifiers.class (4139, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\tests\TestMaxEntTrainer.class (2880, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\tests\TestNaiveBayes.class (7571, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\AccuracyEvaluator.class (1648, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\AdaBoost.class (1823, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\AdaBoostTrainer.class (4124, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\ApproximityPruning.class (6429, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\BaggingClassifier.class (1874, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\BaggingTrainer.class (2105, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\Boostable.class (133, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\Classification.class (2763, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\Classifier.class (6689, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\ClassifierEvaluating.class (364, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\ClassifierTrainer.class (8651, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\ConfidencePredictingClassifier.class (4329, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\ConfidencePredictingClassifierTrainer.class (5481, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\DecisionTree$Node.class (6798, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\DecisionTree.class (4881, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\DecisionTreeTrainer.class (4200, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\FeatureSelectingClassifierTrainer.class (1603, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\Kernel.class (2844, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\LinearKernel.class (870, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\MaxEnt.class (6733, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\MaxEnt2.class (14748, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\MaxEnt3.class (14596, 2004-01-21)
personNER\personNER\mallet\class\edu\umass\cs\mallet\base\classify\MaxEnt4.class (13904, 2004-01-21)
... ...
This directory contains a MALLET-based Person Name Extractor trained
on ~1000 Enron email messages (calendar text entries, actually)
labeled by William Cohen and others at CMU.
We conducted a 4-fold cross validation on this data, and the average
field-level F1 score is 0.8103.
*********************************************
To run the program, type:
mallet/bin/personNER outFile inFile1 inFile2 ...
*********************************************
The program requires that java version 1.4.x be in your PATH.
The argument "outFile" specifies a file in which to save the
extraction results. This output is in the same "stand-off annotation"
format used by William to label the data.
The program will produce a large amount of diagnostic output on
standard out, which can be ignored. (In fact, pipe it to /dev/null to
increase speed.)
Here's an example of output of running the Enron data:
addToType baughman-d__calendar__1 237 11 predicted_name
addToType baughman-d__calendar__11 584 11 predicted_name
addToType baughman-d__calendar__11 849 13 predicted_name
addToType baughman-d__calendar__12 161 5 predicted_name
Each line corresponds to one person name.
--- The first column can be ignored.
--- The second column is the input file name.
--- The third and fourth column are the start position of the person name
and its length.
--- The fifth column is the label.
*********************************************
The arguments "inFile1 inFile2 ..." specify the input files. There are
sample files under the directory "samples". And the program will use them
as input if the user doesn't specify otherwise.
*********************************************
If you have any questions, please contact Andrew McCallum or Wei Li,
{mccallum,weili}@cs.umass.edu
近期下载者:
相关文件:
收藏者: