spanish_coref:西班牙语的解析和共同引用解析管道

  • G0_479289
    了解作者
  • 29MB
    文件大小
  • zip
    文件格式
  • 0
    收藏次数
  • VIP专享
    资源类型
  • 0
    下载次数
  • 2022-06-12 04:03
    上传日期
该软件包包含一个完整的西班牙语分析管道,该管道将文本(每行一个句子)作为输入,并进行PoS标记,命名实体识别和分类,解析和共引用解析。 输出格式用实体注释CoNLL。 如果使用管道,请参考本文: @inproceedings{zora136447, booktitle = {15th Conference of the European Chapter of the Association for Computational Linguistics}, month = {April}, title = {Co-reference R
spanish_coref-master.zip
  • spanish_coref-master
  • coref_example.sh
    4KB
  • models
  • splitDatesModel.mco
    14.3MB
  • 3gram_enhancedAncora.model
    52MB
  • parse.pm
    12.6KB
  • es.cfg
    1022B
  • maltparser_tools
  • src
  • test.conll
    658.3KB
  • MaltParserServer.java
    5.8KB
  • MPClient.java
    3.5KB
  • .classpath
    391B
  • Readme.md
    4.1KB
  • LICENSE
    34.3KB
  • squoia
  • crf2conll.pm
    20KB
  • util.pm
    22.6KB
  • FreeLingModules
  • server_squoia.cc
    21.6KB
  • output_crf.cc
    7.6KB
  • grammar_es
  • afixos_desr.dat
    32.2KB
  • tagset.dat
    1.7KB
  • dicc_squoia.src
    17.9MB
  • np_desr.dat
    960B
  • splitter.dat
    173B
  • quantities.dat
    22.7KB
  • tokenizer.dat
    2.9KB
  • probabilitats.dat
    888.5KB
  • locucions_squoia.dat
    143.2KB
  • es_squoia.cfg
    3KB
  • nec.cc
    6.2KB
  • output_crf.h
    1.3KB
  • common
  • punct.dat
    381B
  • config_squoia
  • analyzer_squoia.h
    6.9KB
  • socket.h
    4KB
  • stats.h
    3.4KB
  • config_squoia.h
    29.6KB
  • analyzer_client.cc
    3.5KB
  • corzu_es
  • data
  • person.list
    15.4KB
  • male_names.txt
    362.2KB
  • all.male.sorted
    62.2KB
  • lastnames.txt
    32.6KB
  • female_names.txt
    223KB
  • all.fem.sorted
    105.1KB
  • corzu_es.py
    36.6KB
  • mle_weights_real
    167.8KB
  • extract_markables.py
    16.7KB
  • person.txt
    29.8KB
  • check_names.pl
    14.9KB
  • arcs_inferred_antecedents.py
    18.2KB
  • conll_to_html.py
    2.8KB
  • test_data
  • mao-s-china-at-60.es
    6.5KB
  • scripts
  • conll2senttok.pl
    584B
内容介绍
This package contains a full analysis pipeline for Spanish that takes text (one sentence per line) as input and does PoS tagging, named entity recognition and classification, parsing and co-reference resolution. Output format is CoNLL annotated with entities. Please reference this paper if you use the pipeline: ``` @inproceedings{zora136447, booktitle = {15th Conference of the European Chapter of the Association for Computational Linguistics}, month = {April}, title = {Co-reference Resolution of Elided Subjects and Possessive Pronouns in Spanish-English Statistical Machine Translation}, author = {Annette Rios and Don Tuggener}, publisher = {Association for Computational Linguistics}, year = {2017}, pages = {657--662}, url = {http://dx.doi.org/10.5167/uzh-136447} } ``` An annotated version of the news commentary 2011 corpus used in the paper is available from https://github.com/a-rios/CorefMT ## Installation ### FreeLing `git clone https://github.com/TALP-UPC/freeling` Installation (make sure to install from sources, headers are needed), see: https://talp-upc.gitbooks.io/freeling-user-manual/content/installation.html compile Freeling analyzer with crf output format for wapiti: ``` export $FREELING_INSTALLATION_DIR= path to you installation of FreeLing export $PARSING_PIPELINE_DIR= path to this package g++ -c -o output_crf.o output_crf.cc -I$FREELING_INSTALLATION_DIR/include -I$PARSING_PIPELINE_DIR/FreeLingModules/config_squoia g++ -c -o analyzer_client.o analyzer_client.cc -I$FREELING_INSTALLATION_DIR/include -I$PARSING_PIPELINE_DIR/FreeLingModules/config_squoia g++ -std=gnu++11 -c -o server_squoia.o server_squoia.cc -I$FREELING_INSTALLATION_DIR/include -I$PARSING_PIPELINE_DIR/FreeLingModules/config_squoia export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$FREELING_INSTALLATION_DIR/lib g++ -O3 -Wall -o server_squoia server_squoia.o output_crf.o -L$FREELING_INSTALLATION_DIR/lib -lfreeling -lboost_program_options -lboost_system -lboost_filesystem -lpthread ``` named entity classification: ``` g++ -std=gnu++11 -o nec nec.cc -I$FREELING_INSTALLATION_DIR/include -I$PARSING_PIPELINE_DIR/FreeLingModules/config_squoia -L$FREELING_INSTALLATION_DIR/lib -lfreeling -lboost_program_options -lboost_system -lboost_filesystem -lpthread ``` analyzer_client: ``` g++ -O3 -Wall -o analyzer_client analyzer_client.o -L$FREELING_INSTALLATION_DIR/local/lib -lfreeling export FREELINGSHARE=$FREELING_INSTALLATION_DIR/share/freeling ``` once compiled, you can test the server: ``` ./server_squoia -f $PARSING_PIPELINE_DIR/FreeLingModules/es_squoia.cfg --server --port=$PORT 2> logtagging & echo "eso es una prueba" |./analyzer_client $PORT ``` Link server_squoia, analyzer_client and nec to the /bin folder (optional, if you do not link them, change the paths in es.cfg): ``` cd $PARSING_PIPELINE_DIR/bin ln -s ../FreeLingModules/server_squoia . ln -s ../FreeLingModules/analyzer_client . ln -s ../FreeLingModules/nec . ``` For system wide use, either link client and server to somewhere in your $PATH (e.g. in `/usr/local/bin`), or add their location to $PATH ### Wapiti https://wapiti.limsi.fr/ follow installation instructions, then adapt path to wapiti in es.cfg ### MaltParser http://www.maltparser.org/download.html follow installation instructions, see http://www.maltparser.org/install.html set maltPath in es.cfg to your installation of maltparser compile server-client modules ($MALTPARSER_DIR= path to your maltparser installation): ``` cd $PARSING_PIPELINE_DIR/maltparser_tools/src javac -cp $MALTPARSER_DIR/maltparser-1.8/maltparser-1.8.jar MPClient.java javac -cp $MALTPARSER_DIR/maltparser-1.8/maltparser-1.8.jar MaltParserServer.java ``` move binaries to ../bin: `mv MaltParserServer.class MPClient.class ../bin/` ### Perl modules required: ``` Getopt::Long; Storable; File::Basename; File::Spec::Functions ``` parse with parse.pm: ``` cd $PARSING_PIPELINE_DIR ./parse.pm -c es.cfg ``` use `./parse.pm --help` to see input/output format options As an example for how to add co-reference annotations to your conll with corzu, see coref_example.sh
评论
    相关推荐
    • 文字分类
      文字分类
    • 天气分类
      天气分类
    • 推文分类
      推文分类
    • 推特分类
      推特分类
    • 分类
      分类
    • 分类
      分类
    • DogBreed分类
      狗的品种分类 描述 该项目可以对120个不同的犬种进行分类。 我将转移学习用于图像分类和来自Tensorhub的预训练模型-mobilenetv2 使用Streamlit将模型部署到Web 演示版 执照
    • 昆虫分类
      三百六十行,行行出状元,但状元也是需要查找和学习昆虫分类的,欢迎大家下载昆虫分类参考使用。PS:可下...该文档为昆虫分类,是一份很不错的参考资料,具有较高参考价值,感兴趣的可以下载看看
    • 分类
      分类
    • 商品分类
      这是一款整理发布的商品分类,适用于公司企业营销人员学习参考商品分类,进而更好提升自己。P...该文档为商品分类,是一份很不错的参考资料,具有较高参考价值,感兴趣的可以下载看看