中文分词算法研究整理资料

  • n3_839247
    了解作者
  • 26.2MB
    文件大小
  • rar
    文件格式
  • 0
    收藏次数
  • VIP专享
    资源类型
  • 0
    下载次数
  • 2022-05-11 10:32
    上传日期
中文分词算法研究整理资料
中文分词算法研究整理资料.rar
  • 中文分词算法研究整理资料
  • 词典结构
  • 双数组trie树的基本构造及简单优化.doc
    41KB
  • 一种改进的高效分词词典机制.caj
    322.1KB
  • 基于双数组的分词词典研究与实现.nh
    1.9MB
  • 分词系统中常用的分词词典机制有.doc
    249KB
  • 汉语词典快速查询算法研究.caj
    485.1KB
  • 通用类trie树及自动生成.caj
    16.1KB
  • 一个基于改进的反序分词词典的中文分词算法.kdh
    97KB
  • 使用二级索引的中文分词词典.kdh
    163KB
  • 汉语词典的快速查询算法研究.caj
    210.1KB
  • 基于词频统计的中文分词的研究.pdf
    154.3KB
  • Chinese segmentation and new word detection using conditional random fields.pdf
    155.2KB
  • An empirical comparison of goodness measures for unsupervised Chinese word segmentation with a unified framework.pdf
    445.2KB
  • 基于动态规划的最小代价路径汉语自动分词.pdf
    340.6KB
  • 中文分词和词性标注模型.pdf
    95.4KB
  • 简与美3.txt
    3.4KB
  • Statistical Properties of Overlapping Ambiguities in Chinese Word Segmentation and a Strategy for Their Disambiguation.pdf
    507.8KB
  • Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation.pdf
    344KB
  • 简与美2.txt
    3.4KB
  • 收集的一些中文分词的工具.txt
    6.3KB
  • Chinese Word Segmentation Using Minimal Linguistic Knowledge.pdf
    96.6KB
  • 基于词典的中文分词算法研究.pdf
    352.1KB
  • 基于搜索引擎的中文分词评估方法.pdf
    310.5KB
  • ictclas分析.txt
    45B
  • 中文分词算法概述.pdf
    197.9KB
  • A hybrid approach to word segmentation.pdf
    712.9KB
  • 詹卫东老个人网站.txt
    33B
  • 中文全切分快速分词方法.caj
    94.3KB
  • mmseg算法.txt
    49B
  • 设立切分标志法在中文地址自动分词中的改进与应用.pdf
    156.5KB
  • 基于层叠隐马模型的汉语词法分析.pdf
    377.2KB
  • Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging.pdf
    250.4KB
  • 基于层叠隐马科胡语法分析.pdf
    72.7KB
  • 一些链接.txt
    595B
  • Joint Word Segmentation and POS Tagging Using a Single Perceptron.pdf
    150.6KB
  • 基于隐马尔科夫模型的中文分词研究.pdf
    170.3KB
  • Contextual dependencies in unsupervised word segmentation.pdf
    134.8KB
  • 简与美5.txt
    3.7KB
  • word segmentaion readming list.txt
    41B
  • 中文分词技术的研究现状与困难.pdf
    564.7KB
  • 基于上下文信息提取的概率分词算法.pdf
    540.6KB
  • A stochastic finite-state word-segmentation algorithm for Chinese.pdf
    1.8MB
  • 中文文本姓名识别的研究.kdh
    1.4MB
  • Web mining research--A survey.pdf
    188.7KB
  • A brief survey of text mining.pdf
    478.8KB
  • Chinese Part of Speech Tagging one at a time or all at once world based or character based.pdf
    296.2KB
  • Combining segmenter and chunker for Chinese word segmentation.pdf
    168.2KB
  • 对互联网环境下中文分词系统的一种架构改进.pdf
    138.6KB
  • 基于词典和词频的中文分词方法.pdf
    422.8KB
  • 期刊.txt
    27B
  • 一种全切分与统计结合的分词系统.caj
    30.3KB
  • 基于优化最大匹配与统计结合的汉语分词方法.pdf
    384.1KB
  • A maximum entropy Chinese character-based parser.pdf
    183.7KB
  • 面向大规模信息检索的中文分词技术研究.ppt
    1.4MB
  • A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005.pdf
    99.6KB
  • Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling.pdf
    320.3KB
  • 基于字位置概率特征的条件随机场中文分词方法.pdf
    244.7KB
  • Chinese segmentation with a word-based perceptron algorithm.pdf
    210KB
  • 基于数据驱动的中文分词方法研究.pdf
    269.3KB
  • Chinese and Japanese word segmentation using word-level and character-level information.pdf
    120KB
  • Character-Level Dependencies in Chinese Usefulness and Learning.pdf
    169.1KB
  • 书面汉语自动分词系统.pdf
    349.9KB
  • An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging.pdf
    238.8KB
  • 基于有效子串标注的中文分词.pdf
    153KB
  • A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging.pdf
    197.8KB
  • 中文自动分词关键技术研究与实现.nh
    4MB
  • 基于SVM的词频统计中文分词研究.pdf
    523.4KB
  • 几种基于词典的中文分词算法评价.pdf
    263.5KB
  • 二次回溯中文分词方法.pdf
    247KB
  • The first international Chinese word segmentation bakeoff.pdf
    102.1KB
  • Hash快速分词算法.pdf
    230.5KB
  • 詹卫东老师中文信息处理.ppt
    512.5KB
  • 免费中文分词系统与资源收集.txt
    1.3KB
  • 匹配的.txt
    4.1KB
  • An improved Chinese word segmentation system with conditional random field.pdf
    1MB
  • Combining classifiers for Chinese word segmentation.pdf
    77.1KB
  • Chinese Word Segmentation as LMR Tagging.pdf
    44.7KB
  • 基于N-最短路径方法的中文词语粗分模型.pdf
    57.5KB
  • Adaptive Chinese word segmentation.pdf
    156.9KB
  • A hybrid approach to word segmentation and pos tagging.pdf
    564.7KB
  • 基于反序词典的中文分词技术研究.pdf
    176.2KB
  • HHMM-based Chinese lexical analyzer ICTCLAS.pdf
    161.2KB
  • Perceptron Learning for Chinese Word Segmentation.pdf
    90.9KB
  • Chinese word segmentation without using lexicon and hand-crafted training data.pdf
    587KB
  • ChineseWord Segmentation and Named.pdf
    1.6MB
  • A compression-based algorithm for Chinese word segmentation.pdf
    786.5KB
  • 简与美4.txt
    5KB
  • 基于专有名词优先的快速中文分词.pdf
    274KB
  • Subword-based tagging by conditional random fields for chinese word segmentation.pdf
    164.2KB
  • 基于条件随机场的中文分词方法.pdf
    168.6KB
  • 简与美7.txt
    5.1KB
  • A dual-layer CRFs based joint decoding method for cascaded segmentation and labeling tasks.pdf
    216.5KB
  • Automatic Adaptation of Annotation Standards Chinese Word Segmentation and POS Tagging – A Case Study.pdf
    125.5KB
  • 基于一种粗切分的最短路径中文分词研究.pdf
    253.5KB
  • 简与美6.txt
    7.8KB
  • 一种改进的中文分词算法.pdf
    141.3KB
  • 一种改进的统计与后串最大匹配的中文分词算法研究.pdf
    329.6KB
  • 基于语料库和面向统计学的自然语言处理技术介绍.pdf
    89.3KB
  • 简与美1.txt
    2.5KB
内容介绍
<html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8"> <meta name="generator" content="pdf2htmlEX"> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> <link rel="stylesheet" href="https://static.pudn.com/base/css/base.min.css"> <link rel="stylesheet" href="https://static.pudn.com/base/css/fancy.min.css"> <link rel="stylesheet" href="https://static.pudn.com/prod/directory_preview_static/62f7b0b0f97302478e464a08/raw.css"> <script src="https://static.pudn.com/base/js/compatibility.min.js"></script> <script src="https://static.pudn.com/base/js/pdf2htmlEX.min.js"></script> <script> try{ pdf2htmlEX.defaultViewer = new pdf2htmlEX.Viewer({}); }catch(e){} </script> <title></title> </head> <body> <div id="sidebar" style="display: none"> <div id="outline"> </div> </div> <div id="pf1" class="pf w0 h0" data-page-no="1"><div class="pc pc1 w0 h0"><img class="bi x0 y0 w1 h1" alt="" src="https://static.pudn.com/prod/directory_preview_static/62f7b0b0f97302478e464a08/bg1.jpg"><div class="c x1 y1 w2 h2"><div class="t m0 x2 h3 y2 ff1 fs0 fc0 sc0 ls0 ws0"><span class="fc0 sc0">A </span><span class="_ _0"> </span><span class="ls1"><span class="fc0 sc0">Stochastic </span><span class="_ _1"> </span><span class="ls2"><span class="fc0 sc0">Finite-State </span></span></span></div><div class="t m0 x2 h3 y3 ff1 fs0 fc0 sc0 ls3 ws0"><span class="fc0 sc0">Word-Segmentation </span><span class="_ _0"> </span><span class="ls4"><span class="fc0 sc0">Algorithm </span><span class="_ _1"></span><span class="ls5"><span class="fc0 sc0">for </span></span></span></div><div class="t m0 x3 h3 y4 ff1 fs0 fc0 sc0 ls6 ws0"><span class="fc0 sc0">Chinese </span></div><div class="t m1 x2 h4 y5 ff2 fs1 fc0 sc0 ls2 ws0"><span class="fc0 sc0">Richard </span><span class="_ _2"></span><span class="ls0"><span class="fc0 sc0">Sproat* </span></span></div><div class="t m0 x2 h5 y6 ff2 fs2 fc0 sc0 ls7 ws0"><span class="fc0 sc0">Bell </span><span class="_ _3"></span><span class="ls8"><span class="fc0 sc0">Laboratories </span></span></div><div class="t m1 x2 h4 y7 ff2 fs1 fc0 sc0 ls5 ws0"><span class="fc0 sc0">William </span><span class="_ _2"></span><span class="ls9"><span class="fc0 sc0">Gale~ </span></span></div><div class="t m0 x2 h5 y8 ff2 fs2 fc0 sc0 lsa ws0"><span class="fc0 sc0">Bell </span><span class="_ _3"></span><span class="ls8"><span class="fc0 sc0">Laboratories </span></span></div><div class="t m1 x4 h4 y5 ff2 fs1 fc0 sc0 lsb ws0"><span class="fc0 sc0">Chilin </span><span class="_ _2"></span><span class="lsc"><span class="fc0 sc0">Shih~ </span></span></div><div class="t m0 x4 h5 y9 ff2 fs2 fc0 sc0 ls7 ws0"><span class="fc0 sc0">Bell </span><span class="_ _3"></span><span class="ls8"><span class="fc0 sc0">Laboratories </span></span></div><div class="t m1 x4 h4 ya ff2 fs1 fc0 sc0 lsd ws0"><span class="fc0 sc0">Nancy </span><span class="_ _3"></span><span class="lse"><span class="fc0 sc0">Chang </span><span class="_ _4"></span><span class="ls0"><span class="fc0 sc0">~ </span></span></span></div><div class="t m0 x4 h5 yb ff2 fs2 fc0 sc0 lsf ws0"><span class="fc0 sc0">University </span><span class="_ _3"></span><span class="lse"><span class="fc0 sc0">of </span><span class="ls10"><span class="fc0 sc0">Cambridge </span></span></span></div><div class="t m0 x3 h6 yc ff3 fs3 fc0 sc0 ls0 ws0"><span class="fc0 sc0">The </span><span class="_ _5"> </span><span class="fc0 sc0">initial </span><span class="_ _5"> </span><span class="fc0 sc0">stage </span><span class="_ _5"> </span><span class="ls2"><span class="fc0 sc0">of </span><span class="_ _6"></span><span class="lsa"><span class="fc0 sc0">text </span><span class="_ _7"></span></span></span><span class="fc0 sc0">analysis </span><span class="_ _3"></span><span class="ls2"><span class="fc0 sc0">for </span><span class="_ _7"></span><span class="lsa"><span class="fc0 sc0">any </span><span class="_ _5"> </span><span class="ls11"><span class="fc0 sc0">NLP </span><span class="_ _8"> </span></span></span></span><span class="fc0 sc0">task </span><span class="_ _8"> </span><span class="ls12"><span class="fc0 sc0">usually </span><span class="_ _9"> </span></span><span class="fc0 sc0">involves </span><span class="_ _1"> </span><span class="fc0 sc0">the </span><span class="_"> </span><span class="fc0 sc0">tokenization </span><span class="_ _1"> </span><span class="ls2"><span class="fc0 sc0">of </span><span class="_ _3"></span></span><span class="fc0 sc0">the </span></div><div class="t m0 x3 h6 yd ff3 fs3 fc0 sc0 ls13 ws0"><span class="fc0 sc0">input </span><span class="_ _8"> </span><span class="ls0"><span class="fc0 sc0">into </span><span class="_ _5"> </span><span class="ls14"><span class="fc0 sc0">words. </span><span class="ls15"><span class="fc0 sc0">For </span><span class="_ _7"></span></span></span><span class="fc0 sc0">languages </span><span class="_ _8"> </span><span class="fc0 sc0">like </span><span class="_ _7"></span><span class="fc0 sc0">English </span><span class="_"> </span><span class="ls16"><span class="fc0 sc0">one </span><span class="_ _7"></span><span class="ls15"><span class="fc0 sc0">can </span><span class="_ _7"> </span></span></span><span class="fc0 sc0">assume, </span><span class="_"> </span><span class="fc0 sc0">to </span><span class="_ _2"></span><span class="fc0 sc0">a </span><span class="_ _a"></span><span class="ls17"><span class="fc0 sc0">first </span><span class="_ _7"></span><span class="ls0"><span class="fc0 sc0">approximation, </span><span class="_ _0"> </span><span class="fc0 sc0">that </span></span></span></span></div><div class="t m0 x2 h6 ye ff3 fs3 fc0 sc0 ls0 ws0"><span class="fc0 sc0">word </span><span class="fc0 sc0">boundaries </span><span class="_ _a"></span><span class="ls18"><span class="fc0 sc0">are </span><span class="_ _b"></span><span class="lsa"><span class="fc0 sc0">given </span><span class="_ _6"></span><span class="ls19"><span class="fc0 sc0">by </span><span class="_ _a"></span><span class="ls0"><span class="fc0 sc0">whitespace </span><span class="_ _a"></span><span class="fc0 sc0">or </span><span class="_ _c"></span><span class="fc0 sc0">punctuation. </span><span class="_ _1"> </span><span class="lsa"><span class="fc0 sc0">In </span></span><span class="fc0 sc0">various </span><span class="ls13"><span class="fc0 sc0">Asian </span><span class="_ _3"></span></span><span class="fc0 sc0">languages, </span><span class="_ _6"></span><span class="fc0 sc0">including </span></span></span></span></span></div><div class="t m0 x2 h6 yf ff3 fs3 fc0 sc0 ls0 ws0"><span class="fc0 sc0">Chinese, </span><span class="_ _b"></span><span class="fc0 sc0">on </span><span class="_ _6"></span><span class="fc0 sc0">the </span><span class="_ _c"></span><span class="fc0 sc0">other </span><span class="_ _b"></span><span class="fc0 sc0">hand, </span><span class="fc0 sc0">whitespace </span><span class="fc0 sc0">is </span><span class="_ _c"></span><span class="fc0 sc0">never </span><span class="fc0 sc0">used </span><span class="_ _a"></span><span class="fc0 sc0">to </span><span class="_ _b"></span><span class="ls12"><span class="fc0 sc0">delimit </span><span class="_ _c"></span><span class="ls0"><span class="fc0 sc0">words, </span><span class="lsc"><span class="fc0 sc0">so </span><span class="_ _b"></span><span class="ls0"><span class="fc0 sc0">one </span><span class="_ _c"></span><span class="ls1a"><span class="fc0 sc0">must </span><span class="ls0"><span class="fc0 sc0">resort </span><span class="_ _c"></span><span class="fc0 sc0">to </span><span class="_ _b"></span><span class="fc0 sc0">lexical </span></span></span></span></span></span></span></div><div class="t m0 x3 h6 y10 ff3 fs3 fc0 sc0 ls0 ws0"><span class="fc0 sc0">information </span><span class="_ _6"></span><span class="fc0 sc0">to </span><span class="fc0 sc0">"reconstruct" </span><span class="_ _7"></span><span class="fc0 sc0">the </span><span class="_ _b"></span><span class="fc0 sc0">word-boundary </span><span class="_ _3"></span><span class="fc0 sc0">information. </span><span class="ls1b"><span class="fc0 sc0">In </span><span class="_ _c"></span><span class="ls0"><span class="fc0 sc0">this </span><span class="_ _c"></span><span class="fc0 sc0">paper </span><span class="_ _b"></span><span class="ls19"><span class="fc0 sc0">we </span><span class="_ _d"></span><span class="ls0"><span class="fc0 sc0">present </span><span class="_ _c"></span><span class="fc0 sc0">a </span><span class="_ _b"></span><span class="fc0 sc0">stochastic </span></span></span></span></span></div><div class="t m0 x5 h6 y11 ff3 fs3 fc0 sc0 lsa ws0"><span class="fc0 sc0">finite-state </span><span class="_ _2"></span><span class="ls13"><span class="fc0 sc0">model </span><span class="_ _c"></span><span class="ls0"><span class="fc0 sc0">wherein </span><span class="_ _8"> </span><span class="fc0 sc0">the </span><span class="_ _2"></span><span class="fc0 sc0">basic </span><span class="_ _3"></span><span class="fc0 sc0">workhorse </span><span class="_ _3"></span><span class="lsc"><span class="fc0 sc0">is </span><span class="_ _7"></span></span><span class="fc0 sc0">the </span><span class="_ _2"></span><span class="ff2 fs2 ls1c"><span class="fc0 sc0">weighted </span><span class="_ _8"> </span><span class="ls1d"><span class="fc0 sc0">finite-state </span><span class="_ _5"> </span><span class="ls1e"><span class="fc0 sc0">transducer. </span><span class="_"> </span></span></span></span><span class="fc0 sc0">The </span></span></span></div><div class="t m0 x3 h6 y12 ff3 fs3 fc0 sc0 ls0 ws0"><span class="fc0 sc0">model </span><span class="_ _6"></span><span class="fc0 sc0">segments </span><span class="_ _1"> </span><span class="fc0 sc0">Chinese </span><span class="_ _2"></span><span class="ls1f"><span class="fc0 sc0">text </span><span class="_ _2"></span></span><span class="fc0 sc0">into </span><span class="_ _7"></span><span class="fc0 sc0">dictionary </span><span class="_ _5"> </span><span class="fc0 sc0">entries </span><span class="_ _7"></span><span class="fc0 sc0">and </span><span class="_ _7"></span><span class="fc0 sc0">words </span><span class="_ _7"></span><span class="fc0 sc0">derived </span><span class="_ _7"></span><span class="ls19"><span class="fc0 sc0">by </span><span class="_ _3"></span></span><span class="fc0 sc0">various </span><span class="_ _7"></span><span class="fc0 sc0">productive </span></div><div class="t m0 x3 h6 y13 ff3 fs3 fc0 sc0 ls0 ws0"><span class="fc0 sc0">lexical </span><span class="_ _c"></span><span class="ls20"><span class="fc0 sc0">processes, </span><span class="ls21"><span class="fc0 sc0">and--since </span><span class="_ _3"></span><span class="ls0"><span class="fc0 sc0">the </span><span class="fc0 sc0">primary </span><span class="_"> </span><span class="fc0 sc0">intended </span><span class="_ _2"></span><span class="fc0 sc0">application </span><span class="_ _3"></span><span class="ls17"><span class="fc0 sc0">of </span><span class="_ _b"></span><span class="ls0"><span class="fc0 sc0">this </span><span class="_ _2"></span><span class="fc0 sc0">model </span><span class="_ _3"></span><span class="lsc"><span class="fc0 sc0">is </span><span class="_ _3"></span></span><span class="fc0 sc0">to </span><span class="_ _3"></span><span class="fc0 sc0">text-to-speech </span></span></span></span></span></span></div><div class="t m0 x2 h6 y14 ff3 fs3 fc0 sc0 ls22 ws0"><span class="fc0 sc0">synthesis--provides </span><span class="_ _2"></span><span class="ls0"><span class="fc0 sc0">pronunciations </span><span class="_ _7"></span><span class="ls2"><span class="fc0 sc0">for </span><span class="_ _3"></span><span class="ls23"><span class="fc0 sc0">these </span><span class="_ _c"></span><span class="lsa"><span class="fc0 sc0">words. </span><span class="fc0 sc0">We </span><span class="ls0"><span class="fc0 sc0">evaluate </span><span class="_ _7"> </span><span class="fc0 sc0">the </span><span class="_ _3"></span><span class="ls12"><span class="fc0 sc0">system's </span><span class="_ _2"></span></span><span class="fc0 sc0">performance </span><span class="_ _2"></span><span class="ls19"><span class="fc0 sc0">by </span></span></span></span></span></span></span></div><div class="t m0 x2 h6 y15 ff3 fs3 fc0 sc0 ls0 ws0"><span class="fc0 sc0">comparing </span><span class="_ _7"> </span><span class="ls24"><span class="fc0 sc0">its </span><span class="_ _3"></span></span><span class="fc0 sc0">segmentation </span><span class="_ _e"> </span><span class="fc0 sc0">'Tudgments" </span><span class="_ _e"> </span><span class="ls25"><span class="fc0 sc0">with </span><span class="_ _2"></span></span><span class="fc0 sc0">the </span><span class="ls25"><span class="fc0 sc0">judgments </span><span class="_ _2"></span><span class="ls2"><span class="fc0 sc0">of </span><span class="_ _b"></span><span class="ls0"><span class="fc0 sc0">a </span><span class="_ _3"></span><span class="fc0 sc0">pool </span><span class="ls2"><span class="fc0 sc0">of </span><span class="_ _b"></span><span class="lsa"><span class="fc0 sc0">human </span><span class="_ _7"></span><span class="ls0"><span class="fc0 sc0">segmenters, </span></span></span></span></span></span></span></div><div class="t m0 x2 h6 y16 ff3 fs3 fc0 sc0 ls0 ws0"><span class="fc0 sc0">and </span><span class="_ _3"></span><span class="fc0 sc0">the </span><span class="lsa"><span class="fc0 sc0">system </span><span class="_ _3"></span></span><span class="fc0 sc0">is </span><span class="fc0 sc0">shown </span><span class="_ _7"> </span><span class="fc0 sc0">to </span><span class="fc0 sc0">perform </span><span class="fc0 sc0">quite </span><span class="_ _2"></span><span class="fc0 sc0">well. </span></div><div class="t m2 x3 h7 y17 ff1 fs4 fc0 sc0 ls0 ws0"><span class="fc0 sc0">1. </span><span class="_ _7"></span><span class="ls11"><span class="fc0 sc0">The </span><span class="_ _3"></span><span class="fc0 sc0">Problem </span></span></div><div class="t m0 x3 h5 y18 ff2 fs2 fc0 sc0 ls26 ws0"><span class="fc0 sc0">Any </span><span class="ls10"><span class="fc0 sc0">NLP </span><span class="lse"><span class="fc0 sc0">application </span><span class="_ _3"></span><span class="ls1e"><span class="fc0 sc0">that </span><span class="_ _6"></span><span class="ls1c"><span class="fc0 sc0">presumes </span><span class="_ _3"></span><span class="ls27"><span class="fc0 sc0">as </span></span></span></span></span><span class="fc0 sc0">input </span><span class="_ _3"></span><span class="ls28"><span class="fc0 sc0">unrestricted </span><span class="_ _8"> </span><span class="ls2"><span class="fc0 sc0">text </span><span class="ls27"><span class="fc0 sc0">requires </span><span class="_ _2"></span><span class="lse"><span class="fc0 sc0">an </span><span class="_ _6"></span><span class="ls29"><span class="fc0 sc0">initial </span><span class="_ _3"></span><span class="lsd"><span class="fc0 sc0">phase </span></span></span></span></span></span></span></span></div><div class="t m0 x3 h5 y19 ff2 fs2 fc0 sc0 ls27 ws0"><span class="fc0 sc0">of </span><span class="ls2a"><span class="fc0 sc0">text </span><span class="_ _6"></span><span class="ls1d"><span class="fc0 sc0">analysis; </span><span class="_ _3"></span><span class="ls2b"><span class="fc0 sc0">such </span><span class="_ _3"></span><span class="ls8"><span class="fc0 sc0">applications </span><span class="_ _2"></span><span class="ls2c"><span class="fc0 sc0">involve </span><span class="_ _6"></span><span class="ls6"><span class="fc0 sc0">problems </span><span class="_ _6"></span><span class="ls2"><span class="fc0 sc0">as </span><span class="_ _3"></span><span class="ls2d"><span class="fc0 sc0">diverse </span><span class="_ _3"></span></span><span class="fc0 sc0">as </span><span class="ls2d"><span class="fc0 sc0">machine </span><span class="_ _7"> </span><span class="ls2e"><span class="fc0 sc0">translation, </span></span></span></span></span></span></span></span></span></span></div><div class="t m0 x2 h5 y1a ff2 fs2 fc0 sc0 lse ws0"><span class="fc0 sc0">information </span><span class="_ _8"> </span><span class="ls1"><span class="fc0 sc0">retrieval, </span><span class="_ _9"> </span><span class="ls2d"><span class="fc0 sc0">and </span><span class="_ _5"> </span><span class="ls8"><span class="fc0 sc0">text-to-speech </span><span class="_ _3"></span><span class="lsf"><span class="fc0 sc0">synthesis </span><span class="_ _8"> </span><span class="ls0"><span class="fc0 sc0">(TTS). </span><span class="_ _5"> </span><span class="ls10"><span class="fc0 sc0">An </span><span class="_ _7"> </span><span class="ls29"><span class="fc0 sc0">initial </span><span class="_ _9"> </span><span class="ls2f"><span class="fc0 sc0">step </span><span class="_ _7"></span></span></span></span></span></span></span></span></span><span class="fc0 sc0">of </span><span class="_ _6"></span><span class="ls26"><span class="fc0 sc0">any </span><span class="_ _7"></span><span class="ls30"><span class="fc0 sc0">text- </span></span></span></div><div class="t m0 x3 h5 y1b ff2 fs2 fc0 sc0 ls27 ws0"><span class="fc0 sc0">analysis </span><span class="_ _2"></span><span class="ls2b"><span class="fc0 sc0">task </span><span class="_ _3"></span><span class="lsa"><span class="fc0 sc0">is </span><span class="_ _2"></span><span class="ls11"><span class="fc0 sc0">the </span><span class="_ _7"></span><span class="lsf"><span class="fc0 sc0">tokenization </span><span class="_ _3"></span><span class="lse"><span class="fc0 sc0">of </span><span class="ls3"><span class="fc0 sc0">the </span><span class="_ _2"></span><span class="lsd"><span class="fc0 sc0">input </span><span class="_ _2"></span><span class="ls31"><span class="fc0 sc0">into </span><span class="_ _7"></span></span></span></span></span><span class="fc0 sc0">words. </span><span class="_ _7"></span></span><span class="fc0 sc0">For </span><span class="_ _3"></span><span class="ls0"><span class="fc0 sc0">a </span><span class="_ _7"></span><span class="ls32"><span class="fc0 sc0">language </span><span class="_ _7"></span><span class="ls1a"><span class="fc0 sc0">like </span><span class="_ _7"></span><span class="ls21"><span class="fc0 sc0">English, </span></span></span></span></span></span></span></span></div><div class="t m0 x2 h5 y1c ff2 fs2 fc0 sc0 ls2a ws0"><span class="fc0 sc0">this </span><span class="_"> </span><span class="ls6"><span class="fc0 sc0">problem </span><span class="_ _5"> </span><span class="ls1b"><span class="fc0 sc0">is </span><span class="_"> </span><span class="ls3"><span class="fc0 sc0">generally </span><span class="_ _f"> </span><span class="lse"><span class="fc0 sc0">regarded </span><span class="_ _0"> </span><span class="ls27"><span class="fc0 sc0">as </span><span class="_ _9"> </span><span class="lsb"><span class="fc0 sc0">trivial </span><span class="_"> </span><span class="ls30"><span class="fc0 sc0">since </span><span class="_ _9"> </span><span class="ls33"><span class="fc0 sc0">words </span><span class="_ _9"> </span><span class="ls2"><span class="fc0 sc0">are </span><span class="_"> </span><span class="ls2f"><span class="fc0 sc0">delimited </span><span class="_ _f"> </span></span><span class="fc0 sc0">in </span><span class="_"> </span><span class="fc0 sc0">English </span></span></span></span></span></span></span></span></span></span></div><div class="t m0 x3 h5 y1d ff2 fs2 fc0 sc0 ls2a ws0"><span class="fc0 sc0">text </span><span class="_ _5"> </span><span class="ls34"><span class="fc0 sc0">by </span><span class="_ _2"></span><span class="ls35"><span class="fc0 sc0">whitespace </span><span class="_ _f"> </span><span class="ls27"><span class="fc0 sc0">or </span><span class="_ _5"> </span><span class="ls10"><span class="fc0 sc0">marks </span><span class="_ _f"> </span><span class="lse"><span class="fc0 sc0">of </span><span class="_ _7"></span><span class="ls2d"><span class="fc0 sc0">punctuation. </span><span class="_ _0"> </span><span class="ls2b"><span class="fc0 sc0">Thus </span><span class="_"> </span><span class="ls17"><span class="fc0 sc0">in </span><span class="_ _9"> </span></span></span></span><span class="fc0 sc0">an </span><span class="_ _9"> </span><span class="lsb"><span class="fc0 sc0">English </span><span class="_ _0"> </span><span class="ls8"><span class="fc0 sc0">sentence </span><span class="_ _9"> </span><span class="ls2b"><span class="fc0 sc0">such </span><span class="_"> </span><span class="ls2"><span class="fc0 sc0">as </span></span></span></span></span></span></span></span></span></span></div><div class="t m0 x2 h6 y1e ff3 fs3 fc0 sc0 ls36 ws0"><span class="fc0 sc0">I'm </span><span class="_ _6"></span><span class="ls0"><span class="fc0 sc0">going </span><span class="_ _5"> </span><span class="fc0 sc0">to </span><span class="_ _3"></span><span class="fc0 sc0">show </span><span class="_ _8"> </span><span class="ls19"><span class="fc0 sc0">up </span><span class="_ _3"></span><span class="fc0 sc0">at </span><span class="_ _2"></span></span><span class="fc0 sc0">the </span><span class="_ _3"></span><span class="ls2"><span class="fc0 sc0">ACL </span><span class="_ _8"> </span><span class="ff2 fs2 ls3"><span class="fc0 sc0">one </span><span class="_ _7"> </span><span class="ls34"><span class="fc0 sc0">would </span><span class="_ _7"> </span><span class="ls37"><span class="fc0 sc0">reasonably </span><span class="_ _7"></span><span class="ls38"><span class="fc0 sc0">conjecture </span><span class="_ _7"></span><span class="ls1e"><span class="fc0 sc0">that </span><span class="_ _5"> </span><span class="ls27"><span class="fc0 sc0">there </span><span class="_ _5"> </span><span class="ls2"><span class="fc0 sc0">are </span><span class="_ _8"> </span></span><span class="fc0 sc0">eight </span></span></span></span></span></span></span></span></span></div><div class="t m0 x2 h5 y1f ff2 fs2 fc0 sc0 ls39 ws0"><span class="fc0 sc0">words </span><span class="ls10"><span class="fc0 sc0">separated </span><span class="_ _6"></span><span class="ls34"><span class="fc0 sc0">by </span><span class="_ _a"></span><span class="ls37"><span class="fc0 sc0">seven </span><span class="ls11"><span class="fc0 sc0">spaces. </span><span class="ls0"><span class="fc0 sc0">A </span><span class="_ _7"></span><span class="ls33"><span class="fc0 sc0">moment's </span><span class="ls2"><span class="fc0 sc0">reflection </span><span class="ls31"><span class="fc0 sc0">will </span><span class="_ _6"></span><span class="ls38"><span class="fc0 sc0">reveal </span><span class="_ _6"></span><span class="ls1e"><span class="fc0 sc0">that </span><span class="_ _3"></span></span><span class="fc0 sc0">things </span><span class="_ _7"></span><span class="ls36"><span class="fc0 sc0">are </span><span class="ls10"><span class="fc0 sc0">not </span></span></span></span></span></span></span></span></span></span></span></span></div><div class="t m0 x3 h5 y20 ff2 fs2 fc0 sc0 ls3 ws0"><span class="fc0 sc0">quite </span><span class="_ _8"> </span><span class="ls31"><span class="fc0 sc0">that </span><span class="_ _8"> </span><span class="ls11"><span class="fc0 sc0">simple. </span><span class="_ _5"> </span><span class="ls17"><span class="fc0 sc0">There </span><span class="_ _9"> </span><span class="ls36"><span class="fc0 sc0">are </span><span class="_ _8"> </span><span class="ls3a"><span class="fc0 sc0">clearly </span><span class="_ _7"></span><span class="ls27"><span class="fc0 sc0">eight </span><span class="_ _8"> </span><span class="ls35"><span class="fc0 sc0">orthographic </span><span class="_ _f"> </span><span class="ls33"><span class="fc0 sc0">words </span><span class="_ _7"></span><span class="ls2"><span class="fc0 sc0">in </span><span class="_ _7"></span></span></span></span></span></span></span></span></span></span><span class="fc0 sc0">the </span><span class="_ _7"></span><span class="ls32"><span class="fc0 sc0">example </span><span class="_ _7"> </span><span class="ls3a"><span class="fc0 sc0">given, </span></span></span></div><div class="t m0 x2 h6 y21 ff2 fs2 fc0 sc0 ls39 ws0"><span class="fc0 sc0">but </span><span class="_ _3"></span><span class="ls17"><span class="fc0 sc0">if </span><span class="_ _3"></span><span class="ls3"><span class="fc0 sc0">one </span><span class="_ _3"></span><span class="ls10"><span class="fc0 sc0">were </span><span class="_ _7"></span><span class="ls37"><span class="fc0 sc0">doing </span><span class="_ _2"></span><span class="ls38"><span class="fc0 sc0">syntactic </span><span class="_ _2"></span><span class="ls28"><span class="fc0 sc0">analysis </span><span class="_ _2"></span></span></span></span></span><span class="fc0 sc0">one </span><span class="_ _2"></span><span class="ls34"><span class="fc0 sc0">would </span><span class="_ _3"></span><span class="ls3b"><span class="fc0 sc0">probably </span><span class="_ _3"></span><span class="ls3c"><span class="fc0 sc0">want </span><span class="_ _2"></span><span class="ls2"><span class="fc0 sc0">to </span><span class="_ _3"></span><span class="lse"><span class="fc0 sc0">consider </span><span class="_ _7"></span><span class="ff3 fs3 ls36"><span class="fc0 sc0">I'm </span><span class="_ _2"></span></span></span><span class="fc0 sc0">to </span></span></span></span></span></span></span></div><div class="t m0 x3 h5 y22 ff2 fs2 fc0 sc0 ls11 ws0"><span class="fc0 sc0">consist </span><span class="_ _2"></span><span class="lse"><span class="fc0 sc0">of </span><span class="_ _2"></span><span class="ls3d"><span class="fc0 sc0">two </span></span></span></div><div class="t m2 x6 h7 y22 ff1 fs4 fc0 sc0 ls2a ws0"><span class="fc0 sc0">syntactic </span></div><div class="t m0 x7 h6 y22 ff2 fs2 fc0 sc0 ls10 ws0"><span class="fc0 sc0">words, </span><span class="_ _8"> </span><span class="ls3e"><span class="fc0 sc0">namely </span><span class="_ _2"></span><span class="ls0"><span class="fc0 sc0">I </span><span class="_"> </span></span></span><span class="fc0 sc0">and </span><span class="_ _5"> </span><span class="ff3 fs3 ls16"><span class="fc0 sc0">am. </span><span class="_"> </span></span><span class="ls1b"><span class="fc0 sc0">If </span><span class="_ _2"></span><span class="ls3"><span class="fc0 sc0">one </span><span class="_ _8"> </span></span><span class="fc0 sc0">is </span><span class="_ _7"></span><span class="ls8"><span class="fc0 sc0">interested </span><span class="_ _9"> </span><span class="ls2"><span class="fc0 sc0">in </span><span class="_ _8"> </span><span class="ls11"><span class="fc0 sc0">translation, </span></span></span></span></span></div><div class="t m0 x3 h6 y23 ff2 fs2 fc0 sc0 ls3 ws0"><span class="fc0 sc0">one </span><span class="_ _7"></span><span class="ls34"><span class="fc0 sc0">would </span><span class="_ _2"></span><span class="ls3b"><span class="fc0 sc0">probably </span><span class="_ _2"></span><span class="ls3f"><span class="fc0 sc0">want </span><span class="_ _8"> </span><span class="ls2"><span class="fc0 sc0">to </span><span class="_ _8"> </span><span class="ls8"><span class="fc0 sc0">consider </span><span class="_ _5"> </span><span class="ff3 fs3 ls0"><span class="fc0 sc0">show </span><span class="_ _7"></span><span class="ls19"><span class="fc0 sc0">up </span><span class="_ _8"> </span></span></span><span class="ls27"><span class="fc0 sc0">as </span><span class="_ _7"> </span><span class="ls0"><span class="fc0 sc0">a </span><span class="_ _f"> </span><span class="ls3a"><span class="fc0 sc0">single </span><span class="_ _8"> </span><span class="ls35"><span class="fc0 sc0">dictionary </span><span class="_ _5"> </span><span class="ls40"><span class="fc0 sc0">word </span><span class="_ _8"> </span><span class="ls30"><span class="fc0 sc0">since </span><span class="_ _7"></span><span class="ls36"><span class="fc0 sc0">its </span></span></span></span></span></span></span></span></span></span></span></span></span></div><div class="t m0 x3 h6 y24 ff2 fs2 fc0 sc0 ls8 ws0"><span class="fc0 sc0">semantic </span><span class="_ _7"></span><span class="fc0 sc0">interpretation </span><span class="_ _f"> </span><span class="ls1b"><span class="fc0 sc0">is </span><span class="_ _7"> </span><span class="ls10"><span class="fc0 sc0">not </span><span class="_ _8"> </span><span class="ls2"><span class="fc0 sc0">trivially </span><span class="_ _8"> </span><span class="ls2b"><span class="fc0 sc0">derivable </span><span class="_ _5"> </span><span class="ls1e"><span class="fc0 sc0">from </span><span class="_ _7"></span><span class="ls3"><span class="fc0 sc0">the </span><span class="_ _7"> </span><span class="ls2f"><span class="fc0 sc0">meanings </span><span class="_ _f"> </span><span class="lse"><span class="fc0 sc0">of </span><span class="_ _3"></span><span class="ff3 fs3 ls0"><span class="fc0 sc0">show </span><span class="_"> </span></span></span></span></span></span></span></span><span class="fc0 sc0">and </span><span class="_ _9"> </span><span class="ff3 fs3 ls0"><span class="fc0 sc0">up. </span></span></span></span></div><div class="t m0 x3 h5 y25 ff2 fs2 fc0 sc0 ls26 ws0"><span class="fc0 sc0">And </span><span class="_ _2"></span><span class="ls17"><span class="fc0 sc0">if </span><span class="_ _6"></span><span class="ls2d"><span class="fc0 sc0">one </span><span class="_ _6"></span><span class="ls1b"><span class="fc0 sc0">is </span><span class="_ _2"></span><span class="ls8"><span class="fc0 sc0">interested </span><span class="_ _7"></span><span class="ls2"><span class="fc0 sc0">in </span><span class="_ _3"></span><span class="ls0"><span class="fc0 sc0">TTS, </span><span class="_ _8"> </span></span></span></span></span><span class="fc0 sc0">one </span><span class="_ _3"></span><span class="ls34"><span class="fc0 sc0">would </span><span class="_ _6"></span><span class="ls6"><span class="fc0 sc0">probably </span><span class="_ _6"></span><span class="ls8"><span class="fc0 sc0">consider </span><span class="_ _7"></span><span class="ls11"><span class="fc0 sc0">the </span><span class="_ _2"></span><span class="ls2"><span class="fc0 sc0">single </span><span class="_ _2"></span></span></span></span></span></span><span class="fc0 sc0">orthographic </span></span></span></div><div class="t m0 x2 h6 y26 ff2 fs2 fc0 sc0 ls40 ws0"><span class="fc0 sc0">word </span><span class="_ _3"></span><span class="ff3 fs3 ls2"><span class="fc0 sc0">ACL </span><span class="_ _2"></span></span><span class="ls27"><span class="fc0 sc0">to </span><span class="_ _2"></span><span class="ls11"><span class="fc0 sc0">consist </span><span class="_ _7"></span></span><span class="fc0 sc0">of </span><span class="_ _6"></span><span class="ls2e"><span class="fc0 sc0">three </span></span></span></div><div class="t m2 x8 h7 y26 ff1 fs4 fc0 sc0 ls2d ws0"><span class="fc0 sc0">phonological </span></div><div class="t m0 x9 h5 y26 ff2 fs2 fc0 sc0 ls41 ws0"><span class="fc0 sc0">words--/eJ </span><span class="_ _7"></span><span class="ls11"><span class="fc0 sc0">s'i </span><span class="_ _2"></span><span class="ls3c"><span class="fc0 sc0">~l/--corresponding </span><span class="_ _f"> </span><span class="ls2"><span class="fc0 sc0">to </span><span class="_ _2"></span><span class="ls3"><span class="fc0 sc0">the </span></span></span></span></span></div><div class="t m0 x3 h5 y27 ff2 fs2 fc0 sc0 ls2f ws0"><span class="fc0 sc0">pronunciation </span><span class="_ _5"> </span><span class="ls27"><span class="fc0 sc0">of </span><span class="_ _3"></span><span class="ls1e"><span class="fc0 sc0">each </span><span class="_ _2"></span><span class="lse"><span class="fc0 sc0">of </span><span class="_ _6"></span><span class="ls3"><span class="fc0 sc0">the </span><span class="_ _7"> </span><span class="ls3a"><span class="fc0 sc0">letters </span><span class="_ _7"></span><span class="ls17"><span class="fc0 sc0">in </span><span class="_ _7"></span><span class="ls11"><span class="fc0 sc0">the </span><span class="_ _7"></span><span class="ls8"><span class="fc0 sc0">acronym. </span><span class="_ _8"> </span><span class="ls2"><span class="fc0 sc0">Space- </span><span class="_ _7"></span></span></span></span></span></span></span></span></span><span class="fc0 sc0">or </span><span class="_ _3"></span><span class="ls10"><span class="fc0 sc0">punctuation-delimited </span></span></span></div><div class="t m3 xa h8 y28 ff2 fs5 fc0 sc0 ls0 ws0"><span class="fc0 sc0">700 </span><span class="_ _3"></span><span class="ls3a"><span class="fc0 sc0">Mountain </span><span class="_ _2"></span><span class="lsb"><span class="fc0 sc0">Avenue, </span></span></span><span class="fc0 sc0">2d-451, </span><span class="_"> </span><span class="ls3a"><span class="fc0 sc0">Murray </span><span class="_ _7"> </span><span class="ls17"><span class="fc0 sc0">Hill, </span><span class="ls1b"><span class="fc0 sc0">NJ </span></span></span></span><span class="fc0 sc0">07974, </span><span class="_ _2"></span><span class="ls1f"><span class="fc0 sc0">USA. </span><span class="_ _3"></span><span class="ls42"><span class="fc0 sc0">E-mail: </span><span class="_ _6"></span><span class="ls8"><span class="fc0 sc0">rws@bell-labs, </span></span></span></span></div><div class="t m2 xb h9 y28 ff1 fs6 fc0 sc0 ls2f ws0"><span class="fc0 sc0">corn </span></div><div class="t m3 xc h8 y29 ff2 fs5 fc0 sc0 ls0 ws0"><span class="fc0 sc0">t </span><span class="_"> </span><span class="fc0 sc0">700 </span><span class="_ _3"></span><span class="ls31"><span class="fc0 sc0">Mountain </span><span class="_ _2"></span><span class="lsb"><span class="fc0 sc0">Avenue, </span></span></span><span class="fc0 sc0">2d-451, </span><span class="_"> </span><span class="ls3a"><span class="fc0 sc0">Murray </span><span class="_ _7"> </span><span class="ls17"><span class="fc0 sc0">Hill, </span><span class="ls1b"><span class="fc0 sc0">NJ </span><span class="_ _6"></span></span></span></span><span class="fc0 sc0">07974, </span><span class="_ _2"></span><span class="ls1f"><span class="fc0 sc0">USA. </span><span class="_ _3"></span><span class="ls42"><span class="fc0 sc0">E-mail: </span><span class="_ _6"></span><span class="ls6"><span class="fc0 sc0">&#162;ls@bell-labs. </span></span></span></span></div><div class="t m2 xb h9 y29 ff1 fs6 fc0 sc0 ls2f ws0"><span class="fc0 sc0">corn </span></div><div class="t m3 xc h8 y2a ff2 fs5 fc0 sc0 ls43 ws0"><span class="fc0 sc0">~t </span><span class="_ _6"></span><span class="ls0"><span class="fc0 sc0">600 </span><span class="_ _2"></span><span class="ls3a"><span class="fc0 sc0">Mountain </span><span class="_ _2"></span><span class="lsb"><span class="fc0 sc0">Avenue, </span></span></span><span class="fc0 sc0">2c-278, </span><span class="_ _7"> </span><span class="ls3a"><span class="fc0 sc0">Murray </span><span class="_ _7"> </span><span class="ls17"><span class="fc0 sc0">Hill, </span><span class="_ _c"></span><span class="fc0 sc0">NJ </span><span class="_ _6"></span><span class="ls0"><span class="fc0 sc0">07974, </span><span class="_ _7"> </span><span class="ls1f"><span class="fc0 sc0">USA. </span><span class="_ _6"></span><span class="ls42"><span class="fc0 sc0">E-mail: </span><span class="ls28"><span class="fc0 sc0">gale@research, </span><span class="ls40"><span class="fc0 sc0">art. </span><span class="_ _3"></span><span class="ls44"><span class="fc0 sc0">corn </span></span></span></span></span></span></span></span></span></span></div><div class="t m3 xc h8 y2b ff2 fs5 fc0 sc0 ls0 ws0"><span class="fc0 sc0">&#167; </span><span class="_ _6"></span><span class="ls1"><span class="fc0 sc0">Cambridge, </span><span class="_ _2"></span><span class="lsa"><span class="fc0 sc0">UK. </span><span class="_ _3"></span><span class="ls42"><span class="fc0 sc0">E-mail: </span><span class="_ _6"></span><span class="ls45"><span class="fc0 sc0">nc201@eng.cam.ac.uk </span></span></span></span></span></div><div class="t m3 x2 h8 y2c ff2 fs5 fc0 sc0 ls0 ws0"><span class="fc0 sc0">&#169; </span><span class="_ _0"> </span><span class="fc0 sc0">1996 </span><span class="_ _2"></span><span class="ls1"><span class="fc0 sc0">Association </span><span class="ls5"><span class="fc0 sc0">for </span><span class="ls38"><span class="fc0 sc0">Computational </span><span class="_ _3"></span></span><span class="fc0 sc0">Linguistics </span></span></span></div></div></div><div class="pi" data-data='{"ctm":[1.884722,0.000000,0.000000,1.884722,0.000000,0.000000]}'></div></div> </body> </html>
评论
    相关推荐