【ACE实体关系数据集】

所属分类:源码/资料
开发工具:Others
文件大小:532KB
下载次数:2
上传日期:2022-11-20 08:27:00
上 传 者gdTest
说明:  ACE2005语料库是语言数据联盟(LDC)发布的由实体,关系和事件注释组成的各种类型的数据,包括英语,阿拉伯语和中文培训数据,目标是开发自动内容提取技术,支持以文本形式自动处理人类语言。 ACE语料解决了五个子任务的识别:entities、values、temporal expressions、relations and events。这些任务要求系统处理文档中的语言数据,然后为每个文档输出有

文件列表:
ace (0, 2021-03-15)
ace\ACE-MK (0, 2021-03-15)
ace\ACE-MK\.DS_Store (12292, 2016-06-16)
ace\ACE-MK\._.DS_Store (120, 2016-06-16)
ace\ACE-MK\bc (0, 2021-03-15)
ace\ACE-MK\bc\.DS_Store (6148, 2016-06-16)
ace\ACE-MK\bc\._.DS_Store (120, 2016-06-16)
ace\ACE-MK\bc\CNN_CF_20030303.1900.00.apf.add.xml (4739, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030303.1900.02.apf.add.xml (15094, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030303.1900.05.apf.add.xml (4767, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030303.1900.06-1.apf.add.xml (4551, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030303.1900.06-2.apf.add.xml (4476, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030304.1900.01.apf.add.xml (11821, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030304.1900.02.apf.add.xml (15285, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030304.1900.04.apf.add.xml (14487, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030304.1900.06-2.apf.add.xml (2533, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030305.1900.00-1.apf.add.xml (10857, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030305.1900.00-2.apf.add.xml (4973, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030305.1900.00-3.apf.add.xml (2371, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030305.1900.02.apf.add.xml (11088, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030305.1900.06-1.apf.add.xml (9278, 2016-01-19)
ace\ACE-MK\bc\CNN_CF_20030305.1900.06-2.apf.add.xml (2756, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030328.1600.07.apf.add.xml (13134, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030329.1600.00-2.apf.add.xml (6237, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030329.1600.00-3.apf.add.xml (3374, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030329.1600.00-4.apf.add.xml (19436, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030329.1600.00-5.apf.add.xml (3120, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030329.1600.00-6.apf.add.xml (5544, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030329.1600.01-1.apf.add.xml (3285, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030329.1600.01-3.apf.add.xml (12193, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030329.1600.02.apf.add.xml (6406, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030330.1600.05-2.apf.add.xml (4810, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030330.1600.06.apf.add.xml (8455, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030402.1600.00-1.apf.add.xml (9249, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030402.1600.00-2.apf.add.xml (18370, 2016-01-19)
ace\ACE-MK\bc\CNN_IP_20030402.1600.00-3.apf.add.xml (3501, 2016-01-19)
... ...

ACE Meta-knowledge annotation format ===================================== A Custom XML format to encode meta-knowledge information about events in ACE 2005 has been created. It is formally defined in the DTD file "add.dtd", which is included as part of this distribution. The meta-knowledge annotations are provided in the ACE-MK directory. This directory itself has six subdirectories, which correspond to the same split of the documents as is provided in the ACE 2005 English corpus data. Each of the six subdirectories, i.e. “bn”, “bc”, “nw”, “cts”, “wl” and “un”, corresponds to a different data source from which the documents were originally drawn. The meta-knowledge annotation files within each of the subdirectories ends with the extension “.add.xml”. The base name of each file matches that of underlying “.sgm” file, corresponding to the original document in the ACE 2005 corpus, as well as the original annotation file in ACE 2005 (with the extension “.apf.xml”) which the meta-knowledge annotation file can be seen to augment. These original annotation files are the ones contained within the “timex2norm” subdirectory of each data source directory within the original ACE 2005 distribution, (e.g. “bn/timex2norm”), since these are considered to be the final, consolidated annotation files. An example of the custom XML format used to encode meta-knowledge information is shown below, after which descriptions of the different types of elements are provided. said earlier reports apparently apparently earlier reports said The “source_file” and “document” elements match those at the start of the original corresponding ACE annotation file (with the extension “.apf.xml”), and they provide information about the underlying text document. The information added during the meta-knowledge annotation effort is encoded in the following elements: mk-cue ====== These elements encode information about meta-knowledge cue words and phrases (i.e., evidence for the assignment of particular meta-knowledge values) that have been identified during annotation. Attributes ———————— ID - A unique ID for the cue Type - The type of the cue, i.e., the type of meta-knowledge attribute for which it provides evidence. The value of this attribute is one of the following: “Subjectivity-Cue”, “Modality-Cue”, “Tense-Cue”, “Genericity-Cue”, “SourceType-Cue” Children ———— A single “extent” element always appears as child, which is used to denote the span covered in the corresponding text file. mk-source ========= These elements correspond to phrases in the text that correspond to information sources of events, and which have been identified during the meta-knowledge annotation effort. Note, however, that certain sources correspond to entities that were annotated during the original ACE annotation effort (e.g., people or organisations providing information). The “mk-source” elements only encode sources that do not correspond to originally annotated entities. Often, these newly annotated sources are vague phrases, such as “reports”, that correspond to unnamed sources. Attributes ————— ID - A unique ID for the source phrase Children ———— A single “extent” element always appears as child, which is used to denote the span covered in the corresponding text file. event_mention ============== The original ACE 2005 annotation includes both “event” and “event_mention” annotations. A particular event (e.g., a death or attack) may be mentioned several times in a document. The “event” elements group together these different mentions of the same event (i.e., an “event” element can have one or more “event_mention” elements as its children). In the original ACE 2005 annotation, meta-knowledge attributes (i.e., MODALITY, POLARITY, TENSE and GENERICITY) were encoded at the level of “event” elements, i.e. the values of these attributes were expected to be the same for all mentions of the events. However, for our augmented meta-knowledge annotation, the meta-knowledge information is more appropriately attached at the level of event mentions. This is because, for example, different sources could be providing different information about the same event. Each of these sources may in turn have different opinions about the event. Such factors mean that, when applying our augmented and more fine-grained meta-knowledge annotation scheme, it is far more appropriate to assign meta-knowledge information to each individual mention of an event. Each “event_mention” element in the “.add.xml” files corresponds to an “event_mention” element in the associated “.apf.xml” file within the ACE 2005 corpus. The information provided within each “event_mention” element within the “add.xml” file is intended to be used to augment/enrich the corresponding “event_mention” element in the “.apf.xml” file. Attributes ———— ID - The ID for the event mention. This ID corresponds to the ID of an “event_mention” element in the associated “.apf.xml” file within the ACE 2005 corpus (i.e. within the “timex2norm” subdirectory of the corresponding corpus partition. MK-GENERICITY - The meta-knowledge GENERICITY attribute associated with the event mention. Possible values are “Specific” and “Generic”/ MK-MODALITY - The meta-knowledge MODALITY attribute associated with the event mention. Possible values are “Asserted”, “Presupposed”, “Speculated” and “Other”. MK-POLARITY - The meta-knowledge POLARITY attribute associated with the event mention. Possible values are “Positive” and “Negative”. MK-SOURCE-TYPE - The meta-knowledge SOURCE-TYPE attribute associated with the event mention. Possible values are “Author”, “Involved” and “ThirdParty”. MK-SUBJECTIVITY - The meta-knowledge SUBJECTIVITY attribute associated with the event mention. Possible values are “Positive”, “Negative”, “Multi-valued” and “Neutral”. Children ———— Zero or more “event_mention_mk_evidence” elements, each of which links a meta-knowledge related text span to the event mention. event_mention_mk_evidence ========================== These elements indicate the meta-knowledge related text spans that are associated with their parent event mention elements. Such text spans correspond either to cue word/phrases that provide evidence for the assignment of a particular meta-knowledge attribute value, or to phrases denoting the information source of an event Attributes ————— EVIDENCE-TYPE - The type of meta-knowledge evidence that the underlying text span provides. This can either be: - A cue word/phrase, in which case one of the following values is used: MODALITY-CUE, GENERICITY-CUE, POLARITY-CUE, SUBJECTIVITY-CUE, TENSE-CUE, SOURCETYPE-CUE - A word/phrase denoting an information source, in which case one of the following values is used: SOURCE-NAMED if the phrase corresponds to a named source (e.g, a particular person, group or organisation); or SOURCE-UNNAMED, if the phrase is a a more vague reference to an information source, that is not specifically named. NOTE: There can be more than one instance of a particular EVIDENCE-TYPE associated with a given event mention. For example, there may be multiple sources, or multiple words/phrases that provide evidence about the modality of an event. REF-ID - An ID that references an existing text span annotation (i.e., the value of this attribute matches the the value of the ID attribute of an existing element corresponding to a text span annotation). The referenced text span annotation may be one of the following: - An “mk-cue” annotation in the same file - An “mk-source” annotation in the same file - An “entity-mention” annotation in the associated original annotation file (i.e., the file with the extension “.apf.xml” within the “timex2norm” subdirectory of the corresponding corpus partition.). As mentioned above, information sources of events may correspond to entities that were previously identified as part of the original ACE 2005 annotation effort (people, organisation, etc). In this case, the REFID value will correspond to the annotation that identifies the mention of the relevant entity within the same sentence. Children ———— A single “extent” element always appears as child, which is used to denote the span covered in the corresponding text file. extent ======= An element used to denote the span covered in the corresponding text file. Children ———— A single charseq element, which provides details of the character offsets in the underlying document, and the text span covered. charseq ======= An element providing precise details about the text span covered by an annotation. The content of this element is the exact text covered by the annotation. Attributes ————— START - The character offset in the underlying document file corresponding to the first character covered by the annotation. END - The character offset in the underlying document file corresponding to the last character covered by the annotation.

近期下载者

相关文件


收藏者