dictionary

所属分类:WEB开发
开发工具:XSLT
文件大小:0KB
下载次数:0
上传日期:2023-10-08 18:32:43
上 传 者sh-1993
说明:  GCIDE英语单词词典的Ruby解析器,生成友好的结构化JSON文件,便于大规模数据库导入。包括...,
(A Ruby parser for the GCIDE English word dictionary that generates friendly structured JSON files for easy mass database import. Includes other resources if you need more data for an English dictionary database.)

文件列表:
Gemfile (163, 2024-01-01)
Gemfile.lock (1044, 2024-01-01)
Rakefile (141, 2024-01-01)
dictionary-test.json (5367, 2024-01-01)
dictionary.json (45262333, 2024-01-01)
dist/ (0, 2024-01-01)
dist/gcide/ (0, 2024-01-01)
dist/gcide/gcide_a-entries.json (2829889, 2024-01-01)
dist/gcide/gcide_b-entries.json (2418898, 2024-01-01)
dist/gcide/gcide_c-entries.json (4106607, 2024-01-01)
dist/gcide/gcide_d-entries.json (2403540, 2024-01-01)
dist/gcide/gcide_e-entries.json (1697741, 2024-01-01)
dist/gcide/gcide_f-entries.json (1697716, 2024-01-01)
dist/gcide/gcide_g-entries.json (1287557, 2024-01-01)
dist/gcide/gcide_h-entries.json (1611810, 2024-01-01)
dist/gcide/gcide_i-entries.json (1779943, 2024-01-01)
dist/gcide/gcide_j-entries.json (366733, 2024-01-01)
dist/gcide/gcide_k-entries.json (366047, 2024-01-01)
dist/gcide/gcide_l-entries.json (1469215, 2024-01-01)
dist/gcide/gcide_m-entries.json (2371314, 2024-01-01)
dist/gcide/gcide_n-entries.json (844263, 2024-01-01)
dist/gcide/gcide_o-entries.json (1111809, 2024-01-01)
dist/gcide/gcide_p-entries.json (3573602, 2024-01-01)
dist/gcide/gcide_q-entries.json (223346, 2024-01-01)
dist/gcide/gcide_r-entries.json (1888977, 2024-01-01)
dist/gcide/gcide_s-entries.json (4587510, 2024-01-01)
dist/gcide/gcide_t-entries.json (2181246, 2024-01-01)
dist/gcide/gcide_u-entries.json (698287, 2024-01-01)
dist/gcide/gcide_v-entries.json (659220, 2024-01-01)
dist/gcide/gcide_w-entries.json (970664, 2024-01-01)
dist/gcide/gcide_x-entries.json (47534, 2024-01-01)
dist/gcide/gcide_y-entries.json (112555, 2024-01-01)
dist/gcide/gcide_z-entries.json (117944, 2024-01-01)
lib/ (0, 2024-01-01)
lib/parser.rb (2873, 2024-01-01)
... ...

# English Dictionary ![Tests](https://github.com/javierjulio/dictionary/workflows/Tests/badge.svg) This is a minimally tested and incomplete parser of the Webster Unabriged English Dictionary from the [modified GCIDE XML](http://rali.iro.umontreal.ca/GCIDE/) that categorizes content to make it easy to find and parse. I was doing a lot of research on finding a machine readable English dictionary for a project where I didn't want to rely on a third party API (e.g. Wordnik). ## Generate Simple JSON From the project directory, run the following: ```sh ruby parse.rb ``` This will generate a JSON file for each GCIDE XML file. Each object key is a unique word and the value being an object containing the definitions (array of objects - definition, part of speech, field, and sequence). The files (excluding obsolete content) will contain ~99k unique words and ~160k definitions. ## Resources ### GCIDE After reviewing all resources went first with parsing this GCIDE XML. The next best solution seems to be Wikitionary TSV. * http://rali.iro.umontreal.ca/GCIDE/ (the ZIP download is further down the page) ### Wikitionary TSV * http://aautar.digital-radiation.com/wiktionary-db/wiktionary.E20121127.tsv.zip * http://semisignal.com/?p=5666 (TSV file linked to above and sample code) * https://github.com/boyers/asler/tree/master/scratch ### Webster's Unabridged Dictionary (1913 - public domain) * http://www.mso.anu.edu.au/~ralph/OPTED/index.html * https://github.com/janosgyerik/dictgen (Plain Text parser) * http://en.wiktionary.org/wiki/Wiktionary:Abbreviations_in_Webster ### Moby Word Lists * https://github.com/drichert/moby (Ruby parser for hyphenation, parts-of-speech, and thesaurus)

近期下载者

相关文件


收藏者