ntto
所属分类:WEB开发
开发工具:GO
文件大小:0KB
下载次数:0
上传日期:2019-08-05 14:28:57
上 传 者:
sh-1993
说明: 小型n-三元组到行分隔JSON转换器和前缀切割器。,
(Small n-triples to line delimited JSON converter and prefix cutter.,)
文件列表:
.travis.yml (42, 2019-08-05)
LICENSE (1156, 2019-08-05)
Makefile (967, 2019-08-05)
RULES (800, 2019-08-05)
cmd/ (0, 2019-08-05)
cmd/ntto/ (0, 2019-08-05)
cmd/ntto/ntto.go (4381, 2019-08-05)
common.go (3763, 2019-08-05)
common_test.go (5786, 2019-08-05)
debian/ (0, 2019-08-05)
debian/ntto/ (0, 2019-08-05)
debian/ntto/DEBIAN/ (0, 2019-08-05)
debian/ntto/DEBIAN/control (221, 2019-08-05)
go.mod (37, 2019-08-05)
packaging/ (0, 2019-08-05)
packaging/buildrpm.sh (814, 2019-08-05)
packaging/ntto.spec (1612, 2019-08-05)
rules.go (10238, 2019-08-05)
ntto
====
[![Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.](https://www.repostatus.org/badges/latest/inactive.svg)](https://www.repostatus.org/#inactive)
Minimal n-triples toolkit. It can:
* shrink n-triples by applying namespace abbreviations (given some rules)
* convert n-triples to line delimited JSON (.ldj)
To list the abbreviation rules, run:
$ ntto -d
To create an abbreviated NT file from an NT file, run:
$ ntto -o OUTPUT.NT -a FILE.nt
To create an abbreviated JSON file from an NT file, run:
$ ntto -a -j FILE.nt > OUTPUT.LDJ
To create an abbreviated JSON file from an NT file while ignoring conversion errors, run:
$ ntto -a -j -i FILE.nt > OUTPUT.LDJ
To create an abbreviated JSON file from an NT file while ignoring conversion errors and using a custom RULES file, run:
$ ntto -r RULES -a -j -i FILE.nt > OUTPUT.LDJ
Installation
------------
RPM and DEB packages can be found under [releases](https://github.com/miku/ntto/releases).
With a proper Go setup, a
$ go get github.com/miku/ntto/cmd/ntto
should work as well.
Usage
-----
$ ntto
Usage: ntto [OPTIONS] FILE
-a abbreviate n-triples using rules
-c dump constructed sed command and exit
-cpuprofile string
write cpu profile to file
-d dump rules and exit
-i ignore conversion errors
-j convert nt to json
-n string
string to indicate empty string replacement (default "")
-o string
output file to write result to
-r string
path to rules file, use built-in if none given
-v prints current version and exits
-w int
parallelism measure (default 4)
Mode of operation
-----------------
`ntto` takes a RULES file (alternatively uses some [hardwired](https://github.com/miku/ntto/blob/master/rules.go) rules) to abbreviate
common prefixes in a n-triple file. `ntto` does not do the replacements itself, but outsources it to external programs, like `replace` or `perl`.
With the help of `replace` ntto can shorten up to 3M lines per second. The resulting
file size can be up to 50% of the size of the original file.
Example rules file
------------------
$ cat RULES
# example rules file
dbp http://dbpedia.org/resource/
gnd http://d-nb.info/gnd/
dnbes http://d-nb.info/standards/elementset/gnd#
dnbac http://d-nb.info/standards/vocab/gnd/geographic-area-code#
dnbv http://d-nb.info/standards/vocab/gnd/
viaf http://viaf.org/viaf/
frbr http://rdvocab.info/uri/schema/FRBRentitiesRDA/
rdgr http://rdvocab.info/ElementsGr2/
# empty lines are ignored, as are comments
foaf http://xmlns.com/foaf/0.1/
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
schema http://schema.org/
dc http://purl.org/dc/elements/1.1/
dcterms http://purl.org/dc/terms/
Performance data point
----------------------
$ wc -l file.nt
114171541
$ time ntto -o output.nt -a file.nt
real 1m51.202s
user 1m3.626s
sys 0m13.602s
$ time ntto -a -j file.nt > output.ldj
real 15m47.872s
user 16m19.516s
sys 2m3.013s
Sometimes, less is more, but YMMV:
$ time ntto -w 2 -a -j file.nt > output.ldj
real 12m3.619s
user 15m17.422s
sys 2m14.430s
近期下载者:
相关文件:
收藏者: