vibrrt

所属分类:特征抽取
开发工具:R
文件大小:0KB
下载次数:0
上传日期:2023-08-27 00:38:16
上 传 者sh-1993
说明:  Vibrato的R包装器:基于Viterbi的加速标记器,
(An R wrapper of Vibrato : Viterbi-based accelerated tokenizer,)

文件列表:
.Rbuildignore (117, 2023-12-23)
.devcontainer/ (0, 2023-12-23)
.devcontainer/devcontainer.json (654, 2023-12-23)
DESCRIPTION (668, 2023-12-23)
LICENSE (44, 2023-12-23)
LICENSE.md (1073, 2023-12-23)
NAMESPACE (634, 2023-12-23)
R/ (0, 2023-12-23)
R/bind_lr.R (2069, 2023-12-23)
R/bind_tf_idf2.R (5225, 2023-12-23)
R/collapse_tokens.R (1251, 2023-12-23)
R/dictionary.R (2596, 2023-12-23)
R/extendr-wrappers.R (425, 2023-12-23)
R/imports.R (109, 2023-12-23)
R/lex_density.R (1362, 2023-12-23)
R/mute_tokens.R (657, 2023-12-23)
R/pack.R (2533, 2023-12-23)
R/prettify.R (4514, 2023-12-23)
R/tokenize.R (4579, 2023-12-23)
R/utils.R (1894, 2023-12-23)
_pkgdown.yml (156, 2023-12-23)
inst/ (0, 2023-12-23)
inst/user.csv (231, 2023-12-23)
man/ (0, 2023-12-23)
man/as_tokens.Rd (739, 2023-12-23)
man/bind_lr.Rd (991, 2023-12-23)
man/bind_tf_idf2.Rd (2168, 2023-12-23)
man/collapse_tokens.Rd (786, 2023-12-23)
man/dict_path.Rd (584, 2023-12-23)
man/download_dict.Rd (721, 2023-12-23)
man/get_dict_features.Rd (939, 2023-12-23)
man/is_blank.Rd (459, 2023-12-23)
... ...

# vibrrt > An R wrapper of ‘[Vibrato](https://github.com/daac-tools/vibrato)’: > Viterbi-based accelerated tokenizer [![vibrrt status badge](https://paithiov909.r-universe.dev/badges/vibrrt)](https://paithiov909.r-universe.dev) [![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) [![R-CMD-check](https://github.com/paithiov909/vibrrt/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/paithiov909/vibrrt/actions/workflows/R-CMD-check.yaml) ## Installation ``` r install.packages("vibrrt", repos = "https://paithiov909.r-universe.dev") ``` ## Usage ``` r ipadic <- vibrrt::dict_path("ipadic-mecab-2_7_0") if (!file.exists(ipadic)) { vibrrt::download_dict("ipadic-mecab-2_7_0") } gibasa::ginga[5:10] |> vibrrt::tokenize(sys_dic = ipadic) |> vibrrt::prettify(col_select = c("POS1", "POS2")) ``` ## Benchmark ``` r microbenchmark::microbenchmark( gibasa = gibasa::tokenize(gibasa::ginga, mode = "wakati"), vibrrt_ipadic = vibrrt::tokenize( gibasa::ginga, sys_dic = vibrrt::dict_path("ipadic-mecab-2_7_0"), mode = "wakati" ), times = 10L, check = "equal" ) #> Unit: milliseconds #> expr min lq mean median uq max neval #> gibasa 104.3151 107.1517 113.6609 108.2799 116.5667 151.5655 10 #> vibrrt_ipadic 400.1805 406.0459 437.5194 423.5166 456.0265 521.1660 10 ``` ``` r microbenchmark::microbenchmark( gibasa = gibasa::tokenize( gibasa::ginga, sys_dic = "/usr/local/lib/python3.10/dist-packages/unidic_lite/dicdir", mode = "wakati" ), vibrrt_unidic = vibrrt::tokenize( gibasa::ginga, sys_dic = vibrrt::dict_path("unidic-mecab-2_1_2"), mode = "wakati" ), times = 5L ) #> Unit: milliseconds #> expr min lq mean median uq max neval #> gibasa 386.1541 390.373 1158.620 544.906 630.9402 3840.727 5 #> vibrrt_unidic 2334.7467 2352.088 2628.865 2404.555 2474.3871 3578.548 5 ```

近期下载者

相关文件


收藏者