tokenizers-ruby

所属分类:特征抽取
开发工具:Rust
文件大小:62KB
下载次数:0
上传日期:2023-05-10 06:45:58
上 传 者sh-1993
说明:  Ruby的快速最先进标记器
(Fast state-of-the-art tokenizers for Ruby)

文件列表:
CHANGELOG.md (1623, 2023-10-07)
Cargo.lock (23765, 2023-10-07)
Cargo.toml (88, 2023-10-07)
Gemfile (86, 2023-10-07)
LICENSE.txt (11358, 2023-10-07)
Rakefile (1394, 2023-10-07)
ext (0, 2023-10-07)
ext\tokenizers (0, 2023-10-07)
ext\tokenizers\Cargo.toml (499, 2023-10-07)
ext\tokenizers\extconf.rb (84, 2023-10-07)
ext\tokenizers\src (0, 2023-10-07)
ext\tokenizers\src\decoders.rs (13355, 2023-10-07)
ext\tokenizers\src\encoding.rs (2534, 2023-10-07)
ext\tokenizers\src\error.rs (484, 2023-10-07)
ext\tokenizers\src\lib.rs (5796, 2023-10-07)
ext\tokenizers\src\models.rs (14780, 2023-10-07)
ext\tokenizers\src\normalizers.rs (16121, 2023-10-07)
ext\tokenizers\src\pre_tokenizers.rs (17561, 2023-10-07)
ext\tokenizers\src\processors.rs (6787, 2023-10-07)
ext\tokenizers\src\tokenizer.rs (16467, 2023-10-07)
ext\tokenizers\src\trainers.rs (28180, 2023-10-07)
ext\tokenizers\src\utils (0, 2023-10-07)
ext\tokenizers\src\utils\mod.rs (75, 2023-10-07)
ext\tokenizers\src\utils\normalization.rs (2642, 2023-10-07)
ext\tokenizers\src\utils\regex.rs (656, 2023-10-07)
lib (0, 2023-10-07)
lib\tokenizers.rb (1860, 2023-10-07)
lib\tokenizers (0, 2023-10-07)
lib\tokenizers\char_bpe_tokenizer.rb (655, 2023-10-07)
lib\tokenizers\decoders (0, 2023-10-07)
lib\tokenizers\decoders\bpe_decoder.rb (141, 2023-10-07)
lib\tokenizers\decoders\ctc.rb (214, 2023-10-07)
lib\tokenizers\decoders\metaspace.rb (194, 2023-10-07)
... ...

# Tokenizers Ruby :slightly_smiling_face: Fast state-of-the-art [tokenizers](https://github.com/huggingface/tokenizers) for Ruby [![Build Status](https://github.com/ankane/tokenizers-ruby/workflows/build/badge.svg?branch=master)](https://github.com/ankane/tokenizers-ruby/actions) ## Installation Add this line to your applications Gemfile: ```ruby gem "tokenizers" ``` ## Getting Started Load a pretrained tokenizer ```ruby tokenizer = Tokenizers.from_pretrained("bert-base-cased") ``` Encode ```ruby encoded = tokenizer.encode("I can feel the magic, can you?") encoded.tokens encoded.ids ``` Decode ```ruby tokenizer.decode(ids) ``` Load a tokenizer from files ```ruby tokenizer = Tokenizers::CharBPETokenizer.new("vocab.json", "merges.txt") ``` ## Training Check out the [Quicktour](https://huggingface.co/docs/tokenizers/quicktour) and equivalent [Ruby code](https://github.com/ankane/tokenizers-ruby/blob/master/test/quicktour_test.rb#L8) ## History View the [changelog](https://github.com/ankane/tokenizers-ruby/blob/master/CHANGELOG.md) ## Contributing Everyone is encouraged to help improve this project. Here are a few ways you can help: - [Report bugs](https://github.com/ankane/tokenizers-ruby/issues) - Fix bugs and [submit pull requests](https://github.com/ankane/tokenizers-ruby/pulls) - Write, clarify, or fix documentation - Suggest or add new features To get started with development: ```sh git clone https://github.com/ankane/tokenizers-ruby.git cd tokenizers-ruby bundle install bundle exec rake compile bundle exec rake download:files bundle exec rake test ```

近期下载者

相关文件


收藏者