nanotokenizer
所属分类:特征抽取
开发工具:C++
文件大小:0KB
下载次数:0
上传日期:2024-01-30 13:51:16
上 传 者:
sh-1993
说明: C中的纳米标记器++
(Nanoscale tokenizer in C++)
文件列表:
hat-trie/
LICENSE
Makefile
minijson.h
rwkv_vocab_v20230424.json
rwkv_world_tokenizer.cc
rwkv_world_tokenizer.hh
rwkv_world_tokenizer_example.cc
# Nanoscale tokenizer in C++
Nanoscale tokenizer in C++.
Currently RWKV world tokenizer is implemented.
## Features
* Easy to embed
* Read vocab from JSON(through minijson)
### Additional feature to RWKV world tokenizer(TODO)
* [ ] UTF-8 fallback(with fallback token_id. default 65530)
## TODO
* [ ] Make C++ Exception free
## Third party libraries
* minijson : MIT license https://github.com/syoyo/minijson
* hat-trie: MIT license https://github.com/Tessil/hat-trie
* rwkv_world_tokenizer : Apache 2.0 license https://github.com/mlc-ai/tokenizers-cpp
* rwkv_vocab_v20230424.json : Not sure, but would be Apache 2.0 also. https://github.com/BlinkDL/ChatRWKV
近期下载者:
相关文件:
收藏者: