nanotokenizer

所属分类:特征抽取
开发工具:C++
文件大小:0KB
下载次数:0
上传日期:2024-01-30 13:51:16
上 传 者sh-1993
说明:  C中的纳米标记器++
(Nanoscale tokenizer in C++)

文件列表:
hat-trie/
LICENSE
Makefile
minijson.h
rwkv_vocab_v20230424.json
rwkv_world_tokenizer.cc
rwkv_world_tokenizer.hh
rwkv_world_tokenizer_example.cc

# Nanoscale tokenizer in C++ Nanoscale tokenizer in C++. Currently RWKV world tokenizer is implemented. ## Features * Easy to embed * Read vocab from JSON(through minijson) ### Additional feature to RWKV world tokenizer(TODO) * [ ] UTF-8 fallback(with fallback token_id. default 65530) ## TODO * [ ] Make C++ Exception free ## Third party libraries * minijson : MIT license https://github.com/syoyo/minijson * hat-trie: MIT license https://github.com/Tessil/hat-trie * rwkv_world_tokenizer : Apache 2.0 license https://github.com/mlc-ai/tokenizers-cpp * rwkv_vocab_v20230424.json : Not sure, but would be Apache 2.0 also. https://github.com/BlinkDL/ChatRWKV

近期下载者

相关文件


收藏者