gpt3-tokenizer-php
所属分类:特征抽取
开发工具:PHP
文件大小:571KB
下载次数:0
上传日期:2023-03-09 23:55:27
上 传 者:
sh-1993
说明: gpt3标记器php,,
(gpt3-tokenizer-php,,)
文件列表:
.editorconfig (337, 2023-02-07)
.php-cs-fixer.dist.php (212, 2023-02-07)
CHANGELOG.md (327, 2023-02-07)
LICENSE (1062, 2023-02-07)
composer.json (1096, 2023-02-07)
composer.lock (139789, 2023-02-07)
data (0, 2023-02-07)
data\characters.json (2615, 2023-02-07)
data\encoder.json (1042301, 2023-02-07)
data\vocab.bpe (456318, 2023-02-07)
phpstan.neon (66, 2023-02-07)
rector.php (569, 2023-02-07)
src (0, 2023-02-07)
src\Encoder.php (8906, 2023-02-07)
tests (0, 2023-02-07)
tests\EncoderTest.php (2106, 2023-02-07)
# GPT3 tokenizer
PHP Text Tokenizer for GPT models
## About
A PHP toolkit to tokenize text like GPT family of models process it.
Forked from https://github.com/CodeRevolutionPlugins/GPT-3-Encoder-PHP to fit our usage, fix bugs and add unit testing.
## Usage
The mbstring PHP extension is needed for this tool to work correctly (in case non-ASCII characters are present in the tokenized text): [details here on how to install mbstring](https://www.php.net/manual/en/mbstring.installation.php)
PHP 8.1 is needed too;
```php
use Semji\GPT3Tokenizer\Encoder;
$prompt = "Many words map";
$encoder = new Encoder();
$encoder->encode($prompt);
```
近期下载者:
相关文件:
收藏者: