gpt3-tokenizer-java
所属分类:GPT/ChatGPT
开发工具:Java
文件大小:1556KB
下载次数:0
上传日期:2023-04-09 13:00:00
上 传 者:
sh-1993
说明: GPT3 4令牌化器的Java实现。
(Java implementation of a GPT3/4 tokenizer.
,)
文件列表:
LICENSE (1065, 2023-04-09)
build.gradle (3081, 2023-04-09)
gradle (0, 2023-04-09)
gradle\wrapper (0, 2023-04-09)
gradle\wrapper\gradle-wrapper.jar (60756, 2023-04-09)
gradle\wrapper\gradle-wrapper.properties (202, 2023-04-09)
gradlew (8188, 2023-04-09)
gradlew.bat (2747, 2023-04-09)
settings.gradle (42, 2023-04-09)
src (0, 2023-04-09)
src\main (0, 2023-04-09)
src\main\java (0, 2023-04-09)
src\main\java\com (0, 2023-04-09)
src\main\java\com\didalgo (0, 2023-04-09)
src\main\java\com\didalgo\gpt3 (0, 2023-04-09)
src\main\java\com\didalgo\gpt3\ByteSequence.java (6694, 2023-04-09)
src\main\java\com\didalgo\gpt3\ChatFormatDescriptor.java (1196, 2023-04-09)
src\main\java\com\didalgo\gpt3\Encoding.java (7044, 2023-04-09)
src\main\java\com\didalgo\gpt3\GPT3Tokenizer.java (7935, 2023-04-09)
src\main\java\com\didalgo\gpt3\TokenCount.java (5658, 2023-04-09)
src\main\resources (0, 2023-04-09)
src\main\resources\com (0, 2023-04-09)
src\main\resources\com\didalgo (0, 2023-04-09)
src\main\resources\com\didalgo\gpt3 (0, 2023-04-09)
src\main\resources\com\didalgo\gpt3\cl100k_base.tiktoken (1681126, 2023-04-09)
src\main\resources\com\didalgo\gpt3\p50k_base.tiktoken (836186, 2023-04-09)
src\main\resources\com\didalgo\gpt3\r50k_base.tiktoken (835554, 2023-04-09)
src\test (0, 2023-04-09)
src\test\java (0, 2023-04-09)
src\test\java\com (0, 2023-04-09)
src\test\java\com\didalgo (0, 2023-04-09)
src\test\java\com\didalgo\gpt3 (0, 2023-04-09)
src\test\java\com\didalgo\gpt3\ByteSequenceTest.java (2840, 2023-04-09)
src\test\java\com\didalgo\gpt3\GPT3TokenizerTest.java (17632, 2023-04-09)
... ...
# GPT3/4 Java Tokenizer
[![License: MIT](https://img.shields.io/github/license/didalgo2/gpt3-tokenizer-java?style=flat-square)](https://opensource.org/license/mit/)
![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/didalgo2/gpt3-tokenizer-java/gradle.yml?style=flat-square)
[![Maven Central](https://img.shields.io/maven-central/v/com.didalgo/gpt3-tokenizer?style=flat-square)](https://central.sonatype.com/artifact/com.didalgo/gpt3-tokenizer/0.1.2)
This is a Java implementation of a GPT3/4 tokenizer, loosely ported from [Tiktoken](https://github.com/openai/tiktoken) with the help of [ChatGPT](https://openai.com/blog/chatgpt).
## Usage Examples
### Encoding Text to Tokens
```java
GPT3Tokenizer tokenizer = new GPT3Tokenizer(Encoding.CL100K_BASE);
List tokens = tokenizer.encode("example text here");
```
### Decoding Tokens to Text
```java
GPT3Tokenizer tokenizer = new GPT3Tokenizer(Encoding.CL100K_BASE);
List tokens = Arrays.asList(123, 456, 789);
String text = tokenizer.decode(tokens);
```
### Counting Number of Tokens in Chat Messages
```java
GPT3Tokenizer tokenizer = new GPT3Tokenizer(Encoding.CL100K_BASE);
int tokens = TokenCount.fromMessages(messages, tokenizer, ChatFormatDescriptor.forModel("gpt-3.5-turbo"));
```
## License
This project is licensed under the MIT License.
近期下载者:
相关文件:
收藏者: