gpt3-tokenizer-java

所属分类:GPT/ChatGPT
开发工具:Java
文件大小:1556KB
下载次数:0
上传日期:2023-04-09 13:00:00
上 传 者sh-1993
说明:  GPT3 4令牌化器的Java实现。
(Java implementation of a GPT3/4 tokenizer. ,)

文件列表:
LICENSE (1065, 2023-04-09)
build.gradle (3081, 2023-04-09)
gradle (0, 2023-04-09)
gradle\wrapper (0, 2023-04-09)
gradle\wrapper\gradle-wrapper.jar (60756, 2023-04-09)
gradle\wrapper\gradle-wrapper.properties (202, 2023-04-09)
gradlew (8188, 2023-04-09)
gradlew.bat (2747, 2023-04-09)
settings.gradle (42, 2023-04-09)
src (0, 2023-04-09)
src\main (0, 2023-04-09)
src\main\java (0, 2023-04-09)
src\main\java\com (0, 2023-04-09)
src\main\java\com\didalgo (0, 2023-04-09)
src\main\java\com\didalgo\gpt3 (0, 2023-04-09)
src\main\java\com\didalgo\gpt3\ByteSequence.java (6694, 2023-04-09)
src\main\java\com\didalgo\gpt3\ChatFormatDescriptor.java (1196, 2023-04-09)
src\main\java\com\didalgo\gpt3\Encoding.java (7044, 2023-04-09)
src\main\java\com\didalgo\gpt3\GPT3Tokenizer.java (7935, 2023-04-09)
src\main\java\com\didalgo\gpt3\TokenCount.java (5658, 2023-04-09)
src\main\resources (0, 2023-04-09)
src\main\resources\com (0, 2023-04-09)
src\main\resources\com\didalgo (0, 2023-04-09)
src\main\resources\com\didalgo\gpt3 (0, 2023-04-09)
src\main\resources\com\didalgo\gpt3\cl100k_base.tiktoken (1681126, 2023-04-09)
src\main\resources\com\didalgo\gpt3\p50k_base.tiktoken (836186, 2023-04-09)
src\main\resources\com\didalgo\gpt3\r50k_base.tiktoken (835554, 2023-04-09)
src\test (0, 2023-04-09)
src\test\java (0, 2023-04-09)
src\test\java\com (0, 2023-04-09)
src\test\java\com\didalgo (0, 2023-04-09)
src\test\java\com\didalgo\gpt3 (0, 2023-04-09)
src\test\java\com\didalgo\gpt3\ByteSequenceTest.java (2840, 2023-04-09)
src\test\java\com\didalgo\gpt3\GPT3TokenizerTest.java (17632, 2023-04-09)
... ...

# GPT3/4 Java Tokenizer [![License: MIT](https://img.shields.io/github/license/didalgo2/gpt3-tokenizer-java?style=flat-square)](https://opensource.org/license/mit/) ![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/didalgo2/gpt3-tokenizer-java/gradle.yml?style=flat-square) [![Maven Central](https://img.shields.io/maven-central/v/com.didalgo/gpt3-tokenizer?style=flat-square)](https://central.sonatype.com/artifact/com.didalgo/gpt3-tokenizer/0.1.2) This is a Java implementation of a GPT3/4 tokenizer, loosely ported from [Tiktoken](https://github.com/openai/tiktoken) with the help of [ChatGPT](https://openai.com/blog/chatgpt). ## Usage Examples ### Encoding Text to Tokens ```java GPT3Tokenizer tokenizer = new GPT3Tokenizer(Encoding.CL100K_BASE); List tokens = tokenizer.encode("example text here"); ``` ### Decoding Tokens to Text ```java GPT3Tokenizer tokenizer = new GPT3Tokenizer(Encoding.CL100K_BASE); List tokens = Arrays.asList(123, 456, 789); String text = tokenizer.decode(tokens); ``` ### Counting Number of Tokens in Chat Messages ```java GPT3Tokenizer tokenizer = new GPT3Tokenizer(Encoding.CL100K_BASE); int tokens = TokenCount.fromMessages(messages, tokenizer, ChatFormatDescriptor.forModel("gpt-3.5-turbo")); ``` ## License This project is licensed under the MIT License.

近期下载者

相关文件


收藏者