go-gpt-3-encoder
所属分类:GPT/ChatGPT
开发工具:GO
文件大小:560KB
下载次数:0
上传日期:2023-03-12 18:53:58
上 传 者:
sh-1993
说明: 用于GPT2和GPT3的Go BPE令牌化器(编码器+解码器)
(Go BPE tokenizer (Encoder+Decoder) for GPT2 and GPT3
,)
文件列表:
LICENSE (1070, 2023-03-13)
Makefile (1044, 2023-03-13)
encoder.go (7871, 2023-03-13)
encoder.json (1042301, 2023-03-13)
encoder_test.go (10162, 2023-03-13)
go.mod (377, 2023-03-13)
go.sum (2057, 2023-03-13)
utils.go (317, 2023-03-13)
vocab.bpe (456318, 2023-03-13)
# go-gpt-3-encoder
Go BPE tokenizer (Encoder+Decoder) for GPT2 and GPT3.
## About
GPT2 and GPT3 use byte pair encoding to turn text into a series of integers to feed into the model. This is a Go implementation of OpenAI's original Python encoder/decoder which can be found [here](https://github.com/openai/gpt-2/blob/master/src/encoder.py).
This code was inspired by [Javascript implementation](https://github.com/latitudegames/GPT-3-Encoder) and partially generated by OpenAI himself!
## Install
```bash
go get github.com/samber/go-gpt-3-encoder
```
## Usage
Compatible with Node >= 12
```go
import tokenizer "github.com/samber/go-gpt-3-encoder"
encoder, err := tokenizer.NewEncoder()
if err != nil {
log.Fatal(err)
}
str := "This is an example sentence to try encoding out on!"
encoded, err := encoder.Encode(str)
if err != nil {
log.Fatal(err)
}
fmt.Println("We can look at each token and what it represents:")
for _, token := range encoded {
fmt.Printf("%d -- %s\n", token, encoder.Decode([]int{token}))
}
decoded := encoder.Decode(encoded)
fmt.Printf("We can decode it back into: %s\n", decoded)
```
## Contribute
Some corner cases are not covered by this library. See `@TODO` in tests.
近期下载者:
相关文件:
收藏者: