GPTEncoder

所属分类:GPT/ChatGPT
开发工具:Swift
文件大小:0KB
下载次数:0
上传日期:2023-04-01 07:39:58
上 传 者sh-1993
说明:  OpenAI GPT型号的Swift BPE编码器解码器。用于为OpenAI ChatGPT API标记文本的编程接口。,
(Swift BPE Encoder Decoder for OpenAI GPT Models. A programmatic interface for tokenizing text for OpenAI ChatGPT API.,)

文件列表:
GPTEncoder.podspec (1246, 2023-04-01)
LICENSE (1070, 2023-04-01)
Package.swift (488, 2023-04-01)
Sources/ (0, 2023-04-01)
Sources/GPTEncoder/ (0, 2023-04-01)
Sources/GPTEncoder/Atomic.swift (854, 2023-04-01)
Sources/GPTEncoder/Extensions.swift (480, 2023-04-01)
Sources/GPTEncoder/GPTEncoder.swift (5204, 2023-04-01)
Sources/GPTEncoder/GPTEncoderResources.swift (852, 2023-04-01)
Sources/GPTEncoder/Helper.swift (1496, 2023-04-01)
Sources/GPTEncoder/Resources/ (0, 2023-04-01)
Sources/GPTEncoder/Resources/encoder.json (1042301, 2023-04-01)
Sources/GPTEncoder/Resources/vocab.bpe (456318, 2023-04-01)
Tests/ (0, 2023-04-01)
Tests/GPTEncoderTests/ (0, 2023-04-01)
Tests/GPTEncoderTests/GPTEncoderTests.swift (776, 2023-04-01)

# GPTEncoder ![Alt text](https://imagizer.imageshack.com/v2/640x480q70/922/a8ueTO.png "image") Swift BPE Encoder/Decoder for OpenAI GPT Models. A programmatic interface for tokenizing text for OpenAI GPT API. The GPT family of models process text using tokens, which are common sequences of characters found in text. The models understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens. You can use the tool below to understand how a piece of text would be tokenized by the API, and the total count of tokens in that piece of text. This library is based on [nodeJS gpt-3-encoder](https://github.com/latitudegames/GPT-3-Encoder) and [OpenAI Official Python GPT Encoder/Decoder](https://github.com/openai/gpt-2) I've also created [GPTTokenizerUI](https://github.com/alfianlosari/GPTTokenizerUI), a SPM lib you can integrate in your app for providing GUI to input text and show the tokenization results used by GPT API. ![Alt text](https://imagizer.imageshack.com/v2/640x480q70/922/CEVvrE.png "image") ## Supported Platforms - iOS/macOS/watchOS/tvOS - Linux ## Installation ### Swift Package Manager - File > Swift Packages > Add Package Dependency - Add - Add https://github.com/alfianlosari/GPTEncoder.git ### Cocoapods ```ruby platform :ios, '15.0' use_frameworks! target 'MyApp' do pod 'GPTEncoder', '~> 1.0.3' end ``` ## Usage ```swift let encoder = SwiftGPTEncoder() let str = "The GPT family of models process text using tokens, which are common sequences of characters found in text." let encoded = encoder.encode(text: str) print("String: \(str)") print("Encoded this string looks like: \(encoded)") print("Total number of token(s): \(encoded.count) and character(s): \(str.count)") print("We can look at each token and what it represents") encoded.forEach { print("Token: \(encoder.decode(tokens: [$0]))") } print(encoded) let decoded = encoder.decode(tokens: encoded) print("We can decode it back into:\n\(decoded)") ``` ### Encode To encode a `String` to array of `Int` tokens, you can simply invoke `encode` passing the string. ```swift let encoded = encoder.encode(text: "The GPT family of models process text using tokens, which are common sequences of characters found in text.") // Output: [464, 402, 11571, 1641, 286, 4981, 1429, 2420, 1262, 16326, 11, 543, 389, 2219, 16311, 286, 3435, 1043, 287, 2420, 13] ``` ### Decode To decode an array of `Int` tokens back to the `String` you can invoke `decode` passing the tokens array. ```swift let decoded = encoder.decode(tokens: [464, 402, 11571, 1641, 286, 4981, 1429, 2420, 1262, 16326, 11, 543, 389, 2219, 16311, 286, 3435, 1043, 287, 2420, 13]) // Output: "The GPT family of models process text using tokens, which are common sequences of characters found in text." ``` ### Clear Cache Internally, a cache is used to improve performance when encoding the tokens, you can reset the cache as well. ```swift encoder.clearCache() ```

近期下载者

相关文件


收藏者