pii-redaction-openai-pdftron
所属分类:人工智能/神经网络/深度学习
开发工具:JavaScript
文件大小:1411KB
下载次数:0
上传日期:2022-04-16 17:59:43
上 传 者:
sh-1993
说明: pii编校openai pdftron,这个例子利用openai来识别pii(名称、地址、DOB),并利用pdftron来进行文本提取和编校,
(This example leverages OpenAI for identifying PII (names, addresses, DOB) and
PDFTron for text extraction and redaction.
,)
文件列表:
LICENSE (1071, 2022-04-17)
client (0, 2022-04-17)
client\package-lock.json (1120823, 2022-04-17)
client\package.json (936, 2022-04-17)
client\public (0, 2022-04-17)
client\public\favicon.ico (3870, 2022-04-17)
client\public\index.html (1721, 2022-04-17)
client\public\logo192.png (5347, 2022-04-17)
client\public\logo512.png (9664, 2022-04-17)
client\public\manifest.json (492, 2022-04-17)
client\public\robots.txt (67, 2022-04-17)
client\src (0, 2022-04-17)
client\src\App.css (596, 2022-04-17)
client\src\App.js (855, 2022-04-17)
client\src\index.css (366, 2022-04-17)
client\src\index.js (219, 2022-04-17)
client\tools (0, 2022-04-17)
client\tools\copy-webviewer-files.js (275, 2022-04-17)
screenshot.png (706479, 2022-04-17)
server (0, 2022-04-17)
server\app.js (7055, 2022-04-17)
server\files (0, 2022-04-17)
server\files\legal-contract.pdf (514405, 2022-04-17)
server\modules (0, 2022-04-17)
server\modules\mimeType.js (478, 2022-04-17)
server\package-lock.json (142772, 2022-04-17)
server\package.json (541, 2022-04-17)
server\server.js (250, 2022-04-17)
# pii-redaction-openai-pdftron
This example leverages OpenAI for identifying PII (names, addresses, DOB) and PDFTron for text extraction and redaction.
![Screenshot](https://github.com/andreysaf/pii-redaction-openai-pdftron/blob/main/screenshot.png?raw=true "Screenshot")
## Installation
Inside of `server/` create a new file called `config.env` and place the demo key from [PDFTron](https://www.pdftron.com/download-center/mac/) and [Open.AI](https://beta.openai.com/docs/introduction):
```
PORT=9000
PDFTRONKEY=
OPENAI_API_KEY=
```
After in the terminal run the following:
```
cd client
npm i
npm start
```
```
cd server
npm i
npm start
```
## Walkthrough
Node.js server will act as a file storage. [PDFTron Node.js SDK](https://www.pdftron.com/documentation/nodejs/get-started/integration/) will extract text, search, and create markup annotations. [Open.AI](https://openai.com/api/) will detect names and addresses from the text provided by PDFTron.
### PII Identification
`getNamesAndAddressesFromOpenAI` accepts text extracted from a document, and builds a `prompt` that accepts a natural language command to extract names and addresses. It can be modified to search for other information. For testing purposes the function is commented out. Please uncomment and build your `prompt` as needed.
```javascript
const getNamesAndAddressesFromOpenAI = async (text) => {
return await openai.createCompletion('text-davinci-002', {
prompt: `Extract names and address from this text: ${text}`,
temperature: 0,
max_tokens: ***,
top_p: 1.0,
frequency_penalty: 0.0,
presence_penalty: 0.0,
});
};
```
### Summarization
Summarization of the contract works in a similar way to PII search, where inside of the `prompt` `Tl;dr` is added to the end of the string that needs to be summarized. For testing purposes the function is commented out. Please uncomment and build your `prompt` as needed.
```javascript
const summarizeTheContract = async (text) => {
return await openai.createCompletion('text-davinci-002', {
prompt: `${text} \n\nTl;dr`,
temperature: 0.7,
max_tokens: 60,
top_p: 1.0,
frequency_penalty: 0.0,
presence_penalty: 0.0,
});
};
```
Here is a sample summarization of the file in the repository.
```
This is a contract between a company and a bank for the sale of goods. The company agrees to sell the goods to the bank for a sum of money, and the bank agrees to purchase the goods from the company. The contract includes terms and conditions for the sale and purchase of the goods
```
近期下载者:
相关文件:
收藏者: