pii-redaction-openai-pdftron

所属分类:人工智能/神经网络/深度学习
开发工具:JavaScript
文件大小:1411KB
下载次数:0
上传日期:2022-04-16 17:59:43
上 传 者sh-1993
说明:  pii编校openai pdftron,这个例子利用openai来识别pii(名称、地址、DOB),并利用pdftron来进行文本提取和编校,
(This example leverages OpenAI for identifying PII (names, addresses, DOB) and PDFTron for text extraction and redaction. ,)

文件列表:
LICENSE (1071, 2022-04-17)
client (0, 2022-04-17)
client\package-lock.json (1120823, 2022-04-17)
client\package.json (936, 2022-04-17)
client\public (0, 2022-04-17)
client\public\favicon.ico (3870, 2022-04-17)
client\public\index.html (1721, 2022-04-17)
client\public\logo192.png (5347, 2022-04-17)
client\public\logo512.png (9664, 2022-04-17)
client\public\manifest.json (492, 2022-04-17)
client\public\robots.txt (67, 2022-04-17)
client\src (0, 2022-04-17)
client\src\App.css (596, 2022-04-17)
client\src\App.js (855, 2022-04-17)
client\src\index.css (366, 2022-04-17)
client\src\index.js (219, 2022-04-17)
client\tools (0, 2022-04-17)
client\tools\copy-webviewer-files.js (275, 2022-04-17)
screenshot.png (706479, 2022-04-17)
server (0, 2022-04-17)
server\app.js (7055, 2022-04-17)
server\files (0, 2022-04-17)
server\files\legal-contract.pdf (514405, 2022-04-17)
server\modules (0, 2022-04-17)
server\modules\mimeType.js (478, 2022-04-17)
server\package-lock.json (142772, 2022-04-17)
server\package.json (541, 2022-04-17)
server\server.js (250, 2022-04-17)

# pii-redaction-openai-pdftron This example leverages OpenAI for identifying PII (names, addresses, DOB) and PDFTron for text extraction and redaction. ![Screenshot](https://github.com/andreysaf/pii-redaction-openai-pdftron/blob/main/screenshot.png?raw=true "Screenshot") ## Installation Inside of `server/` create a new file called `config.env` and place the demo key from [PDFTron](https://www.pdftron.com/download-center/mac/) and [Open.AI](https://beta.openai.com/docs/introduction): ``` PORT=9000 PDFTRONKEY= OPENAI_API_KEY= ``` After in the terminal run the following: ``` cd client npm i npm start ``` ``` cd server npm i npm start ``` ## Walkthrough Node.js server will act as a file storage. [PDFTron Node.js SDK](https://www.pdftron.com/documentation/nodejs/get-started/integration/) will extract text, search, and create markup annotations. [Open.AI](https://openai.com/api/) will detect names and addresses from the text provided by PDFTron. ### PII Identification `getNamesAndAddressesFromOpenAI` accepts text extracted from a document, and builds a `prompt` that accepts a natural language command to extract names and addresses. It can be modified to search for other information. For testing purposes the function is commented out. Please uncomment and build your `prompt` as needed. ```javascript const getNamesAndAddressesFromOpenAI = async (text) => { return await openai.createCompletion('text-davinci-002', { prompt: `Extract names and address from this text: ${text}`, temperature: 0, max_tokens: ***, top_p: 1.0, frequency_penalty: 0.0, presence_penalty: 0.0, }); }; ``` ### Summarization Summarization of the contract works in a similar way to PII search, where inside of the `prompt` `Tl;dr` is added to the end of the string that needs to be summarized. For testing purposes the function is commented out. Please uncomment and build your `prompt` as needed. ```javascript const summarizeTheContract = async (text) => { return await openai.createCompletion('text-davinci-002', { prompt: `${text} \n\nTl;dr`, temperature: 0.7, max_tokens: 60, top_p: 1.0, frequency_penalty: 0.0, presence_penalty: 0.0, }); }; ``` Here is a sample summarization of the file in the repository. ``` This is a contract between a company and a bank for the sale of goods. The company agrees to sell the goods to the bank for a sum of money, and the bank agrees to purchase the goods from the company. The contract includes terms and conditions for the sale and purchase of the goods ```

近期下载者

相关文件


收藏者