porsche-scrape

所属分类:数据挖掘/数据仓库
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2022-06-22 08:51:18
上 传 者sh-1993
说明:  在这里,我们从保时捷新闻编辑室获取PDF,并尝试以JSON格式从中提取有意义的数据。
(In this we take PDFs from the Porsche Newsroom and try to extract meaningful data from them in a JSON format.)

文件列表:
DiogoSecures.md (2607, 2022-06-22)
HardEarnedPoints.md (4246, 2022-06-22)
PorscheWins.md (4078, 2022-06-22)
PorscheWinsPDFToJSONLines.py (3725, 2022-06-22)
pdfToJSON.py (925, 2022-06-22)
porsche-scrape.yaml (1443, 2022-06-22)
requestArticles.md (1116, 2022-06-22)
requestArticles.py (636, 2022-06-22)

# Porsche Scrape In this we take PDFs from the Porsche Newsroom and try to extract meaningful data from them in a JSON format. First is a basic scrape of all the PDFs within the month of July from the first to the 15th, resulting in 25 PDFs. We then use PyMuPDF to extract information from that and convert it into a JSON format. Finally we process the PDFs to remove extraneous data to allow for proper usage during the NN training stages, which are next.

近期下载者

相关文件


收藏者