pq_parser

所属分类:数据库系统
开发工具:Jupyter Notebook
文件大小:228KB
下载次数:0
上传日期:2020-06-16 14:16:18
上 传 者sh-1993
说明:  将ProQuest全球新闻流数据库下载的文本文件解析为元数据和全文CSV的脚本。
(Script to parse text file downloads from ProQuest s Global Newsstream database into CSV of metadata and full text.)

文件列表:
LICENSE.md (14831, 2020-06-16)
imgs (0, 2020-06-16)
imgs\pq_dl.png (86442, 2020-06-16)
imgs\pq_save.png (121078, 2020-06-16)
imgs\pq_save_feb20.png (21910, 2020-06-16)
pq_parser.ipynb (10471, 2020-06-16)

## Parse ProQuest Metadata This notebook includes a python function to parse newspaper articles downloaded from ProQuest Global Newsstream into one CSV file with metadata and full text (when full text is available). Created by Cody Hennesy and David Naughton (University of Minnesota, Twin Cities, Libraries). Email Cody (chennesy@umn.edu) with any questions. UPDATE (Feb 3, 2020): ProQuest re-enabled the save as text option, so the parsing code included here is once again working. For an alternative approach using R and saving documents as HTML files, [Jae Yeon Kim's Tidy Ethnic News parser](https://github.com/jaeyk/tidyethnicnews). See also: [Factiva parser](https://github.com/chennesy/factiva_parser) ~~NOTE: As of ~ September 15, 2019, ProQuest disabled the Save as "Text" option for multiple search results that this script requires to function. Requests are in to restore this functionality.~~

近期下载者

相关文件


收藏者