Web_Crawling_And_Scraping

所属分类:Julia编程
开发工具:Julia
文件大小:0KB
下载次数:0
上传日期:2024-01-31 16:49:16
上 传 者sh-1993
说明:  西北大学数据工程与围棋课程“爬网和刮网”作业代码库
(Repository of code for the "Crawling and Scraping the Web" Assignment for Northwestern s Data Engineering with Go course)

文件列表:
scrapedOutputGo.jl
web_crawling_and_scraping.go

For the project included within this Github repository, we imagine that we are a data scientist at a company whose leadership team is interested in leveraging Go for its web crawling and scraping processes. The leadership team at this company has had success in the past crawling across webpages and scraping data from them via Python's Scrapy module. However, given that the leadership team believes that Go is faster at computing and processing than Python is, the leadership team has asked the data scientist to study whether Go has the same capability to crawl across webpages, scrape information, compile the data into a JSON format, and export the data into an output julia script file. If so, then the leadership team may be interested in switching its data engineering code from leveraging Python to leveraging Go. Accordingly, the data scientist in this study created the "web_crawling_and_scraping" Go script included within this repository to crawl across the Wikipedia pages for the following topics and scrape information from them: Robotics, Robot Reinforcement Learning, Robot Operating System, Intelligent Agent, Software Agent, Robotic Process Automation, Chatbot, Applications of Artificial Intelligence, and Android (Robot). For each of these webpages, the data scientist extracted the webpage's title, the webpage's URL, and the text from the body of the webpage. The Go program then compiled this data into a JSON for each webpage and exported all the scraped data into the "scrapedOutputGo" output julia file. If I were the data scientist making a recommendation to the leadership team of this company based on the findings of this study, I likely would recommend that the company begin leveraging Go instead of Python for its data engineering projects. That's because previous experiments at this company (as described in the Command_Line_Applications and Benchmarking-Project Github repositories) have proven that Go is a much faster language computationally than Python is. Given that Go can also produce equally excellent outputs when crawling and scraping the internet, Go should be the data engineering language of choice for this company.

近期下载者

相关文件


收藏者