seo-web-scraping-pycon-ca-2018

所属分类:搜索引擎
开发工具:Jupyter Notebook
文件大小:1161KB
下载次数:0
上传日期:2018-11-03 17:55:06
上 传 者sh-1993
说明:  使用Python进行以SEO为中心的web抓取
(SEO focused web scraping with Python)

文件列表:
backup-html (0, 2018-11-04)
backup-html\2018.pycon.ca.html (31365, 2018-11-04)
backup-html\pythonbytes.fm.html (29717, 2018-11-04)
backup-html\www.rei.com_c_mens-shoes_f_f-waterproof.html (399880, 2018-11-04)
data (0, 2018-11-04)
data\info.txt (16, 2018-11-04)
jupyter-notebooks (0, 2018-11-04)
jupyter-notebooks\seo-web-scraping-pycon-ca-2018.ipynb (1182295, 2018-11-04)
jupyter-notebooks\src (0, 2018-11-04)
jupyter-notebooks\src\awkward.jpg (28244, 2018-11-04)
jupyter-notebooks\src\html-tag.png (34882, 2018-11-04)
jupyter-notebooks\src\http-request.png (29045, 2018-11-04)
jupyter-notebooks\src\http-response-codes.jpg (137821, 2018-11-04)
jupyter-notebooks\src\info.txt (28, 2018-11-04)
jupyter-notebooks\src\pycon-ca-logo.png (20086, 2018-11-04)
jupyter-notebooks\src\request-structure.png (22586, 2018-11-04)
jupyter-notebooks\src\selenium-logo.jpeg (30210, 2018-11-04)

# Python for SEO: Web Scraping In this workshop, I want to show off the open-source tools we (at [Ayima](https://www.ayima.com/)) leverage from Python’s ecosystem, and present them in a guided format. - We’ll first look at the basics of web requests with the requests library and show simple HTML parsing with BeautifulSoup4. - Then we’ll get into some more advanced details of each, including request sessions, passing cookies, custom user agents and more detailed HTML parsing techniques. - Finally we’ll conclude by showing how selenium can be used to render JavaScript when making requests. ### Dependencies - python (3.6+) - [requests](https://pypi.org/project/requests/) - [beautifulsoup4](https://pypi.org/project/beautifulsoup4/) - [pandas](https://pypi.org/project/pandas/) - [selenium](https://pypi.org/project/selenium/) (+ [chromedriver](http://chromedriver.chromium.org/downloads)) ### No Internet Connection? Some of the tutorial can still be completed without an internet connection. 1. Start a local server from `backup-html` folder: ``` cd backup-html python -m http.server ``` 2. Replace all URLs with the equivalent local files: e.g. ``` https://2018.pycon.ca -> http://localhost:8000/2018.pycon.ca.html ```

近期下载者

相关文件


收藏者