search-engines 联合开发网

Pudn.com > 下载中心 > Python编程 > search-engines

search-engines

python search-engine google bing scraping

所属分类：Python编程
开发工具：Python
文件大小：0KB
下载次数：0
上传日期：2024-01-26 23:46:25
上传者：sh-1993

说明：查询和抓取搜索引擎（谷歌、谷歌新闻、雅虎、雅虎新闻、必应、必应新闻、Ask、狗桩、狗桩新闻）
(Query and scrape search engines (Google, Google News, Yahoo, Yahoo News, Bing, Bing News, Ask, Dogpile, Dogpile News))

文件列表:

search_engines/
tests/
poetry.lock
pyproject.toml

### **Query and scrape search engines (Google, Google News, Yahoo, Yahoo News, Bing, Bing News, Ask, Dogpile, Dogpile News)** ---- ## Installation ```pip install search_engines``` ## Overview Each search engine has a module {engine_name}.py which two functions: ```python extract_search_results(html: str, page_url: str) -> Tuple[List[Dict[str, str]], str] ``` and ```python get_search_url(query: str, latest: bool = True, country: str = 'us') -> str ``` ## Usage Example Construct a URL for the first results page of searching "Tesla TSLA" in Bing Search. ```python from search_engines import bing_search url = bing_search.get_search_url('Tesla TSLA') ``` Load the URL using a simple HTTP client or web browser and extract the page HTML. This package does not make any restrictions on clients can be used. We'll use the `requests` library for this example. ```python import requests resp = requests.get(url) html = resp.text ``` We can now extract search results from the HTML. The returned `results` list will be a list of dictionaries with keys `url`, `title`, `preview_text`, `page_number`. If we want to scrape multiple pages, we can load the next page using the returned `next_page_url`, and again extracting the results using `extract_search_results`. ```python results, next_page_url = bing_search.extract_search_results(html, url) ``` ## Contributions Add new search engines!

近期下载者：

相关文件：

评论：[我要评论] [举报此文件]

收藏者：