cached-historical-data-fetcher 联合开发网

Pudn.com > 下载中心 > 操作系统开发 > cached-historical-data-fetcher

cached-historical-data-fetcher

所属分类：操作系统开发
开发工具：Python
文件大小：0KB
下载次数：0
上传日期：2023-10-07 03:14:56
上传者：sh-1993

说明： Python实用程序，用于使用缓存获取任何历史数据。适用于新闻、帖子、天气等。，
(Python utility for fetching any historical data using caching. Suitable for news, posts, weather, etc.,)

文件列表:

.all-contributorsrc (303, 2024-01-03)
.copier-answers.yml (651, 2024-01-03)
.editorconfig (292, 2024-01-03)
.flake8 (46, 2024-01-03)
.idea/ (0, 2024-01-03)
.idea/cached-historical-data-fetcher.iml (333, 2024-01-03)
.idea/watcherTasks.xml (2731, 2024-01-03)
.idea/workspace.xml (1502, 2024-01-03)
.pre-commit-config.yaml (2090, 2024-01-03)
.readthedocs.yml (611, 2024-01-03)
CHANGELOG (2117, 2024-01-03)
CHANGELOG.md (12, 2024-01-03)
CONTRIBUTING.md (3928, 2024-01-03)
LICENSE (1078, 2024-01-03)
commitlint.config.js (244, 2024-01-03)
docs/ (0, 2024-01-03)
docs/Makefile (634, 2024-01-03)
docs/_static/ (0, 2024-01-03)
docs/changelog.md (34, 2024-01-03)
docs/conf.py (2456, 2024-01-03)
... ...

# Cached Historical Data Fetcher

License

Python utility for fetching any historical data using caching. Suitable for acquiring data that is added frequently and incrementally, e.g. news, posts, weather, etc. ## Installation Install this via pip (or your favourite package manager): ```shell pip install cached-historical-data-fetcher ``` ## Features - Uses cache built on top of [`joblib`](https://github.com/joblib/joblib), [`lz4`](https://github.com/lz4/lz4) and [`aiofiles`](https://github.com/Tinche/aiofiles). - Ready to use with [`asyncio`](https://docs.python.org/3/library/asyncio.html), [`aiohttp`](https://github.com/aio-libs/aiohttp), [`aiohttp-client-cache`](https://github.com/requests-cache/aiohttp-client-cache). Uses `asyncio.gather` for fetching chunks in parallel. (For performance reasons, only using `aiohttp-client-cache` is probably not a good idea when fetching large number of chunks (web requests).) - Based on [`pandas`](https://github.com/pandas-dev/pandas) and supports `MultiIndex`. ## Usage ### `HistoricalDataCache`, `HistoricalDataCacheWithChunk` and `HistoricalDataCacheWithFixedChunk` Override `get_one()` method to fetch data for one chunk. `update()` method will call `get_one()` for each unfetched chunk and concatenate results, then save to cache. ```python from cached_historical_data_fetcher import HistoricalDataCacheWithFixedChunk from pandas import DataFrame, Timedelta, Timestamp from typing import Any # define cache class class MyCacheWithFixedChunk(HistoricalDataCacheWithFixedChunk[Timestamp, Timedelta, Any]): delay_seconds = 0.0 # delay between chunks (requests) in seconds interval = Timedelta(days=1) # interval between chunks, can be any type start_index = Timestamp.utcnow().floor("10D") # start index, can be any type async def get_one(self, start: Timestamp, *args: Any, **kwargs: Any) -> DataFrame: """Fetch data for one chunk.""" return DataFrame({"day": [start.day]}, index=[start]) # get complete data print(await MyCacheWithFixedChunk().update()) ``` ```shell day 2023-09-30 00:00:00+00:00 30 2023-10-01 00:00:00+00:00 1 2023-10-02 00:00:00+00:00 2 ``` **See [example.ipynb](example.ipynb) for real-world example.** ### `IdCacheWithFixedChunk` Override `get_one` method to fetch data for one chunk in the same way as in `HistoricalDataCacheWithFixedChunk`. After updating `ids` by calling `set_ids()`, `update()` method will call `get_one()` for every unfetched id and concatenate results, then save to cache. ```python from cached_historical_data_fetcher import IdCacheWithFixedChunk from pandas import DataFrame from typing import Any class MyIdCache(IdCacheWithFixedChunk[str, Any]): delay_seconds = 0.0 # delay between chunks (requests) in seconds async def get_one(self, start: str, *args: Any, **kwargs: Any) -> DataFrame: """Fetch data for one chunk.""" return DataFrame({"id+hello": [start + "+hello"]}, index=[start]) cache = MyIdCache() # create cache cache.set_ids(["a"]) # set ids cache.set_ids(["b"]) # set ids again, now `cache.ids` is ["a", "b"] print(await cache.update(reload=True)) # discard previous cache and fetch again cache.set_ids(["b", "c"]) # set ids again, now `cache.ids` is ["a", "b", "c"] print(await cache.update()) # fetch only new data ``` ```shell id+hello a a+hello b b+hello id+hello a a+hello b b+hello c c+hello ``` ## Contributors Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)): This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!

近期下载者：

相关文件：

评论：[我要评论] [举报此文件]

收藏者：