cached-historical-data-fetcher

所属分类:操作系统开发
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-10-07 03:14:56
上 传 者sh-1993
说明:  Python实用程序,用于使用缓存获取任何历史数据。适用于新闻、帖子、天气等。,
(Python utility for fetching any historical data using caching. Suitable for news, posts, weather, etc.,)

文件列表:
.all-contributorsrc (303, 2024-01-03)
.copier-answers.yml (651, 2024-01-03)
.editorconfig (292, 2024-01-03)
.flake8 (46, 2024-01-03)
.idea/ (0, 2024-01-03)
.idea/cached-historical-data-fetcher.iml (333, 2024-01-03)
.idea/watcherTasks.xml (2731, 2024-01-03)
.idea/workspace.xml (1502, 2024-01-03)
.pre-commit-config.yaml (2090, 2024-01-03)
.readthedocs.yml (611, 2024-01-03)
CHANGELOG (2117, 2024-01-03)
CHANGELOG.md (12, 2024-01-03)
CONTRIBUTING.md (3928, 2024-01-03)
LICENSE (1078, 2024-01-03)
commitlint.config.js (244, 2024-01-03)
docs/ (0, 2024-01-03)
docs/Makefile (634, 2024-01-03)
docs/_static/ (0, 2024-01-03)
docs/changelog.md (34, 2024-01-03)
docs/conf.py (2456, 2024-01-03)
... ...

# Cached Historical Data Fetcher

CI Status Documentation Status Test coverage percentage

Poetry black pre-commit

PyPI Version Supported Python versions License

Python utility for fetching any historical data using caching. Suitable for acquiring data that is added frequently and incrementally, e.g. news, posts, weather, etc. ## Installation Install this via pip (or your favourite package manager): ```shell pip install cached-historical-data-fetcher ``` ## Features - Uses cache built on top of [`joblib`](https://github.com/joblib/joblib), [`lz4`](https://github.com/lz4/lz4) and [`aiofiles`](https://github.com/Tinche/aiofiles). - Ready to use with [`asyncio`](https://docs.python.org/3/library/asyncio.html), [`aiohttp`](https://github.com/aio-libs/aiohttp), [`aiohttp-client-cache`](https://github.com/requests-cache/aiohttp-client-cache). Uses `asyncio.gather` for fetching chunks in parallel. (For performance reasons, only using `aiohttp-client-cache` is probably not a good idea when fetching large number of chunks (web requests).) - Based on [`pandas`](https://github.com/pandas-dev/pandas) and supports `MultiIndex`. ## Usage ### `HistoricalDataCache`, `HistoricalDataCacheWithChunk` and `HistoricalDataCacheWithFixedChunk` Override `get_one()` method to fetch data for one chunk. `update()` method will call `get_one()` for each unfetched chunk and concatenate results, then save to cache. ```python from cached_historical_data_fetcher import HistoricalDataCacheWithFixedChunk from pandas import DataFrame, Timedelta, Timestamp from typing import Any # define cache class class MyCacheWithFixedChunk(HistoricalDataCacheWithFixedChunk[Timestamp, Timedelta, Any]): delay_seconds = 0.0 # delay between chunks (requests) in seconds interval = Timedelta(days=1) # interval between chunks, can be any type start_index = Timestamp.utcnow().floor("10D") # start index, can be any type async def get_one(self, start: Timestamp, *args: Any, **kwargs: Any) -> DataFrame: """Fetch data for one chunk.""" return DataFrame({"day": [start.day]}, index=[start]) # get complete data print(await MyCacheWithFixedChunk().update()) ``` ```shell day 2023-09-30 00:00:00+00:00 30 2023-10-01 00:00:00+00:00 1 2023-10-02 00:00:00+00:00 2 ``` **See [example.ipynb](example.ipynb) for real-world example.** ### `IdCacheWithFixedChunk` Override `get_one` method to fetch data for one chunk in the same way as in `HistoricalDataCacheWithFixedChunk`. After updating `ids` by calling `set_ids()`, `update()` method will call `get_one()` for every unfetched id and concatenate results, then save to cache. ```python from cached_historical_data_fetcher import IdCacheWithFixedChunk from pandas import DataFrame from typing import Any class MyIdCache(IdCacheWithFixedChunk[str, Any]): delay_seconds = 0.0 # delay between chunks (requests) in seconds async def get_one(self, start: str, *args: Any, **kwargs: Any) -> DataFrame: """Fetch data for one chunk.""" return DataFrame({"id+hello": [start + "+hello"]}, index=[start]) cache = MyIdCache() # create cache cache.set_ids(["a"]) # set ids cache.set_ids(["b"]) # set ids again, now `cache.ids` is ["a", "b"] print(await cache.update(reload=True)) # discard previous cache and fetch again cache.set_ids(["b", "c"]) # set ids again, now `cache.ids` is ["a", "b", "c"] print(await cache.update()) # fetch only new data ``` ```shell id+hello a a+hello b b+hello id+hello a a+hello b b+hello c c+hello ``` ## Contributors Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)): This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!

近期下载者

相关文件


收藏者