frontpage-archive-2024
所属分类:虚拟/增强现实-VR/AR
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2024-02-07 07:01:57
上 传 者:
sh-1993
说明: 德国在线新闻标题的每小时提交
(hourly commits of german online press headlines)
文件列表:
docs/snapshots/
src/
archive.py
requirements.txt
# Archive of news front pages
[![Scraper](https://github.com/defgsus/frontpage-archive-2024/actions/workflows/scraper.yml/badge.svg)](https://github.com/defgsus/frontpage-archive-2024/actions/workflows/scraper.yml)
Collects articles and stores them in [json files](docs/snapshots).
## Scraped sites:
| id | since | files | url |
|:------------------------------------------------------------|:-----------|--------:|:---------------------------------|
| [bild.de](docs/snapshots/bild.de) | 2022-01-28 | 26 | https://www.bild.de |
| [compact-online.de](docs/snapshots/compact-online.de) | 2022-01-29 | 5 | https://www.compact-online.de/ |
| [faz.net](docs/snapshots/faz.net) | 2022-01-29 | 14 | https://www.faz.net/ |
| [fr.de](docs/snapshots/fr.de) | 2022-01-28 | 8 | https://www.fr.de/ |
| [gmx.net](docs/snapshots/gmx.net) | 2022-01-29 | 7 | https://www.gmx.net/ |
| [heise.de](docs/snapshots/heise.de) | 2022-01-28 | 12 | https://www.heise.de/ |
| [spiegel.de](docs/snapshots/spiegel.de) | 2022-01-28 | 21 | https://www.spiegel.de/ |
| [spiegeldaily.de](docs/snapshots/spiegeldaily.de) | 2022-01-28 | 5 | https://www.spiegeldaily.de/ |
| [sueddeutsche.de](docs/snapshots/sueddeutsche.de) | 2022-01-29 | 7 | https://www.sueddeutsche.de/ |
| [t-online.de](docs/snapshots/t-online.de) | 2022-01-29 | 8 | https://www.t-online.de/ |
| [volksstimme.de](docs/snapshots/volksstimme.de) | 2022-01-29 | 8 | https://www.volksstimme.de/ |
| [web.de](docs/snapshots/web.de) | 2022-01-29 | 15 | https://web.de |
| [welt.de](docs/snapshots/welt.de) | 2022-01-29 | 16 | https://www.welt.de |
| [zeit.de](docs/snapshots/zeit.de) | 2022-01-29 | 16 | https://www.zeit.de/ |
| [zeitfuerdieschule.de](docs/snapshots/zeitfuerdieschule.de) | 2022-01-29 | 5 | https://www.zeitfuerdieschule.de |
Well, let's see how far this goes with a free github account.
Many websites transmit click-ids and random uuids in their
documents so there is a change in every file in each snapshot.
Anyways, currently each snapshot adds about 10mb to the
repository size (size of `.git` directory). That's not going
to work for long :-(
### TODO
- https://www.n-tv.de/
- https://www.handelsblatt.com/
- https://www.taz.de/
- https://www.wa.de/
- https://www.rnd.de/
- https://www.nzz.ch/
- https://www.bazonline.ch/
- https://www.focus.de/
- https://www.tagesschau.de/
- https://www.heise.de/tp/
- https://www.golem.de/
- https://www.kicker.de/
- https://www.achgut.com/
- https://www.stern.de/
近期下载者:
相关文件:
收藏者: