frontpage-archive-2024

所属分类:虚拟/增强现实-VR/AR
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2024-02-07 07:01:57
上 传 者sh-1993
说明:  德国在线新闻标题的每小时提交
(hourly commits of german online press headlines)

文件列表:
docs/snapshots/
src/
archive.py
requirements.txt

# Archive of news front pages [![Scraper](https://github.com/defgsus/frontpage-archive-2024/actions/workflows/scraper.yml/badge.svg)](https://github.com/defgsus/frontpage-archive-2024/actions/workflows/scraper.yml) Collects articles and stores them in [json files](docs/snapshots). ## Scraped sites: | id | since | files | url | |:------------------------------------------------------------|:-----------|--------:|:---------------------------------| | [bild.de](docs/snapshots/bild.de) | 2022-01-28 | 26 | https://www.bild.de | | [compact-online.de](docs/snapshots/compact-online.de) | 2022-01-29 | 5 | https://www.compact-online.de/ | | [faz.net](docs/snapshots/faz.net) | 2022-01-29 | 14 | https://www.faz.net/ | | [fr.de](docs/snapshots/fr.de) | 2022-01-28 | 8 | https://www.fr.de/ | | [gmx.net](docs/snapshots/gmx.net) | 2022-01-29 | 7 | https://www.gmx.net/ | | [heise.de](docs/snapshots/heise.de) | 2022-01-28 | 12 | https://www.heise.de/ | | [spiegel.de](docs/snapshots/spiegel.de) | 2022-01-28 | 21 | https://www.spiegel.de/ | | [spiegeldaily.de](docs/snapshots/spiegeldaily.de) | 2022-01-28 | 5 | https://www.spiegeldaily.de/ | | [sueddeutsche.de](docs/snapshots/sueddeutsche.de) | 2022-01-29 | 7 | https://www.sueddeutsche.de/ | | [t-online.de](docs/snapshots/t-online.de) | 2022-01-29 | 8 | https://www.t-online.de/ | | [volksstimme.de](docs/snapshots/volksstimme.de) | 2022-01-29 | 8 | https://www.volksstimme.de/ | | [web.de](docs/snapshots/web.de) | 2022-01-29 | 15 | https://web.de | | [welt.de](docs/snapshots/welt.de) | 2022-01-29 | 16 | https://www.welt.de | | [zeit.de](docs/snapshots/zeit.de) | 2022-01-29 | 16 | https://www.zeit.de/ | | [zeitfuerdieschule.de](docs/snapshots/zeitfuerdieschule.de) | 2022-01-29 | 5 | https://www.zeitfuerdieschule.de | Well, let's see how far this goes with a free github account. Many websites transmit click-ids and random uuids in their documents so there is a change in every file in each snapshot. Anyways, currently each snapshot adds about 10mb to the repository size (size of `.git` directory). That's not going to work for long :-( ### TODO - https://www.n-tv.de/ - https://www.handelsblatt.com/ - https://www.taz.de/ - https://www.wa.de/ - https://www.rnd.de/ - https://www.nzz.ch/ - https://www.bazonline.ch/ - https://www.focus.de/ - https://www.tagesschau.de/ - https://www.heise.de/tp/ - https://www.golem.de/ - https://www.kicker.de/ - https://www.achgut.com/ - https://www.stern.de/

近期下载者

相关文件


收藏者