ArchiveBox

所属分类:hotest
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-07-07 16:38:50
上 传 者sh-1993
说明:  开源自托管web存档。获取URL浏览器历史书签Pocket Pinboard等,保存HTML、JS、PDF、媒体等。。。,
(Open source self-hosted web archiving. Takes URLs browser history bookmarks Pocket Pinboard etc., saves HTML, JS, PDFs, media, and more...,)

文件列表:
.dockerignore (236, 2023-12-19)
.npmignore (203, 2023-12-19)
CNAME (13, 2023-12-19)
Dockerfile (14736, 2023-12-19)
LICENSE (1070, 2023-12-19)
SECURITY.md (1620, 2023-12-19)
_config.yml (127, 2023-12-19)
archivebox/ (0, 2023-12-19)
archivebox/.flake8 (259, 2023-12-19)
archivebox/LICENSE (10, 2023-12-19)
archivebox/__init__.py (27, 2023-12-19)
archivebox/__main__.py (160, 2023-12-19)
archivebox/cli/ (0, 2023-12-19)
archivebox/cli/__init__.py (4858, 2023-12-19)
archivebox/cli/archivebox_add.py (4092, 2023-12-19)
archivebox/cli/archivebox_config.py (1701, 2023-12-19)
archivebox/cli/archivebox_help.py (777, 2023-12-19)
archivebox/cli/archivebox_init.py (1396, 2023-12-19)
... ...

ArchiveBox
Open-source self-hosted web archiving.

Quickstart | Demo | GitHub | Documentation | Info & Motivation | Community | Roadmap
"Your own personal internet archive" (网站存档 / 爬虫)
curl -sSL 'https://get.archivebox.io' | sh
 
**ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view sites you want to preserve offline.** You can set it up as a [command-line tool](https://github.com/ArchiveBox/ArchiveBox/blob/master/#quickstart), [web app](https://github.com/ArchiveBox/ArchiveBox/blob/master/#quickstart), and [desktop app](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/electron-archivebox) (alpha), on Linux, macOS, and Windows (WSL/Docker). **You can feed it URLs one at a time, or schedule regular imports** from browser bookmarks or history, feeds like RSS, bookmark services like Pocket/Pinboard, and more. See input formats for a full list. **It saves snapshots of the URLs you feed it in several formats:** HTML, PDF, PNG screenshots, WARC, and more out-of-the-box, with a wide variety of content extracted and preserved automatically (article text, audio/video, git repos, etc.). See output formats for a full list. The goal is to sleep soundly knowing the part of the internet you care about will be automatically preserved in durable, easily accessible formats [for decades](https://github.com/ArchiveBox/ArchiveBox/blob/master/#background--motivation) after it goes down.


bookshelf graphic   logo   bookshelf graphic

Demo | Screenshots | Usage
. . . . . . . . . . . . . . . . . . . . . . . . . . . .


**  Get ArchiveBox with `docker` / `apt` / `brew` / `pip3` / `nix` / etc. ([see Quickstart below](https://github.com/ArchiveBox/ArchiveBox/blob/master/#quickstart)).** ```bash # Get ArchiveBox with Docker or Docker Compose (recommended) docker run -v $PWD/data:/data -it archivebox/archivebox:dev init --setup # Or install with your preferred package manager (see Quickstart below for apt, brew, and more) pip3 install archivebox # Or use the optional auto setup script to install it curl -sSL 'https://get.archivebox.io' | sh ``` ** Example usage: adding links to archive.** ```bash archivebox add 'https://example.com' # add URLs one at a time archivebox add < ~/Downloads/bookmarks.json # or pipe in URLs in any text-based format archivebox schedule --every=day --depth=1 https://example.com/rss.xml # or auto-import URLs regularly on a schedule ``` ** Example usage: viewing the archived content.** ```bash archivebox server 0.0.0.0:8000 # use the interactive web UI archivebox list 'https://example.com' # use the CLI commands (--help for more) ls ./archive/*/index.json # or browse directly via the filesystem ```


cli init screenshot cli init screenshot server snapshot admin screenshot server snapshot details page screenshot

## Key Features - [**Free & open source**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/blob/dev/LICENSE), doesn't require signing up online, stores all data locally - [**Powerful, intuitive command line interface**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#CLI-Usage) with [modular optional dependencies](https://github.com/ArchiveBox/ArchiveBox/blob/master/#dependencies) - [**Comprehensive documentation**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki), [active development](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Roadmap), and [rich community](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-Community) - [**Extracts a wide variety of content out-of-the-box**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/issues/51): [media (youtube-dl or yt-dlp), articles (readability), code (git), etc.](https://github.com/ArchiveBox/ArchiveBox/blob/master/#output-formats) - [**Supports scheduled/realtime importing**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Scheduled-Archiving) from [many types of sources](https://github.com/ArchiveBox/ArchiveBox/blob/master/#input-formats) - [**Uses standard, durable, long-term formats**](https://github.com/ArchiveBox/ArchiveBox/blob/master/#saves-lots-of-useful-stuff-for-each-imported-link) like HTML, JSON, PDF, PNG, and WARC - [**Usable as a oneshot CLI**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#CLI-Usage), [**self-hosted web UI**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#UI-Usage), [Python API](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://docs.archivebox.io/en/latest/modules.html) (BETA), [REST API](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/issues/496) (ALPHA), or [desktop app](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/electron-archivebox) (ALPHA) - [**Saves all pages to archive.org as well**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#save_archive_dot_org) by default for redundancy (can be [disabled](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#stealth-mode) for local-only mode) - Advanced users: support for archiving [content requiring login/paywall/cookies](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#chrome_user_data_dir) (see wiki security caveats!) - Planned: support for running [JS during archiving](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/issues/51) to adblock, [autoscroll](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/issues/80), [modal-hide](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/issues/175), [thread-expand](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/issues/345)...


grassgrass
# Quickstart **  Supported OSs:** Linux/BSD, macOS, Windows (Docker/WSL)   **  CPUs:** amd64, x86, arm8, arm7 (raspi>=3) Note: On arm7, the `playwright` package, provides easy `chromium` management, is not yet available. Do it manually with alternative methods.
####   Easy Setup
Docker docker-compose (macOS/Linux/Windows)     recommended   (click to expand)
Docker Compose is recommended for the easiest install/update UX + best security + all the extras working out-of-the-box.

  1. Install Docker and Docker Compose on your system (if not already installed).
  2. Download the docker-compose.yml file into a new empty directory (can be anywhere).
    mkdir ~/archivebox && cd ~/archivebox
    curl -O 'https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/dev/docker-compose.yml'
    
  3. Run the initial setup and create an admin user.
    docker compose run archivebox init --setup
    
  4. Optional: Start the server then login to the Web UI http://127.0.0.1:8000 Admin.
    docker compose up
    # completely optional, CLI can always be used without running a server
    # docker compose run [-T] archivebox [subcommand] [--args]
    
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.

Docker docker (macOS/Linux/Windows)
  1. Install Docker on your system (if not already installed).
  2. Create a new empty directory and initialize your collection (can be anywhere).
    mkdir ~/archivebox && cd ~/archivebox
    docker run -v $PWD:/data -it archivebox/archivebox init --setup
    
  3. Optional: Start the server then login to the Web UI http://127.0.0.1:8000 Admin.
    docker run -v $PWD:/data -p 8000:8000 archivebox/archivebox
    # completely optional, CLI can always be used without running a server
    # docker run -v $PWD:/data -it [subcommand] [--args]
    
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.

curl sh automatic setup script bash auto-setup script (macOS/Linux)
  1. Install Docker on your system (optional, highly recommended but not required).
  2. Run the automatic setup script.
    curl -sSL 'https://get.archivebox.io' | sh
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.
See setup.sh for the source code of the auto-install script.
See "Against curl | sh as an install method" blog post for my thoughts on the shortcomings of this install method.


####   Package Manager Setup
aptitude apt (Ubuntu/Debian)
  1. Add the ArchiveBox repository to your sources.
    # On Ubuntu == 20.04, add the sources automatically:
    sudo apt install software-properties-common
    sudo add-apt-repository -u ppa:archivebox/archivebox
    
    # On Ubuntu >= 20.10 or <= 19.10, or other Debian-style systems, add the sources manually:
    echo "deb http://ppa.launchpad.net/archivebox/archivebox/ubuntu focal main" | sudo tee /etc/apt/sources.list.d/archivebox.list
    sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C258F79DCC02E369
    sudo apt update
    
  2. Install the ArchiveBox package using apt.
    sudo apt install archivebox
    sudo python3 -m pip install --upgrade --ignore-installed archivebox   # pip needed because apt only provides a broken older version of Django
    
  3. Create a new empty directory and initialize your collection (can be anywhere).
    mkdir ~/archivebox && cd ~/archivebox
    archivebox init --setup           # if any problems, install with pip instead
    
    Note: If you encounter issues with NPM/NodeJS, install a more recent version.

  4. Optional: Start the server then login to the Web UI http://127.0.0.1:8000 Admin.
    archivebox server 0.0.0.0:8000
    # completely optional, CLI can always be used without running a server
    # archivebox [subcommand] [--args]
    
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.
See the debian-archivebox repo for more details about this distribution.

homebrew brew (macOS)
  1. Install Homebrew on your system (if not already installed).
  2. Install the ArchiveBox package using brew.
    brew tap archivebox/archivebox
    brew install archivebox
    
  3. Create a new empty directory and initialize your collection (can be anywhere).
    mkdir ~/archivebox && cd ~/archivebox
    archivebox init --setup         # if any problems, install with pip instead
    
  4. Optional: Start the server then login to the Web UI http://127.0.0.1:8000 Admin.
    archivebox server 0.0.0.0:8000
    # completely optional, CLI can always be used without running a server
    # archivebox [subcommand] [--args]
    
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.
See the homebrew-archivebox repo for more details about this distribution.

Pip pip (macOS/Linux/Windows)
  1. Install Python >= v3.7 and Node >= v14 on your system (if not already installed).
  2. Install the ArchiveBox package using pip3.
    pip3 install archivebox
    
  3. Create a new empty directory and initialize your collection (can be anywhere).
    mkdir ~/archivebox && cd ~/archivebox
    archivebox init --setup
    # install any missing extras like wget/git/ripgrep/etc. manually as needed
    
  4. Optional: Start the server then login to the Web UI http://127.0.0.1:8000 Admin.
    archivebox server 0.0.0.0:8000
    # completely optional, CLI can always be used without running a server
    # archivebox [subcommand] [--args]
    
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.
See the pip-archivebox repo for more details about this distribution.

Arch pacman / FreeBSD pkg / Nix nix (Arch/FreeBSD/NixOS/more)
See below for usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.


####   Other Options
Docker docker + Electron electron Desktop App (macOS/Linux/Windows)
  1. Install Docker on your system (if not already installed).
  2. Download a binary release for your OS or build the native app from source