.dockerignore (236, 2023-12-19)
.npmignore (203, 2023-12-19)
CNAME (13, 2023-12-19)
Dockerfile (14736, 2023-12-19)
LICENSE (1070, 2023-12-19)
SECURITY.md (1620, 2023-12-19)
_config.yml (127, 2023-12-19)
archivebox/ (0, 2023-12-19)
archivebox/.flake8 (259, 2023-12-19)
archivebox/LICENSE (10, 2023-12-19)
archivebox/__init__.py (27, 2023-12-19)
archivebox/__main__.py (160, 2023-12-19)
archivebox/cli/ (0, 2023-12-19)
archivebox/cli/__init__.py (4858, 2023-12-19)
archivebox/cli/archivebox_add.py (4092, 2023-12-19)
archivebox/cli/archivebox_config.py (1701, 2023-12-19)
archivebox/cli/archivebox_help.py (777, 2023-12-19)
archivebox/cli/archivebox_init.py (1396, 2023-12-19)
... ...
**ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view sites you want to preserve offline.**
You can set it up as a [command-line tool](https://github.com/ArchiveBox/ArchiveBox/blob/master/#quickstart), [web app](https://github.com/ArchiveBox/ArchiveBox/blob/master/#quickstart), and [desktop app](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/electron-archivebox) (alpha), on Linux, macOS, and Windows (WSL/Docker).
**You can feed it URLs one at a time, or schedule regular imports** from browser bookmarks or history, feeds like RSS, bookmark services like Pocket/Pinboard, and more. See
input formats for a full list.
**It saves snapshots of the URLs you feed it in several formats:** HTML, PDF, PNG screenshots, WARC, and more out-of-the-box, with a wide variety of content extracted and preserved automatically (article text, audio/video, git repos, etc.). See
output formats for a full list.
The goal is to sleep soundly knowing the part of the internet you care about will be automatically preserved in durable, easily accessible formats [for decades](https://github.com/ArchiveBox/ArchiveBox/blob/master/#background--motivation) after it goes down.
** Get ArchiveBox with `docker` / `apt` / `brew` / `pip3` / `nix` / etc. ([see Quickstart below](https://github.com/ArchiveBox/ArchiveBox/blob/master/#quickstart)).**
```bash
# Get ArchiveBox with Docker or Docker Compose (recommended)
docker run -v $PWD/data:/data -it archivebox/archivebox:dev init --setup
# Or install with your preferred package manager (see Quickstart below for apt, brew, and more)
pip3 install archivebox
# Or use the optional auto setup script to install it
curl -sSL 'https://get.archivebox.io' | sh
```
** Example usage: adding links to archive.**
```bash
archivebox add 'https://example.com' # add URLs one at a time
archivebox add < ~/Downloads/bookmarks.json # or pipe in URLs in any text-based format
archivebox schedule --every=day --depth=1 https://example.com/rss.xml # or auto-import URLs regularly on a schedule
```
** Example usage: viewing the archived content.**
```bash
archivebox server 0.0.0.0:8000 # use the interactive web UI
archivebox list 'https://example.com' # use the CLI commands (--help for more)
ls ./archive/*/index.json # or browse directly via the filesystem
```
## Key Features
- [**Free & open source**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/blob/dev/LICENSE), doesn't require signing up online, stores all data locally
- [**Powerful, intuitive command line interface**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#CLI-Usage) with [modular optional dependencies](https://github.com/ArchiveBox/ArchiveBox/blob/master/#dependencies)
- [**Comprehensive documentation**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki), [active development](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Roadmap), and [rich community](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-Community)
- [**Extracts a wide variety of content out-of-the-box**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/issues/51): [media (youtube-dl or yt-dlp), articles (readability), code (git), etc.](https://github.com/ArchiveBox/ArchiveBox/blob/master/#output-formats)
- [**Supports scheduled/realtime importing**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Scheduled-Archiving) from [many types of sources](https://github.com/ArchiveBox/ArchiveBox/blob/master/#input-formats)
- [**Uses standard, durable, long-term formats**](https://github.com/ArchiveBox/ArchiveBox/blob/master/#saves-lots-of-useful-stuff-for-each-imported-link) like HTML, JSON, PDF, PNG, and WARC
- [**Usable as a oneshot CLI**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#CLI-Usage), [**self-hosted web UI**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#UI-Usage), [Python API](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://docs.archivebox.io/en/latest/modules.html) (BETA), [REST API](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/issues/496) (ALPHA), or [desktop app](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/electron-archivebox) (ALPHA)
- [**Saves all pages to archive.org as well**](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#save_archive_dot_org) by default for redundancy (can be [disabled](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#stealth-mode) for local-only mode)
- Advanced users: support for archiving [content requiring login/paywall/cookies](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#chrome_user_data_dir) (see wiki security caveats!)
- Planned: support for running [JS during archiving](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/issues/51) to adblock, [autoscroll](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/issues/80), [modal-hide](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/issues/175), [thread-expand](https://github.com/ArchiveBox/ArchiveBox/blob/master/https://github.com/ArchiveBox/ArchiveBox/issues/345)...
# Quickstart
** Supported OSs:** Linux/BSD, macOS, Windows (Docker/WSL) ** CPUs:** amd64, x86, arm8, arm7
(raspi>=3)
Note: On arm7, the `playwright` package, provides easy `chromium` management, is not yet available. Do it manually with alternative methods.
#### Easy Setup
docker-compose
(macOS/Linux/Windows) recommended (click to expand)
Docker Compose is recommended for the easiest install/update UX + best security + all the extras working out-of-the-box.
- Install Docker and Docker Compose on your system (if not already installed).
- Download the
docker-compose.yml
file into a new empty directory (can be anywhere).
mkdir ~/archivebox && cd ~/archivebox
curl -O 'https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/dev/docker-compose.yml'
- Run the initial setup and create an admin user.
docker compose run archivebox init --setup
- Optional: Start the server then login to the Web UI http://127.0.0.1:8000 Admin.
docker compose up
# completely optional, CLI can always be used without running a server
# docker compose run [-T] archivebox [subcommand] [--args]
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.
docker
(macOS/Linux/Windows)
- Install Docker on your system (if not already installed).
- Create a new empty directory and initialize your collection (can be anywhere).
mkdir ~/archivebox && cd ~/archivebox
docker run -v $PWD:/data -it archivebox/archivebox init --setup
- Optional: Start the server then login to the Web UI http://127.0.0.1:8000 Admin.
docker run -v $PWD:/data -p 8000:8000 archivebox/archivebox
# completely optional, CLI can always be used without running a server
# docker run -v $PWD:/data -it [subcommand] [--args]
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.
bash
auto-setup script (macOS/Linux)
- Install Docker on your system (optional, highly recommended but not required).
- Run the automatic setup script.
curl -sSL 'https://get.archivebox.io' | sh
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.
See setup.sh
for the source code of the auto-install script.
See "Against curl | sh as an install method" blog post for my thoughts on the shortcomings of this install method.
#### Package Manager Setup
apt
(Ubuntu/Debian)
- Add the ArchiveBox repository to your sources.
# On Ubuntu == 20.04, add the sources automatically:
sudo apt install software-properties-common
sudo add-apt-repository -u ppa:archivebox/archivebox
# On Ubuntu >= 20.10 or <= 19.10, or other Debian-style systems, add the sources manually:
echo "deb http://ppa.launchpad.net/archivebox/archivebox/ubuntu focal main" | sudo tee /etc/apt/sources.list.d/archivebox.list
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C258F79DCC02E369
sudo apt update
- Install the ArchiveBox package using
apt
.
sudo apt install archivebox
sudo python3 -m pip install --upgrade --ignore-installed archivebox # pip needed because apt only provides a broken older version of Django
- Create a new empty directory and initialize your collection (can be anywhere).
mkdir ~/archivebox && cd ~/archivebox
archivebox init --setup # if any problems, install with pip instead
Note: If you encounter issues with NPM/NodeJS, install a more recent version.
- Optional: Start the server then login to the Web UI http://127.0.0.1:8000 Admin.
archivebox server 0.0.0.0:8000
# completely optional, CLI can always be used without running a server
# archivebox [subcommand] [--args]
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.
See the debian-archivebox
repo for more details about this distribution.
brew
(macOS)
- Install Homebrew on your system (if not already installed).
- Install the ArchiveBox package using
brew
.
brew tap archivebox/archivebox
brew install archivebox
- Create a new empty directory and initialize your collection (can be anywhere).
mkdir ~/archivebox && cd ~/archivebox
archivebox init --setup # if any problems, install with pip instead
- Optional: Start the server then login to the Web UI http://127.0.0.1:8000 Admin.
archivebox server 0.0.0.0:8000
# completely optional, CLI can always be used without running a server
# archivebox [subcommand] [--args]
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.
See the homebrew-archivebox
repo for more details about this distribution.
pip
(macOS/Linux/Windows)
- Install Python >= v3.7 and Node >= v14 on your system (if not already installed).
- Install the ArchiveBox package using
pip3
.
pip3 install archivebox
- Create a new empty directory and initialize your collection (can be anywhere).
mkdir ~/archivebox && cd ~/archivebox
archivebox init --setup
# install any missing extras like wget/git/ripgrep/etc. manually as needed
- Optional: Start the server then login to the Web UI http://127.0.0.1:8000 Admin.
archivebox server 0.0.0.0:8000
# completely optional, CLI can always be used without running a server
# archivebox [subcommand] [--args]
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.
See the pip-archivebox
repo for more details about this distribution.
pacman
/ pkg
/ nix
(Arch/FreeBSD/NixOS/more)
See below for usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.
#### Other Options
docker
+ electron
Desktop App (macOS/Linux/Windows)
- Install Docker on your system (if not already installed).
- Download a binary release for your OS or build the native app from source