rust-sqlite
所属分类:数据库系统
开发工具:Rust
文件大小:0KB
下载次数:0
上传日期:2023-10-19 11:56:19
上 传 者:
sh-1993
说明: 在Rust中重写SQLite以供学习和娱乐,
(Rewriting SQLite in Rust for Learning and Fun,)
文件列表:
Cargo.toml (579, 2023-12-01)
LICENSE (1060, 2023-12-01)
TODO.md (4611, 2023-12-01)
docs/ (0, 2023-12-01)
docs/BTreeInternalNode.png (31719, 2023-12-01)
docs/COMPARISON.md (94, 2023-12-01)
docs/ModuleAccess.md (341, 2023-12-01)
docs/ModuleBTree.md (12489, 2023-12-01)
docs/ModulePhysicalExecution.md (1411, 2023-12-01)
docs/ModuleWAL.md (2446, 2023-12-01)
docs/PagerSubModuleInterconnection.png (77433, 2023-12-01)
docs/SQLQueryJourney.md (1128, 2023-12-01)
docs/spark/ (0, 2023-12-01)
docs/spark/QueryPlan.md (3598, 2023-12-01)
readings/ (0, 2023-12-01)
readings/Architecture of a Database System (2007).pdf (931364, 2023-12-01)
readings/Sibsankar Haldar - SQLite Database System Design and Implementation-Sibsankar Haldar (2016).pdf (15247918, 2023-12-01)
sample.db (16384, 2023-12-01)
src/ (0, 2023-12-01)
src/access/ (0, 2023-12-01)
src/access/buffer_pool.rs (3919, 2023-12-01)
src/access/mod.rs (21, 2023-12-01)
src/bin/ (0, 2023-12-01)
src/bin/rsql.rs (2388, 2023-12-01)
src/btree/ (0, 2023-12-01)
src/btree/bt_cursor.rs (20287, 2023-12-01)
src/btree/mod.rs (19, 2023-12-01)
src/concurrency/ (0, 2023-12-01)
src/concurrency/mod.rs (45, 2023-12-01)
src/concurrency/transaction.rs (47, 2023-12-01)
src/concurrency/transaction_id.rs (46, 2023-12-01)
src/lib.rs (223, 2023-12-01)
src/logical/ (0, 2023-12-01)
src/logical/expression/ (0, 2023-12-01)
src/logical/expression/mod.rs (1, 2023-12-01)
src/logical/mod.rs (20, 2023-12-01)
src/model/ (0, 2023-12-01)
src/model/cell_table_interior.rs (1212, 2023-12-01)
... ...
# rust-sqlite
Rewriting SQLite in Rust for Learning and for Fun.
This was inspired by
- [CodeCrafters SQLite programming challenge](https://app.codecrafters.io/courses/sqlite), please pay them a visit.
- [SQLite detailed documentation](https://www.sqlite.org/fileformat.html).
- [Apache Arrow DataFusion](https://github.com/apache/arrow-datafusion).
- [Let's Build a Simple Database](https://cstack.github.io/db_tutorial/)
## 1. Getting Started
```
docs # detailed doc for implementation and design records, step by step guidelines, module walkthrough, etc.
readings # related more comprehensive readings like books and articles
src # source code with unit tests, for a more detailed module description, look at Architecture section
bin # binary cli entry point with main function
access # access layer
concurrency # handling concurrency control: transactions for example
logical # things with logical layer like logical plan, not much here as we use arrow-datafusion for this
model # main domain model of sqlite database like TableLeafCell that mapped to sqlite3 doc concepts
physical # things related to physical planning and execution
wal # Write Ahead Logging for atomicity, recovery, etc.
storage # module handling physical storage to file on disk
util
presentation.rs # how sqlite present returned result to cli stdout (rows)
varint.rs # varint encode and decode
tests # integration tests and test resources
```
```
# execute program against a sqlite database
# this table has 4 rows
cargo run -- sql tests/resources/sample.db "select name from apples"
# this table has 6895 rows and span > 1 db page
cargo run -- sql tests/resources/superheroes.db "select * from superheroes"
# suppress warnings
RUSTFLAGS=-Awarnings cargo run -- sql sample.db "select name from apples"
# see what returns by sqlite
sqlite3 sample.db "select * from apples"
# output >= debug logs
export RUST_LOG=debug
sqlite3 sample.db "select * from apples"
RUST_LOG=debug sqlite3 sample.db "select * from apples" # this also works
```
Testing
```
# run tests
cargo test
# showing warnings and stdout
cargo test -- --nocapture
# run all tests with prefix test_move_to_right and show print output
cargo test move_to_right -- --nocapture
```
Sample.db schema
- apples: id integer primary key, name text, color text
- oranges: id integer primary key, name text, description text
## 2. Architecture
Differences to SQLite official implementation:
- Database Frontend (Tokenizer, Parser) is replaced by DataFusion.
- Virtual Machine is replaced by using DataFusion and a custom Physical Layer for query processing and execution.
Layers
```
SQL String
-----Logical Layer-----
Query Planning: Tokenizer, Parser (datafusion)
Logical Plan (datafusion)
-----Physical Layer-----
Physical Planner (custom)
Physical Plan (custom)
-----Access Layer-----
BTree Module (custom)
Buffer Pool (custom)
Concurrency Control (custom)
WAL Write Ahead Logging (custom)
-----Storage Layer-----
Disk Manager
-----Physical File------
File (e.g. on disk) following SQLite database file format
```
Logical Layer
- This layer is responsible for interpreting the SQL string and converting it into a logical plan
that represents the operations to be performed on the data.
- It involves some important steps
- Tokenizer: SQL string is tokenined into tokens.
- Parser: Tokens are used to build Abstract Syntax Tree (AST).
- Query Planning: AST is transformed into Logical Plan.
- [DataFusion](https://github.com/apache/arrow-datafusion) library is used for this layer.
Physical Layer
- Physical Planner: takes the `LogicalPlan` of `arrow-datafusion` and transform it to an executable physical plan called `Exec`.
- why physical planning of `arrow-datafusion` is not used?
- custom-built in order to custom this layer to have SQLite functionalities.
For example, physical plan of a table scan will scan the table in the database file in SQLite format.
Access Layer
- The access layer is responsible for managing how data is accessed and manipulated.
This includes managing data structures like B-Trees, handling concurrency to ensure data integrity,
and handling recovery and consistency in case of system failures.
- BTree Module
- managing the B-Tree data structure used for storing and retrieving data.
- Buffer Pool
- The Buffer Pool is a cache of data that resides in memory for faster access, for locking, transaction managemeng, etc.
- When data is read from the disk, it is first loaded into the buffer pool.
- It also handles the replacement policy when the buffer is full, typically using an LRU (Least Recently Used) policy.
- Concurrency Control
- ensures that multiple concurrent operations (e.g. write) can happen and do not
impact data integrity (data corruption, missing amount, etc.).
- Write Ahead Logging
- providing Atomicity, Recovery, etc. for the db.
- managing recovery process and ensure consistency in case of failures/ crashes.
- uses techniques like logging (e.g. Write Ahead Logging) and periodic checkpoints.
Storage Layer
- Disk Manager
- logical abstraction over physical file system and disk access
- provides interfaces of physical disk operations: reads, writes, flushes, etc.
Physical File: actual file in sqlite3 file format.
Sequence Diagram: SQL String to returned result
TODO
## 3. Known Limitations
Schema
- sqlparser-rs and datafusion seems not having knowledge re primary key and auto-increment.
Parsing DDL which has `id integer primary key autoincrement` lost knowledge of `primary key autoincrement`.
Data types
- Arrow supports Utf8 only. Sqlite has Text in (UTF-8, UTF-16BE or UTF-16LE) so
only utf8 is supported.
## 4. References
### Readings
1. Paper - Architecture of a Database System (2007). Overview of important components to relational database systems.
2. Book - SQLite Database System Design and Implementation, Sibsankar Haldar (2016).
3. Article - [Series: What would SQLite look like if written in Rust?](https://dev.to/thepolyglotprogrammer/what-would-sqlite-look-like-if-written-in-rust-part-2-4g66)
### Projects
- https://github.com/datafuselabs
- [Apache OpenDAL](https://github.com/apache/incubator-opendal)
- [GrepTimeDB](https://github.com/GreptimeTeam/greptimedb)
- [rqlite, A lightweight, distributed relational database built on SQLite in Golang](https://github.com/rqlite/rqlite)
近期下载者:
相关文件:
收藏者: