lf

所属分类:数据库系统
开发工具:C
文件大小:3845KB
下载次数:0
上传日期:2023-01-08 09:26:26
上 传 者sh-1993
说明:  完全去中心化完全复制的键值存储
(Fully Decentralized Fully Replicated Key Value Store)

文件列表:
LICENSE.txt (16727, 2022-08-17)
Makefile (1179, 2022-08-17)
cmd (0, 2022-08-17)
cmd\lf (0, 2022-08-17)
cmd\lf\lf.go (51491, 2022-08-17)
doc (0, 2022-08-17)
doc\DESIGN.md (4917, 2022-08-17)
doc\badhorse.sh (1057, 2022-08-17)
doc\momentum.pdf (77193, 2022-08-17)
doc\underconstruction.gif (1969, 2022-08-17)
doc\wharrgarbl.jpg (27583, 2022-08-17)
go.mod (263, 2022-08-17)
go.sum (1724, 2022-08-17)
native (0, 2022-08-17)
native\common.h (14097, 2022-08-17)
native\db.c (80000, 2022-08-17)
native\db.h (11059, 2022-08-17)
native\iset.h (1712, 2022-08-17)
native\map.h (10008, 2022-08-17)
native\mappedfile.h (2450, 2022-08-17)
native\sqlite3 (0, 2022-08-17)
native\sqlite3\INSTALL (15744, 2022-08-17)
native\sqlite3\sqlite3.c (7892675, 2022-08-17)
native\sqlite3\sqlite3.h (559672, 2022-08-17)
native\sqlite3\sqlite3ext.h (33981, 2022-08-17)
native\suint96.h (4025, 2022-08-17)
native\vector.h (1945, 2022-08-17)
pkg (0, 2022-08-17)
pkg\lf (0, 2022-08-17)
pkg\lf\api-make.go (7661, 2022-08-17)
pkg\lf\api-query.go (13017, 2022-08-17)
pkg\lf\api.go (5273, 2022-08-17)
pkg\lf\base62.go (2698, 2022-08-17)
pkg\lf\blob.go (2128, 2022-08-17)
pkg\lf\clientconfig.go (3259, 2022-08-17)
pkg\lf\comment.go (2244, 2022-08-17)
... ...

# LF: Fully Decentralized Fully Replicated Key/Value Store *(c)2019-2021 [ZeroTier, Inc.](https://www.zerotier.com/)* *Licensed under the [Mozilla Public License (MPL) Version 2.0](LICENSE.txt)* *NOTE: this project is not currently maintained. We are still working on decentralization of the root infrastructure but due mostly to performance constraints will probably pursue a simpler and faster system to do so. That system may include some of these ideas but probably not all of them and will probably be written in Rust. That being said, we're leaving this up for others to check out as it contains a lot of interesting ideas. Think of it as a R&D incubation project from "ZeroTier Labs."* ## Introduction LF (pronounced "aleph") is a fully decentralized fully replicated key/value store. Fully decentralized means anyone can run a node without obtaining special permission and all nodes are effectively equal. Fully replicated means every node stores the entire data set. LF is built on a [directed acyclic graph (DAG)](https://en.wikipedia.org/wiki/Directed_acyclic_graph) data model that makes synchronization easy and allows many different security and conflict resolution strategies to be used. One way to think of LF's DAG is as a gigantic [conflict-free replicated data type](https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type) (CRDT). Proof of work is used to rate limit writes to the shared data store on public networks and as one thing that can be taken into consideration for conflict resolution. Other things that can be considered (at the querying client's discretion) are local subjective heuristics at the node and certificates issued by a certificate authority. The name LF comes from the short story [The Aleph](https://en.wikipedia.org/wiki/The_Aleph_%28short_story%29) by Jorge Luis Borges and the novel [Mona Lisa Overdrive](https://en.wikipedia.org/wiki/Mona_Lisa_Overdrive) by William Gibson. Borges' story involves a point in space that contains all other points, a fitting metaphor for a data store where every node stores everything. Gibson's novel features a sci-fi take on Borges' concept. At one point a character calls it the "LF" because "aleph" has been mis-heard as an acronym. We used LF because there's already an open source project called Aleph, it gives the command line client `lf` a short name, and because two levels of nerdy literary recursion are cool. ### Why Does This Exist? The purpose of LF is to provide for fully open decentralized systems what things like [etcd](https://github.com/etcd-io/etcd) and [consul](https://www.consul.io) provide in centrally managed environments, namely a fast reliable store for small but critical pieces of information. These are things like keys, certificates, identity information, configuration files, IPs, DNS names, URLs, data hashes, and so on. Most decentralized systems use distributed hash tables (DHTs) for this purpose. DHTs scale well but are slow, require a reliable global network to maintain full access to the data set, and are vulnerable to ["Sybil"](https://en.wikipedia.org/wiki/Sybil_attack) type attacks. We at ZeroTier wanted something very fast, secure, and continuously available even under unreliable network conditions. This prompted us to develop something fundamentally new. As far as we know nothing quite like LF exists (we looked). The closest analogs are cryptocurrency block chains and CRDT-based distributed databases. ### Features and Benefits * **Easy to use and deploy**, ships with useful defaults and credentials to use an open public network. * **Fully decentralized** system with open participation and no single points of failure. (Private LF databases can be created that require certificates, but this is optional.) * **Fast sub-second nearline queries** against the entire global data set at all times, even when network is down. * **Versatile security model** allowing user choice between different conflict resolution mechanisms that can be used alone or in combination with one another. These include local heuristics, proof of work "weight," elective trust of other nodes, and certificates. * **Flexible record lookup** API allowing multiple nested keys and range queries against ***-bit ordinals associated with each key. * **Everything is encrypted** including record keys making the system private and secure even though all data is replicated globally. Records whose keys are not known cannot even be enumerated or looked up. * **Liveness signaling** through the *pulse* mechanism to enable LF to be used to advertise service, object, or user availability in near-real-time. ### Limitations and Disadvantages * **LF is only good for small bits of information** that don't change very often. It's explicitly not designed for large data. * **[CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem) trade-off: AP** (availability, partition-tolerance). The database is eventually consistent and locks are not supported. * **High CPU, memory, storage, and bandwidth requirements** make LF unsuitable for small and resource constrained devices. * **Storage requirements grow over time** in a manner not unlike a block chain. Fortunately [storage is getting cheaper over time too](https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/). The data model and protocol are designed to permit partial data discarding and fractional nodes. These features are not implemented yet and likely won't be needed for years. ## Building and Running LF builds and runs on Linux, Mac, and probably Free/Open/NetBSD. It won't work on Windows yet but porting shouldn't be too hard if anyone wants it. It's mostly written in Go (1.11+ required) with some C for performance critical bits. It depends on a recent version of SQLite which is included in source form to avoid problems due to excessively old versions on some systems. To build on most platforms just type `make`. You will need Go 1.11 or newer (type `go version` to check) and a relatively recent C compiler supporting the C99 standard. ## Getting Started Both a command line client and a full node implementation are present in the same `lf` binary. Just run it and it will print help. LF ships out of the box with its command line client configured to query `lf.zerotier.com`, a public node using the default public database operated by ZeroTier. That means you can try a simple query right away: ```text $ ./lf get bad horse# Bad Horse, Bad Horse | bad#0 horse#0 Bad Horse, Bad Horse | bad#0 horse#1 He rides across the nation, the thoroughbred of sin | bad#0 horse#2 He got the application that you just sent in | bad#0 horse#3 It needs evaluation, so let the games begin | bad#0 horse#4 A heinous crime, a show of force | bad#0 horse#5 (a murder would be nice of course!) | bad#0 horse#6 Bad Horse, Bad Horse | bad#0 horse#7 Bad Horse, He's Bad! | bad#0 horse#8 The evil league of evil is watching so beware | bad#0 horse#9 The grade that you receive'll be your last, we swear | bad#0 horse#10 So make the bad horse gleeful, or he'll make you his mare | bad#0 horse#11 You're saddled up; there's no recourse | bad#0 horse#12 It's "hi-yo, silver!" | bad#0 horse#13 Signed: Bad Horse. | bad#0 horse#14 ``` Drop the `./` before `lf` if you already installed the binary somewhere in your path. Don't forget the trailing hash sign on `horse#` or you will only get the first record (more on this later). These are the lyrics to [Bad Horse](https://www.youtube.com/watch?v=VNhhz1yYk2U) from the musical [Dr. Horrible's Sing-Along Blog](https://drhorrible.com). Yes, we put that in the public data store. We hope it's fair use. ### Selectors, Ordinals, and Values Record keys are cryptographic objects called *selectors* that are generated from plain text selector *names* and *ordinals*. The name of a selector is any text or binary string. The ordinal is an unsigned ***-bit integer. (If no ordinal is specified, it is zero.) Names can only be looked up directly but if you know a name you can ask for ranges of its ordinals. Bad Horse is stored as a series of records with two selectors and with the ordinals in the second selector placing them in order. You can see them in the output above. Now try a few variations: ```text $ ./lf get bad horse Bad Horse, Bad Horse | bad#0 horse#0 $ ./lf get bad horse#2#10 He rides across the nation, the thoroughbred of sin | bad#0 horse#2 He got the application that you just sent in | bad#0 horse#3 It needs evaluation, so let the games begin | bad#0 horse#4 A heinous crime, a show of force | bad#0 horse#5 (a murder would be nice of course!) | bad#0 horse#6 Bad Horse, Bad Horse | bad#0 horse#7 Bad Horse, He's Bad! | bad#0 horse#8 The evil league of evil is watching so beware | bad#0 horse#9 The grade that you receive'll be your last, we swear | bad#0 horse#10 ``` When you ask for `bad horse#` (in the very first example) the trailing hash is expanded to `#0#18446744073709551615`. That huge number is the maximum value of a ***-bit unsigned integer. Leaving off the trailing hash is equivalent to `#0` and gets only ordinal zero. Using `#2#10` asks for ordinals two through ten inclusive. Try it! #### Conflict Resolution If LF is open and decentralized, what happens if someone does this? ```text $ ./lf set bad horse#0 'Good Horse, Good Horse' ``` A record will be created and published. Chances are nobody will see it. The already existing Bad Horse records have three big things going for them: 1. They're older and therefore more work (in a proof of work sense) has been added to the DAG since their creation. Each record references older records and when it gets stitched into the DAG its work is transferred to its ancestors all the way back to the beginning of time, increasing a metric called their *weight*. A determined attacker willing to buy a lot of compute power could overcome this, but the more time passes the more costly this attack becomes. 2. They are going to have lower *local reputation* scores on current nodes, including the one you are likely querying. When a record arrives at a running node that collides with an existing fully synchronized and verified record, it gets flagged as suspect. Lower reputation records also don't get picked to be links for new records, meaning they pick up weight more slowly and get replicated more slowly (if at all). 3. *Oracle nodes* will gossip about the impostor, but not the original. An oracle node is a node configured to generate *commentary*, special records that perform the dual function of adding work to the DAG and commenting on records these nodes find suspect. The commentary of all oracle nodes is available but users are free to decide whether or not to trust it. But what does happen to lower weight or lower reputation records? Try this: ```text $ ./lf -json get bad horse#0 ``` The LF API returns results as an array of arrays. Each array in the first dimension represents a set of records that share the same set of selector keys. The array in the second dimension is these records sorted in descending order of trust according to whatever conflict resolution mechanism(s) are active. The default behavior of the command line client is to show only the "winner" for each set of selectors, but `-json` tells it to request and dump everything. Since names are first come first serve, short names like `bad` aren't the sorts of names you'll want to use for the first selector for records in a production system. Good naming strategies include reverse-DNS-order names similar to Java class names (e.g. `com.zerotier...`), unique GUIDs, and random strings. The latter options are good for systems that don't want to advertise their keys globally and want to avoid making their records available to users not in-the-know. Just remember that [naming things is one of the two hard things in computing](https://www.martinfowler.com/bliki/TwoHardThings.html). ### Running a Full Node Running a node on the public network is easy: ```text $ ./lf node-start & ``` If you are a normal user the node will keep its files in `$HOME/.lf`. If you are root it will be `/var/lib/lf`. Obviously you'll want to set this up via *systemd* or some other process-supervising system to run it on a real server. Nodes listen by default on P2P port 9908 and HTTP port 9***0. These can be changed with command line options. If you're on a server on the open Internet and are root (or can bind port 443) you can use `-letsencrypt ` to enable SSL and automatically obtain a certificate from [Let's Encrypt](https://letsencrypt.org). Here are some of the node-specific files you'll see in the node's home directory: * `lf.pid`: PID of running node * `node.log`: Node log output (`tail -f` this file to watch your node synchronize) * `identity-secret.pem`: Secret ECC key for node's commentary and also for encryption of node-to-node P2P communications * `authtoken.secret`: HTTP API bearer auth token. * `peers.json`: Periodically updated to cache information about P2P peers of this node * `genesis.lf`: Genesis records for the network this node participates in * `node.db`: SQLite indexing and meta-data database * `records.lf`: Flat data file containing all records in binary serialized format. This file will get quite large over time. * `weights.b??`: Record proof of work weights with 96-bit weights striped across three memory mapped files to reduce I/O load during weight updates * `graph.bin`: Memory mapped record linkage graph data structure * `wharrgarbl-table.bin`: Static table used by proof of work algorithm, auto-generated on first use The node will also create or modify `client.json` to add its own local (127.0.0.1) HTTP URL so that client queries on the local system will use it. Watch `node.log` after you start your server for the first time and you'll see it synchronizing with the network. This can take a while. Once the node is fully synchronized you should be able to make queries against any data. A few caveats for running nodes: * We recommend a ***-bit system with a bare minimum of 1gb RAM for full nodes. Full nodes usually use between 384mb and 1gb of RAM and may also use upwards of 1gb of virtual address space for memory mapped files. 32-bit systems may have issues with address space exhaustion. * Don't locate the node's files on a network share (NFS, CIFS, VM-host mount, etc.) as LF makes heavy use of memory mapping and this does not always play well with network drives. It could be slow, unreliable, or might not work at all. * Hard-stopping a node with `kill -9` or a hard system shutdown could corrupt the database. This might get improved in the future. ### Pulses for Liveness Signaling In real world systems it's very often useful to be able to provide information about which objects are "alive." Examples include users in a chat systems, nodes in a distributed network, or services in a distributed micro-service architecture. Signaling liveness in LF by constantly updating records would be very inefficient. LF therefore provides (since 0.9.18) a much more efficient method called a **pulse**. Each LF record contains a ***-bit field called *PulseToken*. This field contains the final hash that results from iteratively evaluating a ***-bit AES-based short input cryptographic hash function (see `th***.go`) 525,600 times. The first hash in this hash chain is based on a hash of the record's plain text selector name(s), ordinal(s), timestamp, and owner secret key. 525,600 is non-coincidentally the number of minutes in one year. To generate a pulse the owner of a record computes and broadcasts hash N in the record's pulse hash chain where N equals 525,600 minus the number of minutes that have elapsed since the record's timestamp. Any node can verify the validity of a pulse by hashing the pulse M times where M equals the number of minutes being signaled by the pulse and verifying that the result equals the record's *PulseToken*. Since pulses can only work up to one year since a record's original timestamp, after one year a new record containing the same data will have to be made. The pulse feature reduces the frequency of full record rewrites for liveness messaging from once-per-minute or once-per-few-minutes to once-per-year. This is effectively the same as the [S/KEY](https://en.wikipedia.org/wiki/S/KEY) one-time password scheme originally invented by cryptographer Leslie Lamport, only in this case each one-time password indicates a timestamp. Pulses are extremely lightweight. They consist of a ***-bit hash plus three bytes for the minute count. Ten million objects could signal liveness with one minute resolution and consume only 1.8 megabytes per second of bandwidth (1.8% of a one gigabit connection). A billion objects would require a bit more than two gigabits of bandwidth. If scalability of the pulse mechanism becomes a concern broadcasting could be replaced by a sharded or DHT-like pulse signaling algorithm. This lightness comes at a cost of some security. A ***-bit hash is not sufficient to prevent a determined attacker with large amount of compute power or storage from falsifying pulse messages, but the cost is high enough to deter casual or mass scale abuse. If you need to convey liveness information with a higher level of security than that offered by the pulse mechanism, a secondary check must be employed. Since pulses typically signal liveness the simplest approach would be to directly verify liveness by querying the object or service in question before concluding definitively that it is alive. On the public network there's a record called `pulse.test` that can be used to test pulses. Try `./lf -json get pulse.test` to output verbose JSON information about this record: ```json { "Hash": "=iJniIEzkyHOg6a3h1j84NfK9zAimbKOIdtueFiBmhqW", "Size": 253, "Record": { "Value": "\buejQNR7CCbAQXvwnfSMKDs6uEmQcIA92E0olsb4c", "Owner": "@BwAIObsaWYF1CiOXgol", "Links": ["=WWXjsYR5RofL0HX9kQ5SgXxaG4K53UBAJX9cAaAVBbz", "=yJbYgUszcV0sHRuuqpADsdu1tmM5DSIXdZuIkMiiZds"], "Timestamp": 15***413035, "PulseToken": 2307216339445404689, "Type": 0, "Selectors": [ { "Ordinal": "\b015cQjQWnbkEmtGKpOoJX", "Claim": "\bYRsSUaW7KRTQulx9HPPxmMjleJFuA2rCfN9J2FRKXLtRSashX1glxLt" } ], "Work": "\b1hUKPQBAAW6CoI6KeBP", "WorkAlgorithm": 1, "Signature": "\b14RFxlZgEuGoalSdN6Z4SyknXw6QMU145kYLDUIvcPY5AsRH6IyfCFUyAxONARQpUPsosc3VgCLq" }, "Value": "This record is being pulsed.", "Pulse": 15***413815, "Trust": 0.9, "LocalTrust": 1, "OracleTrust": 1, "Weight": [0, 0, 0, 71774169], "Signed": false } ``` One of ZeroTier's nodes has a *cron* job running to emit this pulse every minute. The field *Pulse* will indicate the record's timestamp plus the number of minutes indicated by the latest received pulse. Pulses are broadcast using a best-effort rumor mill algorithm. A node caches information about the latest pulses it has observed but nodes do not get old pulses when they synchronize nor are pulses re-transmitted if a node re-joins after being offline for a period of time. Pulses are ephemeral signals of liveness, not permanent parts of the data set. ### Oracles and Elective Trust LF's intrinsic conflict resolution mechanism relies upon proof of work as a hard to for ... ...

近期下载者

相关文件


收藏者