bloom_filter

所属分类:collect
开发工具:Crystal
文件大小:0KB
下载次数:0
上传日期:2022-01-08 13:05:29
上 传 者sh-1993
说明:  Crystal lang中的Bloom筛选器实现,
(Bloom filter implementation in Crystal lang,)

文件列表:
.travis.yml (124, 2023-08-25)
CHANGELOG.md (90, 2023-08-25)
LICENSE (1081, 2023-08-25)
Makefile (201, 2023-08-25)
samples/ (0, 2023-08-25)
samples/basic.cr (340, 2023-08-25)
samples/benchmark.cr (800, 2023-08-25)
samples/dump_and_load.cr (289, 2023-08-25)
samples/optimal.cr (141, 2023-08-25)
samples/union_and_intersection.cr (415, 2023-08-25)
samples/visualize.cr (289, 2023-08-25)
shard.yml (117, 2023-08-25)
spec/ (0, 2023-08-25)
spec/bloom_filter_spec.cr (4257, 2023-08-25)
spec/spec_helper.cr (45, 2023-08-25)
src/ (0, 2023-08-25)
src/bloom_filter.cr (780, 2023-08-25)
src/bloom_filter/ (0, 2023-08-25)
src/bloom_filter/filter.cr (4712, 2023-08-25)
src/bloom_filter/version.cr (43, 2023-08-25)

# Bloom Filter [![Build Status](https://travis-ci.org/crystal-community/bloom_filter.svg?branch=master)](https://travis-ci.org/crystal-community/bloom_filter) Implementation of [Bloom Filter](https://en.wikipedia.org/wiki/Bloom_filter) in [Crystal lang](http://crystal-lang.org/). * [Installation](#installation) * [Usage](#usage) * [Basic](#basic) * [Creating a filter with optimal parameters](#creating-a-filter-with-optimal-parameters) * [Dumping to file and loading](#dumping-into-a-file-and-loading) * [Union and intersection](#union-and-intersection) * [Visualization](#visualization) * [Benchmark](#benchmark) * [Contributors](#contributors) ## Installation Add this to your application's `shard.yml`: ```yaml dependencies: bloom_filter: github: crystal-community/bloom_filter ``` ## Usage ### Basic ```crystal require "bloom_filter" # Create filter with bitmap size of 32 bytes and 3 hash functions. filter = BloomFilter.new(bytesize = 32, hash_num = 3) # Insert elements filter.insert("Esperanto") filter.insert("Toki Pona") # Check elements presence filter.has?("Esperanto") # => true filter.has?("Toki Pona") # => true filter.has?("Englsh") # => false ``` ### Creating a filter with optimal parameters Based on your needs(expected number of items and desired probability of false positives), your can create an optimal bloom filter: ```crystal # Create a filter, that with one million inserted items, gives 2% of false positives for #has? method filter = BloomFilter.new_optimal(1_000_000, 0.02) filter.bytesize # => 1017796 (993Kb) filter.hash_num # => 6 ``` ### Dumping into a file and loading It's possible to save existing bloom filter as a binary file and then load it back. ```crystal filter = BloomFilter.new_optimal(2, 0.01) filter.insert("Esperanto") filter.dump_file("/tmp/bloom_languages") loaded_filter = BloomFilter.load_file("/tmp/bloom_languages") loaded_filter.has?("Esperanto") # => true loaded_filter.has?("English") # => false ``` ### Union and intersection Having two filters of the same size and number of hash functions, it's possible to perform union and intersection operations: ```crystal f1 = BloomFilter.new(32, 3) f1.insert("Esperanto") f1.insert("Spanish") f2 = BloomFilter.new(32, 3) f2.insert("Esperanto") f2.insert("English") # Union f3 = f1 | f2 f3.has?("Esperanto") # => true f3.has?("Spanish") # => true f3.has?("English") # => true # Intersection f4 = f1 & f2 f4.has?("Esperanto") # => true f4.has?("Spanish") # => false f4.has?("English") # => false ``` ### Visualization If you want to see how your filter looks like, you can visualize it: ```crystal f1 = BloomFilter.new(16, 2) f1.insert("Esperanto") puts "f1 = (Esperanto)" puts f1.visualize f2 = BloomFilter.new(16, 2) f2.insert("Spanish") puts "f2 = (Spanish)" puts f2.visualize f3 = f1 | f2 puts "f3 = f1 | f2 = (Esperanto, Spanish)" puts f3.visualize ``` Output: ``` f1 = (Esperanto) █ █ f2 = (Spanish) █ █ f3 = f1 | f2 = (Esperanto, Spanish) █ █ █ █ ``` In this way, you can actually see which bits are set:) ## Benchmark Performance of Bloom filter depends on the following parameters: * Size of the filter * Number of hash functions * Length of the input string To run benchmark from `./samples/benchmark.cr`, simply run make task: ``` $ make benchmark Number of items: 100000000 Filter size: 117005Kb Hash functions: 7 String size: 13 user system total real insert 0.004227 0.000000 0.004227 ( 2.769349) has? (present) 0.007980 0.000000 0.007980 ( 5.223778) has? (missing) 0.004318 0.000000 0.004318 ( 2.829521) ``` ## Contributors - [greyblake](https://github.com/greyblake) Potapov Sergey - creator, maintainer - [funny-falcon](https://github.com/funny-falcon) Sokolov Yura - better hash algorithms

近期下载者

相关文件


收藏者