Classical Bloom Filter¶
This implementation uses bitvector to store the bloom filter array.
from pdsa.membership.bloom_filter import BloomFilter
bf = BloomFilter(1000000, 5)
bf.add("hello")
bf.test("hello")
Build a filter¶
You can build a new filter either from specifiyng its length and number of hash functions, or from the expected capacity and error probability.
Build filter from its length and number of hash function¶
from pdsa.membership.bloom_filter import BloomFilter
bf = BloomFilter(100000, 5)
Note
Memory for the filter is assigned by chunks, therefore the length of the filter can be rounded up to use it in full.
Build filter from the expected capacity and error probability¶
In this case length of the filter and number of hash functions will be calculated to handle the requested number of elements with the requested error.
from pdsa.membership.bloom_filter import BloomFilter
bf = BloomFilter.create_from_capacity(10000, 0.02)
Add element into the filter¶
bf.add("hello")
Note
It is possible to add into the filter any elements (internally it uses repr() of the python object to calculate hash values for elements that are not integers, strings or bytes.
Test if element is in the filter¶
bf.test("hello") == True
"hello" in bf
Size of the filter in bytes¶
print(bf.sizeof())
Length of the filter¶
print(len(bf))
Count of unique elements in the filter¶
print(bf.count())
Warning
It is only an approximation, since there is no reliable way to determine the number of unique elements that are already in the filter.