Counting Bloom Filter

This implementation uses 4-bit counter implementation to store counts of elements and bitvector to store the bloom filter array.

from pdsa.membership.counting_bloom_filter import CountingBloomFilter

bf = CountingBloomFilter(1000000, 5)
bf.add("hello")
bf.test("hello")
bf.remove("hello")

Build a filter

You can build a new filter either from specifiyng its length and number of hash functions, or from the expected capacity and error probability.

Build filter from its length and number of hash function

from pdsa.membership.counting_bloom_filter import CountingBloomFilter

bf = CountingBloomFilter(100000, 5)

Note

Memory for the filter is assigned by chunks, therefore the length of the filter can be rounded up to use it in full.

Build filter from the expected capacity and error probability

In this case length of the filter and number of hash functions will be calculated to handle the requested number of elements with the requested error.

from pdsa.membership.counting_bloom_filter import CountingBloomFilter

bf = CountingBloomFilter().create_from_capacity(10000, 0.02)

Add element into the filter

bf.add("hello")

Note

It is possible to add into the filter any elements (internally it uses repr() of the python object to calculate hash values for elements that are not integers, strings or bytes.

Test if element is in the filter

bf.test("hello") == True

"hello" in bf

Delete element from the filter

bf.remove("hello")

Warning

The implementation uses 4-bit counters that freeze at value 15. So, the deletion, in fact, is a probabilistically correct only.

Size of the filter in bytes

print(bf.sizeof())

Length of the filter

print(len(bf))

Count of unique elements in the filter

print(bf.count())

Warning

It is only an approximation, since there is no reliable way to determine the number of unique elements that are already in the filter.