Counting Bloom Filter¶

This implementation uses 4-bit counter implementation to store counts of elements and bitvector to store the bloom filter array.

from pdsa.membership.counting_bloom_filter import CountingBloomFilter

bf = CountingBloomFilter(1000000, 5)
bf.add("hello")
bf.test("hello")
bf.remove("hello")

Build a filter¶

You can build a new filter either from specifiyng its length and number of hash functions, or from the expected capacity and error probability.

Build filter from its length and number of hash function¶

from pdsa.membership.counting_bloom_filter import CountingBloomFilter

bf = CountingBloomFilter(100000, 5)

Note

Memory for the filter is assigned by chunks, therefore the length of the filter can be rounded up to use it in full.

Build filter from the expected capacity and error probability¶

In this case length of the filter and number of hash functions will be calculated to handle the requested number of elements with the requested error.

from pdsa.membership.counting_bloom_filter import CountingBloomFilter

bf = CountingBloomFilter.create_from_capacity(10000, 0.02)

Add element into the filter¶

bf.add("hello")

Note

It is possible to add into the filter any elements (internally it uses repr() of the python object to calculate hash values for elements that are not integers, strings or bytes.

Test if element is in the filter¶

bf.test("hello") == True

"hello" in bf

Delete element from the filter¶

bf.remove("hello")

Warning

The implementation uses 4-bit counters that freeze at value 15. So, the deletion, in fact, is a probabilistically correct only.

Size of the filter in bytes¶

print(bf.sizeof())

Length of the filter¶

print(len(bf))

Count of unique elements in the filter¶

print(bf.count())

Warning

It is only an approximation, since there is no reliable way to determine the number of unique elements that are already in the filter.