Quantile Digest (q-digest)

Quantile Digest, or q-digest, is a tree-based stream summary algorithm that was proposed by Nisheeth Shrivastava, Subhash Suri et al. in 2004 in the context of monitoring distributed data from sensors.

from pdsa.rank.qdigest import QuantileDigest

qd = QuantileDigest(3, 5)
for i in range(100):
    qd.add(random.randrange(0, 8))
qd.compress()

qd.quantile_query(0.5)
qd.inverse_quantile_query(5)
qd.interval_query(2, 6)

Build a q-digest

Quantile Digest is designed to be built on integer numbers from a known range.

The range of the supported integers is defined by the number of bytes in thier maximal representation. Thus, for k-bytes integers, the range will be [0, 2^k - 1].

from pdsa.rank.qdigest import QuantileDigest

qd = QuantileDigest(3, 5)

Note

The ranges up to 32 bytes only are supported in the current implementation.

Add element into q-digest

qd.add(5)

Quantile Query

Given a fraction q from [0, 1], the quantile query is about to find the value whose rank in a sorted sequence of the n values is q * n.

qd.quantile_query(0.95)

Inverse Quantile Query

Given an element, the inverse quantile query is about to find its rank in sorted sequence of values.

qd.inverse_quantile_query(4)

Interval (range) Query

Given a value the interval (range) query is about to find the number of elements in the given range in the sequence of elements.

qd.interval_query(3, 6)

Merge q-digests

qd1.merge(qd2)

Warning

Only q-digets with same compression_factor and range are possible to merge correctly.

Length of the q-digest

Length of the q-digest is the number of buckets (nodes) included into the q-digest.

print(len(qd))

Size of the q-digest in bytes

print(qd.sizeof())

Warning

Since we can’t calculate exact size of a dict in Cython, this function return some estimation based an ideal size of keys, values of each bucket.

Count of elements in the q-digest

print(qd.count())

Warning

While we can’t say exactly which elements are in the q-digest, (because the compression is a lossy operation), it’s still possible to say how many in total elements were added.