Quantile Digest (q-digest)

Quantile Digest, or q-digest, is a tree-based stream summary algorithm that was proposed by Nisheeth Shrivastava, Subhash Suri et al. in 2004 in the context of monitoring distributed data from sensors.

from pdsa.rank.qdigest import QuantileDigest

qd = QuantileDigest(3, 5)
for i in range(100):
    qd.add(random.randrange(0, 8))

qd.interval_query(2, 6)

Build a q-digest

Quantile Digest is designed to be built on integer numbers from a known range.

The range of the supported integers is defined by the number of bytes in thier maximal representation. Thus, for k-bytes integers, the range will be [0, 2^k - 1].

from pdsa.rank.qdigest import QuantileDigest

qd = QuantileDigest(3, 5)


The ranges up to 32 bytes only are supported in the current implementation.

Add element into q-digest


Quantile Query

Given a fraction q from [0, 1], the quantile query is about to find the value whose rank in a sorted sequence of the n values is q * n.


Inverse Quantile Query

Given an element, the inverse quantile query is about to find its rank in sorted sequence of values.


Interval (range) Query

Given a value the interval (range) query is about to find the number of elements in the given range in the sequence of elements.

qd.interval_query(3, 6)

Merge q-digests



Only q-digets with same compression_factor and range are possible to merge correctly.

Length of the q-digest

Length of the q-digest is the number of buckets (nodes) included into the q-digest.


Size of the q-digest in bytes



Since we can’t calculate exact size of a dict in Cython, this function return some estimation based an ideal size of keys, values of each bucket.

Count of elements in the q-digest



While we can’t say exactly which elements are in the q-digest, (because the compression is a lossy operation), it’s still possible to say how many in total elements were added.