# Count Sketch¶

Count Sketch is a simple space-efficient probabilistic data structure that is used to estimate frequencies of elements in data streams and can address the Heavy hitters problem. It was proposed by Moses Charikar, Kevin Chen, and Martin Farach-Colton in 2002.

## Build a sketch¶

You can build a new sketch either from specifiyng its dimensions (number of counter arrays and their length), or from the expected overestimation diviation and standard error probability.

### Build filter from its dimensions¶

```from pdsa.frequency.count_min_sketch import CountSketch

cs = CountSketch(num_of_counters=5, length_of_counter=2000)
```

### Build filter from the expected errors¶

In this case the number of counter arrays and their length will be calculated corresponsing to the expected overestimation and the requested error.

```from pdsa.frequency.count_min_sketch import CountSketch

cs = CountSketch.create_from_expected_error(deviation=0.000001, error=0.01)
```

Note

The deviation is the error ε in answering the paricular query. For example, if we expect 10^7 elements and allow the fixed overestimate of 10, the deviation is 10/10^7 = 10^{-6}.

The error is the standard error δ (0 < error < 1).

Note

The Count–Min Sketch is approximate and probabilistic at the same time, therefore two parameters, the error ε in answering the paricular query and the error probability δ, affect the space and time requirements. In fact, it provides the guarantee that the estimation error for frequencies will not exceed ε x n with probability at least 1 – δ.

## Index element into the sketch¶

```cs.add("hello")
```

Note

It is possible to index into the counter any elements (internally it uses repr() of the python object to calculate hash values for elements that are not integers, strings or bytes.

## Estmiate frequency of the element¶

```print(cs.frequency("hello"))
```

Warning

It is only an approximation of the exact frequency.

## Size of the sketch in bytes¶

```print(cs.sizeof())
```

## Length of the sketch¶

```print(len(cs))
```