Details
-
Type:
Task
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
Description
HyperLogLog is an algorithm/structure by which cardinality of a set may be estimated. It is much more efficient than count(distinct x) when a degree of error is permissible and can be used in warehousing applications, where multiple estimates of set cardinality can be merged to produce an estimate of the set union.
See blog.aggregateknowledge.com for a lot of discussion of the algorithm and for an PostgreSQL based implementation similar to what would be suggested for inclusion in Maria.
http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
Even according to this blog, there are different ways of approximating count(distinct x). And I don't feel myself versed enough in this to choose one approach over the others.
On the other hand, that link above shows how someone has implemented a PostgreSQL extension for calculating HyperLogLog. You can do the same and impement a UDF for MariaDB that calculates HyperLogLog. And then, if you'd like it, we can add it to the official MariaDB distribution.