Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4167

Support for HyperLogLog aggregate functions

    Details

    • Type: Task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      HyperLogLog is an algorithm/structure by which cardinality of a set may be estimated. It is much more efficient than count(distinct x) when a degree of error is permissible and can be used in warehousing applications, where multiple estimates of set cardinality can be merged to produce an estimate of the set union.

      See blog.aggregateknowledge.com for a lot of discussion of the algorithm and for an PostgreSQL based implementation similar to what would be suggested for inclusion in Maria.

      http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            serg Sergei Golubchik added a comment -

            Even according to this blog, there are different ways of approximating count(distinct x). And I don't feel myself versed enough in this to choose one approach over the others.

            On the other hand, that link above shows how someone has implemented a PostgreSQL extension for calculating HyperLogLog. You can do the same and impement a UDF for MariaDB that calculates HyperLogLog. And then, if you'd like it, we can add it to the official MariaDB distribution.

            Show
            serg Sergei Golubchik added a comment - Even according to this blog, there are different ways of approximating count(distinct x). And I don't feel myself versed enough in this to choose one approach over the others. On the other hand, that link above shows how someone has implemented a PostgreSQL extension for calculating HyperLogLog. You can do the same and impement a UDF for MariaDB that calculates HyperLogLog. And then, if you'd like it, we can add it to the official MariaDB distribution.

              People

              • Assignee:
                Unassigned
                Reporter:
                mjohnston Mathew Johnston
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - 3 days
                  3d
                  Remaining:
                  Remaining Estimate - 3 days
                  3d
                  Logged:
                  Time Spent - Not Specified
                  Not Specified