We’re happy to announce our newest open-source project, java-hll, a HyperLogLog implementation in Java that is storage-compatible with the previously released postgresql-hll and js-hll implementations. And as the rule of three dictates, we’ve also extracted the storage specification that makes them interoperable into it’s own repository. Currently, all three implementations support reading storage specification v1.0.0, while only the PostgreSQL and Java implementations fully support writing v1.0.0. We hope to bring the JS implementation up to speed, with respect to serialization, shortly.
One of the first things that we wanted to do with HyperLogLog when we first started playing with it was to support and expose it natively in the browser. The thought of allowing users to directly interact with these structures — perform arbitrary unions and intersections on effectively unbounded sets all on the client — was exhilarating to us. We knew it could be done but we simply didn’t have the time.
Note: There’s more interesting math in the error bounds that we haven’t explored. Presenting error bounds on a measurement that cannot mathematically be less than zero is problematic. For instance, if you have a ruler that can only measure to 1/2″ and you measure an object that truly is 1/8″ long you can say “all I know is this object measures under 0.25 inches”. Your object cannot measure less than 0 inches, so you would never say 0 minus some error bound. That is, you DO NOT say 0.0 ± 0.25 inches. Similarly with set intersections there is no meaning to a negative intersection. We did some digging and just threw our hands up and tossed in what we feel are best practices. In the js-hll code we a) never show negative values and b) we call “spurious” any calculation that results in an answer within 20% of the error bound. If you have a better answer, we would love to hear it!