Open Source Release: java-hll

We’re happy to announce our newest open-source project, java-hll, a HyperLogLog implementation in Java that is storage-compatible with the previously released postgresql-hll and js-hll implementations. And as the rule of three dictates, we’ve also extracted the storage specification that makes them interoperable into it’s own repository. Currently, all three implementations support reading storage specification v1.0.0, while only the PostgreSQL and Java implementations fully support writing v1.0.0. We hope to bring the JS implementation up to speed, with respect to serialization, shortly.


  1. A couple of maybe useful proposals for improvement:

    – BitUtil.leastSignificantBit() will be much slower than Long.numberOfTrailingZeroes(), even testing for zero separately, as the latter is intrinsified to s single instruction. This is a simple test on random numbers:

    Java: elapsed 678740292, 0.678740292 ns/call
    Test-based: elapsed 1890064725, 1.890064725 ns/call

    “Test-based” is the code in BitUtil. “Java” is
    public static int javaLsb( final long value ) {
    return value == 0 ? -1 : Long.numberOfTrailingZeros( value );

    It’s an almost threefold slowdown (and the costs of memory access and looping are constant, so it’s probably even more). This is on a recent Haswell architecture—older architectures might give you just a twofold slowdown, but definitely the Java version is much faster (and you do not need to maintain additional code :).

    – union() in the homogeneous FULL case can be improved in speed around 10x using broadword programming. The cost of extracting, maximizing and writing back each register is very high: instead, you just need a few passes of logical word-by-word operations. You can have a look at the Java code of the HyperBall.max() function in WebGraph’s distribution, in you’re interested. It depends also on how frequently you use the FULL representation.

    • Sebastiano,

      Thanks so much for the tips! I’ll definitely be getting my hands dirty in the next few weeks improving the performance of the library.

  2. First off: great work!

    I did find a small bug: The large estimator (both implementation and cut-off) are incorrect; it should use ‘double twoToL = Math.pow(2.0, totalBits – 1);’ instead of the shift (it can easily be >64 which will give an overflow)

    As an improvement, ‘Fill(0)’ is the obvious winner; apart from that, I’d consider changing the state to use the Empty set again after a Clear call.

    • Hi Stefan,

      Thank you for pointing out that bug and for your kinds words! I’ll be spending some quality time in the coming month cleaning up the performance issues you and Sebastiano (above) mentioned as well as this bug. Though you probably deduced this yourself, the large estimator cutoff code is still safe for regwidth = 5 as log2m cannot be greater than 30.


      • Hi Timon,

        You’re welcome.

        As for performance, I would use the DeBruijn code for counting right hand zero’s, as described at . I found that it’s quite hard to beat that one in .NET – and if Java does its work comparable it should give roughly the same results.

        A small question: I’ve been wondering if it would be possible to change the implementation of HLL to create a histogram of the upper bits as ‘buckets’ (x-axis) and the cardinality within the bucket as y-axis? As values I intend to use non-hashed int or long values. I don’t care much about the bucket width (whatever goes goes…) – but this should be possible, right?

        Kind regards,

      • I know this might seem condenscent, but: do you people *ever* make microbenchmarks before making statements like that?

        Java: elapsed 72316560, 0.7231656 ns/call
        Test-based: elapsed 188450361, 1.88450361 ns/call
        De Brujin: elapsed 288912842, 2.88912842 ns/call

      • Sebastiano Vigna,

        Yes, I made quite a few actually. Perhaps you haven’t read my comment; it clearly states that I tested it in .NET, not in Java. Apparently your findings are that Java has different characteristics than .NET. Although I haven’t tested that, I think your conclusion also highly depends on the test data and your CPU.

        A similar approach holds for 2log’s, for which I’ve published the code on this page: . As you can also read, simple ns/call doesn’t cut it; different architectures give quite different results.

        As for test-based: this largely depends on the data you feed it; branch prediction can easily break it. I’m not going into details about that; there’s already more than enough information on the internet about that. Since the data that’s feeded into HLL is sort-of random by nature, I wouldn’t recommend this unless tests say otherwise. From the benchmarks that I’ve ran in .NET with random data, this was the clear loser.

        In .NET the DeBruijn method is the fastest and most stable; I’ve tested all implementations that you can find on the bithacks page as well as a few other approaches, both for the Intel x64 and x86 architectures. I haven’t tested on ARM or AMD architectures.

  3. In case you did not notice, this library is in Java. The fact that you claim that benchmarks might depend on the CPU while suggesting choices based on the experience with another entirely platform (!) is quite striking. The two systems are quite different—I’m not going into details.

    In general, Java is quite slow at accessing arrays. Historically, elegant solutions such as the De Briujin LSB have not performed so well because of the table lookup. You can see, for example, the comparison between selection algorithms at . It is quite more complicated than LSB or log2, but it gives the idea of how results can be quite different when array access is involved depending on whether your are writing in C, Java or .NET.

    The benchmark has been designed so that 99% of the calls to the test-based solution exit at the second test. Thus, it is extremely easy in terms of branch prediction. The other algorithms are almost data-oblivious (the exception is the test for zero before Long.numberOfTrailingZeros()) and execute exactly the same operations independently of the argument. There is no data dependency.

    In general, in 2014 suggesting to outsmart the compiler and the JVM is a bad approach that should be discouraged, unless you have very cogent reasons, like a game-changing new research algorithm. The JVM can choose for you the best hardware implementation on the fly, without having to hardwire algorithms that might not match the peculiarities of the architecture you’re running on. That’s why the low-level methods in Integer and Long have been introduced in the first place.

    • Sebastiano, yes I did notice, and clearly stated both the platform and the assumptions in my reply.

      I’m interested in HLL for various reasons and created an implementation as well in .NET. I bumped on this forum and started to read. Because I found a remark about performance I posted a suggestion (the one I use, which was carefully tested). I see no reason for the hostility.

      As for JVM, you do know that not everyone uses that, right? I mean, IKVM.Net is quite a commonly used tool, to name just one. In my experience with both .NET and Java, both VM’s aren’t nearly as optimized as people would like to believe. In fact, things have gotten worse – in C++ you know what’s going to happen, with modern VM’s you don’t until you start benchmarking. Other approaches than the ones that have been implemented might just do the trick; it’s just a matter of finding and then testing them all. So no, contrary of your belief, I don’t trust my compiler. But that’s just my opinion and there’s advantages as well; I’m well aware of them.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: