There are many reasons why real-time streaming is a necessity in today’s data analytics infrastructure. Just having data isn’t good enough. It must be timely and it must be actionable. Here is an analogy:
It’s 11:45 PM on a Friday night and you’re out at Streaming Seven — a swanky new club — after a long week of working. The Fire Marshall comes and asks the door man “How many people are in the establishment?” The doorman, who has been keeping a streaming count, looks at his hand counter and says “300, our capacity is 500″. The Fire Marshall nods and moves on.
The next night you join your friends in their favorite hang-out. Again the Fire Marshall comes and asks the same question “How many people are in the establishment?” In this bar the manager looks around frantically and replies “I don’t know!” The Fire Marshall orders the lights on, the doors closed, stop serving drinks, and does a headcount. The manager is losing money by the minute. You’re left standing there with an empty drink thinking about Streaming Seven.
Given that data sets continue to increase in size while the window for analyzing those data sets invariably decreases in this fast-paced industry, it becomes more and more critical to filter and stream data to multiple process end points in real time. It is very costly to defer the analytics until later, not only in dollars but also processing times, complex unreliable data stores, and waiting customers.
Here at AK we use Rsyslog in conjunction with ZeroMQ to provide our real-time streaming infrastructure. My next few posts are going to dive into these technologies and outline why we’ve chosen them. Stay tuned!