Real-Time Streaming with Rsyslog and ZeroMQ

Rsyslog + 0MQHow do we go about streaming data in real time? At AK, we use Rsyslog in conjunction with ZeroMQ and a little AK secret sauce. Why Rsyslog? We looked for technology that existed in the world today that solved >90% of the problem. Since the beginning of modern UNIX operating systems, system logging has existed in the computer world and has evolved into real-time log routers and aggregators.

  • Rsyslogd allows for multiple inputs and outputs.
  • Rsyslogd allows for multiple routes based on stream type, origination (location and/or application), destinations.

As such, AK has written a ZeroMQ Rsyslog module

  • ZeroMQ input/output interface (connect/bind, push/pull)
  • pub/sub type coming soon

Simply put, we at AK have moved to a real-time data streaming process by integrating the Rsyslog service with the ZeroMQ library. This has allowed us to move from a brittle system of large scheduled data migrations and deferred processing to a lighter weight model of real time streaming data and processing. There are many benefits to this, including high scalability, durability and verification, real-time streaming among multiple data centers, and efficiency. We believe that others who have the same issues of counting and providing insights to massive data sets will follow in moving to a real time data analytics platform.

The pub-sub ZeroMQ integration. This is beyond cool since it basically allows us to expose a tap into the event stream. You want to simply connect to the event stream and try out some new algorithm? It’s trivial. Put ZeroMQ on the front and start listening. You want to grab a few minutes worth of events as they come in? Just connect and take what you need. No more going off to the log server, finding the logs, parsing them, breaking them up, etc, etc, etc. Just tap and go.

Real-Time Streaming for Data Analytics

There are many reasons why real-time streaming is a necessity in today’s data analytics infrastructure. Just having data isn’t good enough. It must be timely and it must be actionable. Here is an analogy:

It’s 11:45 PM on a Friday night and you’re out at Streaming Seven — a swanky new club — after a long week of working. The Fire Marshall comes and asks the door man “How many people are in the establishment?” The doorman, who has been keeping a streaming count, looks at his hand counter and says “300, our capacity is 500”. The Fire Marshall nods and moves on.

The next night you join your friends in their favorite hang-out. Again the Fire Marshall comes and asks the same question “How many people are in the establishment?” In this bar the manager looks around frantically and replies “I don’t know!” The Fire Marshall orders the lights on, the doors closed, stop serving drinks, and does a headcount. The manager is losing money by the minute. You’re left standing there with an empty drink thinking about Streaming Seven.

Given that data sets continue to increase in size while the window for analyzing those data sets invariably decreases in this fast-paced industry, it becomes more and more critical to filter and stream data to multiple process end points in real time. It is very costly to defer the analytics until later, not only in dollars but also processing times, complex unreliable data stores, and waiting customers.

Here at AK we use Rsyslog in conjunction with ZeroMQ to provide our real-time streaming infrastructure. My next few posts are going to dive into these technologies and outline why we’ve chosen them. Stay tuned!

Something Cool From AK on GitHub

Rsyslog + 0MQToday we at AK have released a rsyslog module for 0MQ on GitHub! We are very excited about how we have moved to a real-time processing model and wanted to share. Keep checking back to this blog for more info!

0MQ input and output modules for rsyslog

The Myth of Clouds

CloudsOver the last few years, people like us have been fighting off the “forces that be” to move everything into a public cloud. The belief is that if you move your software stack into a public cloud you will no longer need to worry about such tedious tasks as capacity  planning, capital expenditures, and staffing of a service delivery team.  You will just magically have a 100% uptime for your product.

One of the other myths that is particularly concerning to me is that  public clouds remove the requirement to have industry experts in networking, systems, and data on hand. You will change your process and product to fit inside of their systems. Today’s problems around real-time analytics and massively large data sets have grown larger than what fits inside of conventional thought, and sadly cloud computing has become a conventional thought for many in the Internet services world.

Hello!

Allow me to introduce myself. I am Dale, head of Service Delivery at AK. Our small, but highly skilled team has many years of service delivery experience with large service delivery platforms such as AOL, Salesforce, Netflix, and others. We have a unique view into building and managing cutting edge service delivery infrastructures with the correct balance of cost, scale, and performance required to meet today’s massively large data sets.

In this blog, we will show some of the tips and tricks we have learned on scaling for real-time ingest and processing, maximizing your uptimes, and other very exciting approaches to scaling and management.