We Love Rsyslog

Every day we’re growing to appreciate rsyslog more as we scale up. Yesterday I captured this shot of iptraf running on one of our event routing hosts. 158mbit/s in and 583mbit/s out for a combined total of 740mbit/s of traffic being handled by a single rsyslog server. The server is both filtering and routing by various message properties. At this rate of traffic the hardware isn’t even breaking a sweat. We realize we’re a bit off the beaten path with our use of rsyslog for message routing at AK. This screen capture is part of the story as to why:

rsyslog routing some serious traffic

Real-Time Streaming with Rsyslog and ZeroMQ

Rsyslog + 0MQHow do we go about streaming data in real time? At AK, we use Rsyslog in conjunction with ZeroMQ and a little AK secret sauce. Why Rsyslog? We looked for technology that existed in the world today that solved >90% of the problem. Since the beginning of modern UNIX operating systems, system logging has existed in the computer world and has evolved into real-time log routers and aggregators.

  • Rsyslogd allows for multiple inputs and outputs.
  • Rsyslogd allows for multiple routes based on stream type, origination (location and/or application), destinations.

As such, AK has written a ZeroMQ Rsyslog module

  • ZeroMQ input/output interface (connect/bind, push/pull)
  • pub/sub type coming soon

Simply put, we at AK have moved to a real-time data streaming process by integrating the Rsyslog service with the ZeroMQ library. This has allowed us to move from a brittle system of large scheduled data migrations and deferred processing to a lighter weight model of real time streaming data and processing. There are many benefits to this, including high scalability, durability and verification, real-time streaming among multiple data centers, and efficiency. We believe that others who have the same issues of counting and providing insights to massive data sets will follow in moving to a real time data analytics platform.

The pub-sub ZeroMQ integration. This is beyond cool since it basically allows us to expose a tap into the event stream. You want to simply connect to the event stream and try out some new algorithm? It’s trivial. Put ZeroMQ on the front and start listening. You want to grab a few minutes worth of events as they come in? Just connect and take what you need. No more going off to the log server, finding the logs, parsing them, breaking them up, etc, etc, etc. Just tap and go.

Real-Time Streaming for Data Analytics

There are many reasons why real-time streaming is a necessity in today’s data analytics infrastructure. Just having data isn’t good enough. It must be timely and it must be actionable. Here is an analogy:

It’s 11:45 PM on a Friday night and you’re out at Streaming Seven — a swanky new club — after a long week of working. The Fire Marshall comes and asks the door man “How many people are in the establishment?” The doorman, who has been keeping a streaming count, looks at his hand counter and says “300, our capacity is 500”. The Fire Marshall nods and moves on.

The next night you join your friends in their favorite hang-out. Again the Fire Marshall comes and asks the same question “How many people are in the establishment?” In this bar the manager looks around frantically and replies “I don’t know!” The Fire Marshall orders the lights on, the doors closed, stop serving drinks, and does a headcount. The manager is losing money by the minute. You’re left standing there with an empty drink thinking about Streaming Seven.

Given that data sets continue to increase in size while the window for analyzing those data sets invariably decreases in this fast-paced industry, it becomes more and more critical to filter and stream data to multiple process end points in real time. It is very costly to defer the analytics until later, not only in dollars but also processing times, complex unreliable data stores, and waiting customers.

Here at AK we use Rsyslog in conjunction with ZeroMQ to provide our real-time streaming infrastructure. My next few posts are going to dive into these technologies and outline why we’ve chosen them. Stay tuned!

Something Cool From AK on GitHub

Rsyslog + 0MQToday we at AK have released a rsyslog module for 0MQ on GitHub! We are very excited about how we have moved to a real-time processing model and wanted to share. Keep checking back to this blog for more info!

0MQ input and output modules for rsyslog

The Myth of Clouds

CloudsOver the last few years, people like us have been fighting off the “forces that be” to move everything into a public cloud. The belief is that if you move your software stack into a public cloud you will no longer need to worry about such tedious tasks as capacity  planning, capital expenditures, and staffing of a service delivery team.  You will just magically have a 100% uptime for your product.

One of the other myths that is particularly concerning to me is that  public clouds remove the requirement to have industry experts in networking, systems, and data on hand. You will change your process and product to fit inside of their systems. Today’s problems around real-time analytics and massively large data sets have grown larger than what fits inside of conventional thought, and sadly cloud computing has become a conventional thought for many in the Internet services world.