linux.conf.au 2014

Perth, Western Australia - 6th to 10th January 2014

linux.conf.au


<-- Back to schedule

Finding signal in the monitoring noise with Flapjack

Project: Flapjack
Wiki Page: Finding signal in the monitoring noise with Flapjack

Working in operations in 2014 is hard.*

More applications are running in the cloud, the infrastructures we manage are getting bigger and bigger, and responsibility for that is being divided up across multiple teams.

Then something breaks. All hell breaks loose. Your on-call engineer receives 900 SMS in 30 seconds. Her phone melts. You can’t distinguish the signal from the noise. It takes an hour to fix the problem.

Weren’t computers meant to solve these problems?

Enter Flapjack: a distributed event processing + monitoring alert routing system. Flapjack sits at the end of your monitoring pipeline and works out who it should send alerts to. Sounds pretty simple? Flapjack tries to make it so.

There are still really hard problems to solve when working out who to notify about a detected failure, and what to do when lots of things fail simultaneously.

You should be interested in Flapjack if:

- You want to track down failures faster by rolling up your alerts across multiple monitoring systems.
- You monitor large infrastructures that have multiple teams responsible for keeping them up.
- You want to dip your toe in the water and try alternative check execution engines like Sensu in parallel to Nagios.

In this tutorial, Jesse Reynolds and Lindsay Holmwood will take you on a whirlwind tour of Flapjack - what it is, how it solves problems, where it’s going - with a hands on lab that you can start applying in your organisation tomorrow.

Attendees of this tutorial will come away with an understanding of:

- How to install + configure Flapjack
- How to use Flapjack when migrating away from Nagios as a check execution engine
- How to work with Flapjack’s APIs to integrate with your existing systems

Please prepare your laptop for this tutorial by following this guide.

*Disclaimer: this abstract was written in 2013. Things may have since gotten awesome and we’re all sitting on the beach in the Bahamas drinking piña coladas. But this is highly unlikely.

Lindsay Holmwood

Lindsay Holmwood is a engineering manager living in the Australian Blue Mountains. He runs a distributed infracoders team at Bulletproof that builds hassle free tools, and was responsible for ensuring 100% uptime for the 2010 + 2011 + 2012 Movember campaigns. In his spare time, Lindsay organises the monthly Sydney DevOps Meetups. He also won third place at the 1996 Sydney Royal Easter Show LEGO building competition.

Jesse Reynolds

Jesse is an infrastructure and web operations system administrator and developer. Jesse co-founded Virtual Artists in 1993, one of Australia's first web development and hosting agencies, and has worked with Fujitsu Australia, The University of New South Wales, and Carbon Planet. Jesse is currently at Bulletproof, working from home in the Adelaide Hills as an R&D engineer.

Jesse is a core developer on the Flapjack project.