Saturday, March 9, 2013

Book review: Effective Monitoring and Alerting

I've just finished reviewing the book, 'Effective Monitoring and Alerting' by Slawek Ligus, published by O'Reilly Media.
This  deals with System Operations and administration. Given the importance  that maintenance of production software commands these days and the  emerging convergence of developers and operations; thanks to the devopts movement which has gained a fare amount of traction, the importance of  this text cannot be underestimated.
The author presents the concepts in a technology agnostic manner and focuses on latest trends in operations.
The  book gives you a short, high level overview of the topic in 150 pages  and covers the common tasks involved in monitoring and alerting of  operations.

It starts with the introduction of monitoring and alerting and the issues surrounding these concepts. Over the next few chapter, monitoring and alerting is discussed with greater detail and explanation about interpreting the monitoring as well as understanding the nuances of alarms are given. 
After these topics, the implementation, challenges and implications of scaling these on a large basis is discussed by some sane advices obtained from real world projects. These are then closed off by displaying the principles that capture the essence of the topic.

Get in the habit of measuring
Draw Conclusions Reliably
Monitor Extensively
Alarm Selectively
Work smart, not hard
Learn from the experiences of others
Have a tactic
Run a bank of cases
Enjoy the process

While these might come as common words of advice; these were presented in a practical yet succinct manner.
As the book focuses on a  niche that involves guesswork and intuition from non system  administrators towards understanding these concepts, it provides a lot  of insight that senior administrators can impart. While the book focuses  on theory in all the chapters, setting up Open TSDB is given as an  appendix, I've gone through this and it is different from what is  generally available in the blogs and online tutorials as it covers the  setup from the perspective of actual deployment rather than focusing on a  specific technology(like hbase or nagios) used in the exercise.
Overall,  this was a power packed guide that covered various concepts but took  off from advanced level in various concepts, leaving me lost in few  areas where I do not have any prior experience .
Disclaimer: I received a copy of the book under the Blogger Review Program - O'Reilly Media.

No comments: