Saturday, April 5, 2014

Book Review: Agile Data Science



Agile Data Science is an instant starter guide for all things related to big data and is aimed at beginners in this technological field. It attempts to explain the technologies associated with bid data in a very simplified manner, which can be helpful to first timers but quite confusing at implementation level - mainly due to the lack of space for the topics dealt with this book.
The first half of the book talks about theory and setup of tools that can easily be setup for a data analysis project and the remaining half of the book goes into its creation.

The main point of contention I felt was the way the book shifted gears- the first section was comprehensible enough but upon moving to the second section, the level of details set in quickly, making the contents hard to follow and I ended up using the available source code.

It first starts with explaining the big data and cloud technologies prevalent today and then introduces the user towards Data, cloud and various handy tools to utilize them.
The next part contains an end to end application that mines and provides analytics for emails - a very real world implementation which is covered in various aspects.

However, as open source projects keep on changing rapidly, the relevance/best practices followed by the example is questionable. This is also important as various startups follow their individual app stacks while addressing the big data challenges.

As a personal note, I would not recommend this to a learner as a person is better off developing in bits and pieces - various tutorials over the internet serve this better. Only if you need an insight into how things are carried out to meet big data challenges using opensource technologies is this text useful.

Note: I've been provided a copy of this book under the OReilly's blogger review program.