Wednesday, August 22, 2012

Problem with as/400 jdbc driver

While iterating over the resultset obtained from an as/400 datasource today, I was constantly getting this error:
descriptor index not valid
Upon searching for this error, I found its solution that should better be shared with everyone.
The jdbc driver starts the columns from 1 instead of 0 index, so when calling some data from the ResultSet object, such as resultSet.getString(1), the column fetched is the first column, and not the second column as it seems.
It could be nice if the error message explained this problem in a non-cryptic manner. This problem is one of those nagging problems that repeat from time to time(as this api doesn't seem to conform to java conventions in this case).

Tuesday, August 21, 2012

Book Review: Hadoop: The Definitive Guide, Second Edition by Tom White

















Anyone interested in big data management today has at least a passing familiarity with Hadoop, an open source map-reduce algorithm implementation. Here's my review of the second edition of one of the most comprehensive books on the topic.
As a longtime hadoop enthusiast, I already had read the first book, I was interested in finding out what this second edition has in store for the readers.

The book builds over its predecessor and apart from addition of Hive and Sqoop, a case study covering graph visualization in social networks has been added. The hadoop version has been updated, as a developer, I'd recommend latest stable release of hadoop as it is an active project. However, as Tom White is himself a committer in this project, various project insights are added along the way as in the original edition.

From the first time hadoop adopter's point of view too, this text is an easy to adapt and the learning curve of hadoop is lessened to a great extent.
The book starts by building the context, presenting the history and ecosystem of hadoop and gives its user a high level overview. The underpinings of hadoop, or the mapreduce algorithm and its implementation in hadoop is covered in the next few chapters. This contains practical aspects of running any hadoop application including HDFS file manipulation and map reduce operation in detail. An exhaustive list of mapreduce techniques alongwith their examples are then covered that come up in everyday development while using hadoop api to interface with big data.
Another highlight of this book is the comprehensiveness of running and deploying hadoop in various configurations. Also, closely knit data management tools in the hadoop ecosystem or its sub-projects such as pig, hive, hbase, zookeeper and sqoop have been covered.
This is followed by various case studies that make an interesting read. It was disheartening to see no major updates in the case studies compared to the previous edition .

From a person already having the original edition of this book, the second edition does not have much to cover, but for a person not having read any previous editions, this is a comprehensive book.
Note: This book has been provided to me for reviewing under the Oreilly Blogger Review Program.