Monday, June 17, 2013

A rant on design patterns

There are few things a java programmer is expected to demonstrate after a few years of working in trenches and design patterns is cardinal to it. It is not sufficient to implement a design pattern where it is begging to be implemented, to lessen the sagging code logic and improve readability, extensibility or both but also to insert it into every nook, corner and alley of code where it might or might not serve a purpose.
I recently did a code submission for a multinational company and was rejected due to this precise reason - my code was not extensible, they said.

However, what was bitterly amusing for me was the fact that I was shipped with a document that said KISS should be followed and emphasis would be over test driven development. The main point in this frustrating experience is how can one keep it simple and stupid... or minimized without using design patterns. Either I do not have an understanding of the DPs and there is any pattern that can be used to further minimize the code, something that I did in the first place. Merely creating methods and class hierarchy to conform to a pattern(when the implementation is merely 10 lines of code) in every case seems to be an overkill, even if the code is meant to be extensible.

I think as programmer, to create a solution, there are arguably ten ways of performing a task, each with its own advantage and caveat and what is important for a programmer who is reviewing it (for recruitment or as an exercise, while merging open source changes or even for a team member) has to get this basic fundamental correct.

Thursday, June 6, 2013

MapReduce2.0: Next Generation Mapreduce using YARN

There has been a lot of long overdue changes in apache hadoop, the most popular framework to perform mapreduce over big data.
As the users of hadoop may recall, one of the vulnerability in mapreduce was the presence of a single node that performed work of job tracker which performed the tasks of job scheduling as well as management and handling errors in map and reduce operations done by these jobs. Until now, there had been workarounds that addressed the failure of this job management node but the separation of work in the jobtracker was not addressed. In particular, several aspects regarding scalability, cluster utilization, change/upgradation, supporting workloads other than MapReduce itself.

YARN(Yet-Another-Resource-Negotiator) separates this problem by splitting up resource management and job scheduler into different daemons that are then applicable on applications - either as a single job or a graph of interrelated jobs. Resource management is done by a global ResourceManager(RM) that resembles the old job tracker, but exists independently of the job or its application. To perform the jobs for each application, ApplicationMaster(AM) is used which acts as an intermediary between the RM and the nodes, which are in turn manipulated through NodeManager(NM) and it relays the node health and resource requirements to the RM as needed.
The YARN archiecture

A key advantage of YARN is that it supports algorithms other than MapReduce. There are problems that need to have intermediate or realtime results while processing and traditionally can't be used in mapreduce such as graph processing. Simply put in context of hadoop, if one node's results are required by another after or during processing of one or the both records, then batch processing is the answer.

People generally get confused in the context of the newer version of hadoop(the top level project) with MRv2 as the hadoop supports both the MR1 as well as MRv2. The change is simply in the architecture, that can determine the performance(especially post the map stage - wherein the mapreduce implementations often left nodes underutilized at the reduce phase) of the application in use. One key thing to note here is the fact that the YARN is a framework in itself and what hadoop does is to run mapreduce as a job on the YARN. Thus hadoop preserves its existing API but reaps the benefits arising out of dynamic node allocation done internally by YARN.

Wednesday, June 5, 2013

Musings on refactoring

I happened to be across the refactoring in ruby book, which is the
adoption of the refactorings book by fowler from java into ruby. On
reading, one thing which strikes you is the fact that refactoring is persistent across languages and platforms. Java is verbose and allows various coding styles and idiosyncrasies which is evident why lot of java programmers churn
out code that is unnecessarily complex. The proliferation of IDEs has also encouraged this malpractice of creating not just sub-optimal but also unnecessarily complex code.

However, finding the same in ruby has traditionally been difficult as ruby, like its japanese inventor has been neat and concise in its apporach, allowing basically for a lot of bang for a small amount of code. However, as the community grows, there is an acute need to maintain the standards in ruby as well as the developer base is exploding exponentially in ruby. In my opinion, refactoring is a practice that every developer needs to apply and enforce amongst their team as a cardinal rule in programming. Applying this context in the updated refactorings book, I feel while using ruby the answers have become more to the point, but at the cost of grossly sloppy code. It could have better demonstrated in a language that is equally neat and offers variety like java. Python obviously springs into mind where at one place there is a C like procedural code and at the other end there are python experts proclaiming the use of pythonic code.

Having the same refactoring into another language feels alien for a while, but then it is a bliss to realize that same problem existed into another language as well, it was just that a learner like me got carried away in the syntactic differences.

At the end of the day, it is the learner who needs to decide what to and more importantly what not to include in the changes to be made in the codebase.