Sumit Bisht: 7/1/13

Wednesday, July 31, 2013

Performing cron jobs in rails through clockwork

The Clockwork Gem allows us to run cron jobs in a ruby application. Since I was using this today, its documentation is a bit of a blocker. The github wiki gives the following example:

require 'clockwork'
include Clockwork

handler do |job|
  puts "Running #{job}"
end

every(10.seconds, 'frequent.job')
every(3.minutes, 'less.frequent.job')
every(1.hour, 'hourly.job')

every(1.day, 'midnight.job', :at => '00:00')

However, I needed to run custom tasks, which is not mentioned there. Here's how I did the necessary work

require_relative "../config/environment"
require_relative "../config/boot"
require 'clockwork'

include Clockwork

handler do |job|
 puts "running the scheduled job #{job}."
end

every(30.seconds, 'test job execution'){Module::Class.new.method_to_call}

I added the require_relative calls to environment.rb & boot.rb to load the rails framework from this clock.rb file, which is placed in lib folder. This example is calling a custom module placed in the lib folder in the same rails app.
As this application is to be deployed over heroku, we need to have 2 dynos, one for the rails and another for the extra task. We define this through Procfile file present at rails root with the following contents:

web    bundle exec unicorn -p $PORT -c ./config/unicorn.rb
clock  bundle exec clockwork app/lib/clock.rb

Which does the necessary things for me at heroku. To run long running tasks, you may want to use delayed jobs, but it will involve another dyno for management so I left it in this solution.

Tuesday, July 30, 2013

Easier hadoop job creation through mrjob

Today, I came across mrjob, which is a python package that enables creation of a mapreduce job on a hadoop cluster using python. This is indeed useful as other python based mapreduce frameworks like octopy are easier to setup but do not scale well. This was created and opensourced by Yelp, the food rating website. You can find it on github here.
Basically mrjob performs the following main operations for you:

Write multi-step MapReduce jobs in pure Python
Test on your local machine
Run on a Hadoop cluster
Run in the cloud using Amazon Elastic MapReduce (EMR)

Thus, it allows a developer to have separate development, staging and production environments.
Mrjob is available for python 2.5+ and installs without any glitch on my python 2.7 setup.
The python script in use can be used both as standalone or as a regular job on hadoop cluster. It can consist one of mapper, combiner or reduce function. For example, the sample program consisted of the following:

from mrjob.job import MRJob
import re

WORD_RE = re.compile(r"[\w']+")


class MRWordFreqCount(MRJob):

    def mapper(self, _, line):
        for word in WORD_RE.findall(line):
            yield word.lower(), 1

    def combiner(self, word, counts):
        yield word, sum(counts)

    def reducer(self, word, counts):
        yield word, sum(counts)


if __name__ == '__main__':
    MRWordFreqCount.run()

The raw data gets sent to mapper() function that creates the keys in the form of word, 1 for each mapped word. this data is then sent to whichever processer is in use by transmitting through json (although there are other binary protocols as well, which can better be applied in larger scenarios). These protocols are basically the serialization mechanism between the program and its backend here.
The configuration file does the actual work of deciding whether to use python based mapreduce on the same machine itself, or on a local/remote hadoop cluster and even for amazon elastic mapreduce environment. To do this, I simply put the .mrjob.conf file on my home folder with the following:

runners:
hadoop:
    base_tmp_dir:/tmp/hjobs
    python_archives: &python_archives
    setup_cmds: &setup_cmds
    upload_files: &upload_files
local:
    base_tmp_dir: /home/sumit/mrjob/tmp
    python_archives: *python_archives

I ran the supplied test over both local install of python as well as on my local hadoop cluster without any problems and am suitably impressed with its compactness, which might come handy to me or you someday!

Tuesday, July 23, 2013

The bane of being a polyglot developer

In the recent months, there has been a bitter realization for me as I have given quite a few interviews as I was looking for a job. My resume was decorated with most of what I have been doing as part of my earlier job as well as what I did in my spare time. However where I failed were the areas that require mugging up the most. Consider the following questions:

What is the difference between get and load methods in hibernate?
What are the different active record validations?
What are the different types of active resource nesting?
What is different between event handling and event bubbling?

Answers to these are no far than a google search or documentation lookup away but in all these questions, I wore a blank look. The first one was a hibernate question and after searching for answers, I came to know that it was actually one of the most popular hibernate interview questions. The other two were rails questions, but since I have done very little professional ruby work, I could not explain the same. The last one was a jQuery killer, appearing after I had rated myself 5/10 in javascript and was again nothing, but technology buzzword.

Very few interviewers were interested in asking something challenging questions in scalability, design pattern or architectural issues. This really leaves me angry at recruiters only looking to fill up position involving technology X people and also at myself at allowing myself to stray. The real trouble that I feel is that I am maybe easily persuaded. I started with java, then moved over to j2ee and for a time felt that I had nailed it when I finally deployed a ejb 2.0 environment. However, then came spring and hibernate and I felt that they were the actual mecca and medina for me. However, later on I was sucked by the very vocal ruby community that said, 'wait a minute, you a java guy, we can make you 10X productive'. Duh!
Then came a string of languages, python, .net, scala, f# and whatever the titbits that I had left - from playing with arduinos at home to mainframes at work (and I still harbor desire to purchase a raspberry pi one day and do more cool hacks on it).
However, these plethora of languages and technologies have taken a trool on my confidence, especially when I come across people who only know or like a single technology and are comfortable in it.
But that's not for me as I thrive on change. For example I am writing this rant after 10 pm and today I've:

studied redis in the morning
tried to implement a scalable solution on redis to handel >10 million records
did AdWords and facebook api integration into a rails4 app in office in the day
will be writing/improving the draft chapters of the upcoming book that I am writing in robot framework.

I think Steve yegge captures some of my deficiencies clearly as I am an advance learner in a lot of technology that I have on my resume, and I guess that's the way it is going to be in the near future.

Sunday, July 7, 2013

Book Review: Embedded Android

This is the review of the book, 'Embedded Android' by Karim Yaghmour, which is an essential reference for any developer doing customization of android platform. As I am an application and a device developer primarily when it comes to android or embedded device management respectively, I read this book as a novice attempting to glean some understanding of the android internals.

The first thing that struck me was the updated nature of the book wrt the android ecosystem and how some of the mystifying concepts of android were explained, like the comparison between gnu linux and its kernal and of dalvik with java and the algorithms employed by the OS for optimizing constrained resource devices. The next involved Karim taking me on a guided tour to the different parts of android that I did not imagined such as building and customization of the android SDK itself and custom roms for headless devices.

Make no mistake, this is not a book for android application development, or its native counterpart but explains you part-by-part about the different aspects involved in development of the android project.

Book contents

The book starts with the introduction of the android project's history and its differences with classical open source projects. It then dives deep into the internals of the android OS covering the architecture and comparing it with linux. It then covers dalvik and system services.
The next couple of chapters focus entirely on the android open source project and involves building of various components of the project. The build system of android is also discussed and compared with conventional makefile based systems. One considerable mention is the presence of build recipes and hacks that give more insight about what is being covered.
Hardware and popular systems are covered next and development on different boards and SOCs are covered in this chapter which is followed up by a thorough discussion of the filesystem, components as well as commonly used tools.
Finally the book discusses the android framework where different utilities, extension into support of newer hardware, components, services and parts of the android OS is covered.

The appendixes cover portions such as legacy user-space and adding support for new hardware as well as customizing the default lists of packages. The default init.rc files are also provided alongwith informative links to various websites that cover the latest in the topics that are covered in the book.

While I cannot honestly comment the usability of the book from an android modder/image customizer's point of view, in general I found the book to be of great use and armed with my preexisting knowledge of android development and linux, I was able to understand the topics very well.

Disclaimer: This book was provided to me by Oreilly as part of their blogger review program.

Tuesday, July 2, 2013

DevOps Dojo

and for those, who think devops is just a fancy term for maintenance; I got 2 words: Automation & Collaboration.
This concept features more than just integrating the different teams. However, on many job having devops as their Job Description this is not the case. Apopular tweet that is going rounds around twitter captures the essence of devops : 'It’s not always about starting small, it’s about tackling the hardest apps first'.

Atlassian has released a recent advertisement where it cleverly displays its in house products. It is hearning to see the extensive use of open source projects together with, understandably altassian products. The website is cleverly put together of two different dojos that cover the technical and cultural aspects of this integration.