Friday, December 31, 2010

A paradigm mismatch ?

Here are my comment on an often repeating topic of conversation about 6th semester projects in Masters Of Computer Applications course. A lot of people are considering programming for web based applications aka websites for their last semester projects in MCA. Since this involves a whole stack of technologies, one can observe that it in turn, helps a person or more importantly, his mentor to hide a poor design behind this complexity.
Coming back to the topic, a lot of my friends and people from various coaching centers have certainly got preconceived notions regarding different technologies to code with as their server side programming language. However, one can observe the faint rumblings of faint but still existent religious fervor of 'my language is better than yours'.
To start off with, students try and then ignore Java as an alternative because it is too complex (It is a big pain, but then has its benefits) and huge for their web applications. However, this fear can be allayed by the fact that one can choose from a wide variety of frameworks(MVC like Struts & SpringMVC or component based like Tapestry or JSF) and technologies (a plethora of middleware tech.) within the java ecosystem itself. PHP is touted as a compelling alternative, but for me, the scripting approach only works for demonstrative purposes. Anything larger either needs a huge patience, or frameworks, which are immature so far in this platform. I really am not impressed by frameworks like Zend here to name a few. Microsoft .NET deserves a special mention because it is already baked in (with IDE & app server ready, for instance) as well as has a meaningful architecture(through code behind). However the very strength of this framework becomes its nemesis as students are not encouraged to question/hack into the innards of any tool used. The upcoming framework like asp.net MVC too represents a copied and chaotic exercise to perform something that is coming too late.
My personal outlook about the whole situation is that no single technology rules the roost. If you ask me, it is java for building middleware (business intensive) & expandable (one where you can fit in a lot of components/features) websites. Php is really cute as you get the time to play around with css & javascript based functionality around your site (After all, the data is the king and your site functions as its glorified frontend). If I require a lot of versatility as well as ease of development, I'll go for .NET as it really hits that sweet spot when you are willing to work inside the confines of the framework.
I had this discussion with myself a few days back in a moment of clarity as I was contemplating mine and others' 6th semester project for my MCA course. I specifically ignored Rails (and frameworks based on it) as one generally threads a safe line in our course. Researching and hacking into a shiny new technology and writing a perfect but undocumented & incomplete project would only lead towards its cancellation.
BTW, I had time to write all this stuff on new year's eve as tomorrow is my 5th semester ERP examination. So good luck for me and a very happy new year 2011 AD for my readers!

Monday, December 6, 2010

The stuff behind the fluff – Is Green IT an exercise in vain?

Cloud computing is generally thought to be efficient, but this study changes it all. According to the recent UK study as published by DW World, cloud computing study reveals higher environmental impact than was previously thought. The quantity of the data centers as well as the consumers would collectively create this problem, which would be further intensified by the use of richer digital media. What this study surprised me was the analysis that it would be casual home users who would lead the excessive usage through rich media sharing/duplication.
In the past, Information Storage and Management used to focus solely on the enterprise specific needs. For handling the binary media format, technologies like CAS(Content Addressing Systems) are already in place. But according to this study, there would be a demand of 3200 Mega Bytes per person per day in just few years time. Given the pace of Internet proliferation and advancement, this is not a vague guess but a worrisome one.
The article pointed out the following courses of actions :
Creation of better/faster computers – This is really not possible as stated due to the lag between demands and Moore's Law
Green data centers – Again, here too the politics play an important role, BRIC nations in particular do have a cavalier attitude towards implementing clean sources.

What was missing here in my option was the point that the research was addressing the solution of the problem rather than its cause. So, instead of addressing the data storage issues, we need to rigorously safeguard against the redundancy in such data. Semantic web is one such approach. Other approaches such as tagging the digital media and having a single copy for identical media (different locations with same signature) could solve this problem. Also, instead of consolidization, the work on distributed computing should also be promoted.
The distributed computing can easily be done using map/reduce algorithms or through application development platforms like Apache Hadoop. The issue of data can be addressed using peer-to-peer data exchange as in bit-torrent. These wouldn't create problems themselves because of the ever decreasing costs of online bandwidth and its performance.

Wednesday, November 24, 2010

Hurdles Towards Virtulization

Hurdles Towards Virtulization : What we face on its way to implementation

This post is dedicated to the current challenges that we face in the creation and use of virtualization. I am writing this even at a time when virtualization is something which is sorely missing in everyday computing and beyond for me. For instance, during the past few weeks, I had my both hard disks developing bad sectors and had a miserable experience upgrading Ubuntu on my friends' disk. We need a mechanism to seamlessly move the operating systems from one machine to another. For this, we need virtualization.
Given the current hype and hoopla over this term, many people believe that this is a recent phenomenon which is not the case. Virtualization has its roots at least 40 years before, with the concept of time sharing systems. Similarly, the IBM OS/360 also heralded this new era of 'virtual machines'(VM). In order to manage these Vms, we need a special program which is known as Hypervisor. Now the challenge with hypervisor is that it needs to provide a secure, equivalent, efficient and controlled resource environment for its guests(which are the virtual machines).

So what are the challenges ?
The first challenge that comes in my mind is the performance of the VM. If the guest is not getting enough processing, then there is no benefit out of such an arrangement. This can often happen when the guest OS performs a privileged operation such as a hardware interrupt or a system call resulting in acquisition of hardware locks. Now, as the guest has to run safely, this has to be carefully emulated by the hypervisor.
There is also a concept of time loss or clock skew. As there may be more than one OS per CPU running on a hypervisor at a given instance of time, the instructions by them would be run collectively. However, from the VM's perspective, there were only its instructions, so the instructions take longer time to execute (as other VM processes also hog the CPU cycles) than what the VM actually expects it to. Naturally, this time delay propagates to larger proportions leading to problems such as time-slicing of multiple processes at the guest OS/VM level.
The processing sharing problem also has a profound effect on the performance of the overall system. For instance, on a high priority guest, a low priority process would get precedence over a higher priority process on a low priority guest.This problem has not been solved to this date and only fringe works have been done in this regard. Perhaps this has still pertained due to the fact that a lot of OSs still are created with a single operator at a time without any restrictions.
The memory management as well as addressing are also some of the prevailing problems. Memory cache management is also a problem that rises exponentially as the number of intermediaries between the processes and hardware increase. This can be understood using a simple example. Let us consider that we have a machine which as LRU method of page catching and its guest has MRU method of caching. So as soon as the process becomes old in guest, it is discarded from the memory. However, on the host it is not deleted. So new we have wasteful data on the host memory. The other extreme of this situation is when the data is frequently required in this setup. Now, the host would start to thrash. We need to control this thrashing and ensure page faults remain constant and distributed over a period of time through 'handshaking' communication between the guest and hypervisor with regard to occurrence of faults.The final problem that plagues virtualization systems is the addressing of memory which is there due to different memory addressing schemes being followed in both the hypervisor and guest. The memory addresses are divided for processes into segments, pages and further into offsets. So every process has 2 addresses and 3 memory references, leading to extra overhead. This is avoided through a translation look-aside buffer (TLB), which generally stores the recently used segment and page addresses. But for virtualized environments, this isn't enough as in normal TLB translations, the virtual addresses are transformed into a real ones but due to 3 address levels (Actual hardware address, the 'emulated' hardware address for guest and the virtual address at the guest), there exists an overhead of extra translation. Thankfully, this is being addressed by hardware vendors through hardware support such as Intel's VT-i or AMD's NPT on one end and through utilisation of these enhancements by hypervisors such as virtualbox and vmware.
Hopefully in the near future, we can expect to see more prevalence of virtualized environments, not only in heavy duty systems, but on hardware ranging from the least expensive personal computing to mission-critical systems, given the increasing maturity of virtualization .

Thursday, October 28, 2010

Are we a dying breed ?

Is it really beneficial to build and maintain an online persona when it is fraught with problems all around the corner? This thought was echoed on a thread on javaranch.com, which happens to be a very popular social forum for java technology enthusiasts.
One of my acquaintance had also, quite recently expressed reservations about the same due to the lack of appreciation. Whether his comment is just his perception or is it the truth, remains to be seen. The concept of professional blogging too remains an elusive concept for me and as far as I have observed, this remains largely and exercise in isolation or through and 'tool'.

In the google trends on blogging, it appeared that the search volume index dropped to an all time low in the last quarter of 2010 ever since the usage curve became consistent since 2007. The reference of blogs in the news also became erratic and dropped during this period.

This leads to an interesting conclusion; is twitter really ushering in an era of microblogging since the people don't really want to go that extra mile for sharing their thoughts ? Applying social networking in the above comparison definitely causes no raised eyebrows as it is quite clear that social networking has taken the lead for the online citizens.

As the aftermath of this change, you can expect greater emphasis on the management of information over these platforms - both for commercial and research purposes as well as a general change in the dissemination of information. This is because instead if the traditional Author-reader/follower role, we're fundamentally shifting towards an peer to peer approach with spontaneous interactions. So instead of following your favorite blog and waiting for the posts to come in, real time and probably, geographically aware interactions rule the roost. Social networking today has spread much beyond socializing with friends, family and that 'hot chick' profile(pun intended) to an information sharing portal. It is customizable as a mini internet with all the users being able to be the generators of the content.

So if you are a techie or business savvy, you can utilize this change for your benefit. What do you feel about this ? are there any new developments in this arena that you are aware of ? please do comment about it.

Tuesday, October 12, 2010

From theory to practice- Transitioning isn't as easy as it seems

I've been involved recently with the task of creating different projects. In order to do this, I followed the simplest and most important rule in the book – just put the most dedicated individuals on your team. However, with the induction of each separate member in the team, the allocation of responsibilities became more complex and the communication too started becoming a thorn in the flesh for the project vision that I had in mind. However, through agile approaches, currently I am able to lead and integrate my team effectively.
I didn't had enough time to bask in the glory of my newly-founded team equilibrium when my junior students asked me to mentor for their project too (we're having a common project by IBM, namely TGMC 2010). Okay, I said, and went into their class for a briefing. Upon entering the class, I was surrounded by a group of around 70 students, all willing to enter (and most of them already have) in a contest requiring people to submit Ajax driven websites that leverage SOA as their middleware, but very few having experience in even basic html . To get the people up and running, I created a blog wrote a basic how-to (http://bbdnitm-tgmc.blogspot.com/2010/10/getting-up-to-speed-towards-building.html).However, after 4 days, I am yet to get a comment from these people and have resigned to the fate that they need to work really hard if they ever want to see anything at the end of the tunnel. A few of them might deliver a workable solution after 6 months, but for the rest of them, the pain would be inevitable. From my side, I am willing to mentor any person who is a quick learner of jsp and middleware stuff and although I don't know much about the IBM's product usage, I am willing to learn and help others ease into deployment into IBM specific technologies like DB2 and Websphere/RAD. This incident has led me to think about project management in a new light, and now what I am feeling personally is that it is as important to have an able team as it is to find a mentor. Apart from this, it is the approach that matters for an individual, rather than his/her skills.

Monday, September 13, 2010

A case for Security on cloud

A case for Security on cloud – What vendors tell developers and what they actually do

There has been commensurate commentary over the issue of security over clouds by people. My observation so far has made me come to this consensus that on the Internet, there have been two fractions of crowds – one that favors CSPs(Cloud Service Providers) and place trust over them and the other, quite skeptical and $% one wary of the CSPs over privacy and security issues. This issue has permeated from individual to enterprises alike. In keeping with the main issue of 'trust', CSPs have been quick to point out (for their good more than anything else) the black-boxed nature of their offerings.
This discussion has become more important due to ever increasing share of cloud over the IT infrastructure in almost all organizations, which is not surprising considering the ever growing importance to economize and increase flexibility in all firms across every domain.
As apparent from the above chart, cloud computing has moved from network and storage level (SAN, NAS) to complete machine level virtualization (vmware, zen) to platform as a services, and is moving towards completely managed applications (of which Salesforce serves as an excellent example). This implies that interdependence over CSPs is increasing at almost an exponential pace. Apart from risks in reliability, interoperability and vendor lock-in, there has been jitters in the IT industry, and rightly so because of the black box approach towards solutions as I'll mention below that CSPs sell while explaining the benefits of abstracting security.
However, things aren't as rosy as they appear and life is not as simple as we imagine it to be (sigh!) and security for entire solution is different than that of an application. It is not an aspect or component that we can invert to container or add in addition to the application. In this case, the approach towards security differs as it has to be applied at all the levels of the application stack.
Infrastructure Security
This is important in context of public clouds as there tends to be misconceptions between the terms infrastructure security and infrastructure-as-a-service security. There is a slight difference between the two as there is a difference in the above mentioned terms. Both have a similar ramifications over the customer threat, risk and compliance management. Some of the leading CSPs like Amazon and Google have had incidents where their Data Confidentiality and Integrity have been breached. Proper access control over data is as important as having the information secure. Here again, the emphasis is not over the information security but also on the infrastructure security(There have been cases where discarded AMIs have been used maliciously) that is being offered.
In order to mitigate the increased risk factors, we need to rethink about whether we need a public, private or a hybrid cloud. Similarly for the data in transit, we need to implement rigorous encryption algorithms which again have a trade off between security and usage.
The security approach follows an onion pattern where host security forms the basis followed by virtualization software and virtual server security, which then propagates into network and finally application security.

Data Security(Information Storage and Management)
This is an important part of information storage and management because in a cloud setup, there tends to be a change about how the data moves about. Since the data gets remotely stored, security of the data takes a new dimension. Data security now requires the following aspects:
Data-in-transit
Data-at-rest
Processing of data, including multitenancy
Data lineage
Data provenance
Data remanence


Authenticity and Accessibility on cloud
The services pertaining to the trio of Authentication, Authorization and Auditing(AAA) comprise the Identity and Access Management. Unfortunately, it remains to be seen whether the cloud providers actually are committed towards providing them in a readily usable manner or not because it is inconvenient to provide it properly, but also due to frequent change of roles and rules of the stakeholders involved. However, there has been a lot of work going about here and largely due to federation of IAM entities and emergence of web service standards such as SAML and WS-Federation. What is needed here is something that is IAM-as-a-service. This will not only address weak ISM models, but also makeup for lack of federated structures.

Is Security as a Service (Gosh, another SAAS acronym ) sufficient ?
The Security-as-a-service model, like the original SAAS also is subscription based. This however is nothing new as many vendors have been providing email filtering and anti-virus scanning traditionally.
This service today needs greater scanning and importance that ever before largely due to the rise of crime over the cloud and before cloud platform assumes even greater dimensions than ever before. These are just some of the difficulties that I've mentioned. Similarly, vulnerability management and IAM has been around for a while, but has not truly been incorporated as a service up till now.
Hopefully in the near future, the shift of enterprises towards cloud computing and the maturity of the services oriented model would promote this service paradigm and the ever increasing security conscious customers would fuel the growth of security over cloud.

Thursday, September 2, 2010

Commenting about Comments

//TODO : Finish this rant
#There has been a lot of discussions about the use of comments that appear in program code. This was something that I overlooked so far and only used when it appealed to me, which is itself a rare occasion for me.

#In the book, '97 things every programmer should know', there have been a lot of insight over the same issue. In it, Carl Evans recollects a lesson that he learned from his teacher(getting poor grades without commenting). The same thing, in fact happened to me too.

#Recently, a lecturer teaching Software Engineering (who is quite vociferous in his teachings)in our class proclaimed, “Comments are just waste of time and people(coders perhaps ?) should be kicked in their ass for writing comments. Comments indicate that the programmer was wasting time in writing them instead of writing LOCs”.
However, I learned a lesson similar to Carl in my class exams held in the previous week. In the exam, there was a question that required us to print a pascaline triangle using C#. I wrote its program, that I conjured up at the moment and hoping that it'll yield returns in terms of marks. But to my utter surprise, the program in question was awarded a big 0.
While explaining the paper, this teacher(who teaches .NET framework and C#) declared the opposite of the commenting practices listed above.
In my opinion, the amount of comment that you really need to insert is when the code requires some external prodding to be able to be understood.

So, I guess that it is probably useful to create and maintain comments in a codebase if it is self-describing. According to me, for long term basis a comment should only written specifically for technical documentation. The use of javadocs or msdn styled comments not only serve the purpose for the ordinary comments, but also aid future developers into maintaining and extending the software as they can provide insight in various ways( separate documentation and IDE support).

In The Elements of Programming Style (Computing McGraw-Hill), Kernighan and Plauger note that “a comment is of zero (or negative) value if it is wrong.”. These, alongwith my thoughts over the usability and proportionality of comments are echoed in this javaranch discussion thread. Hopefully, in the near future such a situation may not arise when it became an embarrassing situation for me, just because of lack of comments in my otherwise correctly functioning piece of code.

Saturday, August 28, 2010

Why performance matters #1


One of the many issues that i've faced as an enterprise java developer has been performance. It is important to understand the issues surrounding performance and address them before the problems escalate into unresolvable issues. It is probably the fixation of architectural design patterns and conventions with the enterprise java community that lead us to overlook various performance related issues.
The community has today moved a long way from MVC frameworks and monolithic j2ee 1.4 servers to increasing adoption of various open source frameworks and protals. The enterprise applications too have evolved from websites having component based solutions to portals and coarse granular SOA applications, built on top of web services. However, one thing remains constant, that is the layered execution model of any application. This can be demonstrated as follows :




Our Java EE Application
Application Framework
Application Server
Java Runtime Environment
Operating System
Hardware


It is important to note that the performance of the solution does'nt depends on the application alone, rather, it depends on correct performance tuning and optimization of all the layers associated with the application. This form of issue is known as Vertical complexity.
In actual applications, however, there are a lot of discrete components that are present, each with its own 'stack' of complexity. Together, this forms into another complexity known as horizontal complexity. These issues are apparant only when the application faces heavy load in its lifecycle or when it begins to operate beyond a single JVM. This can result in the following problems :-
  • Slow execution of application- beyond the aggreed upon threshold in SLAs
  • Application performance degradation over a period of time- memory leaks, resource allocation bugs
  • Erratic CPU utilization and application freezing
  • Performance anomalies which occur in production- hard to reproduce, which escape load testing
So, it is not hard to imagine why organizations and teams look forward to solve the problems. Solution to these problems has also a fancy name, “Application Performance Management” which is a set of recepies and approaches meant to address the issues listed above. In development, this is applied to memory analysis, code profiling and test coverage. Memory analysis can be done using any standard debugger and most IDEs usually have a nice interface to do so. Profiling, however involves some serious investment into the profiler that you are planning to use. Fortunately, some IDEs like netbeans have in built profilers, but external profilers like yourkit are more relied upon. A code coverage tool does the same thing for performance testing as it normally does for unit testing- identifying whether a given code is covered under performace testing or not.

Returning to our 'stack' for a moment, we can easily visualize the tasks needed for performance optimization at different levels.
While the performance at application level can be done via APM, the underlying application framework also plays an important role. For instance, if the framework is in beta and the application is using some new feature of it, chances are that it might have a bug that can wreak havoc upon the application and create confusing problems. The application server also needs to be properly tuned and customized for clustering and performance scaling if needed. In other cases too, there is a need for system administration of these servers as they not only act as the host for the application, but also provide monitoring activities (which can further be used as performance measurement) and manage external resources needed by the application. JVM is an often ovelooked slice of the stack, which if used correctly, can cause precise Garbage Collection trips, memory management for the application as well as its dependencies. It never hurts to have knowledge of the JVM memory areas like stack, heap and perm spaces, which are the places where the application and its server reside and operate. The operating system and hardware are the obvious choices for performance optimization because these provide the underlying resource allocation, scalability, computation and other aspects that are needed for the software. From a developer's point of view, the operating system and hardware is not generally modifiable, but the overlying layers are. So it is better to have an understanding of these concepts. In a future post, i'll start with analyzing specific performance issues relating to a tier and post recepies and solutions that I came up with so far.

Monday, August 2, 2010

IBM Offers free System Z Mastery exam to students

The IBM System Z is a part of IBM's zEnterprise system, which combines mainframe, power and system x in a single system and as claimed by IBM, can save more than 50% in costs.
This includes z/OS, which is a widely used mainframe operating system.Today the mainframe plays a central role in the daily operations of most of the world's largest corporations. Even the advent of private and hybrid cloud has yet to impact the mainframe market significantly and as mainframes continue to evolve, the demand for professionals working on them rise.

Enough of the briefest of the introductions about this technology.
I wanted to share this information that IBM are conducting the System Z mastery exam for free (they are giving vouchers) to students by the end of this year (before 31st December 2010)
You can visit http://www.ibm.com/developerworks/university/systemz/masterytest/st... for more details.
The course materials are also included at the website.
These exams are important as they can serve as your career decider; mainframe administration is a field in itself.

Unlike configuring server systems targeted for smb s, this involves programming, monitoring, fault tolerance and scalability challenges.
Hope you have fun!

Thursday, July 15, 2010

An Open Source Summer Event

We, the members of Open Source University Meetup recently conducted an on campus event this summer on 5th of July. As always, it was a successful event with a large turnout and participation of students from different years and branches.
This was an altogether a different event as compared to the few previous ones. In this event, we held discussions on how to conduct our group in future and also decided on what next things that we as an open souce lobby in our college ought to be doing.
As the markets for open source softwares continue to mature, so is the increasing number of tools, frameworks and libraries which in short, propell this phenomenon even further.

We are going to have new students coming into our college very soon in the upcoming academic year and this presents an excellent opportunity for us to reiterate and rekindle our spirit of openness and sharing. Special thanks for this time go to Gaurav, Sushant, Raghuvendra, Lalit and others for assembling everyone and making this event a sucess.

Mainly, we held brainstorming sessions on the following areas :


  1. Cool Technologies that we can discuss in depth in future.
  2. Promoting various Oracle based software.
  3. Spreading the need of sharing and contributing in the IT industry.
  4. Trying out open source technologies and helping spread its awareness and adoption.
  5. Conducting events of varied themes and involving various activities- everyone in community can chip in. 


As always, by conducting these events, we continue to equipt ourselves better in the event management as well as improving upon the contents of our presentation and doing more than ever for the community.

See the event pics at my OSUM profile albums
http://osum.sun.com/photo/albums/on-campus-event-5th-july

Sunday, June 27, 2010

Enterprise Ruby : First Impressions

I cannot stop marvel at the 'gem' of a useful software that I installed today (www.rubyenterpriseedition.com). As its name was enterprise ruby, earlier my impression was that it was some proprietary software (I had heard websites like twitter using it) but to a pleasant surprise, turned out to be an open source software distributed by Phusion (the same guys behind Passenger deployment tool). It is provided for all *NIX systems and ready to run .deb installer for Ubuntu.
Well, to the point itself, Enterprise Ruby features :
# An enhanced garbage collector. This allows one to reduce memory usage of Ruby on Rails applications (or any Ruby application that takes advantage of the feature) by 33% on average.
# An improved memory allocator. This increases Ruby's performance drastically.
# Various developer tools for debugging memory usage and garbage collector behaviour

A lot of this optimization is POSIX dependent, which is why there is no such version for windows yet. The FAQ page (http://www.rubyenterpriseedition.com/faq.html) demonstrated this and a lot more about the improved platform, which is why a lot of production servers are adopting it.

However, the nicest thing in the  whole installation was that you got a lot of commonly used gems installed, which is really appreciative because we otherwise have to manually install or repeatedly try to install via the gem repository (a thing that sadly fails most of the time for me).

Hopefully you find this information helpful. If you feel anything contradictory to that posted above, please do comment as I might be wrong in some areas that I haven't checked out yet.

Thursday, June 17, 2010

Towards RESTful Web Services

When we talk about distributed applications, internet is the enabling technology that comes to the mind. Sure, one can think of many other ways and protocols of performing a software feat on more than one computer at the same time, but the fact that internet or the World WideWeb is the biggest network out there remains true and has the appeal of its ubiquity in the implementation of any such distributed application.
You can even think of a website as a distributed application, but here, the client (or you) is a passive person/computer that can only use what is fed to him. Even dynamic websites ultimately generate static content (or for that matter, even Web 2.0, which is built on user experience rather than anything else), which even if it appears as a distributed application, there is no such distribution of computation going on. However, in a truly distributed application, the portions of application reside in different computers and communicate with each other via internet. This interchange of the necessary data and process requests is what is needed in these kinds of applications.
Before I further befuddle you from the idiosyncrasy of complex distributed application technologies that are widely used like WS-* stack, we need to understand the simple phenomenon that on the web, the content generated on the web pages is in form of HTML, that is read by the web browsers on your computers and displayed to you as a web page. However, if we change this format to a different one (still textual, not changing into binary ) like XML or JSON, the end users of our application become different.
So ultimately, there is not much difference between a web site and a web service as the former can be used interchangeably with the latter and what make a web site easier to be used by a human being can be applied to a web service, which can be applied over internet on the same manner, to a program. The common web service protocol has been SOAP, which is essentially an application layer on top of whatever internet it is operating on. This created a lighter way of creating distributed and interoperable applications, but at an increasing cost of complexity. Instead of reinventing the wheel, wouldn't it be simply great if we simply set forth the best of the internet in a mix and obtain something refreshingly easy, elegant and say useful for our purposes.
This is what REST or REpresentational State Transfer is all about. It is not just an alternative method of creating web services, but is arguably the easier way of creating them as it addresses the addressing and discovery concerns based on the established design patterns of internet. The simplicity of this approach is not its weakness as you probably might be thinking, but its strength. Other web service provider technologies can claim maturity and tools to hide complexity, but the changing face of REST is negating this limitation of itself in this regard. In my future posts, I'll explain how the changing face of REST is going to make it a force to reckon with in future.

Thursday, May 13, 2010

Scalable Application Architecture with Memory Caches

This post is dedicated towards an important feature that is required in creation of highly performant and scalable applications. Data has traditionally been stored in specialized software, aptly known as database. However, given the massive use of some specific data which is generally static in nature in applications that permit large number of users(thousands, if not more) to interact with the application simultaneously. Using a memory storage instead of a hard disk file results in huge performance gain as the requests pertaining to that data would not be read from a database located on a disk.Think of a in-memory data cache as a robust, persistent storage that can be used in certain transient tasks where the data is stored in RAM of cache machines.

The caching mechanism (framework responsible for maintaining the cache) has to do the following main tasks :
  • Maintain a cache (Obvious one)
  • Determine the pattern of requests (which requests are more in number)
  • Flush out resources and load new ones (different strategies can be used here, most common one is LRU)
  • Make sure that the data maintained in cache is correct (and if not, then to which extent)
  • Enforce consistency of cache across different machines (Generally this design is needed in cloud or clustered machines)
  • Maintain resource utilization (Evict the data from cache if server need increases)

Caching is generally found in distributed applications which are targeted to be used by large amount of people if they are not already in production environment. Today, most of the memory cache frameworks do not offer a synchronization mechanism between the data stored in the database and the memory cache. To overcome this problem, we need to explicitly set an expiration value of the cached object so that it gets refreshed upon requests after a certain period. This performance optimization can be done not only for database specific operations, but also on other data such as repeated web service calls, computation results, static content, etc.

A popular interface standard for java is the JCache, which was proposed as JSR 107. A memory cache software, eg: memcatched [http://www.memcached.org], stores key value pair of data. As soon as a request is generated, the key values are searched, which results in a cache hit or a cache miss scenario. JSR 107 has been adopted in different implementations, one of which is the Google App Engine, which is a cloud platform supporting python and java runtimes. Here, this comes in form of a memcache service for the java runtime. This can be better explained with the following example :


import com.google.appengine.api.memcache.MemcacheService;
import com.google.appengine.api.memcache.MemcacheServiceFactory;
......

MemcacheService cache=MemcacheServiceFactory.getMemcacheService();

.....

cache.put("key","value");

....

object=cache.get("key");

.....
cache.delete("key");


This really makes application scalability easier for the developers (Cloud environments do induce the responsibility of creating applications that can scale quickly). As of now, similar feature doesnt exist in Windows Azure, but what the future holds for this technology cannot be speculated.
Thus it is not surprising that this technology is used in prime websites like YouTube, Wikipedia, Amazon,SourceForge, Metacafe, Facebook, Twitter, etc and that too in large quantites (for eg: Facebook uses over 25 TB of memcache). So, it is imperative for software developers to understand the working and development of this technology.

Saturday, April 17, 2010

My professional life as of now

Like the other years, this year too has been a hectic one for me so far and building upon my existing skills, I've tried to contuniously improve myself as a human being and as a professional. My policy so far has been 'Kaizen', which is the Japanese word that literally means continuous improvement.
So far, the greatest improvement that I've made in the first quarter of the current year has not on the programming languages or frameworks, or for that matter, any software development practice, but on allied fields. Currently, I am working upon solving business problems via alternative means like ERP, CRM and CMS softwares practically.
As I write this blog post, alternatively, I am reading an excellent e-book titled 'Cases on Strategic Information System' that has been given to my entire class of Computer Applications by our Management Information Systems teacher. This contains 24 insightful case studies on the subject and although I have gone through a handful of them, they are really ponderable problems.
Apart from this management hogwash, rest is business as usual. At the core of my heart, I still have misgivings about management as an ethical study and am satisfied that I resisted the temptation to take a managerial course after doing my bachelors.
At my college, the brand value of Sumit Bisht (yes, that's me) continues to rise. During the past month, I have successfully conducted a technical hands-on lab and gave various presentations in front of the entire department. Apart from this, I have frustratingly and agonizingly tried to learn programming for business intelligence and reporting and am making some headway there too. Hopefully, this strategic initiative yields me rich returns in future.
That said, the software developer in me has not taken a sabbathical and thanks to the Google Summer Of Code 2010, I am actively involved in open source software development and even if my proposal is rejected, I would carry on in these exciting projects. This is the first time for me in this event and quite frankly, I am impressed with the amount of attention this activity has generated. I checked into a lot of projects and finally selected a couple of projects which were doable in my opinion with ease. Thus, this summer is going to be as promising and engaging as the last one.
Ciao for now

Thursday, March 18, 2010

Programming Wisdom

Recently, I had the opportunity to read the book, '97 things every programmer should know' by Oreilly publications. Normally I do not read computer books from cover to cover, but this was different. The book had 97 two-page insights by some very experienced programmers and people who really knew the stuff. As I read each pearl of wisdom, it was an engaging experience as every page seemed to yield me with a new insight.
However, as there are some common best practices in our trade, there was some redundancy in the text as well. Nevertheless, it was a journey that is really appreciated like us newbies (without any actual experience). So, like an engaging novel, this book really is a treasure trove for best practitioners and average programmers alike.
This was similar to my previous experiences while reading expert advice. Last year, I had the opportunity to read Bruce Tate's 'Beyond Java'. However, this text differs from Tate's as it is more 'politically correct' and does'nt focusses on advices by a single person. What is also wonderful about this book is that it is platform and language independent and the advices that it offers regarding programming are certainly most talked about in the industry.
As a conclusion, I'd like to give a tribute to the authors as they were simply too wonderful in their explanations(rather than engaging in an obscure jargon) Hats off to them.
Here's the listing of the wisdom(drum-roll) given in this book
1.       Act with Prudence
2.       Apply Functional Programming Principles
3.       Ask, “What Would the User Do?” (You Are Not the User)
4.       Automate Your Coding Standard
5.       Beauty Is in Simplicity
6.       Before You Refactor
7.       Beware the Share
8.       The Boy Scout Rule
9.       Check Your Code First Before Looking to Blame Others
10.   Choose Your Tools with
11.   Code in the Language of the Domain
12.   Code Is Design
13.   Code Layout Matters
14.   Code Reviews
15.   Coding with Reason
16.   A Comment on Comments
17.   Comment Only What the Code Cannot Say
18.   Continuous Learning
19.   Convenience Is Not an -ility
20.   Deploy Early and Often
21.   Distinguish Business Exceptions from Technical
22.   Do Lots of Deliberate Practice
23.   Domain-Specific Languages
24.   Don’t Be Afraid to Break Things
25.   Don’t Be Cute with Your Test Data
26.   Don’t Ignore That Error!
27.   Don’t Just Learn the Language, Understand Its Culture
28.   Don’t Nail Your Program into the Upright Position
29.   Don’t Rely on “Magic Happens Here”
30.   Don’t Repeat Yourself
31.   Don’t Touch That Code!
32.   Encapsulate Behavior, Not Just State
33.   Floating-Point Numbers Aren’t Real
34.   Fulfill Your Ambitions with Open Source
35.   The Golden Rule of API Design
36.   The Guru Myth
37.   Hard Work Does Not Pay Off
38.   How to Use a Bug Tracker
39.   Improve Code by Removing It
40.   Install Me
41.   Interprocess Communication Affects Application Response Time
42.   Keep the Build Clean
43.   Know How to Use Command-Line Tools
44.   Know Well More Than Two Programming Languages
45.   Know Your IDE
46.   Know Your Limits
47.   Know Your Next Commit
48.   Large, Interconnected Data Belongs to a Database
49.   Learn Foreign Languages
50.   Learn to Estimate
51.   Learn to Say, “Hello, World”
52.   Let Your Project Speak for Itself
53.   The Linker Is Not a Magical Program
54.   The Longevity of Interim Solutions
55.   Make Interfaces Easy to Use Correctly and Hard to Use Incorrectly
56.   Make the Invisible More Visible
57.   Message Passing Leads to Better Scalability in Parallel Systems
58.   A Message to the Future
59.   Missing Opportunities for Polymorphism
60.   News of the Weird: Testers Are Your Friends
61.   One Binary
62.   Only the Code Tells the Truth
63.   Own (and Refactor) the Build
64.   Pair Program and Feel the Flow
65.   Prefer Domain-Specific Types to Primitive Types
66.   Prevent Errors
67.   The Professional Programmer
68.   Put Everything Under Version Control
69.   Put the Mouse Down and Step Away from the Keyboard
70.   Read Code
71.   Read the Humanities
72.   Reinvent the Wheel Often
73.   Resist the Temptation of the Singleton Pattern
74.   The Road to Performance Is Littered with Dirty Code Bombs
75.   Simplicity Comes from Reduction
76.   The Single Responsibility Principle
77.   Start from Yes
78.   Step Back and Automate, Automate, Automate
79.   Take Advantage of Code Analysis Tools
80.   Test for Required Behavior, Not Incidental Behavior
81.   Test Precisely and Concretely
82.   Test While You Sleep (and over Weekends)
83.   Testing Is the Engineering Rigor of Software Development
84.   Thinking in States
85.   Two Heads Are Often Better Than One
86.   Two Wrongs Can Make a Right (and Are Difficult to Fix)
87.   Ubuntu Coding for Your Friends
88.   The Unix Tools Are Your Friends
89.   Use the Right Algorithm and Data Structure
90.   Verbose Logging Will Disturb Your Sleep
91.   WET Dilutes Performance Bottlenecks
92.   When Programmers and Testers Collaborate
93.   Write Code As If You Had to Support It for the Rest of Your Life
94.   Write Small Functions Using Examples
95.   Write Tests for People
96.   You Gotta Care About the Code
97.   Your Customers Do Not Mean What They Say

Tuesday, March 2, 2010

It’s not that great Idea, Sirji!

This post is regarding the false nature of so called green IT revolution created by the media. To a large extent, adoption of technology has helped us save the pollution of earth, but are e-devices really giving us benefit that is promised against the regular ways. This Interview with Don Carli Executive Vice President of SustainCommWorld LLC, and Senior Research Fellow with the Institute for Sustainable Communication addresses some of the false claims made by e-reader companies.
Although the penetration of such devices is not much in our country, it is just a matter of time before we see everyone preferring e-readers instead of books. Apart from e-readers, a large number of various smart devices today promise a cleaner environment, but if recycling them is the only way to achieve a true green way, then why cannot we improve the recycling of paper in the first place?
In the article mentioned above, one of the disturbing things was the increase in the energy requirements of data centers in an exponential manner. Again here too, various companies are quick to highlight their green data centers that reduce their energy requirements on non-renewable energy sources. But one wonders if there are these green data centers, then why is the energy requirement of these data centers still increasing.
Hence it is of importance to the media that they portray the right state of so called environment friendly products, and not false claims like a leading telecom provider of India that encourages mobile usage to save trees in its newly launched promotion campaign.

Wednesday, February 24, 2010

Open-Source Content Management Systems that You can Use to build the website of your dreams

Terms and conditions (unfortunately) apply

Gone are the days when people used to turn towards developers to create content and feature rich websites for them. The developers, in turn, created large groups and used (almost religiously) different programming languages/platforms to build the web application of their choice. But specialized software have made this process largely automated today for general purpose websites.
These are called Content Management Systems (CMS), and are highly customizable web applications. To put in other words, a CMS is a fully built web application that is just waiting for you to customize it.
A CMS features a dynamic website, complete with different modules, types of users, extensibility/add-on options, exporting, tagging, commenting, etc that can be created by a web based GUI. Technically, a CMS website has a two tier architecture involving the website and the database. Generally, there is little or no support for middleware, web services and other subtleties. As most of these are written in PHP with MySql as the database, knowledge in these areas definitively help. However, for purists in other language platforms, there are CMSs written other platforms too.
However, the use of CMS doesn’t mean that there would be end of customized (conventional) websites that are created by programmers today. CMS is largely targeted towards creation of web content for individual use and by small and medium scale organizations. What you need is just a basic set of skills like knowledge of servers, etc to get it up and running. Occasionally, you might run into problems and might search from the web based resources and communities for your problem redressal. This process might also involve some hacking inside the application, but broadly speaking, one person having some, if not all, knowledge in website maintenance can easily cater to a large number of such applications.
Broadly speaking, CMSs can be categorized into the following categories:-

1. Portals or General Purpose
These are the most generic versions of CMSs and can be used by non-technical users to publish their own content. Various community websites (like osum.sun.com), online resumes, etc can be created using these. They support plug-in architecture, which is used to add specific feature into them. The top CMS solutions for these are: - Joomla(http://www.joomla.org) and Drupal(http://www.drupal.org)
2. Blogs
Specialized versions of CMS are there that are aimed towards ease of creation and usage for weblogs or blogs. As millions of people use different kinds of blogs today, there is need for ease of creation of these blogs. Best known CMS in this category that comes into my mind is Wordpress (http://www.wordpress.org)
3. E-Commerce
Even after the dot com burst, there is a steady demand for e-commerce as internet is the way of future. By and large, e-commerce websites provide facilities for product inspection and purchase. This involves both the buyer and seller party in different aspects. So, there exists a well defined category of expectations in this area.Magento(http://www.magentocommerce.com) alongwith many others offer easy e-commerce website creation.
4. Wikis
Given the popularity of Wikipedia (http://www.wikipedia.org) and usage of wikis, both internally and externally. It is not a surprise that wikis are an integral part of Web 2.0 technologies. For ease of creation of wikis, there are a large number of specialized CMSs like MediaWiki(http://www.mediawiki.org) and DokuWiki(http://www.dokuwiki.org) alongwith others.
5. Forums
Searching for problems over internet aren’t limited to googling or yahooing (or whichever mutant you decide upon), but involves a community oriented approach. Most of the websites today offer forums and some are even dedicated towards running of these forums. One of the reason Ubuntu is so compelling to use is that for almost any Google search involving ubuntu problems, there is a link to ubuntu forums. Having said that, I'd like to point towards JForum(http://www.jforum.net) and MyBB(http://www.mybboard.net), whose implementations I use almost every other day.
6. E-Learning
This is also a facet of society that is going to be affected by internet. Usage of this avenue is still in a nascent stage. However, like BPO, this is also expected to pick up and has a large potential, especially from Indian IT Industry point of view. Some of the CMSs that came to my knowledge are Dokeos(http://www.dokeos.com) and Moodle(http://www.moodle.com)
7. Collaboration
Information Systems (IS) begin inside an organization. In order to build a successful intranet, various organizations have gone on that extra mile and created/purchased an in-house solution for themselves. However, open source CMS solutions are also there for the technology and budget constrained crowd. These are also specialized versions of their counterparts and can be used to collaborate between different IS within an organization like MIS and ERP, and between different departments like production, marketing and HR. Alfresco(http://www.alfresco.com) and Nuxeo(http://www.nuxeo.com) are some of the well-known CMSs aimed at enterprise collaborations.

Thus there are a large number of CMS that are waiting to be used (with minimal of programmer intervention) and are available free of cost. These are also open source software, so if you are interested, then you can see how they are actually coded and can tweak to your needs. Being open sourced also has the added benefit of a community driven approach that helps people to learn and help troubleshoot their problems.

After reading this, I am sure that you must be itching to get started. Wishing you the best of Luck, go add some wings (or webs ?) to your dreams.

Thursday, February 4, 2010

Importance of a Build Utility-3

Wrapping up my discussion about the build utilities, I’ll be covering Maven (http://maven.apache.org) Build Tool in this blog post.
As already discussed, there is a continuous thirst amongst developers to streamline and automate their projects/applications. Going one step further from the build routine leads to the application creation and maintenance, which is automated via Maven?
Maven is a departure from the existing build technologies, which follow a procedural route, and instead, provides a declarative way, using an xml file, to describe the project.
In large scale projects, there is a need for a large number of libraries. This requires external files in the programming context (read as .jar files in classpath). In Maven, we specify these requirements as dependencies. When the application is executed, these dependencies are downloaded from a central location and kept in a directory at the local computer for usage.
Note that in Maven, we use a lot of the Convention Over Configuration stuff, so if you try to ‘bend’ the defaults on your own, it can cause some headache-inducing  problems (which means that Maven is not for everything and everyone).
Project Types:  Archetype
An Archetype is a template for a project that is created to create a module. This gives us a starting point for our application. We can create such an archetype by :-
mvn archetype:generate
Now we have to answer some values for our project. This is our project type, the group ID, maven ID, Version and the package name of our application.
After the successful execution of this command, the project structure gets created and we have a pom.xml file at the project root. This file is called as POM (Project Object Model) as it contains the information that Maven needs in a declarative and reusable across different projects.
When we build our project, it undergoes a lot of stages or phases (like compiling, unit testing, packaging, deploying, etc) in an ordered manner.

A Small Example

First we will generate our project skeleton using the archetype:generate command










Now, we will get this :



























Here, we would set up the project type (in this case, I am setting up 18, for a blank web application).
Then, we will specify the GAV(Group, Artifact and Version) values as well as the default package. After confirmation, our project structure would be created.
















Now we can start developing our applications. As soon as we are finished, we can package our application as mvn package command.

Note that we did not created any build scripts ourselves; Maven only asked us about the type of project that we need. So our productivity increases as well as we can adhere to unified build standards.
If you look at the /target folder, then there would be a simple-webapp.war file, which can be deployed in any web server. To run this on tomcat server, for instance, simply issue a mvn tomcat:run



Apart from the build lifecycle discussed here, there are other lifecycles such as cleaning and site generation.  It is worth mentioning that, we can make the project an Eclipse project by mvn eclipse:eclipse and so on for other tools.
Now, mainly, we have to take care of the pom.xml file that contains the dependencies, which may arise as and when we expand our application. Rest everything is left to the build utility.

Conclusion
Although it is early for me to comment upon the future of this or any other build tools, but we can expect more automation in future.
Cloud based application servers can provide for automatic classpath management (like IDEs) and maven would not only take on ant as the de-facto tool in java, but also result in numerous spoof offs for other technologies.