Limitations and Challenges in Cloud Computing for Applications
April 14, 2009 at 2:00 pm | In Architecture - Design, Tips,Tricks and code, Trends-Predictions, Unsolved Problems | Leave a CommentI was supposed to be involved in a discussion about cloud computing at Cloudcamp Bangalore, but due to other commitments, I could not attend the event. I had a small writeup about the limitations and challenges in Application clouds. Here is the full text of it.
Cloud Computing is a way of providing dynamically scalable and available resources such as computation, storage etc as a service to users who can use it to deploy their applications and data. Cloud Computing can handle data in both the public and the private domain. But this seemingly harmless way of thinking about building applications has its own set of issues.I am primarily referring to application cloud providers, the kind where you deploy your applications. Not storage and service clouds. Google AppEngine would be a good example for the cloud that I am describing. I note some of them here :
From the Users perspective:
- New unstructured and non standard paradigm of programming: Each cloud has its own supported programming language and syntax requirements for programming, though most of these clouds expose the typical hashtable based cache and datastore interfaces. There is an urgent need for standardization of interfaces and methods of programming them. One of the reasons why shared hosting environments work great is because , as a programmer, I know that I can move my PHP/PERL code to another server and it will work without too much of a fuss. Moving from one of the dozen odd cloud providers to another requires considerable developmental efforts, not to forget time (for businesses, this could spell doom). A look back at history shows languages like SQL, C etc being standardized to stop exactly this sort of undesirable proliferation.
- Restrictions on the programming model : For cloud based applications to be highly available, they must be easy to dynamically mirror on multiple machines. Once these applications are mirrored, they can be served on demand by load balancing servers which makes them highly available and the user doesn’t face delays in being serviced. This is an old trick used by busy websites from the early days of web publishing but these solutions were custom built for websites. So, extending this concept to cloud based platforms, servicing thousands of applications, mandates the platform providers to automate this task of replication and mirroring. This job is easier said than done. This process can be made seamless when the program stores as little state information as possible. By state, I mean transactional variables, static variables, variables in the context of the entire application etc. These things are almost a given in traditional programming environments but are very hard to come by in cloud based environments. The unnatural way of dealing with this situation is using the datastore or the cache to store state of an application. There are a lot of restrictions like lack of privileges to install third party libraries, no access to file system to write files etc ( which forces you to use the datastore and pay for it)
- A good local debugging experience: A good local development environment, debugging experience is a must for programming on the cloud. Most cloud providers do not provide good local development environments. There is also a lack of good IDE’s that can help with programming and debugging programs written for the cloud. The providers that do provide a local debug experience, do not simulate real cloud like conditions. Both from my personal experience and from conversations with other developers, I have come to realize that most people face problems when moving code from their local development servers to the actual cloud. This is only due to inconsistencies in the behavior of the local dev env compared to the cloud.
- Appropriate metrics and documentation of programming best practices : On a cloud, since a user pays for almost every CPU cycle, appropriate metrics on usage of processing time and memory must be presented to the users. Typically a profile of the application with function names and their corresponding time taken, memory used, processing cycles used will definitely help the developer tune his/her code to optimize on usage of processing power. The best solution for this is for cloud providers to abstract common code patterns into optimal libraries so that the users can be assured that they are running the most optimal code for a certain operation. An example of this is Apache PIG, which gives a scripting like interface to Apache Hadoop’s HDFS for data analysis. Also, Most cloud providers do not provide enough statistics and also profiling capabilities.
From the providers perspective:
Here I look at challenges that cloud providers have to face:
- Ensuring availability of the cloud: This proves to be crucial as Clouds host critical business applications, for whom, downtime would mean monetary losses. Effective monitoring and load balancing solutions are to be built. Most clouds employ virtualization technology to get the most out of any resource. In such cases, tools should be written to figure out a resource hog early and move the application to a more powerful grid or a machine, so that the other users get their share of the cloud without delays.
- Ensuring Consistency: Both the data and code is replicated on the cloud and maintaining consistency of data is extremely crucial. This is the reason why most transactional updates are not allowed on the cloud. Example: sequence objects, which are almost a given in traditional databases are not provided, probably because maintaining state across machines for such statements is non trivial. Problems like distributed updates, locking, partitioning, sharding etc arise when dealing with data. Such constructs are to be provided to the users as most of it is given in the non cloud deployment space.
Most datastores provided by cloud vendors (except the ones that provide cloud based database services) do not support relational models. Which means all object relations have to be programmatically established. This could always lead to bad code, unnecessary joins, cascading problems and tons of other problems that developers faced before working with relational datastores. - Program verification : One of the biggest worries about deploying applications on the cloud is the correctness of the program in execution. Erroneous conditions, like infinite loops, can not only put the machine at the risk of being overloaded and unavailable, but also cost the user a significant amount of money. Tools like static analysis should be used to analyze code uploaded on the cloud and it should be checked for infinite loops, possible race conditions, null references, unreachable code etc. The code uploaded should also be optimized or suggestions should be provided to the users about how they could optimize code to best utilize the available resources.
Conclusion : The cloud should become a complete nonrestrictive platform for applications. There should be no restrictions on the constructs, functionality and privileges on the cloud. Also, it should be dead simple to move everyday applications onto the cloud without too much of rework. This could mean writing migration utilities, import/export options and other artifacts that make the transition to a cloud much easier. This will prove essential as most live applications, at least currently, do not run on a cloud and helping them migrate easily will mean more revenue and adoption.
Uncertainty in programming – the lochness of the programming world
March 11, 2009 at 5:09 pm | In Tips,Tricks and code, python, rant | 1 CommentProgramming has come many a mile since the 70’s. A wide array of languages, methodologies, frameworks and other similar artifacts have made the life of a programmer really simple. These artifacts have incrementally solved problems faced by programmers and slowly, but steadily, wrapped the programmers view of a program into a set of abstractions. One of the first abstractions that was built, looking at the history of programming languages, was the ability to hide the underlying differences in hardware, system software and present a unified way of programing and manipulating the system. This is what we call modern day high level programming language.
If the programming language, an abstraction of the real machine code, ever helped solve a problem, it was that of uncertainty. Take an example of the piece of code given below.
// sample code to add two numbers
int a=10,b=20,c=0;
c= a+b;
Console.WriteLine(c.toString());
When I run this code on any machine, I am assured to get the value of c to be equal to be 30. I know when I access the variable “c” the next time, I will find it contains the value of 30. I know that two instructions from now, variables a,b,c will be available for further manipulation.
My recent attempts at programming on the cloud has taught me several lessons, the most important one being, programming to deploy on a cloud is almost like writing programs that you can never be certain about. You can never maintain application state. This means no static variables, no relational datastore, no freedom to write into the filesystem etc. Think about it for a second and it will make sense why these seemingly harmless actions are prohibited. Filesystem access is a big no-no anywhere, but as for static variables, persistent classes, singletons etc, running this on many actual/virtual machines means, all these entities with their values have to be moved/replicated across the cloud. This becomes a non trivial problem especially when the state keeps constantly changing. I could live with all these restrictions by coding, painful but effective, workarounds. What I can’t do is, work with uncertainty. Here is an example :
from google.appengine.api import memcache
def get_greetings(self):
"""get_greetings()
Checks the cache to see if there are cached greetings.
If not, call render_greetings and set the cache
Returns:
A string of HTML containing greetings.
"""
greetings = memcache.get("greetings")
if greetings is not None:
return greetings
else:
greetings = self.render_greetings()
if not memcache.add("greetings", greetings, 10):
logging.error("Memcache set failed.")
return greetings
The code is an example on using the built in caching mechanism on appengine. Notice the line of code given below; its supposed to return the value of the item in the cache with the key greetings
greetings = memcache.get("greetings")
Here’s the question: what is the guarantee that the value, which I inserted into the cache with a large timeout, is actually available. Whenever I write this line of code, do I have to write the failsafe code also(line 15,19) ? I am trying to model state using variables in the cache, mainly because its the next best thing to persistent classes and is less expensive (computationally and financially) than the key/value datastore. How do I reliably do this ? I cant trust that the cache will be available and have to keep on constantly updating the failsafe mechanism ( in case of appengine, the datastore) which is inefficient and highly taxing on the application. What has given rise to this situation is the environment of the cloud. Its not a new problem by any means. With the introduction of new languages, language constructs and other programmatic abstractions, this kind of uncertainty in programming has always reared its ugly head. The lochness of the programming world. And it will continue to do so; which is why we will have constructs like the assert(). My greatest worry is that I don’t see an elegant solution in the foreseeable future.
New Programming Paradigms
March 11, 2009 at 5:06 pm | In Tips,Tricks and code, rant | Leave a CommentOver the last two or three years, I have seen introduction of many new psuedo programming languages(if I can call it that) that help users build applications over the web. Most of these languages are built to work with or as a service. I shall wildly switch between a web service and also the langauge to interact with that webservice; so get the message when I switch from one to another. Let me take one of these languages called YQL. A sample instruction would look like this:
/* Get the latest 10 photos from flickr where the photo name contains cat */ select * from flickr.photos.search where text='Cat' limit 10
As you can clearly see the language makes querying a service and receiving its response really really simple. This is how most new psuedo languages are. They work with service end points and emulate an existing programming language’s syntax to do that. These languages are built with mashup’s in mind. The dangers of such an offering are already imminent. Services are good as long as they are up and live. Take for example any of the Google or yahoo Api’s and you will find wrappers written by people in such pseudo langauges to make your life simple. Even in the enterprise space there are such languages being built which query custom services and makes building applications really really simple.
Another observation of mine involves loose typing in these languages. Most new languages are loosely typed. Most of them take from python which lets the user take care of the typing. SQL by far has been the most emulated language amongst these pseudo langauges. Take for example JoSql to add SQL like capabilities to operations like file handling or Linq in .NET which exposes a sql like interface to datastructures. These improvisations have dramatically reduced time to turn ideas into code and rapidly prototype the application.
There are limitations to using such improvisations; some that even I can vouch for. Loosely typed and unstructured languages are good as long as you are not working on large scale systems. If you are hacking up a solution to a problem that you are facing, these pseudo languages look to be real problem solvers but when it comes to working in teams, projects that need to go into production, you start getting into big problems. Though I am a python fanboy, I faced problems when I was working on python and perl on a large project with a team. Interfaces would be unclear, poor documentation would literally spell doom and tons of other problems that we never thought we would face. There are others who complain of the very same thing. I am guessing we will see a flood of such languages in the future thanks largely to applications evolving slowly into services and it will be difficult to guage the quality of these services. Twitter’s API tried to make their service more stable but the mechanism they chose didn’t satisfy many developers. Lets hope we figure out a way to make these more reliable and stable. I guess its the developers call to be judicious about what language and service to choose when building applications.
Trends in online advertising
December 7, 2008 at 12:49 pm | In Trends-Predictions, search | 1 CommentAdvertising has come a long way since its inception. A simple concept of endorsing links to online resources has becoming a dominating factor on the web. But given the state of online advertising now, what is the roadmap and what can we expect in the future.
The answer is of course non trivial and I will only make a fool of myself by trying to predict the answer, but there are certain inferences I have made based on my observation which I shall pen down. I divide the broad category of pc users into prospects and adless users. Prospects are users who are new or ignorant to concept of online advertising, people like my mother who doesn’t know people endorse links for money. This category could also include people who are open to targeted advertising and see a value add with it. These are people who wholeheartedly click on interesting links. The other category I call adless users, users who have been around the internet long enough to understand irrelevant ads and can spot and ignore ads in a page.
More time a user spends online, the probability of him/her realizing the web is filled with irrelevant ads and over time becoming adless users. As a result, almost all users tend to move towards becoming adless users. This is dangerous for marketers, ad companies, publishers etc as there is a whole eco system depending solely depending on money made out of ads. As new users discover the web, their prospect phase is what publishers can hope to cash in on, but eventually the shift will happen. What happens then ?
Search engines are arguably the best places for advertising and probably the best place for demonstrating the phenomenon I call intrusive or endorsed content. Take the example below.

Ads will stop being sidekicks and move into the foreground, I have shown the shift pictorically. Payperpost got the next concept right, people wont read ads, but social media yes, so pay people to write about your product/service etc. More results on search engines will be endoresed and most of them already are, how do you know a review you are reading of some product isnt already endorsed. Now here is the strangeloop bit, you could say you will search for bad reviews instead of good like this. It wont take long for the advertisers to see this trend as well and then pay for people to write moderately bad reviews inturn endorsing the product. You know that they know that you are looking for bad reviews !!
A surprising result on top caught my eye. A visit to the site will tell you immediately that the site isn’t half as good as the second or the third result, but still its on top. SEO has come a long way and to cheat search engines into making a page popular isn’t that hard. You can hire professionals to do that job. That in a wierd sense is a form of endorsing, a professional SEO group can start bidding for making pages more popular and start their own cartel for endorsed content.
The other strange phenomenon I see that people recognize big search brands, Google in particular, but don’t necessarily relate to the results( you can’t possible relate to the results). You could have the Google homepage serving ads from ask.com and nobody would know the difference if the results looked like Google returned them. Thats probably the reason there still are companies trying to capitalize on the search market. Take a look at the results page below.

In this case the difference between a result from the index and an endorsement is a mere patch of color. How difficult do you think it is to remove that demarcation during difficult times. Ethical boundries as meagre as color differences can be crossed very easily and corporations have showed time and again it can be done.
Thanks to the falling prices of bandwidth and also social media, video is the next big delivery mechanism and it was quite understandable that Google paid a billion and a half to capitalize on youtube’s huge market share and put intrusive ads on videos( you dont have a choice there, no adblock plus !! ). Same goes with pictures and audio. Radio, papers and the television have been doing it for years.
The world thought that we moved away from pop up advertising but we have just made the situation far worse. Ads will become more and more intrusive and there could come a time when content and advertisement are indistinguishable. More on this later.
Human in the loop searches
October 24, 2008 at 8:20 pm | In search | Leave a CommentFor quite some time now, I am using social media sites to do my searching. Its not that traditional search results are bad, just that for most of the results that I am trying to get to, social media sites are doing a far better job. Take for example a search on accessibility or cognitive psychology. It’s painful to get through the clutter and get to results that actually pertain to sites that describe accessibility and information on it. But a couple of searches on Digg and Delicious and I have tons of results at my disposal. Traditional web searches work well for certain types of queries, like word lookups, product lookups, news etc. Non trivial query results have a tough time gaining page rank and will usually fail to show up on the results screen. The central point here is that, for certain searches, you just need the wisdom of the crowds.
I know when I look at sites like Digg, Reddit and Delicious people have gone through these links and painfully tagged and saved these links. Which means that with a very high probability its not marketing gibberish or spam. Folksonomy should definitely be given the credit for making life more organized. While searching for certain tags(delicious), I also discover other related tags and then run more filtered searches to improve the relevance of the results. The web, at least for the moment, is said to be partial to content on computers (technology in general, iPhone in particular J ) and for the zillion other domains that the web doesn’t do justice to, traditional ranking methods do little to improve relevance. Human in the loop is definitely better for such queries.
One of the projects that I have worked on is targeting this very need for non trivial searches. Silverfish, is a semantics extraction engine for academic documents and courses. The indexed results and the social aspect of the site are used to update researchers on the latest in their fields of interest and also recommend fresh material. When more and more people start using the internet there will be an increased demand for searches not related to technology and in such cases human in the loop searches will definitely take front stage. Future of search is definitely going to be more interesting.
Desitech.in – Technologies for the Indian audience
October 9, 2008 at 3:04 pm | In Web News | 1 CommentVisit Desitech.in for technology news in India
A friend of mine along with a bunch of other contributors has started Desitech.in a blog site that covers a wide variety of upcoming technologies, events ( now thats something that I look forward to) and interesting startups. Here is their description about the site:
Desitech is a technology journal which covers events, products and technologies relevant to the Indian audience. The journal features various columns including event announcements, event coverage, interviews with personalities, startup profiles, product comparison, etc
Seems to be interesting. Prash, best of luck on the project.
I turn two today
September 29, 2008 at 9:24 pm | In funny | 1 CommentYes, its two years since I started functioning. My statistics are pretty impressive for a blog from a nobody in technology, ie without writing hit sensitive content on technology. I even have a small set of loyal readers who take the time to read through the gibberish that my author writes. But still, its been a pretty good run till now. Here are some of my statistics.
These stats are for both me and my mirror, actually the place where I was born.
Posts : 322 posts
Comments: 378
Spam Comments :44,083
Pageviews: 75,289
Visitors: 32,877
Average 2.5 min/visit
Total money made redeemed = Rs 800
Total money made yet to be redeemed = Rs 1269 ( waiting for 4500, till it redeems, damn adsense)
One of my articles even made it to 25 newspapers in the US. Some have even been added to the contribute section of MSN India
Anyway, I hope I get the same response (or better) for many more years to come and hope my author can put in credible and creative content and not make do with such lame attempts at posting.
( Author : ah screw you!! )
#@!* you hippie !!
Python – Still Getting started
September 27, 2008 at 6:49 pm | In python | 2 CommentsTags: eclipse, gedit, itsmeritesh, linux, python
My new found interest in learning python, probably one of the most coolest programming languages has left me clamouring for more python goodies. The learning curve for python, at least the scripting part, was extremely small and easy, or maybe that’s because I already know Perl. Here’s my set of getting started tips on Python. And believe me, if you haven’t started yet, please do.
- Most Linuxes come with python interpreter, if not use a package manager to get the latest interpreter.
- Read the Python Tutorial by Guido Von Rossum. The most (100 pages) you will ever read about python, beacuse its faster to do than read about python.
- Any good editor will do, but if you are one of those IDEated individuals, use the pydev plugin for Eclipse.
- For a simpler feel, you can get the python plugins for Gedit from this location. Just install it and my favorite, GEdit becomes python wise. My friend also suggests using Emacs for python development.
Thats about it !! Please feel free to add comments on what else you read, did, installed to become a python developer.
Ruby on Rails – Comprehensive tutorial
September 22, 2008 at 12:22 pm | In Tips,Tricks and code | Leave a CommentI recently gave a tutorial on Ruby on Rails at my school. Its pretty comprehensive and most of the material I used for the tutorial is present on my personal Wiki. Please find the tutorial on my wiki here
http://riteshnayak.com/wiki/index.php?title=Ruby_on_Rails
Also, people who are using rails 2.0, please go the 2.0 section directly. If you start using 1.0 tutorial, you may get lost after a certain time and the results wont show. Check what version of rails you have installed and then use the corresponding section. Best of luck!!
Tags: rubyon rails, ror, rails, itsmeritesh
More modifications to More Y!
August 21, 2008 at 7:24 pm | In General | Leave a Comment
If you haven’t seen More Y! , then its high time you do. Its a simple search plugin built on top of BOSS platform that optimizes the search experience for widescreen displays. Whats even more cool is the adoption rate. My statistics tell me that there have been a lot of people who use this service and nothing satisfies a technologist more than to see his technology being adopted.
After a fight with my significant other who is trying really hard to improve relevance on Yahoo search results, I realized that I wasn’t doing justice to search results by ordering top results in top down fashion. I have changed the code to now show results in left-right fashion. Now the most popular results will be available on top of the screen and you don’t have to strain those precious eyeballs of yours to look all the way down to see the results. A simple change but for a goo cause.
When I get the time, I shall also incorporate search suggestions on your toolbar, but till then make do with this.
Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.

