Limitations and Challenges in Cloud Computing for Applications
April 14, 2009 at 2:00 pm | In Architecture - Design, Tips,Tricks and code, Trends-Predictions, Unsolved Problems | Leave a CommentI was supposed to be involved in a discussion about cloud computing at Cloudcamp Bangalore, but due to other commitments, I could not attend the event. I had a small writeup about the limitations and challenges in Application clouds. Here is the full text of it.
Cloud Computing is a way of providing dynamically scalable and available resources such as computation, storage etc as a service to users who can use it to deploy their applications and data. Cloud Computing can handle data in both the public and the private domain. But this seemingly harmless way of thinking about building applications has its own set of issues.I am primarily referring to application cloud providers, the kind where you deploy your applications. Not storage and service clouds. Google AppEngine would be a good example for the cloud that I am describing. I note some of them here :
From the Users perspective:
- New unstructured and non standard paradigm of programming: Each cloud has its own supported programming language and syntax requirements for programming, though most of these clouds expose the typical hashtable based cache and datastore interfaces. There is an urgent need for standardization of interfaces and methods of programming them. One of the reasons why shared hosting environments work great is because , as a programmer, I know that I can move my PHP/PERL code to another server and it will work without too much of a fuss. Moving from one of the dozen odd cloud providers to another requires considerable developmental efforts, not to forget time (for businesses, this could spell doom). A look back at history shows languages like SQL, C etc being standardized to stop exactly this sort of undesirable proliferation.
- Restrictions on the programming model : For cloud based applications to be highly available, they must be easy to dynamically mirror on multiple machines. Once these applications are mirrored, they can be served on demand by load balancing servers which makes them highly available and the user doesn’t face delays in being serviced. This is an old trick used by busy websites from the early days of web publishing but these solutions were custom built for websites. So, extending this concept to cloud based platforms, servicing thousands of applications, mandates the platform providers to automate this task of replication and mirroring. This job is easier said than done. This process can be made seamless when the program stores as little state information as possible. By state, I mean transactional variables, static variables, variables in the context of the entire application etc. These things are almost a given in traditional programming environments but are very hard to come by in cloud based environments. The unnatural way of dealing with this situation is using the datastore or the cache to store state of an application. There are a lot of restrictions like lack of privileges to install third party libraries, no access to file system to write files etc ( which forces you to use the datastore and pay for it)
- A good local debugging experience: A good local development environment, debugging experience is a must for programming on the cloud. Most cloud providers do not provide good local development environments. There is also a lack of good IDE’s that can help with programming and debugging programs written for the cloud. The providers that do provide a local debug experience, do not simulate real cloud like conditions. Both from my personal experience and from conversations with other developers, I have come to realize that most people face problems when moving code from their local development servers to the actual cloud. This is only due to inconsistencies in the behavior of the local dev env compared to the cloud.
- Appropriate metrics and documentation of programming best practices : On a cloud, since a user pays for almost every CPU cycle, appropriate metrics on usage of processing time and memory must be presented to the users. Typically a profile of the application with function names and their corresponding time taken, memory used, processing cycles used will definitely help the developer tune his/her code to optimize on usage of processing power. The best solution for this is for cloud providers to abstract common code patterns into optimal libraries so that the users can be assured that they are running the most optimal code for a certain operation. An example of this is Apache PIG, which gives a scripting like interface to Apache Hadoop’s HDFS for data analysis. Also, Most cloud providers do not provide enough statistics and also profiling capabilities.
From the providers perspective:
Here I look at challenges that cloud providers have to face:
- Ensuring availability of the cloud: This proves to be crucial as Clouds host critical business applications, for whom, downtime would mean monetary losses. Effective monitoring and load balancing solutions are to be built. Most clouds employ virtualization technology to get the most out of any resource. In such cases, tools should be written to figure out a resource hog early and move the application to a more powerful grid or a machine, so that the other users get their share of the cloud without delays.
- Ensuring Consistency: Both the data and code is replicated on the cloud and maintaining consistency of data is extremely crucial. This is the reason why most transactional updates are not allowed on the cloud. Example: sequence objects, which are almost a given in traditional databases are not provided, probably because maintaining state across machines for such statements is non trivial. Problems like distributed updates, locking, partitioning, sharding etc arise when dealing with data. Such constructs are to be provided to the users as most of it is given in the non cloud deployment space.
Most datastores provided by cloud vendors (except the ones that provide cloud based database services) do not support relational models. Which means all object relations have to be programmatically established. This could always lead to bad code, unnecessary joins, cascading problems and tons of other problems that developers faced before working with relational datastores. - Program verification : One of the biggest worries about deploying applications on the cloud is the correctness of the program in execution. Erroneous conditions, like infinite loops, can not only put the machine at the risk of being overloaded and unavailable, but also cost the user a significant amount of money. Tools like static analysis should be used to analyze code uploaded on the cloud and it should be checked for infinite loops, possible race conditions, null references, unreachable code etc. The code uploaded should also be optimized or suggestions should be provided to the users about how they could optimize code to best utilize the available resources.
Conclusion : The cloud should become a complete nonrestrictive platform for applications. There should be no restrictions on the constructs, functionality and privileges on the cloud. Also, it should be dead simple to move everyday applications onto the cloud without too much of rework. This could mean writing migration utilities, import/export options and other artifacts that make the transition to a cloud much easier. This will prove essential as most live applications, at least currently, do not run on a cloud and helping them migrate easily will mean more revenue and adoption.
Uncertainty in programming – the lochness of the programming world
March 11, 2009 at 5:09 pm | In Tips,Tricks and code, python, rant | 1 CommentProgramming has come many a mile since the 70’s. A wide array of languages, methodologies, frameworks and other similar artifacts have made the life of a programmer really simple. These artifacts have incrementally solved problems faced by programmers and slowly, but steadily, wrapped the programmers view of a program into a set of abstractions. One of the first abstractions that was built, looking at the history of programming languages, was the ability to hide the underlying differences in hardware, system software and present a unified way of programing and manipulating the system. This is what we call modern day high level programming language.
If the programming language, an abstraction of the real machine code, ever helped solve a problem, it was that of uncertainty. Take an example of the piece of code given below.
// sample code to add two numbers
int a=10,b=20,c=0;
c= a+b;
Console.WriteLine(c.toString());
When I run this code on any machine, I am assured to get the value of c to be equal to be 30. I know when I access the variable “c” the next time, I will find it contains the value of 30. I know that two instructions from now, variables a,b,c will be available for further manipulation.
My recent attempts at programming on the cloud has taught me several lessons, the most important one being, programming to deploy on a cloud is almost like writing programs that you can never be certain about. You can never maintain application state. This means no static variables, no relational datastore, no freedom to write into the filesystem etc. Think about it for a second and it will make sense why these seemingly harmless actions are prohibited. Filesystem access is a big no-no anywhere, but as for static variables, persistent classes, singletons etc, running this on many actual/virtual machines means, all these entities with their values have to be moved/replicated across the cloud. This becomes a non trivial problem especially when the state keeps constantly changing. I could live with all these restrictions by coding, painful but effective, workarounds. What I can’t do is, work with uncertainty. Here is an example :
from google.appengine.api import memcache
def get_greetings(self):
"""get_greetings()
Checks the cache to see if there are cached greetings.
If not, call render_greetings and set the cache
Returns:
A string of HTML containing greetings.
"""
greetings = memcache.get("greetings")
if greetings is not None:
return greetings
else:
greetings = self.render_greetings()
if not memcache.add("greetings", greetings, 10):
logging.error("Memcache set failed.")
return greetings
The code is an example on using the built in caching mechanism on appengine. Notice the line of code given below; its supposed to return the value of the item in the cache with the key greetings
greetings = memcache.get("greetings")
Here’s the question: what is the guarantee that the value, which I inserted into the cache with a large timeout, is actually available. Whenever I write this line of code, do I have to write the failsafe code also(line 15,19) ? I am trying to model state using variables in the cache, mainly because its the next best thing to persistent classes and is less expensive (computationally and financially) than the key/value datastore. How do I reliably do this ? I cant trust that the cache will be available and have to keep on constantly updating the failsafe mechanism ( in case of appengine, the datastore) which is inefficient and highly taxing on the application. What has given rise to this situation is the environment of the cloud. Its not a new problem by any means. With the introduction of new languages, language constructs and other programmatic abstractions, this kind of uncertainty in programming has always reared its ugly head. The lochness of the programming world. And it will continue to do so; which is why we will have constructs like the assert(). My greatest worry is that I don’t see an elegant solution in the foreseeable future.
New Programming Paradigms
March 11, 2009 at 5:06 pm | In Tips,Tricks and code, rant | Leave a CommentOver the last two or three years, I have seen introduction of many new psuedo programming languages(if I can call it that) that help users build applications over the web. Most of these languages are built to work with or as a service. I shall wildly switch between a web service and also the langauge to interact with that webservice; so get the message when I switch from one to another. Let me take one of these languages called YQL. A sample instruction would look like this:
/* Get the latest 10 photos from flickr where the photo name contains cat */ select * from flickr.photos.search where text='Cat' limit 10
As you can clearly see the language makes querying a service and receiving its response really really simple. This is how most new psuedo languages are. They work with service end points and emulate an existing programming language’s syntax to do that. These languages are built with mashup’s in mind. The dangers of such an offering are already imminent. Services are good as long as they are up and live. Take for example any of the Google or yahoo Api’s and you will find wrappers written by people in such pseudo langauges to make your life simple. Even in the enterprise space there are such languages being built which query custom services and makes building applications really really simple.
Another observation of mine involves loose typing in these languages. Most new languages are loosely typed. Most of them take from python which lets the user take care of the typing. SQL by far has been the most emulated language amongst these pseudo langauges. Take for example JoSql to add SQL like capabilities to operations like file handling or Linq in .NET which exposes a sql like interface to datastructures. These improvisations have dramatically reduced time to turn ideas into code and rapidly prototype the application.
There are limitations to using such improvisations; some that even I can vouch for. Loosely typed and unstructured languages are good as long as you are not working on large scale systems. If you are hacking up a solution to a problem that you are facing, these pseudo languages look to be real problem solvers but when it comes to working in teams, projects that need to go into production, you start getting into big problems. Though I am a python fanboy, I faced problems when I was working on python and perl on a large project with a team. Interfaces would be unclear, poor documentation would literally spell doom and tons of other problems that we never thought we would face. There are others who complain of the very same thing. I am guessing we will see a flood of such languages in the future thanks largely to applications evolving slowly into services and it will be difficult to guage the quality of these services. Twitter’s API tried to make their service more stable but the mechanism they chose didn’t satisfy many developers. Lets hope we figure out a way to make these more reliable and stable. I guess its the developers call to be judicious about what language and service to choose when building applications.
Ruby on Rails – Comprehensive tutorial
September 22, 2008 at 12:22 pm | In Tips,Tricks and code | Leave a CommentI recently gave a tutorial on Ruby on Rails at my school. Its pretty comprehensive and most of the material I used for the tutorial is present on my personal Wiki. Please find the tutorial on my wiki here
http://riteshnayak.com/wiki/index.php?title=Ruby_on_Rails
Also, people who are using rails 2.0, please go the 2.0 section directly. If you start using 1.0 tutorial, you may get lost after a certain time and the results wont show. Check what version of rails you have installed and then use the corresponding section. Best of luck!!
Tags: rubyon rails, ror, rails, itsmeritesh
Yahoo BOSS – simple, open and awesome
August 5, 2008 at 2:54 pm | In Tips,Tricks and code | 3 CommentsReached home early yesterday and read an article about Yahoo BOSS and its open nature. In my effort to kill time, till dinner, I sat and read through the documentation for BOSS and it turns out its the easiest open search ever. I used the Google Coop as my site search but, somehow the techie inside me couldn’t rest at the thought of someone else doing the tech for me. BOSS looked really tempting with its good results and recent indexes; I sat down to build my site search.

BOSS is simple. Really really simple and can do wonders if you are planning to build a search engine with your own flavor. Unlike other searches, BOSS gives you XML/JSON, meaning you can re-order results and present them in any way you like. Add flash, css, javascript, canvas elements whatever to build that unique search experience. After the cuil ripoff Yuil, which got taken down and was relaunched again as 4Hoursearch (why did they call it that ? figure it out Einstein !!), I was sure BOSS would be easy, but didn’t know it would be this easy.
Its only recently that I started learning Python, and I suck at it, so I picked my old favorite PHP as the language of choice ( I suck at PHP too, but suck less compared to Python). Got myself an Application ID to use BOSS. Used PHP SimpleXML parser to get a URL of choice and Voila, I had my results in an array. Wrote some really rudimentary CSS to match the aesthetics of my site and my site search was done .
Check out my Yahoo BOSS powered site search here !!
If it wasn’t for my crappy PHP skill level, Im sure I could have wrapped up the entire thing, right from “Duh, what is BOSS ?” to the implementation, in under an hour. If I do find more time to kill whilst I wait for dinner, I shall experiment with different displays for search results from BOSS. Y! BOSS is truly open and in keeping with the Open Source spirit, I have shared my rudimentary site search code. You wont believe it but the code , with proper convention, HTML and CSS comes up to 65 lines . Isn’t it awesome ??
Get the code for site search here.
Another wonderful manifestation of this concept is that you can now build custom search engines that will search only the sites that you catalog for information you need. Check out Y! BOSS.
Tags: yahoo, BOSS, sitesearch, search
Do not forget to Remember the Milk
July 29, 2008 at 2:08 pm | In Tips,Tricks and code | Leave a Comment
I am over obsessed with organizing things that I deal with. I love doing things like organizing my bookmarks, tagging all my photos/blogs/folders, documenting every piece of software – things that other people consider to be a chore. I feel organizing and optimization both add a lot to personal productivity. Ok, to be really honest, I hate having to look for things, especially things that I know I have. I also hate it when it takes me more than 5 minutes to find what I want and being organized is a clever way to take out all the hard work in being lazy. I am also involved in a zillion things at once and usually forget a lot of important events ( think its time to invest in a personal secretary ). For that reason I just love the online calendar, I am currently using the Google Calendar which is just really awesome and wanted a quick way to add things to my calendar. Then I found Remember the Milk
The reminder service is awesome. Firstly it integrates to my Google as well as my outlook calendar. Next, there is a firefox addon that lets you see your task schedule in gmail. And to top it off I can post events from twitter and also my messenger. Its just truly awesome.
The service also understand common sentences and converts it to a date time equivalent. If I write say ” do this day after tomorrow”, it adds do this to two days later. I haven’t yet seen the complete capability of the service, but almost all the english sentences I have typed have ended up in dates. Its pretty clever when you think of it. If you are like me and want to get organized use rtmilk.
Others tools that I use for organizing information around me
1. Outlook and Google Calendar
2. Tagging files in Vista ( not in xp )
3. Personal site search ( which helps me more than it helps others )
4. Post its
5. Reminders on my phone
Tags: rememberthemilk, rtm, organizing, calendar, schedule
Ruby on Rails – Getting started
July 4, 2008 at 2:09 pm | In Tips,Tricks and code | 1 Comment

Every night now, for the past three days, I am sitting religiously in front of my 8 year old desktop running gutsy (both my laptops are’nt with me now
) trying to learn Ruby on Rails – ror in short. I must say, I have been completely fascinated by its possibilities and look forward to building some really cool and useful apps with it in the near future. I have also been trying to look for tutorials on ror and not finding much help online. A friend of mine pointed me to The Book for ror which I am trying to get my hands on. I have never been the type that learns a new language from a book; I like to get my hands dirty and try out things – thats the way I learn. So, based on my experience learning ror , Im penning down my getting started with ror.
What can ror do ? It makes developing web applications really, and I mean really simple. You give it a table structure and rails automatically builds a table, forms for insert, delete and display for the fields and also build MVC architecture by default. Controllers and views are built and all you have to do as a web developer is to add CSS to the generated files so that they look awesome. There is demo which shows how you can build a blogging engine using rails in under 15 minutes. Now isnt that cool !! Its a really cool hack that has made the job of writing everyday web applications really easy. Well two of my favorite applications online have been built using ror. Check them out .

Well for starters ruby is a programming language – older than java, very english like and mostly interpreted I guess. Rails is this wonderful platform like hack which does a lot of cool things and ruby sits on top of rails and you can build applications using them.
Installation : I found many places which listed installation instructions but this one worked best.
http://wiki.rubyonrails.org/rails/pages/RailsOnUbuntu

This brings me to another wonderful piece of software that runs under 400kb, Gems. A apt styled package manager built to get ruby related software.
Once you are setup, I guess its time to learn the language and to learn ruby, I would definitely suggest why’s (poignant) guide to Ruby. Nothing beats this online book … sorry online masterpiece in explaining the aspects of the ruby language. Im still not done with this but I cant wait to read more of it.
Since ror is for web development, its obvious you need a webserver, so you can use the prepackaged WEBrick server ( good for small dev). For more serious programming use Mongrel, Apache or lighttpd. Instructions to configure Apache and Lighttpd for ror can be found here. Get MySql for the database and you will be done.
Lastly you need a very good article to get you started and this is the one that got me started out. Its not perfect considering its almost 2 years old but the errors that show up will help you learn much better. And thats it , we are done. Best of luck learning ror.
Tags: ruby, rails , ruby on rails, getting started, tutorials
Email Checklist from Seth Godin
June 18, 2008 at 1:07 pm | In Tips,Tricks and code, gyaan | Leave a CommentGreat tips on sending an Email from Seth Godin himself- I realized a lot of my own follies.
Before you hit send on that next email, perhaps you should run down this list, just to be sure:
- Is it going to just one person? (If yes, jump to #10)
- Since it’s going to a group, have I thought about who is on my list?
- Are they blind copied?
- Did every person on the list really and truly opt in? Not like sort of, but really ask for it?
- So that means that if I didn’t send it to them, they’d complain about not getting it?
- See #5. If they wouldn’t complain, take them off!
- That means, for example, that sending bulk email to a list of bloggers just cause they have blogs is not okay.
- Aside: the definition of permission marketing: Anticipated, personal and relevant messages delivered to people who actually want to get them. Nowhere does it say anything about you and your needs as a sender. Probably none of my business, but I’m just letting you know how I feel. (And how your prospects feel).
- Is the email from a real person? If it is, will hitting reply get a note back to that person? (if not, change it please).
- Have I corresponded with this person before?
- Really? They’ve written back? (if no, reconsider email).
- If it is a cold-call email, and I’m sure it’s welcome, and I’m sure it’s not spam, then don’t apologize. If I need to apologize, then yes, it’s spam, and I’ll get the brand-hurt I deserve.
- Am I angry? (If so, save as draft and come back to the note in one hour).
- Could I do this note better with a phone call?
- Am I blind-ccing my boss? If so, what will happen if the recipient finds out?
- Is there anything in this email I don’t want the attorney general, the media or my boss seeing? (If so, hit delete).
- Is any portion of the email in all caps? (If so, consider changing it.)
- Is it in black type at a normal size?
- Do I have my contact info at the bottom? (If not, consider adding it).
- Have I included the line, “Please save the planet. Don’t print this email”? (If so, please delete the line and consider a job as a forest ranger or flight attendant).
- Could this email be shorter?
- Is there anyone copied on this email who could be left off the list?
- Have I attached any files that are very big? (If so, google something like ’send big files’ and consider your options.)
- Have I attached any files that would work better in PDF format?
- Are there any
or other emoticons involved? (If so, reconsider).
- Am I forwarding someone else’s mail? (If so, will they be happy when they find out?)
- Am I forwarding something about religion (mine or someone else’s)? (If so, delete).
- Am I forwarding something about a virus or worldwide charity effort or other potential hoax? (If so, visit snopes and check to see if it’s ‘actually true).
- Did I hit ‘reply all’? If so, am I glad I did? Does every person on the list need to see it?
- Am I quoting back the original text in a helpful way? (Sending an email that says, in its entirety, “yes,” is not helpful).
- If this email is to someone like Seth, did I check to make sure I know the difference between its and it’s? Just wondering.
- If this is a press release, am I really sure that the recipient is going to be delighted to get it? Or am I taking advantage of the asymmetrical nature of email–free to send, expensive investment of time to read or delete?
- Are there any little animated creatures in the footer of this email? Adorable kittens? Endangered species of any kind?
- Bonus: Is there a long legal disclaimer at the bottom of my email? Why?
- Bonus: Does the subject line make it easy to understand what’s to come and likely it will get filed properly?
- If I had to pay 42 cents to send this email, would I?
OpenId a must for new properties
March 21, 2008 at 4:42 pm | In Tips,Tricks and code, Web News | Leave a CommentI have been a big fan of OpenId for a long time and also advice many people about the benefits of using it. What I really disliked was the fact that big names were missing from the OpenId directories. That changed as Yahoo is beta testing being an OpenId provider and the news is Great. First of all it almost triples the number of users who have OpenIds and also almost every internet user has a yahoo account (approx 30 million) , which makes the proposition a whole lot better.
I have been trying to make a web property OpenId enabled and its a cinch. Just download and install libraries for the multitde of programming languages and then just follow some basic configuration steps , map the OpenId users to your user management system and you are done. Here is a list of all the plugins available for OpenId enabling your site.
OpenId has really come of age and with Yahoo announcing support its become a neccesity for almost every web App. I would even go so far as to mandate websites , old and new , to enable OpenId on their sites and save the users from the painful signup and confirm cycle. The sheer number of OpenId holders should be motivation enough for properties to go the OpenId way.
Performance and its importance for websites
November 25, 2007 at 3:22 pm | In Tips,Tricks and code, Web 2.0, rant | Leave a CommentRecently, the field of performance has been taken by storm. Right from the people in my company who came to improve performance of our websites to the people who gave talks about performance in unconferences held in the city, performance seems to be the thing to talk about.
A recent trip to the Yahoo Developer network portal also showed some glaringly visible tributes to the field of performance. Continue reading Performance and its importance for websites…
Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.


