State of higher education in India with a focus on Computer Science

July 26, 2009 at 11:32 am | In General, Trends-Predictions | 3 Comments

I came back from attending a session which spoke about the state of graduate education in India and here is the summary:

  • Just over 450,000 students in India graduate with an Engineering degree
  • 150,000 students amongst them with a degree in either Computer Science or Information Technology.
  • There are about 1500 Engineering colleges in India.
  • Many of these colleges don’t even have a full professor on their rolls.
  • Currently there are about 750 students pursuing a Phd in 15 of the most reputed institutions in the country which means that, about 80 to 90 students graduate with a Phd from one of the 15 reputed institutions in India.
  • The 15 reputed institutions include the IIT’s, NIT’s, two of the IIIT’s (Hyderabad and Bangalore) and some autonomous institutions like BITS and Vellore.
  • The percentage of students who take up graduate education after their engineering in India is drastically low.
  • About a quarter of the students who secure Phd’s from universities from the US are Indians.
  • Students of Indian and Chinese origin make up half the graduate schools students in America.
  • Most people who secure their Phd’s from universities in India either join small and focused research groups in IT companies or take up faculty positions.
  • This year the amount of students applying for graduate education has increased dramatically, which only is reassuring evidence that graduate education is seen as a substitute for jobs and not as something of value.
  • A couple of IIT’s got about 700 applications for masters and phd positions.

Apart from all this the research output in India is not very high. Groups doing theory are considered to be doing some of the state of the art research, the other departments are not very highly regarded (I have a problem with this generalization, but we will keep that for another discussion). The researchers present in the discussion had plenty of points to contribute for the dismal state of higher education and some of the points mentioned were :

  • Lack of good, trained and motivated faculty members. This was attributed to the fact that salaries in academia were not on par with that of the industry. (pay commission’s revisions should do some good in this direction)
  • Lack of exposure to opportunities, challenges and rewards of research careers. ( this is true for colleges that are not very reputed, the quality of the faculty members are not up to the mark, which means they don’t have enough exposure … you get the point)
  • Societal pressures for securing jobs, that too through college placements, rather than pursuing something that the student really wants to do. A survey of the choices of the students during the engineering seat selection process will ascertain this fact. I even know of people who took up courses they had no interest in just because it was in a college where the placements were good.
  • Lack of funding for graduate students to attend conferences, workshops etc. ( though this was contested by a lot of people, I think , the problem lies in making the students aware of the funds that are available for such purposes )
  • Discrimination against the students who graduate from the IIT’s versus other institutions. (though strong alumni networks are not anything new, other colleges should target to strengthen their alumni networks and not work as silo’s )

This is where I found the IIIT’s (particularly Hyderabad and Bangalore) to be very innovative in their approach. They are situated in the heartland of what can be considered seat of innovation in India. Both of them have strong collaboration with the Indigenous and multinational companies based out of their respective cities and provide for a wonderful platform for students to explore a mix of both academic research and industry relevant parts of the information technology industry. Both IIIT-H and IIIT-Bangalore have achieved recognition for their quality in the industry and academia, and that too in good time. I am positive that in a few years time, these institutions will be deeply connected to the research and development communities of the information technology industry in India and will contribute significantly to the intellectual output of the country.

disclaimer : the numbers mentioned in this post are thanks to Ashwani Sharma, part of the External Research Programs team at Microsoft Research India.

Limitations and Challenges in Cloud Computing for Applications

April 14, 2009 at 2:00 pm | In Architecture - Design, Tips,Tricks and code, Trends-Predictions, Unsolved Problems | Leave a Comment

I was supposed to be involved in a discussion about cloud computing at Cloudcamp Bangalore, but due to other commitments, I could not attend the event. I had a small writeup about the limitations and challenges in Application clouds. Here is the full text of it.

Cloud Computing is a way of providing dynamically scalable and available resources such as computation, storage etc as a service to users who can use it to deploy their applications and data. Cloud Computing can handle data in both the public and the private domain. But this seemingly harmless way of thinking about building applications has its own set of issues.I am primarily referring to application cloud providers, the kind where you deploy your applications. Not storage and service clouds. Google AppEngine would be a good example for the cloud that I am describing. I note some of them here :

From the Users perspective:

  1. New unstructured and non standard paradigm of programming: Each cloud has its own supported programming language and syntax requirements for programming, though most of these clouds expose the typical hashtable based cache and datastore interfaces. There is an urgent need for standardization of interfaces and methods of programming them. One of the reasons why shared hosting environments work great is because , as a programmer, I know that I can move my PHP/PERL code to another server and it will work without too much of a fuss. Moving from one of the dozen odd cloud providers to another requires considerable developmental efforts, not to forget time (for businesses, this could spell doom). A look back at history shows languages like SQL, C etc being standardized to stop exactly this sort of undesirable proliferation.
  2. Restrictions on the programming model : For cloud based applications to be highly available, they must be easy to dynamically mirror on multiple machines. Once these applications are mirrored, they can be served on demand by load balancing servers which makes them highly available and the user doesn’t face delays in being serviced. This is an old trick used by busy websites from the early days of web publishing but these solutions were custom built for websites. So, extending this concept to cloud based platforms, servicing thousands of applications, mandates the platform providers to automate this task of replication and mirroring. This job is easier said than done. This process can be made seamless when the program stores as little state information as possible. By state, I mean transactional variables, static variables, variables in the context of the entire application etc. These things are almost a given in traditional programming environments but are very hard to come by in cloud based environments. The unnatural way of dealing with this situation is using the datastore or the cache to store state of an application. There are a lot of restrictions like lack of privileges to install third party libraries, no access to file system to write files etc ( which forces you to use the datastore and pay for it)
  3. A good local debugging experience: A good local development environment, debugging experience is a must for programming on the cloud. Most cloud providers do not provide good local development environments. There is also a lack of good IDE’s that can help with programming and debugging programs written for the cloud. The providers that do provide a local debug experience, do not simulate real cloud like conditions. Both from my personal experience and from conversations with other developers, I have come to realize that most people face problems when moving code from their local development servers to the actual cloud. This is only due to inconsistencies in the behavior of the local dev env compared to the cloud.
  4. Appropriate metrics and documentation of programming best practices : On a cloud, since a user pays for almost every CPU cycle, appropriate metrics on usage of processing time and memory must be presented to the users. Typically a profile of the application with function names and their corresponding time taken, memory used, processing cycles used will definitely help the developer tune his/her code to optimize on usage of processing power. The best solution for this is for cloud providers to abstract common code patterns into optimal libraries so that the users can be assured that they are running the most optimal code for a certain operation. An example of this is Apache PIG, which gives a scripting like interface to Apache Hadoop’s HDFS for data analysis. Also, Most cloud providers do not provide enough statistics and also profiling capabilities.

From the providers perspective:

Here I look at challenges that cloud providers have to face:

  1. Ensuring availability of the cloud: This proves to be crucial as Clouds host critical business applications, for whom, downtime would mean monetary losses. Effective monitoring and load balancing solutions are to be built. Most clouds employ virtualization technology to get the most out of any resource. In such cases, tools should be written to figure out a resource hog early and move the application to a more powerful grid or a machine, so that the other users get their share of the cloud without delays.
  2. Ensuring Consistency: Both the data and code is replicated on the cloud and maintaining consistency of data is extremely crucial. This is the reason why most transactional updates are not allowed on the cloud. Example: sequence objects, which are almost a given in traditional databases are not provided, probably because maintaining state across machines for such statements is non trivial. Problems like distributed updates, locking, partitioning, sharding etc arise when dealing with data. Such constructs are to be provided to the users as most of it is given in the non cloud deployment space.
    Most datastores provided by cloud vendors (except the ones that provide cloud based database services) do not support relational models. Which means all object relations have to be programmatically established. This could always lead to bad code, unnecessary joins, cascading problems and tons of other problems that developers faced before working with relational datastores.
  3. Program verification : One of the biggest worries about deploying applications on the cloud is the correctness of the program in execution. Erroneous conditions, like infinite loops, can not only put the machine at the risk of being overloaded and unavailable, but also cost the user a significant amount of money. Tools like static analysis should be used to analyze code uploaded on the cloud and it should be checked for infinite loops, possible race conditions, null references, unreachable code etc. The code uploaded should also be optimized or suggestions should be provided to the users about how they could optimize code to best utilize the available resources.

Conclusion : The cloud should become a complete nonrestrictive platform for applications. There should be no restrictions on the constructs, functionality and privileges on the cloud. Also, it should be dead simple to move everyday applications onto the cloud without too much of rework. This could mean writing migration utilities, import/export options and other artifacts that make the transition to a cloud much easier. This will prove essential as most live applications, at least currently, do not run on a cloud and helping them migrate easily will mean more revenue and adoption.

Trends in online advertising

December 7, 2008 at 12:49 pm | In Trends-Predictions, search | 1 Comment

Advertising has come a long way since its inception. A simple concept of endorsing links to online resources has becoming a dominating factor on the web. But given the state of online advertising now, what is the roadmap and what can we expect in the future.

The answer is of course non trivial and I will only make a fool of myself by trying to predict the answer, but there are certain inferences I have made based on my observation which I shall pen down. I divide the broad category of pc users into prospects and adless users. Prospects are users who are new or ignorant to concept of online advertising, people like my mother who doesn’t know people endorse links for money. This category could also include people who are open to targeted advertising and see a value add with it. These are people who wholeheartedly click on interesting links. The other category I call adless users, users who have been around the internet long enough to understand irrelevant ads and can spot and ignore ads in a page.

More time a user spends online, the probability of him/her realizing the web is filled with irrelevant ads and over time becoming adless users. As a result, almost all users tend to move towards becoming adless users. This is dangerous for marketers, ad companies, publishers etc as there is a whole eco system depending solely depending on money made out of ads. As new users discover the web, their prospect phase is what publishers can hope to cash in on, but eventually the shift will happen. What happens then ?

Search engines are arguably the best places for advertising and probably the best place for demonstrating the phenomenon I call intrusive or endorsed content. Take the example below.

Advertising Meltdown

Ads will stop being sidekicks and move into the foreground, I have shown the shift pictorically. Payperpost got the next concept right, people wont read ads, but social media yes, so pay people to write about your product/service etc. More results on search engines will be endoresed and most of them already are, how do you know a review you are reading of some product isnt already endorsed. Now here is the strangeloop bit, you could say you will search for bad reviews instead of good like this. It wont take long for the advertisers to see this trend as well and then pay for people to write moderately bad reviews inturn endorsing the product. You know that they know that you are looking for bad reviews !!

A surprising result on top caught my eye. A visit to the site will tell you immediately that the site isn’t half as good as the second or the third result, but still its on top. SEO has come a long way and to cheat search engines into making a page popular isn’t that hard. You can hire professionals to do that job. That in a wierd sense is a form of endorsing, a professional SEO group can start bidding for making pages more popular and start their own cartel for endorsed content.

The other strange phenomenon I see that people recognize big search brands, Google in particular, but don’t necessarily relate to the results( you can’t possible relate to the results). You could have the Google homepage serving ads from ask.com and nobody would know the difference if the results looked like Google returned them. Thats probably the reason there still are companies trying to capitalize on the search market. Take a look at the results page below.

Ad meltdown 2

In this case the difference between a result from the index and an endorsement is a mere patch of color. How difficult do you think it is to remove that demarcation during difficult times. Ethical boundries as meagre as color differences can be crossed very easily and corporations have showed time and again it can be done.

Thanks to the falling prices of bandwidth and also social media, video is the next big delivery mechanism and it was quite understandable that Google paid a billion and a half to capitalize on youtube’s huge market share and put intrusive ads on videos( you dont have a choice there, no adblock plus !! ). Same goes with pictures and audio. Radio, papers and the television have been doing it for years.

The world thought that we moved away from pop up advertising but we have just made the situation far worse. Ads will become more and more intrusive and there could come a time when content and advertisement are indistinguishable. More on this later.

Do you program this way ?

July 30, 2008 at 6:23 pm | In Trends-Predictions | Leave a Comment

Programming has evolved beyond comprehension. Everyday you hear of some innovation either in programming languages or the techniques. Languages themselves are evolving to keep up with the current development trends. I have been programming for almost ten years and am still amazed by the new programming techniques. Here are some of the cool innovations that I have seen in programming.

 - Scaffolding : the rails feature makes it dead simple and mandatory to make the MVC architecture. Just run the rake and rails command to create all the MVC functionality. Add some jazz to the front end and you can make a standard database application in under 10 minutes.

- .NET web services : during my struggle with web services in Java, I never realized that most of the programming I was doing with Axis could actually be automated and thats what Visual studio did. define a web service and say generate functions for it and you wont believe the functionality that it implements. VS then goes to wrap the service with a class and all its calls and exceptions. Imagine my surprise when I saw a 1000 line program generated by just a click of a button.

- Chaining : imagine doing  banner.generate().show().navigate().hide().unload()  – ruby, jQuery and to a little extent python does it.

- reflections :  going through an entire package structure (jars) or through assemblies (dlls) to figure out what class you want to instantiate and then creating an object of the same during run time. Tons of design patterns are becoming obsolete thanks to reflections.

- AOP : using aop is a fad from a long time but a current technique made me think twice about the power of aop. Imagine a tool that uses deep reflections and figures out where you access the database and then automatically adds aspects that are relevant to the module. The programmer doesn’t even need to know what aspects are essential to the module he/she is developing. The tool automatically adds the functionality. Ex : you are writing into the database – tool figures database entries should be 1024 bit encrypted and adds the required encryption functionality. Now isnt that cool ?

- Program the cloud : write programs without knowing the hardware behind it ; just know that your program will scale like no other using the underlying cluster of hardware that makes the cloud.

 These are the ones that I can remember while writing this post. Do add more programming innovations that you come across in the comments.

Tags: , , , ,

Are days of the RDBMS numbered ?

December 19, 2007 at 8:01 pm | In Architecture - Design, Trends-Predictions, rant | Leave a Comment

Most programmers know databases and its importance. Thanks to the new generation of software as a service and web services, traditional RDBMS’s are sparingly used and the number is bound to deteriorate further as enterprises adopt the Saas platform.

Data has far outgrown the domains of just text. Today we talk of mutlimedia data, urls, semantic data and many more application specific formats. Information on the Web is in JSON, REST , XML , Microformats etc. With this vareity in data formats and representations comes the inherent need for flexibility in storage and querying of such information. Almost all database users know of the conceptual modelling required for the design of any database, the key principle being that more tighter the model, more efficient the database. The integrity of the database is only as good as the integrity of the data. But you cannot talk of data integrity with the kind of formats available today.

Clearly markup data dominates the web . Though databases have developed features to better support , store and validate markup data , the initial design of databases was never to store the wide variety of loosely organized data. Querying of such markup data is fruitless and so is the attempt to index, sort , aggregate this data. To develop a custom database capable of all the above mentioned operations could be a solution, but the given the non standardized nature of this data and its probability of change, you would have a tough time scouring the web to search for changes. Plus these databases will not be semantically inter operable.

Developers are taking notice of a new scheme of storing data, I call it the bucket store. The design is roughly the same as that of a hash table, where data blocks are stored in buckets and hashes are used to index or refer to these buckets. A little improvisation in terms of adding upper layers like domains, groups and so on to complement the schema, table in a database is done to make the data easily classifiable. The advantage with this scheme is heterogeneity in data formats and the absence of constraints.

Several products are offering such services at dirt cheap prices. Take Amazon’s S3 or the recently launched Simpledb or CouchDb which offers a host it yourself version of this storage. Amazon S3 has businesses running on top of it; of the many I can recall Slideshare running on S3. With the advent of more mashups and heterogeneous data being churned out by the web more of such non DBMS related storage options will be employed. Given that this paradigm does implement all the enterprise important features like security, access control , backups, transactions etc and mature modeling methodologies that can rival the ER are proposed , I don’t see any problem in this becoming the most viable and cost effective option for data storage.

The New Digital Divide – saga of the legacy lovers

November 25, 2007 at 3:23 pm | In Trends-Predictions, Unsolved Problems, rant | Leave a Comment

Gone are the days of the digital divide, there is a new kind of divide amongst many computer professionals now. Its the generation gap. Its hard to comprehend this statement, but anybody, whose is exposed to at least 5 years of industry dynamics, will know exactly what I am talking about. Call it Moore’s law affecting software or just plain old generation gap, there is a clear demarcation between people who appreciate new concepts and those who prefer things the 90s way.

There are a set of people that like the innovation happening on the web front and are adopting 2.0 technologies like there is no tomorrow. Everything from office automation to project management is now managed online on productivity service providers. Concepts like wiki, blogs, forums etc are fast appearing as mainstream applications in organizations. Surely as technology evolves and takes new shape, we will see a dramatic shift in adoption of these new tools .

In contrast , there are the other people who have been around for a long time and have seen a lot of productivity applications. To these people, technology is nothing more than a fast changing fad and prefer to stick to their old time favorites. Take people who have seen the main frame era, such folk just don’t appreciate concepts like distributed computing, virtual servers etc. Quotes like ” our mainframes never needed mirroring”, are common. People who still live reminiscing innovation of their times like spreadsheets and ERP’s.

It may be hard to believe but these form the majority of the so called power users of organizations and these legacy softwares( pun intended) , are maintained and supported just for their usage. Its distrubing to know that enterprise software lags open source software by at least 3 years , in terms of innovation. This lag can clearly be accounted to the legacy lovers who insist on using their accustomed softwares. Where does product development go in such a case. Office 2007 is seeing very slow adoption due to a change in the usability. Will this set of users be responsible for the sluggishness of product development? who will convince these users to adopt newer software? more importantly how? What will these users demand 20 years from now?

Its a strange question, but yes its an emerging market.

Personalization is one cookie away

November 25, 2007 at 3:18 pm | In Trends-Predictions, Unsolved Problems, Web News, rant | Leave a Comment

I wrote about personalization some time back and about how we should actually be approaching this problem. Google has got their act into place and are making your own light weight personalization meter, but its for ads :-(

Google is going to put a cookie in your browser that will record information everytime you read an ad served by Google. Continue reading Personalization is one cookie away…

Online Community Organizer – a job for the future

July 20, 2007 at 4:40 pm | In Trends-Predictions, rant, socionets | Leave a Comment

Note : I blog on my personal space at riteshnayak.com/blog . This is a mirror of the content.

Everybody’s writing about the new social organizer phenomenon, So I thought I could add my two cents to it.

What if you want to hire someone to build an online community? Somebody to create and maintain a virtual world in which all the players in an industry feel like they need to be part of it? It would help if that person understood technology, at least well enough to know what it could do. They would need to be able to write. But they also have to be able to seduce stragglers into joining the group in the first place, so they have to be able to understand a marketplace, do outbound selling and non-electronic communications.

Seth Godin writes about the Online Community Organizer as the job of the future.

Continue reading Online Community Organizer – a job for the future…

Collaborative apps and Collective human intelligence

July 20, 2007 at 4:35 pm | In Architecture - Design, Trends-Predictions, Web 2.0, rant | Leave a Comment

Note : I blog on my personal space at riteshnayak.com/blog . This is a mirror of the content.
Collaborative apps have been around for quite sometime now, but they have been lurking very close the corporate apps which can be used primarily in a business scenario. A simple example of the same could be the productivity 2.0 apps like Zoho or Google Docs. The only other breed of collaborative app has been games, which is a again a huge draw. Its true that this genre of applications is still finding its foothold on the web and as time progresses you will find killer new applications that will explore new possibilities with colloborative apps.

I had written about Amazon’s Mechanical Turk and how it used the power of collaboration combined with automated project management to get arduous work done from people. Taking and extending on the same paradigm are newer applications that try and achieve some good from these collaborative applications. Its like the Seti project which uses your computational resource when idle, these applications use the power of human intelligence to contribute to a greater cause.

Continue reading Collaborative apps and Collective human intelligence…

Getting Familiar with Google Gears

July 20, 2007 at 4:32 pm | In Trends-Predictions, Web 2.0, socionets | Leave a Comment

Note : I blog on my personal space at riteshnayak.com/blog . This is a mirror of the content.

Google Gears was released recently as an effort to promote offline web. I have written time and again about this genre of web applications and have spoken about the advancements like the Dojo Offline Toolkit, AIR and the new Silverlight that try to blemish the line between web and desktop applications.

Google Gears is designed ingeniously. Gears is an activex plugin on IE and an XPI on firefox(installables) . Gears then works in your browser for any applications designed to use the gears technology. The foremost application that uses gears is Google Reader, which can store and retrieve almost 2000 articles. The transition between online and offline web is supposed to be seamless, as in one taking over when the connectivity is out and the other when its back. In reader, you have to explicitly make the shift from online to offline, something like the work offline option in IE. Continue reading Getting Familiar with Google Gears…

Next Page »

Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.