Are days of the RDBMS numbered ?

December 19, 2007 at 8:01 pm | In Architecture - Design, Trends-Predictions, rant | No Comments

Most programmers know databases and its importance. Thanks to the new generation of software as a service and web services, traditional RDBMS’s are sparingly used and the number is bound to deteriorate further as enterprises adopt the Saas platform.

Data has far outgrown the domains of just text. Today we talk of mutlimedia data, urls, semantic data and many more application specific formats. Information on the Web is in JSON, REST , XML , Microformats etc. With this vareity in data formats and representations comes the inherent need for flexibility in storage and querying of such information. Almost all database users know of the conceptual modelling required for the design of any database, the key principle being that more tighter the model, more efficient the database. The integrity of the database is only as good as the integrity of the data. But you cannot talk of data integrity with the kind of formats available today.

Clearly markup data dominates the web . Though databases have developed features to better support , store and validate markup data , the initial design of databases was never to store the wide variety of loosely organized data. Querying of such markup data is fruitless and so is the attempt to index, sort , aggregate this data. To develop a custom database capable of all the above mentioned operations could be a solution, but the given the non standardized nature of this data and its probability of change, you would have a tough time scouring the web to search for changes. Plus these databases will not be semantically inter operable.

Developers are taking notice of a new scheme of storing data, I call it the bucket store. The design is roughly the same as that of a hash table, where data blocks are stored in buckets and hashes are used to index or refer to these buckets. A little improvisation in terms of adding upper layers like domains, groups and so on to complement the schema, table in a database is done to make the data easily classifiable. The advantage with this scheme is heterogeneity in data formats and the absence of constraints.

Several products are offering such services at dirt cheap prices. Take Amazon’s S3 or the recently launched Simpledb or CouchDb which offers a host it yourself version of this storage. Amazon S3 has businesses running on top of it; of the many I can recall Slideshare running on S3. With the advent of more mashups and heterogeneous data being churned out by the web more of such non DBMS related storage options will be employed. Given that this paradigm does implement all the enterprise important features like security, access control , backups, transactions etc and mature modeling methodologies that can rival the ER are proposed , I don’t see any problem in this becoming the most viable and cost effective option for data storage.

Hahaha

December 19, 2007 at 3:56 pm | In funny | No Comments

wont you blog about this song ???

Here_Comes_Another_Bubble_-_The_Richter_Scales
Uploaded by kingofdoper

Analytics gets a facelift and new tracking code

December 19, 2007 at 3:29 pm | In D/w-BI-Analytics, Web News | No Comments

Google updated their Analytics product by launching a new script termed ga.js that is used for tracking. The new script handles more complexity than the original urchin.js script and the analytics tool itself has some new features that can be utilized only by ga.js tracking. You can also compare multiple metrics on the same graph thanks to the new graphing features included in the recent update. The new updates seem really neat and there is a detailed PDF describing migration instructions and how to’s.

The older tracking code urchin.js will still continue to work for at least another year but there will be no feature updates or compatibility with the newer features. For me , this change means revisiting lots ( and I mean lots ) of pages where I have embedded the tracking code. Lets hope it goes smoothly.

Computers are better marketers than adsense publishers

December 19, 2007 at 3:28 pm | In Web News | No Comments

The Algorithm is taking over and there is nothing we can do :-)

In what can be termed as a not so wise decision, Google has decided to rid its customers the privilege of choosing where to put their ads. You will not be seeing the advertise on this site link on the served up google ads. Was this is a good decision, is still debatable. The reason : the ads that people put up on sites do not get clicked as much as the ones that the adsense computers pick out. The numbers back up their theory.

adsense

Its still a feeling of insecurity - if you ask me. As a publisher, I would like to publish my ads on certain sites that I know will give me better ROI. This situation is very similar to the proof of the four colored theorem. How will google prove that the ads that are served though the adsense engines do really help in better marketing the publishers wares? Is there even a quantifiable measure to check the validity of those ads? Sure the numbers back up the case, but the minority could definitely have done better things in terms of sales. Not all transactions on adsense go down as ROI, its only what the publisher decides to track. If the publisher has no shopping cart or an online selling model, how will the adsense inpact be felt. All these are questions unanswered. But for now the algorithm rules supreme.

How easy is it to build web properties?

December 5, 2007 at 12:24 pm | In Web 2.0, open source | No Comments

You may think building a web property needs not just code but scalable hardware, but you will be surprised to know you don’t need anything but a browser. Thanks to hosted infrastructure you can run and control your online business from the realms of a browser. Even the former, code, is something that is sparingly used in today’s businesses. That’s the flexibility the web offers.

Lets see some specifics and understand what I am really talking about. You will say the first thing a web based venture needs are servers and collocation centres, WRONG!! You have EC2 for computing, or EDGE grids, S3 or other similar services for storage and even if you do require the pleasures of your own server, try one of the virtualization technologies of the web hosts. That should just about cover your hardware part of things, except if you are trying to break the record for calculating largest number of decimals found in Pi ( which stands frighteningly close to a trillion digits when I last checked).

Now comes the software part of things. Most web businesses thrive on prebuilt, read to deploy open source software. Be it blogs, wikis, forums, bug tracking tools, there is always an open source software for whatever that is that you want to do. What’s even better, is that some of these software’s are self hosted and all you need is to include some paths and you are up and running with the latest and most stable version of the software. The advantage with this scheme is that you don’t need to manually upgrade your installations; they are done by the provider.

Supposing you do require the luxury of your own server, then invest in one of the upcoming virtualization technologies provided by many top hosts. They give you shell access and the comforts of your own server complete with install privileges and best of all, you don’t maintain it.

So since the hardware and software parts of the company are done, all you have to do is think of an innovative idea and get the ball rolling.

Towards Semantic Interoperability

December 5, 2007 at 12:22 pm | In Unsolved Problems, rant | 1 Comment

Is it true? Is Indian IT industry surviving because of lack of Semantic Interoperability?

Incorrect Semantic Interoperability describes mismatch in the format and representation of data belonging to two parallel applications which prohibit them from interacting with each other or prohibits possible migration. Take the example of ERPs like SAP and Oracle Apps, they essentially perform the same computations and solve the same problems, but do they interoperate? No. The data representations for both these applications are bespoke, which makes it unique to one product or a line of products. It is reasonable considering the fact that market domination is obtained by developing custom formats and providing custom decoders, but there is a bigger concern. For large players, data is collated and there is usually a Business Intelligence solution in place and with different products being deployed at different centres, the additional overhead of conversion of the data is imminent. Also two separate applications cannot talk to each other though they are probably linked up sequentially, you have to bring in additional middleware for format conversion and make a common bus.

Efforts like RSS, SOAP and others have been successful only to a certain extent that the format is correct but semantics are still loose. It has to come together sometime, if not now then later, when the data is huge. And the answer is yes, the Indian IT industry survives because of such semantic interoperability. Most of the workforce is maintaining these enterprise bridges (that holds together these different applications) if not building them. A majority of the work involved in the service sector has to do with writing compatibility plugins or writing migration scripts and patch ups. If it was for SI , we wouldn’t have jobs.

Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.