Is that data credible ?

April 28, 2007 at 4:01 pm | In Unsolved Problems, rant |

I just read this wonderful article by Nicholas Carr which talks about information credibility on sites like wikipedia. Sure, the phenomenon is big with almost every person turning to wikipedia as a substitute to encyclopedia , but is that information credible? Its common folk like us (not experts) who add entries to wiki’s and we have our own free will to contort facts. Since wiki’s have become dependencies in many cases, I see a disturbing trend of abuse of technology and information due to the acts of a few. Its evolutionary I suppose which Nicholas points out subtly quoting Larry Sanger’s ( of the wikipedia fame) article.

 In the Middle Ages, we were told what we knew by the Church; after the printing press and the Reformation, by state censors and the licensers of publishers; with the rise of liberalism in the 19th and 20th centuries, by publishers themselves, and later by broadcast media - in any case, by a small, elite group of professionals.

Larry Sanger

In the midst of all of these, at least there is realization that not all information is credible and who controls or contributes that information is as essential as the information itself. I wouldn’t what category to put that into, but we have an unsolved problem on our hands - Information relevance and credibility. Though the promise of a semantic web does tend to blur out the relevance or the contextual aspect of the problem ,  the credibility part of it is lost. Take this blog post for example, the quoted text is actually from Larry Sanger’s article which was referenced in Nicholas Carr’s article which in turn found its way here. Now supposing I were to look at that text alone without knowing the background, how would I know who was responsible for that piece of information.

Knowledge and Information Management techniques are not well equipped to handle such situations. Though images have watermarks and EXIF information, text has a very loosely coupled digital signature. Though these systems could probably solve the origination problem, the problem of information purity and fact/fiction is still left unabated. A conservative solution of allowing experts to monitor and control data ( which Larry Sangers next project citizendium plans to implement) just abstracts the problem to a higher level, besides , how many experts can you find in specialized areas who would voluntarily monitor the data available in their domain. More work and research needs to be done to ascertain accuracy of data and more importantly facts.  I cannot relate to machines solving the problem though a combinations of semantically linked documents with a lot of mining could give partial results. What do you think ? Have you come across any solutions to this sort of a problem ?

No Comments yet »

RSS feed for comments on this post. TrackBack URI

Leave a comment

XHTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.