March 09, 2004

Mining Without Mesothelioma

Posted by nerdling | March 9, 2004 10:32 AM

Though Google is by far the best search engine I've encountered, there are always limitations. The structure of HTML and other web coding languages makes it impossible to access every piece of information on the web through a search engine, which means that even Google, in all its power and majesty, can only crawl 1% of the web. Most of the remaining information is called the deep Web—a complex maze of databases and other structured information that cannot be searched because it exists outside of standard HTML architecture, which is the protocol used by all standard search engines.

That, however, is starting to change, resulting in what could be a major revolution in the way we view information on the internet, as well as how much and what kind of information the average person is able to view: government documents, databases of customer information and scientific research are just the beginning. Yahoo! recently announced the new direction they are pursuing with their search engine (now that they have split with Google), the Content Acquisition Program (CAP), also revealing what promises to be the future of web searching technology—mining the deep Web.

Comments