Monday, February 23, 2009

Exploring a ‘Deep Web’ That Google Can’t Grasp


Exploring a ‘Deep Web’ That Google Can’t Grasp
Jeffrey D. Allred for The New York Times

At the University of Utah, Prof. Juliana Freire is working on DeepPeep, an ambitious effort to index every public database online.

* Sign In to E-Mail
* Print
* Reprints
* ShareClose
o Linkedin
o Digg
o Facebook
o Mixx
o My Space
o Yahoo! Buzz
o Permalink
o

Article Tools Sponsored By
By ALEX WRIGHT
Published: February 22, 2009

One day last summer, Google’s search engine trundled quietly past a milestone. It added the one trillionth address to the list of Web pages it knows about. But as impossibly big as that number may seem, it represents only a fraction of the entire Web.

Beyond those trillion pages lies an even vaster Web of hidden data: financial information, shopping catalogs, flight schedules, medical research and all kinds of other material stored in databases that remain largely invisible to search engines.

The challenges that the major search engines face in penetrating this so-called Deep Web go a long way toward explaining why they still can’t provide satisfying answers to questions like “What’s the best fare from New York to London next Thursday?” The answers are readily available — if only the search engines knew how to find them.

Now a new breed of technologies is taking shape that will extend the reach of search engines into the Web’s hidden corners. When that happens, it will do more than just improve the quality of search results — it may ultimately reshape the way many companies do business online.

Search engines rely on programs known as crawlers (or spiders) that gather information by following the trails of hyperlinks that tie the Web together. While that approach works well for the pages that make up the surface Web, these programs have a harder time penetrating databases that are set up to respond to typed queries.

“The crawlable Web is the tip of the iceberg,” says Anand Rajaraman, co-founder of Kosmix (www.kosmix.com), a Deep Web search start-up whose investors include Jeffrey P. Bezos, chief executive of Amazon.com. Kosmix has developed software that matches searches with the databases most likely to yield relevant information, then returns an overview of the topic drawn from multiple sources.

“Most search engines try to help you find a needle in a haystack,” Mr. Rajaraman said, “but what we’re trying to do is help you explore the haystack.”

That haystack is infinitely large. With millions of databases connected to the Web, and endless possible permutations of search terms, there is simply no way for any search engine — no matter how powerful — to sift through every possible combination of data on the fly.

To extract meaningful data from the Deep Web, search engines have to analyze users’ search terms and figure out how to broker those queries to particular databases. For example, if a user types in “Rembrandt,” the search engine needs to know which databases are most likely to contain information about art ( say, museum catalogs or auction houses), and what kinds of queries those databases will accept.

That approach may sound straightforward in theory, but in practice the vast variety of database structures and possible search terms poses a thorny computational challenge.

“This is the most interesting data integration problem imaginable,” says Alon Halevy, a former computer science professor at the University of Washington who is now leading a team at Google that is trying to solve the Deep Web conundrum.

Google’s Deep Web search strategy involves sending out a program to analyze the contents of every database it encounters. For example, if the search engine finds a page with a form related to fine art, it starts guessing likely search terms — “Rembrandt,” “Picasso,” “Vermeer” and so on — until one of those terms returns a match. The search engine then analyzes the results and develops a predictive model of what the database contains.

In a similar vein, Prof. Juliana Freire at the University of Utah is working on an ambitious project called DeepPeep (www.deeppeep.org) that eventually aims to crawl and index every database on the public Web. Extracting the contents of so many far-flung data sets requires a sophisticated kind of computational guessing game.

“The naïve way would be to query all the words in the dictionary,” Ms. Freire said. Instead, DeepPeep starts by posing a small number of sample queries, “so we can then use that to build up our understanding of the databases and choose which words to search.”

Based on that analysis, the program then fires off automated search terms in an effort to dislodge as much data as possible. Ms. Freire claims that her approach retrieves better than 90 percent of the content stored in any given database. Ms. Freire’s work has recently attracted overtures from one of the major search engine companies.

As the major search engines start to experiment with incorporating Deep Web content into their search results, they must figure out how to present different kinds of data without overcomplicating their pages. This poses a particular quandary for Google, which has long resisted the temptation to make significant changes to its tried-and-true search results format.

“Google faces a real challenge,” said Chris Sherman, executive editor of the Web site Search Engine Land. “They want to make the experience better, but they have to be supercautious with making changes for fear of alienating their users.”

Beyond the realm of consumer searches, Deep Web technologies may eventually let businesses use data in new ways. For example, a health site could cross-reference data from pharmaceutical companies with the latest findings from medical researchers, or a local news site could extend its coverage by letting users tap into public records stored in government databases.

This level of data integration could eventually point the way toward something like the Semantic Web, the much-promoted — but so far unrealized — vision of a Web of interconnected data. Deep Web technologies hold the promise of achieving similar benefits at a much lower cost, by automating the process of analyzing database structures and cross-referencing the results.

“The huge thing is the ability to connect disparate data sources,” said Mike Bergman, a computer scientist and consultant who is credited with coining the term Deep Web. Mr. Bergman said the long-term impact of Deep Web search had more to do with transforming business than with satisfying the whims of Web surfers.

Wednesday, February 18, 2009

Home Work 11

Please post your answers for 21st class here

Announcement for 21st Feb.09 Class

We are going ONLINE for the class of Sat. 21st of February 09. All you need to do is going to this site"http://www.mogulus.com/info/about" and answer these questions;

a).Why Mogulus is the largest Internet TV broadcaster?
b).What kind of changes Mogulus TV brought to broadcasting technology,both software and hardware?
c).Watch the youtube video tutorials posted on our block and summarise steps of "how to broardcast yourself"


These works are easy and you can do this @home and post them on "Home Work 11

See you on 28th of February 09 and ,be well equippied with your presentation!!!!

Best wish

Dr.Supit K.

Monday, February 16, 2009

iPhone application helps Blackjack cheaters (AFP)

SAN FRANCISCO (AFP) - Las Vegas casino operators are on the lookout for blackjack cheaters using a card-counting iPhone application designed to help players win.
Nevada State gaming control officials have sent warnings to casinos about card-counting software that turns iPhone smart mobile telephones or iPod Touch MP3 players into illegal tools for beating the odds at blackjack tables.
"Once this program is installed on the phone through the iTunes website it can make counting cards easy," Nevada gaming control board member Randall Sayre wrote in a February 5 letter to casino operators.
"When the program is used in the 'Stealth Mode' the screen of the phone will remain shut off, and as long as the user knows where the keys are located the program can be run effortlessly without detection."
Players using the program simply tap a virtual button on the screen each time a card 10 or higher is turned up and tap a different button for lower-value cards.
A mini-software program continually updates a "true count," which with one peek can provide feedback regarding a player's chances of winning by getting cards with total values that are closest to 21 points without exceeding that amount.
Nevada officials said they were tipped that players in American Indian-run casinos in Northern California have been using the card-counting software on the popular Apple devices.
It is illegal in Nevada to have or use card-counting gadgets in casinos, but players are allowed to try to keep count in their heads.

Friday, February 6, 2009

6116 students

1.Justin Wade 5129502 justnbangkok@hotmail.com 0873471786 LoopEsolo
2.Haitao Wu(Lester) 5129512 apexbkk@gmail.com 0851518505 LesterXaris
3.Sudarat Sukhaphinad 5119512 gulshy81@yahoo.com 0858904091 SudaratYardley
4.Daniel Aigbona 5119521 danosi66@yahoo.com 0850862736 naruna Broadfoot
5.GuoHai 4929433 Jgykmmod@gmail.com 0846631031 Hai Mocha
6.Fuxiu Jiang 5119519 jfx2005@hotmail.com 0865458577 Linda Jenvieve

Dr.Supit karnjanapun karn006 Haiku 0863690007