Archive for July, 2008

28
Jul

Cuil Runnings Part 2

Part 1 of Cuil Runnings described the self-proclaimed Google-beating search engine and summarized industry’s initial response after launch. Their reception was … frosty, to say the least. Was the response justified? Several tests were performed using common search types to determine whether Cuil is fit to replace Google on discerning desktops.

Test 1: Music Group

For engineers, these ex-Googlers did a poor job of estimating the initial server load but let’s not be petty – it’s only day one. The search results matter most and in particular page 1 of the search results (which is where most people look). Total number of search results, while often used as a selling point, mean very little in practical terms since most surfers only look at the top 10 or 20.

Cuil Search Engine Test 1: 4Hero
(click to enlarge)

My first test was done using a moderately popular music group, the British RnB / Drum n Bass outfit 4Hero. Cuil really showed its best side for this example: firstly, the results displayed in a three-column magazine style (two-column displays are selectable). To the right are a series of categories related to the group 4Hero, including albums from the group and related artists. If you were to click on the link titled “Goldie” (another Drum n Bass artist) Cuil would search for “4Hero Goldie”. This accessible drill down display will be indispensable for people doing actual research.

Test 2: Ancient Civilization

Cuil Search Engine Test 2: Nubia
(click to enlarge)

Inspired by the implications from the first test, I keyed in an ancient civilization –Nubia- to determine how easily research could be accomplished. Again, the category drill down was available on the right but this time a result filter was available above the search results. Since Nubia is a pretty broad topic, Cuil gave the option to filter the existing results on major subtopics like Ancient Nubia and Rhadopis of Nubia. Cuil’s interface is closer to an interactive encyclopedia than a straight search engine.

Test 3: Direct Website Reference

cuil4.jpg
(click to enlarge)

Cuil really fails when it comes to identifying name of actual websites. This is important because a worrying number of people still find websites by entering the English name in a search engine and clicking on the first link they see. I tried that with Jack’s NewsWatch and no link to the site root exists on the front page. In fact, the first link on the list is to Jack’s new-found nemesis at StageLeft. The average site-owner probably doesn’t want a large % of potential search traffic going to his detractors. A significant % of the search results were also coming from third-party services like HaloScan and blog aggregators. Conversely, Google’s first link went directly to the front of Jack’s site, with lesser quantities of the parasitic sites in later links. The same test was performed with other blog sites (Cynics Unlimited, Crux of the Matter, Blue Like You) and only Google linked to the root of the actual site within the first page. The first search result for Small Dead Animals links to a site attacking the blog owner. The first search for Blink 7 links to Blink 182’s band site.

Cuil was much better at identifying major websites like CNN, but that’s hardly an indication of a search engine’s ability to determine link relevance.

Test 4: Recent News Articles

Cuil Search Engine Test 4: Recent News Article
(click to enlarge)

Testing Cuil’s ability to retrieve the latest news stories involved writing two tests per article. First, the full title of the article entered verbatim into the browser to determine whether a link to the actual article or reprint of the article appeared on the first page. The second test was completed using key words. The tests are listed below (key words in brackets)

  • Associated Press: Bush OKs execution of Army death row prisoner (keywords: Bush Ronald Gray Execution)
  • Reuters: Zimbabwe crisis negotiations deadlocked (keywords: Zimbabwe negotiations)
  • Canadian Press: Bell Canada to cut 2,500 jobs to lower operating costs ahead of takeover (keywords: Bell Canada job cuts)

Cuil returned no results for the title of the AP or CP articles. The Reuters subject line returned an excerpt from a Zimbabwe site unrelated to the article in question. Amazingly, Cuil had no results at all for they keywords related to the AP article, meaning not even info on traditional websites. The CP keywords returned several Wikipedia pages about Bell and one WSWS article about Bell cutting jobs … in 1999. Conversely, Google found all of the articles by title and keywords within the first page. Cuil may be indexing more pages than Google but surely aren’t doing so with great speed.

Conclusion

Cuil’s interface is beautiful and intuitive. General-purpose researchers and students will quickly take to its OLAP-style interface and numerous search refinement options. The search engine itself needs help, however. Cuil was not intuitive enough to recognize all by the most ubiquitous site names while third party sites and junk aggregators pushed actual site content out of the top listings. Cuil performed abysmally at retrieving current events or recently-updated sites, which is unacceptable in a 24-hour news environment. As of now, Google has little to fear.

28
Jul

Cuil Runnings Part 1

Cuil Search Engine

Cuil (pronounced “cool”) is the creation of Google alumni Anna Patterson, who is working in conjunction with her husband (former IBM employee Tom Costello) and two other ex Google engineers. Patterson’s last major search engine effort was purchased by the mighty Google in 2004. Costello’s previous efforts include a 1990’s search engine called Xift and IBM’s WebFountain technology. Monier is the former Chief Technology officer of AltaVista – considered by many to be the best search engine in the pre-Google webverse. This group has credentials. They also have funding, to the tune of $33 million in venture capital investments.

Cuil’s self-purported advantages over the competition (read: Google) are as follows

  • More Links. The Cuil search engine claims an index spanning 120 billion web pages, dwarfing both Google’s most recently reported figure of 8.2 billion web pages and the industry’s estimate of 40 billion pages
  • More privacy. Cuil promises not to track the habits of individual users, purporting to track general web trends instead. This feature seems designed to appeal to the privacy experts who have complained about Google’s invasive data gathering efforts.
  • Content-based rankings. Cuil’s engine reportedly places more emphasis on the content of the page than which pages link to it. This is a potential advantage to both users more interested in research than buzz and content providers who concentrate on quality rather than social networking to build their sites.

Survey Says …

Alas, many engines have come and faltered in light of Google’s massive 62% market share (USA). How well did Cuil hold up on its opening day? Not too well, judging by reports in the IT media:

“If you are going to roll out a new search engine, please try to make one that has more going for it than a silly name and cheap, misleading PR. Thus we have Cuil, the search engine rolled out this last week by some ex-Google folks who see a market opportunity. While all the people involved seem competent and have great resumes, the site itself out-and-out stinks”
-John C. Dvorak

“Cuil went live last night and then went down after only a couple of hours of operation due to an apparently overwhelming response which lead to a server melt down. At the time of writing this article they were back up again, but you’d have thought that with all the hype around their launch they would have been better prepared?”
-New Zealand Herald

“What’s the first thing people check in a new, more-powerful Internet search? Their own name, of course. The SAI staff ran our own names through Cuil’s search. It hadn’t heard of some of us, while for others it returned our bylines next to pictures of… other people.
SAI’s commenters noted that searches for terms like “penguins” or “failure” returned zero results.”

-Silicon Valley Insider

Cuil’s lackluster performance is explained briefly in an equally critical CNET article

“Cuil isn’t set up as a massively parallel search network the way, say, Google is. Tom Costello had explained this to me a bit when we talked last week. Each of Cuil’s search appliances is specialized to a particular subcategory of results. There are machines that understand and index sports; others are experts on medicine, etc. As these search machines get overloaded, Sollitto said, they drop offline for some queries, and the machines left online return less-than-relevant results that then appear at the top of users’ pages.”
-WebWare

Overall, it can be said Cuil’s launch was one of the least successful in recent tech history. Is the criticism fair, however? Proceed to part 2 to find out!