Subscribe for daily, weekly or monthly web hosting news updates by email!

Cuil Search Engine Causes Crashes

By theWHIR.com , September 09, 2008

By David Hamilton, theWHIR.com

September 9, 2008 -- (WEB HOST INDUSTRY REVIEW) -- Website administrators are reporting that Cuil's (cuil.com) Twiceler indexing bot is bombarding websites with traffic, causing them to crash and spurring discussion in the IT community about how to save websites from being searched and destroyed.

As far back as April, blogs like TooCan.com have identified Twiceler as an unregistered, undocumented bot that  changes its name in response to a blocking and ignores robots.txt, the file that usually tells bots to keeps websites from being indexed.

Suspicions about Twiceler were revisited when Blog Tech Crunch (www.techcrunch.com) reported about the potential problems of Twiceler after receiving an anonymous email.

"I don't know what spawned it, but when Cuil attempts to index a site, it does so by completely hammering it with traffic," the email read. "So much, that it completely brings the site down. We're 24 hours into this "index" of the site, and I've had to restrict traffic to the site down to 2 packets per second, while discarding the rest, or otherwise it makes the site unusable."

Compliants have been heard on forums where users been sharing information and ways to shield their sites from Twiceler. One administrator posting from Kent, England noted that the indexing bot visited one of his sites suffered nearly 70,000 repeat visits.

"Twiceler has been rampaging on one of the sites I administer," he posted. "It leeched enormous amounts of bandwidth, nearly 2Gb this month until it was blocked."

He reaffirmed early reports that it does not adhere to normal robot.txt commands and he writes that the only way to block it is by denying access in the .htaccess file.

Operational Engineer engineer James Akers responded to complaints, noting that Twiceler is still under development.

"Twiceler is the crawler that we are developing for our new search engine. It is important to us that it obey robots.txt, and that it not crawl sites that do not wish to be crawled," he wrote in a statement. "If you wish, I will be glad to add your site to our list of sites to exclude."

He also noted that there may be malicious bots appearing as Twiceler and that administrators should check that Twiceler is crawling their website and not an imposter.

  • (2) Comments

Comment anonymously or log into your WHIR account

Logging in allows enhanced commenting features (such as external linking) in news, features, blogs and more.

User:

Pass:

(reset password)

Don't have an account yet? Register now!


 

Comment by Anonymous on Wednesday, September 24, 2008

Fair is fair and Cuil is fairly cool and the media hoopla on the Cuil launch is well deserved and totally understandable (even if a bit harsh). After all, Cuil was built by a team of top-notch ex-Google engineers.

But did you know that another new search engine -- built by a team of top-notch ex-Google users -- has surpassed Cuil in traffic this month?

And with nary a lick of media love.

Check out NeXplore Search (www.NeXplore.com) vs. Cuil (www.cuil.com) for the month of September using whatever website traffic comparison tool you prefer -- Google Trends, Alexa, Compete, etc.

Cuil’s focus -- more algorithmic complexity.

NeXplore’s focus -- a more visually engaging and productive search results page.

Seems pretty clear which approach real folk prefer...

Comment by Anonymous on Wednesday, September 24, 2008

Reading between the lines its quite obvious twicelers screwy design wasnt by accident. You have an Irish Techie who convinced VC's to dump 33 million in small change into his concept, which was based mainly on their massive index size. But guess what, when the VC auditors came to see the index size for real Mr. Tom had to fill it with something. So he came up with a spur of the moment idea to run the system dictionary agaisnt every sites web directory, and presto, Cuil now has generated millions if not billions of indexes to unique ip addresses, ofcourse they are all 40(1,2,3) errors, but who cares, since he was selling index size to his VC's not content. He passes the auditors test, they open up the bank account, and its muffins and chocolates forever (or at least until the money runs out

Read Back Issues of WHIR Magazine

October 2009 - Web Hosting's All Star Team
This has been, for us, one of the most interesting, exciting and challenging build-ups to an issue of the magazine yet, Web Hosting's All Star Team. The balloting process was our first experiment with a kind of user participation we're planning to do a lot more with in the months to come. We had thousands of ballots submitted, with hundreds of write-in suggestions and a demonstration of user engagement that has us feeling super positive about the project.
About This Issue | Read Digital Edition

July 2009 - What am I Worth?
One of the interesting luxuries of working on a project like the printed WHIR magazine is that it allows us to play with things like our point of view from one issue to the next. In recent months we've been giving added attention to the kind of practical and applicable advice aimed at smaller hosts and resellers. This issue carries on with that point of view, asking, in our cover story, "what am I worth?" It's a complicated question without a clear-cut answer.
About This Issue | Read Digital Edition

May 2009 - The Blueprint for a Small Web Host
I was a little surprised by how difficult it became to see this idea through. We set out to assemble a blueprint for a small hosting business, but butted up pretty quickly against the general impossibility of covering all the territory that was out there to be covered. The basic constraints of a printed magazine, and the less-than-infinite amount of time we had available forced us to face the fact that we could never produce an exhaustive guide to starting a hosting company.
About This Issue | Read Digital Edition

Read more WHIR Magazine back issues