Robert Cringely says Google is on its way to become the biggest CDN ever. It controls more network fiber than any other company in the world - and it's building two huge data centers in South Carolina alone. Google's enormous capacity, coupled with end users' increasingly voracious appetite for all things web-based, will lead to a future in which Google will act as a giant proxy server for the Internet. It will be "our phone company, our cable company, our stereo system and our digital video recorder"...
Greg Linden disagrees. He thinks that instead of cornering the market on bandwidth, Google is trying to build a world of infinite storage and CPU power. Well, *and* bandwidth.
No web hosting provider will have access to Google's economy of scale. Or out-of-this-world local government incentives. (Unless you're a state-licensed operator in China, but that's a different story.) Sorry, there isn't and won't be any way around this. BUT one resource that Google won't be dishing out in buckets is... knowledge.
I came to this realization while reading this article on MySpace's infrastructure. Once again, I spotted it on Data Center Knowledge!
MySpace was originally built on Perl + Apache + MySQL, but before the site went live, its developers switched to Windows + Cold Fusion + MS SQL. Later on everything was rewritten in C# + ASP.NET. The site started out with just one database, then write vs read transactions were split between one master DB and two slaves. Soon this evolved into vertical partitioning, or a separate database for each and every feature. Finally its technical team settled on running separate database instances for each block of 1 million accounts. All logins come through one single front-end DB, which redirects each user to the database containing files associated with his account.
In the beginning MySpace had server-based storage, then it built a storage area network with room for more disk drives. At one point the company had two full time technicians (!) manually distributing SAN resources between database instances. Finally it moved on to a 3PARData's virtualized storage solution, on which all disk drives can be accessed as one single pool of capacity. In 2007, MySpace plans to replicate its SAN (which is currently in LA) in two other locations, to eliminate its dependency on one single data center location.
Other major changes MySpace has made to its infrastructure include a caching layer between its web and database servers. In addition to minimizing DB lookups, machines on the caching tier are used to store temporary session data, which aren't given permanent database space. Also, Myspace was among the earliest adopters of MS SQL Server 2005, the better to take advantage of its 64 bit support. The 32-bit SQL 2000 limited MySpace to 4GB RAM per server. Now its standard config is 64 GB RAM.
Throughout endless trials and errors, MySpace's developers were under enormous pressure to keep its service online. Could they have benefited from having a team of Internet infrastructure experts to bounce ideas off of? Comparing their experience with benchmarking data from other fast-growing sites?
At this point, MySpace has grown beyond any hosting provider's past knowledge. On the other hand, right now this minute, the founder of tomorrow's MySpace could be ordering his first server at your data center. Ask him what he's up to. Do some research on whether any of your current customers are building similar applications, and what kind of growing pains they've gone through. Turn your hosting company into a repository of knowledge. Because it's the one card you can to play against infinite storage, CPU power and bandwidth.
There are no comments for this entry.
[Add Comment]