A survey performed by Compete.com also reported that the domain “digg.com” attracted at least 236 million visitors annually by 2008.
- Bandwidth (Why Low?): Digg acts as a middle party and that being the case does not require too much bandwidth. People will post links to stories on Digg and readers simply follow this link to the website the story can be found on. The publishing website will provide the bandwidth for the user to read the story.
- CPU (Why Low?): For the same reason bandwidth is low, Digg allots its processing power to links and the voting features. It also powers the indexing of the articles into the various descriptions (business, lifestyle, sports, etc.).
- Disk (Why Low?): Digg allots its disk space primarily to store temporary links to stories and to personal account information. The stories themselves are stored on the host website.
- RAM (Why Low?): The amount of RAM needed to display the links to articles is fairly small, given that the RAM in order to read the articles will be provided by the stories’ host website.
- Scalability (Why High?): Using a LAMP bundle, Digg is able to increase the efficiency of its servers. The site is fairly basic and doesn’t require all too many resources and is easy to make scalable.
The concept behind of the 129th ranked social news website, Digg, is a simple one. Users can submit stories and either “digg” up or “bury” others by casting votes. With the release of Digg v4, however, developers have done away with the bury function, but the format of Digg’s story submission and voting system has been adopted by other social networking sites that want to tap into Digg’s former popularity. Quantcast estimates that Digg’s monthly unique visitors in the U.S. alone is 15.1 million. A survey performed by Compete.com also reported that the domain “digg.com” attracted at least 236 million visitors annually by 2008.
Digg started out as an experiment in 2004, conducted by Kevin Rose, Owen Byrne, Ron Gorodetzky, and Jay Adelson. The original design was ad free, but as Digg became more popular, Google AdSense was added. The second version of Digg, which was released in 2005, added a friends list and the ability to “digg” a story without being directed to a success page, as well as a new interface. Version 3, released in 2006 featured specific categories for Technology, Science, World & Business, Videos, Entertainment and Gaming as well as a section titled View All which displayed all the categories.
Digg has grown large enough to see some problems emerge, such as the “Digg effect,” which creates a sudden increase in traffic when a particular story or several stories are “dugg” to excess.
In 2010, Digg CEO Jay Adelson announced plans to completely reconstruct Digg’s website, introducing changes that will essentially eliminate the duplication problem, prevent “power users” from overpopulating the site with their submissions, and offer a personalized homepage. The new website will also contain features to prohibit “trolling” or “group-burying.” Adelson described the changes, saying, “We’ve got a new backend, a new infrastructure layer, a new services layer, new machines: everything.” Another big change is the company’s switching from MySQL database to Cassandra.
- 26 million unique visitors in a month
- 30 million users
- 2 billion requests a month
- 13,000 requests a second, peak at 27,000 requests a second
- 3 Sys Admins, 2 DBAs, 1 Network Admin, 15 coders, QA team
- Several billion page views per month
- None of the scaling challenges faced had anything to do with PHP
- Dozens of DB servers
- Six dedicated graph database servers to run the Recommendation Engine
- Six to ten machines that serve files from MogileFS
Digg opened its API (Application Programming Interface) to the public in 2007, giving developers the ability to write tools and applications based on queries of Digg’s public data, dating back to its beginning in 2004.
Initially, Digg used an IDDB infrastructure, which allowed for the partitioning of both indexes, namely numeral sequences and unique character indexes, and actual tables across multiple storage servers. It started out with a single Linux server running Apache 1.3, PHP 4, and MySQL. 4.0, using the default MyISAM storage engine. Two features in particular made the LAMP server cluster attractive to Digg and helped it perform well as it grew.
At one time, Digg operated at least 100 servers in multiple data centers. Eventually, as more users joined the site, it moved to an architecture that used a load balancer to send queries to PHP servers, which were fed by MySQL slave servers and a MySQL master server.
It was a standard setup, but Digg also used software called Memcached, which served web pages with constantly changing, personalized content. In essence, Memcached stored pieces of data that could be used to create variety on a webpage, dynamically updating without slowing down the site.
Digg also took advantage of sharding, breaking down the database and isolating heavy loads as to not impede performance. Sharding differs from partitioning in that the data is disseminated onto different physical machines, whereas partitioning typically occurs on the same machine.
Amazon.com. “Digg.com Site Info.” Alexa. http://www.alexa.com/siteinfo/digg.com (accessed October 23, 2010).
“Digg.com Traffic and Demographic Statistics.” Quantcast. http://www.quantcast.com/digg.com (accessed October 23, 2010).
Iskold, Alex. “The Digg Effect.” ReadWriteWeb. http://www.readwriteweb.com/archives/the_digg_effect.php (access October 24, 2010).
Kantar Media Company. “Digg.com.” Site Analytics. http://siteanalytics.compete.com/digg.com/ (accessed October 23, 2010).
Marcus, Stephanie. “A Brief History of Digg.” Mashable. http://mashable.com/2010/08/25/history-of-digg/ (accessed October 23, 2010).
Quinn, John. “Saying Yes to NoSQL; Going Steady with Cassandra.” Digg Blog. http://about.digg.com/blog/saying-yes-nosql-going-steady-cassandra/ (accessed October 24, 2010).
Stump, Joe. “How Digg Works.” Digg Blog. http://about.digg.com/blog/how-digg-works (accessed October 24, 2010).