OpenWetWare:Infrastructure

Ideas for keeping OpenWetWare running

 * Drew + iCampus funding is available to get off-campus hosting.
 * At Wikimania, Wikipedia is considering using PlanetLab (codeen.org). This is a global caching system with hundreds of servers around the world. MIT is already part of it, so should be easy to join. This doesn't solve reliability, just bandwidth and scaling.
 * Also Coral cache - content distribution network
 * Levels of outsourcing IT
 * Co-locating a server at a professional web hosting company would go a long way toward increasing reliability of the site. These companies have physically secure datacenters with backup power and redundant Internet connections.  This kind of service would cost $150-200 a month.
 * Dedicated and Managed hosting

Check list

 * redundant power (different substations, UPS, generators)
 * redundant network (number of connections, physically separate)
 * DNS servers
 * Backups (amount space, frequency, time to restore, access to individual files, location, retention period)
 * support (phone or email)
 * Hardware replacement policy
 * OS support
 * Application support (compiling and supporting 3rd party apps)
 * Bandwidth/traffic

Resources

 * Service providers
 * Web server talk
 * Web hosting talk
 * Managed hosting forum


 * Datacenter review


 * Colotraq - bids from managed hosting providers


 * Cheap IP Takeover
 * Distributing Server Load with Round-Robin DNS
 * MySQL Cluster
 * Replication
 * Identifying and Avoiding Dishonest Hosting Providers?
 * Recommend a server hosting company?, December 21, 2005 1:24 PM
 * I like pair.com for stability and generous bandwidth allowance, but I like ev1servers.net because they do not have an adult content policy.
 * try theplanet.com - they're on a wicked fast backbone, and it looks like they offer gamer-specific plans. My experience with them is they're very responsive as well
 * I've been very happy with Layered Tech. Dedicated servers start at around $70/month with 1TB of bandwidth / month. I picked them because they offered Debian and because they offer very little in the way of handholding sysadmin support and are therefore cheaper. If you know how to be root yourself, it's a good option.
 * Second pair.com Ten years, and my server has been down for a total of 23 minutes due to catastophic drive failure. Otherwise, not even a hiccup.
 * Deal discussion
 * theplanet and liquidweb service hit or miss, rackspace - expensive but worth it


 * The host with the most
 * Please do not use 1and1. They're freaking dreadful as many mefites (including myself) have stated many, many times on Ask and the blue.
 * I have to give a vote for EV1Servers. I've tried a lot of companies, but have been with EV1 with several servers for a few years now. Not even a blip in the service during Rita.. amazing reliability.

URL forwarding
via proxy using mod_rewrite?

DNS failover

 * No-IP Advanced Server Monitoring
 * 24/7 server monitoring, every 5 minutes
 * $150/yr
 * NeuStar (UltraDNS) SiteBacker
 * time to switch: 2 mins to determine if the server is down plus 2-8 mins to propagate the DNS change (3-4 mins on average) - may not be honored by the downstream DNS servers but usually is.
 * 11 DNS clusters around the world, users are directed to the closest one, no single point of failure
 * clients include Amazon, MySpace, Oracle, Sharper Image, etc
 * DNS Made Easy - DNS failover and system monitoring
 * $60/yr
 * 10 million queries per month

IP takeover
works only with servers on local network

Clustering

 * Linux Virtual Server
 * VMWare
 * MySQL replication/cluster
 * SAN (Storage Area Network) with GFS (Global File System)
 * High Availability systems under Linux
 * Master node and backup node
 * Fault detected, DNS redirects request to the backup node IP. Problem: DNS caching
 * Fault detected, master IP is assigned to backup machine (IP takeover)
 * The High Availability Linux Project
 * Ultra Monkey is a project to create load balanced and highly available network services
 * File Area Network

Wikipedia

 * Wikipedia's solution is to use round-robin DNS to distributed traffic among geographically separated server clusters
 * How does the data (MySQL and images) get synchronized between clusters?
 * Do web servers read data from MySQL slaves and write data to the master?
 * Is SAN mirroring used?

Performance tuning

 * Slashdotting - estimates put the peak of the mass influx of page requests at anywhere from several hundred to several thousand hits per minute.
 * From Slashdot FAQ on 7/12/04
 * slashdot typically serves 80 million pages per month. We serve around 3 million pages on weekdays, and slightly less on weekends.
 * thousands of hits at the site in minutes
 * OWW serves ~40,000 pages/day (1.2 million pages/mo) as of August 2006
 * ~75x difference with Slashdot
 * ~25% of OWW traffic is created by spiders/bots (mostly google)
 * server load at peak times is around 25-30%, at normal times: 5-10% (as reported by sar)
 * caching would give 2-10x boost, LAMP tweaks can give some boost as well
 * server can handle ~4x traffic in the current software/hardware configuration
 * therefore we have 8-40x performance reserve on current hardware simply by tuning the software but this needs to be confirm by benchmarking
 * static version of the site would be able to handle slashdot-style traffic surges easily
 * ~25 changes/hour (50 changes total) between 4 and 6pm on 2006-08-18 (Friday)


 * Surviving a slashdotting with a celeron 466
 * I upped the number of min and max servers, set a server timeout of 15 seconds, and made sure new server processes were started every 500 requests or so to keep runaway processes from killing the server.
 * My first course of action was to figure-out how to make the page static so that it wouldn’t have to hit MySQL each time someone requested that page. (dewikify whole site?)
 * One last thing I did was kill all unnecessary processes such as SMTP
 * Once I put the box back in, it took a bit more Apache tweaking to get it perfect, but eventually my spare parts Celeron 466 was handling a Slashdotting at only 30% CPU utilization!
 * Usually I do around 2-2.5 GB of web traffic per month, but in one day I did over 4 GB!


 * http://weblogtoolscollection.com/archives/2004/07/26/staticise-analysis/ Staticise analysis] - load graph of 5 connection/second test with and without Staticise WP plugin


 * How to prepare your site for Digg Effect
 * implement caching (WP-Cache)
 * use "time wget example.com"; the time for the request was reduced from 1.36 second to .02 second
 * stress test with ab


 * Configuring Apache: Don't Succumb to the Slashdot Effect
 * original article
 * list of Apache settings and their suggested values
 * MaxClients 500 -- that one you have to be careful with. You need to do tests where you actually consume the max clients to see if you start to swap and/or hit an io wait bottleneck before you get there, which you almost certainly will with many types of dynamic pages running with 500 clients. For example, if you have php pages averaging 8Mb allocation, 500 clients means you're using 4Gb just for php.

Separate server for static and dynamic content using mod_proxy and mod_rewrite http://www.onlamp.com/pub/a/onlamp/2004/02/05/lamp_tuning.html http://phplens.com/lens/php-book/optimizing-debugging-php.php
 * Configuring Apache for maximum performance
 * Apache tips and links

One set like this should handle about 150 to 200 page views per second. For press links and slashdotting the Squid should handle at least twice that (300 to 400 views/sec?).
 * As your Site Grows - article on Wikimedia
 * 1 x 4GB Squid
 * 5 x 512MB 3GHz P4 Apaches
 * 1 x 4GB dual Opteron database server with 6 fast disks in RAID 10.
 * One or two of the Apaches will have extra RAM - up to two or three gigabytes per computer


 * Problems
 * Apache spawning too many children (out of memory)


 * Solutions
 * Caching helps
 * PHP cache (precompiled objects)
 * statis page cache (files)
 * Make all/some pages static (reduce PHP/wikitext parsing and DB access)
 * Slashdot effect: Assistance and prevention - Wikipedia article: Coral, redirect to mirror nets


 * Scaling MySQL

Hardware

 * Opteron 246 runs at 2GHz and is roughly equivalent to Intel Xeon 3.2GHz
 * Dell PowerEdge 2850 review by PC Magazine
 * Hardware (vs software) firewalls
 * reduce load on the server
 * provide VPN capabilities
 * keep probes away from the servers

Xeon 5148LV vs 5335L

 * Intel Xeon: Woodcrest vs. Clovertown


 * Benchmarks compared 3.0GHz 4-core Mac Pros (Woodcrest) vs 2.66GHz 8-core Mac Pros (Clovertown) and showed a 31% improvement in highly multithreaded benchmarks such as Cinebench


 * Intel Clovertown: Quad Core for the Masses
 * Clovertown is two Woodcrest processors joined together in a single package
 * Each pair of cores shares a single 4MB of L2 Cache, just like Woodcrest and the pair of cores shares a single 1066/1333 MHz pipe.


 * L5335 is $140 cheaper when configuring a system at Dell