1. What is "High Performance"?
reasonable reliability
several days of uptime
no memory leaks
low average service request time, even under high load
~0.002s or so
handle high numbers of requests
2500 req/s or so
handle high number of keep-alives
5000-6000 simultaneous connections
2. Initial Contenders
Direct Request
lighttpd
apache
Reverse Proxy
squid
lighttpd/mod_proxy
seems to have trouble under high load - maybe more tuning?
Software Load Balancing
balance
haproxy
plb
iptables
3. Primary Load Pattern
Images - lots of them - all small. Median file size is 4.5KB, average file size, 9KB.
Highly random io - long tail of infrequently accessed items translates into very active disks
4. Early Winners
squid
does well with small number of items in the cache 400k items
fundamentally, reverse proxy is waste since we control these file
after a while, squid falls over due to some increasing cpu usage (see a detailed posting in the squid mailing list)
no solution was offered
at peak, squid doing 2200 req/s, needs a restart every hour to maintain reasonable response time
cpu is always the bottleneck
yes, i applied the epoll() patch - these numbers are with epoll
5. Later Attempts
lighttpd
couldn't hold up to the random io - the single threaded nature destroys throughput since the process blocks on every page fault
lighttpd w/multiprocess
better - spawn a small number of worker processes and do multiaccept
on a given piece of hardware, 6-8 processes saturate the disks and manage 2.5MB/s traffic, 900 req/s
pretty good, but not as good as squid
memory usage was a little funky - possible leaks, but very hard to track down
process still bottlenecked by io page faults
lighttpd w/asynchronous io
borrow ideas from squid and Flash web server (no, not Flash Media Server, it's a research project at University of Texas, Austin)
problems
linux aio is not a drop in solution
files must be opened O_DIRECT
by-passes the page-cache, so you always hit disk
no asynchronous calls for open, stat, close
solutions
use worker threads to open, stat, mmap and touch all the file pages
queue waiting requests for the same file object
premature optimizations
keep a small list of popular files open
"guess" when a file is hot and don't go through the aio queue (you assume that hot files won't block because they are in the page-cache)
this alleviates some of the queue contention mentioned below
results - high throughput 10MB/s traffic, 3000 req/s, 6500-8000 open connections
even at nearly 80-90% disk utilization (according to iostat) the vast majority of requests are served "instantly"
optimal number of io threads is somewhere between 2 and 4 per spindle (depending if there is RAID1 or JBOD)
implementation problems
futex() on contended queues wastes cpu, however, this isn't really a problem unless your machine has a huge number of disk spindles, you will run out of IO first (unless you are serving an entirely in-memory data set, in which case, you shouldn't even be reading this)
sloppy code on my part - abusing original idea of the stat_cache for a shared file structure and passing data between the primary event thread and the io threads
excessive polling of the aio completion queue - I should have just hacked into the main event loop. I don't know why I didn't. It seems idiotic to get squeamish after a few hundred line changes to code you barely know.
this is entirely built for small files - it mmaps whole files only. a better solution would have to completely integrate into the chunk_queue inside lighttpd
if you don't have enough disk spindle to sustain restarting a machine, for example after a kernel upgrade or panic, you can't expect to handle all of the requests in a timely fashion. i do 503 load shedding - meaning that once the internal aio request queue overflows, I return 503 requests until it clears out a bit. If you make your 503 a small image file, most users won't mind - also, this shouldn't happen too often
there are still weird memory usage patterns I can't quite explain - not sure if they are true leaks, or just that memory isn't free'd until it's really too late
6. Conclusions
lighttpd w/asynchronous io is a huge win over other servers
squid's io model is good, but cpu bogged down by internal cache maintenance?
writing portable code is a pain in the ass - this patch was tuned up for linux, but written on my powerbook (yes, this is probably asking for trouble)
had I heard of nginx (Russian web server project) before doing this, I would have spent some time evaluating it
lighttpd is a fun code base - simple
it is not overly well documented, nor are variable names sufficiently descriptive in many cases
Wikir