You might have noticed that the new site is not particularly fast. This is because our server is nearly saturating our outbound connection to the Internet. This is what one of the two log analyzers I have running is telling me:
During July, we served up almost 75GB of vmod files. (That’s not the whole month, only from 16 July, the day the new site came online, onwards.) In the first five days of August, we’ve served another 17GB of vmod files. Module files constitute 72% (July) and 80% (August) of “viewed” traffic (“viewed” traffic excludes HTTP requests which result in redirects, 404s, etc.), but the traffic is well-spread over the modules: None of the top eight modules contributing the most traffic for August have been downloaded more than 45 times, for example.
We’re already sending out all files with a major MIME type of “text” gzipped; all of the HTML, CSS, JavaScript together amounts to 1% of total traffic, so even if we optimized it away entirely it would make no appreciable difference. The next largest file type after vmod is vmdx. After that, we have PNGs, at 4.5%. I’ve already ran optipng on all of the PNGs on the site, so I don’t see much more savings to be had there.
What I conclude from this that we need more outgoing bandwidth or we need to divert at least some of the module downloads somewhere else.
Here are some possible solutions I’ve thought of:
- Get more outgoing bandwidth where we’re hosted.
- Co-locate the server at a place with more outgoing bandwidth.
- Set up mirrors for the modules.
- Put all the modules on Amazon S3.
- Use BitTorrent to distribute modules in addition to direct downloads.
#1 might solve our problem for the present with the minimum amount of disruption. It won’t be free. It might also be only a temporary solution—we might find ourselves saturating a larger connection, too.
#2 might solve our problem also. It definitely won’t be free, and we’d already be dangerously close to hitting the point where you pay for extra bandwidth.
#3 would be highly effective at reducing the traffic to our server, but would require two changes, namely that we’d need a way of showing (and rotating) mirror links in the module library, and that we’d need at least one reliable volunteer to run a mirror. The former is not a big challenge; it’s something I already know how to do, more or less. The latter is more difficult, not because it’s hard to set up a mirror (it’s really trivial, you can do it with rsync) but because it’s something of a commitment to do it. However, even a single mirror would help tremendously, as it would divert 40% of our total traffic. There are a few other considerations here, namely location: In order to be really useful for sharing the traffic load, I think we’d need a mirror in North America or Europe, because that’s where most of the requests originate. Sending half of our module download requests to Australia I think would just make for slow downloads (sorry Brent, Ben)—though having mirrors outside of North American and Europe could help increase download speeds in those places, so might still be useful.
#4 would be really simple and presumably provide good performance. Amazon S3 is a cloud storage service. I looked up the prices; we’d pay about $45/month for the amount of traffic we see right now.
#5, distributing modules by BitTorrent, is potentially quite spiffy, in that it might let us offload almost all of the downloads to someplace else and provide great speed, without any cost to us. We could run a tracker on the main server, and anybody who wanted to could grab everything and seed it; this would be kind of like running a self-organizing collection of mirrors. (E.g., I would probably seed from my backup server at home. I’d shut it off at night, but, unlike with mirrors, that wouldn’t cause any broken links.) What I’m not sure about with this one is whether many of our users would be comfortable using BitTorrent. (There is one client which is a Java applet—we could provide links which use that—but unfortunately it’s not open-source.) If we couldn’t get people to choose to download via BitTorrent, then we wouldn’t see much reduction in traffic.
In all cases, what the user sees for downloading modules would be nearly the way it is now, except that in the case of mirrors and BitTorrent, there would be a choice of links.
Thoughts?