Ah. Clustering.
As some of you may know, we’ve launched the new “my.opera”:http://my.opera.com/ here at Opera. There’s been a few cumbersome issues with regards to the site. It’s basically been slow. Very slow. The apparent bottleneck from the beginning was suspected to be in the database backend, and so it was.
A while ago, I played around with “mysql”:http://www.mysql.com/ clustering using the native NDB format. NDB is some sort of network layer that allows you to have instantanious slaves replicate the masters data. Sounds promising, right? No. It worked just fine, but actually according to my immediate sloppy benchmarks, we didn’t gain any performance, we actually lost up to 400%! Obviously, we didn’t go with that.
The project wasn’t that necessarry back then, so it was dropped. Now, although, my.opera’s bottleneck being the database, we had to take up the project again. NDB was of course not an option, so the choice stood between buying a state-of-the-art insane-o-matic single SQL server, or going with this “home-hacked” low-end clustering. I rooted for the clustering, so I was put in charge of making it all happen.
Here’s the outline: One server is dedicated to doing writes, and writes only. It also does a single SELECT-statement, but only when it is being queried for LAST_INSERT_ID()'', since this is unfortunately not replicated to the slaves. Then, we needed one box for doing loadbalancing on the readservers. I chose to go with the "LVS":http://www.linuxvirtualserver.org/ method of doing kernel-based TCP/IP Level 4 load balancing via modifying the packets destination to the actual readserver. Also called "Direct Routing". I use "keepalived":http://www.keepalived.org to monitor whether the read is responding on 3306/tcp every few seconds. If it should happen to die horribly, it will get taken out of the loop. Then, we have N amount of read servers behind the LVS who *only* do reads. No writes. That's left to the dedicated write server. These slaves replicate their databases from the write server, by reading the write servers binary logfile continously. It doesn't -- unlike NDB -- guarantee instantanious replication, but close to it. It's good enough for us.
Then! (yes, we're not done yet), we modified our database abstraction layer to distinguish between whether there's a read or a write happening, and according to the conclusion of that, either connect to our single write server, or connect to our loadbalancer. Oh, on the loadbalancer I chose the so calledwlc”-algorhitm, which uses the specified weight of the server, combined with “lc”, or least connections. So that seems to scale rather well. The good thing is that we can add all the read servers we want on the fly, and just add them to the loadbalancer, and we’re flying even higher. We’ll see if that becomes necessarry.
So right now, as of this time of writing, the site is powered by our homemade clustering solution. Logically it should have gained about 300% in speed, and if not more than that, it has. We’re all very happy. :-)
My coworker “Vetle”:http://my.opera.com/vetler has also written a “small entry”:http://my.opera.com/vetler/blog/show.dml/18929 about it.
I wrote a small “script”:http://files.printf.dk/software/clustersync.txt to make the process of resyncing slaves easier.
All in a days work, eh? :-)
By the looks of it, I would say that you should take a good look at emic’s MySQL clustering, which provides synchronous MySQL replication over the asynchronous replication MySQL provides. It would also remove your need for a seperate load balancer :) Also I would recommend that you test NDB again, it is quite fast, but also you should remember that HA has a price, and so does multimaster replication. And last but not least, no database is faster than it’s creator and or user (i.e. if something is slow, someone could be doing something stupid somewhere?). AND for the love of god, dont use MyISAM (dunno if you are, guess not, but i’ll just leave it here for the completeness of my comment)! Tabel-level locking, how i hate you! Now InnoDB’s row-level-locking is getting there, so if you update allot, it’s really worth the ALTER TABLE :) Anyhow, thats just my 2 cents, kroner, or �re. I hear you are comming to town this weekend, I smell a beer in the oven! “Whether you think you can or you can’t, your right!”