2008

2007

2006

2005

2004

Migrating from MySQL to PostgreSQL

▁ oct 16 2007

(If you’re not tech-savvy, please stop reading now.)

I’m waiting for a query to complete. It’s almost midnight. I’ve been at it all day. I’m trying to migrate a large amount of data from MySQL to PostgreSQL. There has been many walls to run into. Ran into most of them.

More after the break.

The database is quite large, a few tens-of-millions of rows. The data has been populated over time, using the SQLAlchemy (from hereon ‘SA’) framework. I created the tables by hand, back then.

My PostgreSQL database doesn’t have the structure, but that’s OK, cause SA can do that for me. It does it nicely too. Except for…, well, it doesn’t deal well with views. A view is just another table in SA.

OK, so you manually remove those by manipulating your metadata-object.

OK, so you have your tabledefs up and running. You want to create your view by hand now. You can do that in the psql prompt.

Now you want to import your data from MySQL. You’ve created a file with ‘mysqldump’. Great. Oh wait, when you try to import it, you get all kinds of nasty ‘referential integrity’ errors. That just means that postgres is anal and refuses to insert any data into a table with a FK that references data that does not yet exist. MySQL does not do any kind of prioritizing of what tables to drop first—it does not look at your data to sequence it properly.

So what to do? Well, first, lets go over the step where you export the data from your MySQL database. You’re of course using ‘mysqldump’. You’ve told it not to create the tables, and all that. Great. Now you have to explicitly define the table drop sequence. Yep, you heard me. It’s a tough job, but it needs to be done.

You end up with something like:

mysqldump <database> -v -n -t --compatible=ansi,postgresql 
--complete-insert=TRUE --extended insert=FALSE --compact 
--default-character-set=UTF8 -r <outfile> 
--tables table1 table3 table2 table4

Mhm. So far so good. You now have a dumpfile, properly sequenced, and you’re ready to import! No you’re not. MySQL (despite of setting the —compatible option) still throws some SQL in your file that doesn’t work properly.

Like… Using backslash’ to quote a ‘. My blog software won’t let me write actual backslashes, but it’s one of these backwards: /. Postgres wants ” or E’…’. Doing the E-magic is a big parsing-hassle, so you simply replace it with ”. Oh, it also inserts some ‘DELIMITER’ lines. Those are not useful, you need to get rid of them. I wrote a script for this:

#!/usr/bin/perl -w
use strict;
while(<STDIN>) {
    s/\\'/''/g;
    s/^DELIMITER/--DELIMITER/g;
    print;
}

So now when you dump your data, either do it to STDOUT and pipe it to your perlscript, or simply do:

cat dumpfile.sql | perl fixme.pl > dumpfile-fixed.sql

You can delete your original dumpfile now.

OK, so now you can get to work;

psql <db>
db=# \i dumpfile-fixed.sql

Lo and behold, you might have a successful import. You might not. I don’t. You know why? Cause I got some ‘dirty’ rows in my original database. That is, rows that reference a FK value that doesn’t exist. I changed some values along the way. MySQL didn’t barf.

Which brings me back to why I’m waiting for that query to finish. It’s a two-level nested select on about 100000x5000x8000 rows. That’ll be quick!

And when that’s done, I of course need to think up a way to prune the dirty rows.

Back to waiting.

← Previous: Coniuro  //  Next: Leopard Annoyances #1

comments

Mads Sülau Jørgensen, 1 year, 1 month ago:

Well now, migrating from MySQL to PostgreSQL is not all that hard. Start with dumping the tabel definitions from MySQL, and hand-rewrite them into a PostgreSQL friendly format. Store the output in one file per tabel. Then dump the data from MySQL with the options Jesper outlined above. Again, do this into seperate files again. Then convert INSERT to COPY, and create your constraints when you have imported all your data. That way, PostgreSQL wont complaint about missing references while importing the database. A good read on COPY: http://www.postgresql.org/docs/techdocs.15 Hope you get the perl conversion script working.

powered by