Mysql Tools

Diffing Databases

When working with external agencies who modify the contents of a database and send a dump, we will often want to see what has changed whilst it has been in their possession. Whilst we could diff the dump file before sending it and the one that was returned, it can be very verbose and hard to follow.

With this in mind, we can split the database dumps in to multiple files (for each table) and diff the database this way instead.

Splitting the dump

To split the database dump, use the following bash script.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/bin/bash
####
# Split MySQL dump SQL file into one file per table
# based on https://gist.github.com/jasny/1608062
####
#adjust this to your case:
START="/-- Table structure for table/"
# or
#START="/DROP TABLE IF EXISTS/"
if [ $# -lt 1 ] || [[ $1 == "--help" ]] || [[ $1 == "-h" ]] ; then
        echo "USAGE: extract all tables:"
        echo " $0 DUMP_FILE"
        echo "extract one table:"
        echo " $0 DUMP_FILE [TABLE]"
        exit
fi
if [ $# -ge 2 ] ; then
        #extract one table $2
        csplit -s -ftable $1 "/-- Table structure for table/" "%-- Table structure for table \`$2\`%" "/-- Table structure for table/" "%40103 SET T
IME_ZONE=@OLD_TIME_ZONE%1"
else
        #extract all tables
        csplit -s -ftable $1 "$START" {*}
fi
[ $? -eq 0 ] || exit
mv table00 head
FILE=`ls -1 table* | tail -n 1`
if [ $# -ge 2 ] ; then
        mv $FILE foot
else
        csplit -b '%d' -s -f$FILE $FILE "/40103 SET TIME_ZONE=@OLD_TIME_ZONE/" {*}
        mv ${FILE}1 foot
fi
for FILE in `ls -1 table*`; do
        NAME=`head -n1 $FILE | cut -d$'\x60' -f2`
        cat head $FILE foot > "$NAME.sql"
done
mv $NAME.sql table_$NAME.sql
rm head foot table*

This script will throw errors as it processes. Once we have run the script on the two dump files, we can use Meld or any other good diffing tool to compare the tables we are interested in.

Making dumps easier to read

As diff tools work line by line and database dumps are compressed to use less whitespace, we will often want to expand the dump files to have new lines. The following one liner will loop over all files and replace the space between insert values with new line characters; making the diffing easier to read.

Be sure to run the script on both sets of files.

1
2
3
4
for f in ./*;
do
    sed -i "s#),(#),\n(#g" $f;
done

Bulk cleaning databases

If you need to drop multiple databases that follow a naming pattern, you can use the following command

Warning

Take backups first

See other related pages