Blog Import - Lessons Learned

So I have been blogging (very irregulary) since 2003. If I remember correctly, I dabbled with Movable Type's platform before eventually settling on Wordpress. When I recently transitioned to using Pelican as a static blogging engine, I didn't bother to import posts from my wordpress installation (for one thing, the Pelican import tool only operates on a wordpress XML export and I just had a database backup).

Over the holidays, I decided to make an attempt at converting all that content to markdown. It's been a challenge, but I was able to pull over most of it.

First, I needed a quick script to pull posts from wp_posts in a wordpress database. I came up with the following:

import MySQLdb as sql
import MySQLdb.converters as converters
import os
import traceback

def connect():
    cnx = sql.connect(host='', user='root', passwd='root', db='blog')
    return cnx

def write_post(row):
        year = row['post_date'].year

        if not os.path.exists('content/%s' % year):
            os.makedirs('content/%s' % year)

        with open('content/%s/' % (year, row['ID'], row['post_name'].replace('-', '_')), 'wb') as fp:
            fp.write('Title: %s
' % row['post_title'])
            fp.write('Date: %s
' % row['post_date'])
            fp.write('Tags: imported
            fp.write('Slug: %s
' % row['post_name'].replace('-', '_'))
    except Exception, e:

if __name__ == '__main__':
    cnx = connect()

    with cnx:
        cursor = cnx.cursor(sql.cursors.SSDictCursor)

            select ID, post_date, post_title, post_name, post_content from wp_posts where post_status = 'publish';

        rows = cursor.fetchmany(size=10)

        while len(rows) > 0:
            for row in rows:

            rows = cursor.fetchmany(size=10)

There was more post processing that I should have done in this script (removing some legacy html tags, join with the tags table to populate that field). I came up with a few regex patterns and used Atom to do a search and replace within the content path and I ended up just going through all 300+ posts and re-tagged and re-catorgorized them.

Below are the biggest takeaways from this process: