So I have been blogging (very irregulary) since 2003. If I remember correctly, I dabbled with Movable Type's platform before eventually settling on Wordpress. When I recently transitioned to using Pelican as a static blogging engine, I didn't bother to import posts from my wordpress installation (for one thing, the Pelican import tool only operates on a wordpress XML export and I just had a database backup).
Over the holidays, I decided to make an attempt at converting all that content to markdown. It's been a challenge, but I was able to pull over most of it.
First, I needed a quick script to pull posts from
wp_posts in a wordpress database. I came up with the following:
import MySQLdb as sql import MySQLdb.converters as converters import os import traceback def connect(): cnx = sql.connect(host='127.0.0.1', user='root', passwd='root', db='blog') return cnx def write_post(row): try: year = row['post_date'].year if not os.path.exists('content/%s' % year): os.makedirs('content/%s' % year) with open('content/%s/%s_%s.md' % (year, row['ID'], row['post_name'].replace('-', '_')), 'wb') as fp: fp.write('Title: %s ' % row['post_title']) fp.write('Date: %s ' % row['post_date']) fp.write('Tags: imported ') fp.write('Category: ') fp.write('Slug: %s ' % row['post_name'].replace('-', '_')) fp.write(' ') fp.write(row['post_content']) fp.write(' ') except Exception, e: print(e) traceback.print_exec() exit() if __name__ == '__main__': cnx = connect() with cnx: cursor = cnx.cursor(sql.cursors.SSDictCursor) cursor.execute(""" select ID, post_date, post_title, post_name, post_content from wp_posts where post_status = 'publish'; """) rows = cursor.fetchmany(size=10) while len(rows) > 0: for row in rows: print(row['post_name']) write_post(row) rows = cursor.fetchmany(size=10)
There was more post processing that I should have done in this script (removing some legacy html tags, join with the tags table to populate that field). I came up with a few regex patterns and used Atom to do a search and replace within the content path and I ended up just going through all 300+ posts and re-tagged and re-catorgorized them.
Below are the biggest takeaways from this process:
<wpgid>32</wpgid>for example) into an inline image. Well that doesn't really help once that plugin is gone. If you want anything to last, statically host your own images or link to images on services you control yourself (flickr for example)