WordPress conversion script

Moving a site from WordPress to Mango just got a whole lot easier. The latest Mango source features a script which connects to an existing WordPress installation using XML-RPC, processes every page and post, and saves each as a Markdown text file in your posts directory. Handy!

Here are the steps to follow in order to give this a go:

  1. Install html2text:

    easy_install html2text
    
  2. Create a new Django project, if required:

    django-admin.py startproject blog
    
  3. Navigate to the new Django project (or an existing Django project):

    cd blog/
    
  4. Open the project's settings file:

    open settings.py
    
  5. Set your preferred time zone*, then save:

    TIME_ZONE = 'Pacific/Auckland'
    
  6. Download the latest Mango source:

    hg clone https://bitbucket.org/davidchambers/mango
    
  7. Navigate to Mango's settings directory:

    cd mango/settings/
    
  8. Create a file to store customizations to Mango's settings:

    touch custom.py
    
  9. Open this file:

    open custom.py
    
  10. Tell Mango where to put the documents by adding DOCUMENTS_PATH, then save:

    DOCUMENTS_PATH = '~/posts/'
    
  11. Navigate to the extras directory:

    cd ../extras/
    
  12. Run the import script, using appropriate arguments:

    python wp.py http://example.com/blog/ admin 12345
    

What the script does

The script communicates with the WordPress blog using XML-RPC, retrieving all the pages and posts. It then pulls out the relevant information and formats it as Mango expects. The icing on the cake is the fact that html2text is used to convert body copy to Markdown – WordPress might think it's fine to store content as HTML, but we think it's totally wrong.

The resulting files are named based on their respective slugs, and saved in your posts directory. For example, these are the files that are generated from a new WordPress installation:

about.text

About
=====

This is an example of a WordPress page, you could edit this to put information
about yourself or your site so readers know where you are coming from. You can
create as many pages like this one or sub-pages as you like and manage all of
your content inside of WordPress.

hello-world.text

date: 24 June 2010
time: 10:42pm

Hello world!
============

Welcome to **WordPress**. This is your first post. Edit or delete it, then
start blogging!

  1. one

  2. two

  3. three

* A word about TIME_ZONE

Mango always uses UTC when presenting content to the world. The default templates make use of the <time> element to provide hooks for localization of dates and times via JavaScript.

An author, though, should be free to use her local time when adding dates and times to posts. For this reason Mango expects these dates and times to use the TIME_ZONE specified in the project's settings file. Mango handles conversion of these times to UTC for display on the Web.

When importing files from WordPress the timestamps are already in UTC; Mango performs the reverse conversion to ensure that posts dates and times (as they appear in the Markdown files) are localized to the author's time zone.

Conversion scripts for other blogging platforms?

Blogger's probably next on the list, based on my completely unresearched assumption of popularity. If you'd like to try Mango and plan to migrate content from an existing site, let me know what you're moving from and I'll bump your script to the top of the list.

While it is possible to convert posts by hand, having done so myself I'll say that I hope never to need to do so again. I wish I'd written this script two months ago!