WordPress conversion script
Moving a site from WordPress to Mango just got a whole lot easier. The latest Mango source features a script which connects to an existing WordPress installation using XML-RPC, processes every page and post, and saves each as a Markdown text file in your posts directory. Handy!
Here are the steps to follow in order to give this a go:
-
Install html2text:
easy_install html2text
-
Create a new Django project, if required:
django-admin.py startproject blog
-
Navigate to the new Django project (or an existing Django project):
cd blog/
-
Open the project's settings file:
open settings.py
-
Set your preferred time zone*, then save:
TIME_ZONE = 'Pacific/Auckland'
-
Download the latest Mango source:
hg clone https://bitbucket.org/davidchambers/mango
-
Navigate to Mango's
settingsdirectory:cd mango/settings/
-
Create a file to store customizations to Mango's settings:
touch custom.py
-
Open this file:
open custom.py
-
Tell Mango where to put the documents by adding
DOCUMENTS_PATH, then save:DOCUMENTS_PATH = '~/posts/'
-
Navigate to the
extrasdirectory:cd ../extras/
-
Run the import script, using appropriate arguments:
python wp.py http://example.com/blog/ admin 12345
What the script does
The script communicates with the WordPress blog using XML-RPC, retrieving all the pages and posts. It then pulls out the relevant information and formats it as Mango expects. The icing on the cake is the fact that html2text is used to convert body copy to Markdown – WordPress might think it's fine to store content as HTML, but we think it's totally wrong.
The resulting files are named based on their respective slugs, and saved in your posts directory. For example, these are the files that are generated from a new WordPress installation:
about.text
About ===== This is an example of a WordPress page, you could edit this to put information about yourself or your site so readers know where you are coming from. You can create as many pages like this one or sub-pages as you like and manage all of your content inside of WordPress.
hello-world.text
date: 24 June 2010 time: 10:42pm Hello world! ============ Welcome to **WordPress**. This is your first post. Edit or delete it, then start blogging! 1. one 2. two 3. three
* A word about TIME_ZONE
Mango always uses UTC when presenting content to the world. The default
templates make use of the <time> element to provide hooks for localization
of dates and times via JavaScript.
An author, though, should be free to use her local time when adding dates and
times to posts. For this reason Mango expects these dates and times to use the
TIME_ZONE specified in the project's settings file. Mango handles conversion
of these times to UTC for display on the Web.
When importing files from WordPress the timestamps are already in UTC; Mango performs the reverse conversion to ensure that posts dates and times (as they appear in the Markdown files) are localized to the author's time zone.
Conversion scripts for other blogging platforms?
Blogger's probably next on the list, based on my completely unresearched assumption of popularity. If you'd like to try Mango and plan to migrate content from an existing site, let me know what you're moving from and I'll bump your script to the top of the list.
While it is possible to convert posts by hand, having done so myself I'll say that I hope never to need to do so again. I wish I'd written this script two months ago!