Categories
Technology Weblogging

Tool independence: The export format

The first challenge to moving a weblog is getting a snapshot of your weblog data in a format that can be imported into the new tool. To create an export that works with most tools, at least for moment, you’ll want to export your existing weblog’s data using the Movable Type import/export format.

WordPress doesn’t have a MT export built into the tool (more on this later); as I mentioned in a previous post, I used Scott Hanson’s new WordPress to MT export script to export my posts, categories, comments, and other data. Once I copied his file, I edited the parameters given in the script, providing the same username, password, and database name I added to the wp-config.php file. I also edited the file to use the default line break within Movable Type, __default__ , rather than ‘markdown’, which is the text format tool currently set as default in the import tool. These items are easily found in the script using a text editing tool.

When I load the export script page into a browser, the exported data prints out to the page, in a technique made popular by Movable Type. Once the export is finished (a message displays at the end of the page), I used the browser’s File/Save As functionality to save the page to a local file called bb.export.

(Another approach is to use the Unix wget utility on the command line as follows: wget http://somedom.com/import.php. This saves the exported data as ‘import.php’ and you can the use the data as needed. )

After the file was created, I FTP’d it to a newly created sub-directory, import within the MT cgi-bin directory. All that was needed at that point was to open up my MT installation, select import/export, and pick the import option–choosing to use myself as default author. the default post status of ‘publish’, and since all the entries had categories, I didn’t need the default category. I also have titles for all entries, so didn’t need to fill in the start and end title HTML fields, either. Clicking the link to do the import should load the data, and the data migration of the tool porting is finished.

(See the Movable Type documentation for more information about importing data into a MT weblog.)

A regular WordPress supporter, Carthik also has a version of a WP-to-MT export tool that uses WordPress global variables for the database variables. You can access it here. Carthik had his export tool finished before Scott started working on his, but had withheld publication because he’s now working on what he and Matt Mullenweg, the lead WP developer, are calling a “lossless XML export”–a import/export format that is going to be included with WordPress 1.3, and licensed as GPL for others to use if they wish.

One reason that the WordPress folks are creating this new format is that there has been problems with the existing MT format in the past. I have exported and imported data several times using this format and haven’t had issues recently, but others have had problems, specifically with fairly fragile points of breakage in the scripts such as dependence on a dashed line to separate entries. When I first used the import format to move from Blogger to MT, the import kept stopping as it would run into a sequence of dashed characters and the import functionality thought, “Well, that’s it – she’s done.” Once I edited for this problem, another would surface, making my move from Blogger to MT the most painful tool move I’ve done.

However, there is no denying that the MT export/import format is the most widely supported format in most weblogs. To have tool independence, in this case you need to depend on a specific tool import/export format…at least until enough vendors can support a replacement.

Technical issues of clean transformations aside, a challenge with a new universal format is the underlying data model each tool shares. For instance, a ‘post’, that has 0 or more ‘comments’ and at least one ‘category’ is more or less a standard model of data across all weblogging tools. However, beyond this simple core model, each tool does differ widely.

For instance, Movable Type supports keywords but not key/value pairs. Keywords are just a listing of terms associated as values to the weblog, while key/value pairs have both a term and an associated value. WordPress supports key/value pairs, and I use these in the ‘about this entry’ box in the top-left of the page. When I moved the data to the Movable Type test site, I lost this key/value pairing. Even if a new export format included these key/value pairs, there would be no place to receive it in the target weblogging tool–in this case Movable Type. The most we could do is strip the key portion of the pair and just take the value, and this would defeat the usefulness.

Now, if we use something like RSS or Atom to act as the transport medium, it might seem as if these would then ensure a common data model because most tools support one or more of these these feed syntaxes. The assumption is that if the tool supports the feed, they have to support the data that provides the feed, and therefore a minimum data model is guaranteeed. Right?

Well, not necessarily….

If a syndication feed supports complex or hierarchical categories but these are optional, and one weblog tools supports them, but another doesn’t, using the syndication feed to export the data from one tool to the other will result in loss of data; using XML won’t improve this situation, or prevent the loss of data.

That’s where the MT import/export format comes in handy at this time. The power behind it isn’t in it’s syntax, which is problematical–but in the underlying data model. The MT format has, by virture of it’s wide usage, defined a minimal shared model that most tools support. An XML-based version of this model could then provide a more robust import/export format. This is a win/win for all tools, and one that we as customers should encourage.

However, until a good majority of tools support the use of this XML format, whether it’s based on RSS or Atom or even something entirely new – the de facto standard for most tools now is the existing MT regular text-based format. This is what I will be using for the rest of these writings.

Print Friendly, PDF & Email