A little XSLT hack for extracting titles from WordPress exports

In preparation for writing my end-of-decade review, I wanted to get a quick, plain-text listing of my posts on this blog, so I requested an export from WordPress — but then I realized that the export was in a modified RSS format, not anything that’s easily parseable, and it includes “posts” for every single photo upload even though I didn’t request any media as a part of my export. RSS is an application of XML, so I dug out my rusty XSLT knowledge (acquired in the early 2000s while building a homebrew photo-gallery system) to hack up a quick transformation of the XML into a list of post dates and titles. (Originally I did titles only, but decided that the dates would be useful information to include.)

Here is the script:

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:wp="http://wordpress.org/export/1.2/"
                version='1.0'>
  <xsl:output method='text' encoding='utf-8'/>

  <xsl:template match="/">
    <xsl:apply-templates select="/rss/channel/item[wp:post_type = 'post']"/>
  </xsl:template>

  <xsl:template match="/rss/channel/item">
    <xsl:value-of select="wp:post_date/text()"/>
    <xsl:text>: </xsl:text>
    <xsl:value-of select="title/text()"/>
    <xsl:text>
</xsl:text>
  </xsl:template>
</xsl:stylesheet>

(Also available as a GitHub gist)

To use it, extract the dump from the ZIP file you downloaded from WordPress, save this file as post-titles.xsl and run xsltproc post-titles.xsl *.xml.

This entry was posted in Administrivia, Computing and tagged . Bookmark the permalink.