In preparation for writing my end-of-decade review, I wanted to get a quick, plain-text listing of my posts on this blog, so I requested an export from WordPress — but then I realized that the export was in a modified RSS format, not anything that’s easily parseable, and it includes “posts” for every single photo upload even though I didn’t request any media as a part of my export. RSS is an application of XML, so I dug out my rusty XSLT knowledge (acquired in the early 2000s while building a homebrew photo-gallery system) to hack up a quick transformation of the XML into a list of post dates and titles. (Originally I did titles only, but decided that the dates would be useful information to include.)
Here is the script:
<?xml version='1.0'?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:wp="http://wordpress.org/export/1.2/" version='1.0'> <xsl:output method='text' encoding='utf-8'/> <xsl:template match="/"> <xsl:apply-templates select="/rss/channel/item[wp:post_type = 'post']"/> </xsl:template> <xsl:template match="/rss/channel/item"> <xsl:value-of select="wp:post_date/text()"/> <xsl:text>: </xsl:text> <xsl:value-of select="title/text()"/> <xsl:text> </xsl:text> </xsl:template> </xsl:stylesheet>
(Also available as a GitHub gist)
To use it, extract the dump from the ZIP file you downloaded from WordPress, save this file as
post-titles.xsl and run
xsltproc post-titles.xsl *.xml.