We're retiring a Confluence server, migrating some spaces to another server, but many just aren't updated any more. We wanted to take snapshots of those spaces for later reference.
We exported each as PDF, a reference to squirrel away somewhere for later. But that only includes image attachments. It ignores all other file types.
The XML export does include all attachments, but names the files with their internal ID rather than something more human readable.
So I threw together this Python script which parses the entities.xml file in an XML export, gathering the filenames from there and mapping them to their ids. It creates a folder named for the space, then sub folders named for each page with an attachment, then copies those attachment files in there.
I hope it is useful to someone else.
Online forums and learning are now in one easy-to-use experience.
By continuing, you accept the updated Community Terms of Use and acknowledge the Privacy Policy. Your public name, photo, and achievements may be publicly visible and available in search engines.