My goal is to use the Confluence API to get the content of a page, parse it, edit it, and update that same page with the edited content.
At first, I assumed Confluence's storage format was HTML. Based on that, my original plan was to use Python's BeautifulSoup module to parse and edit the content once I retrieved it from the Confluence API. I now know that the Confluence page storage format is "XHTML-based". I've tried to parse it with various BeautifulSoup parsers (lxml, xml, html.parser) but they all get caught on the standard-breaking macro elements like this:
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="283daa7d-46af-4d6d-a177-a00b4a2bc342"><ac:plain-text-body><![CDATA[*\[CDRL:\]*]]></ac:plain-text-body></ac:structured-macro>
Is there a preferred method for parsing this?
I am having the same issue! If you found a solution could you share? Thanks!
I never found a parser that worked perfectly for Confluence. Ultimately, I was forced to edit Confluence pages using regular expressions.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I have had the same issue. Ultimately, I have resorted to using an html parser (where that works), an xml parser (where that works), and regex for everything else.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Same issue here. Need to find a way to parse Confluence XHTML incl. macro notation.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hey Bill, thanks for your response.
Alas, while these 2 libraries are helpful wrappers for the Confluence API, as far as I can tell, neither have the ability to parse the XHTML that is pulled from the API representing the content of a page.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
co-pilot suggested the following which I can confirm worked:
import requests
from requests.auth import HTTPBasicAuth
from bs4 import BeautifulSoup
# Replace these variables with your Confluence details
confluence_url = 'https://<my site>.atlassian.net/wiki/rest/api/content/'
page_id = '<my page id>' # Replace with your page ID
username = '<my username>'
api_token = '<my token>'
def get_confluence_page_text(page_id):
url = f"{confluence_url}{page_id}?expand=body.view"
auth = HTTPBasicAuth(username, api_token)
response = requests.get(url, auth=auth)
if response.status_code == 200:
page_data = response.json()
page_html = page_data['body']['view']['value']
# Use BeautifulSoup to extract text from HTML
soup = BeautifulSoup(page_html, 'html.parser')
page_text = soup.get_text()
return page_text
else:
raise Exception(f"Failed to fetch page: {response.status_code} - {response.text}")
if __name__ == "__main__":
try:
page_text = get_confluence_page_text(page_id)
print(page_text)
except Exception as e:
print(e)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
This might not be the solution you desire; however, If you are interested in writing Confluence Wiki text to docx format (while maintaining the wiki formats), you can try jirawiki2docx python library.
https://pypi.org/project/jirawiki2docx/
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Woodson Miles I am using ConfluencePS PowerShell module to achieve this. It's quite simple to fetch and upload page contents using Get-ConfluencePage and Set-ConfluencePage commands.
You can get this module from PowerShell Gallery.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Online forums and learning are now in one easy-to-use experience.
By continuing, you accept the updated Community Terms of Use and acknowledge the Privacy Policy. Your public name, photo, and achievements may be publicly visible and available in search engines.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.