Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

How to retain Word document formatting when converting to Confluence?

Miguel Garcia May 3, 2019

Hi community,

We have many MS Word documents (most in the 10-30 page range) that we'd like to convert to Confluence pages, but the resulting pages lose quite a bit of the formatting and would create a huge amount of manual clean up for us.  These word files include tables (with and without merged cells), images, numbered lists, bullet lists with hierarchies, and more.  Specifically, the loss of text alignments and tabs will likely cause us the most aggravation since the spacing in most of these documents is critical. Copy/pasting tab characters seems to work well enough, but copy/pasting has other limitations that might be just as bad.

I'm not a programmer, but I thought that we might have better results if, rather than importing or copy/pasting, we convert the word files to code (HTML or XML or ???) and then use a source code editor plugin to copy it into a Confluence page's source code, but that keeps throwing errors when we try it - I guess because the languages don't exactly match up.  Maybe this could work if we did it the right way or made a couple tweaks to the process?

So in a nutshell, my question is: what's the best way nowadays to convert word documents into Confluence pages so that you lose as little content/formatting as possible?  If initial set up takes a while but creates a repeatable process, it would be worth it because we have many documents.

Thanks so much for any help!

3 answers

1 accepted

1 vote
Answer accepted
James Dellow
Community Champion
May 3, 2019

Confluence stores content in what they call Confluence storage format - it is 'XHTML-based', but not pure XML or normal HTML, as it contains special tags related to Confluence functionality.

Some of the issues could be CSS related too, particularly for text alignments and tabs.

What do your documents look like if you use a stand alone Word to HTML converter? Try using one that is designed to help people publish content drafted in Word so it can be copied into a generic Web Content Management System - they'll strip some the incompatible formatting from Word.

But if this is causing you enough pain and retaining the formatting is important to your business, I would consider engaging a developer to help solve this.

Miguel Garcia May 6, 2019

Thanks James.  I did only some limited research into converters thinking that the Save As feature in Word basically did the same thing, but that is not at all true as I'm finding out! 

Today I tried https://documentconverter.pro/ and both the desktop app and online app are working much better than other methods so far.  The desktop app also has the option to do multiple files at once which will certainly come in handy.  If anyone has recommendations for the most useful word to html converters, I'm all ears.

Thanks James!

2 votes
Christel Gray
Contributor
February 9, 2023

I know this is an older discussion, but we have the same issue. Seems it should be easier to copy from Word and paste into Confluence without having to jump through hoops. Has anyone come up with a good solution?

1 vote
Bill Bailey
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 4, 2019

First thing to keep in mind, that this is HTML, so you have to think in what is possible. For example, tabs are a foreign concept in HTML. And you shouldn't really be using them much in Word either (too many people use Word like a typewriter).

And even if your try to use HTML to paste into the source editor, even if it doesn't through errors, it will often strip out most manual formatting.

The editor in Confluence is limited by design. But there are things you can replicate with custom CSS and user macros. But it will take some work. And if you want to control the format tightly then you need to be come a power user.

Bottom line, you will have to stop thinking in Word and move to thinking in Confluence and using its macros and methods for formatting content.

Miguel Garcia May 6, 2019

Yes, that's the idea.  It's just the conversion to Confluence that is the challenge now.  Once we're up and running, we plan on making full use of Confluence's macros and other formatting features.  Thanks Bill for the input.

Like Hilde Weisert likes this
Bill Bailey
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 6, 2019

Generally my process for Word docs is to import them, then use Regex in the Source Editor to clean out all the low-level formatting, then go from there.

Jaime Murillo July 9, 2020

Bill_Bailey. 
I found myself on the same situation as Miguel. 
we have a very large amount of existing documents that we are looking forward to import into confluence. But Format is very important. 
could you please elaborate on your comment about your word process import process using Regex? 
Thanks in advance. 

Rahul mukhi
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
August 21, 2020

Bill_Bailey. 
I found myself on the same situation as Miguel or Jaime
We too have a large amount of existing documents that we are looking forward to import into confluence. But Format is very important. 
could you please elaborate on your comment about your word process import process using Regex? 

Or any other workaround please
Thanks in advance. 

Like Yanick Schlatter likes this
Bill Bailey
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 21, 2020

There is in a source editor you can install, that give you a source editor. I think use the source editor to use Regex to clean up the imported HTML. It is best to start with clean HTML when working with imported content.

Once you have clean HTML, you can adjust the formatting using Confluence tools.

Yanick Schlatter July 10, 2023

Hi @Bill Bailey 

 

How can we clean out low level formatting with Regex? Can you give us some example or a guidance page?

 

Thanks a lot!

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events