Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

Why will a ppt file index while the pptx version will not?

Steve Boyle
Contributor
May 5, 2020

I have several PowerPoint files that I'd like to upload to Confluence and I'd like Confluence to index the contents of the files.  Some of the files are in the older ppt format and some of the files are in the newer pptx format.  All of the ppt files index correctly in Confluence.  Only the very smallest, single slide, pptx files will properly index in Confluence.  I turned on debug logging and I'm seeing this:
Error reading content of PowerPoint document: Document too big for text extraction, bailing out

I'm familiar with the Attachment Size setting and these further settings:

  • atlassian.indexing.attachment.maxsize
  • officeconnector.excel.extractor.maxlength
  • officeconnector.textextract.word.docxmaxsize
  • atlassian.indexing.contentbody.maxsize

I've tried tweaking those values and I always hit the same error above.  Is there a way to tell Confluence to accept large pptx files?

I've been able to work-around the issue by saving my pptx files in ppt format, but this is not ideal.

Thank you.

1 answer

0 votes
Diego
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
May 18, 2020

Hello @Steve Boyle !

As I understand, your instance will not index any *.PPTX file you upload into it.

With this behavior in mind, there are a few things I would like to check with you:

  1. How big (in Megabytes or Kilobytes) are the *.PPT files that are indexed?
  2. How big (in Megabytes or Kilobytes) are the *.PPTX files that are not indexed?
  3. What happens if you save one of the files that are originally *.PPT as *.PPTX and try indexing again?

 

I am asking about file sizes so I can check with the responsible team if some of these limits also apply to *.PPTX files:

If the uploaded file is one of the following types, Confluence will only extract up to:

  •     1 MB of text from Excel (.xlsx)
  •     8 MB of text from PDF (.pdf)
  •     10 MB of text from other text files (including .txt, .xml, .html, .rtf etc)
  •     16 MB of text from Word (.docx)

 

Looking forward to your reply.

Steve Boyle
Contributor
May 18, 2020

Hi,

Thank you for your reply.  We've been able to index small/simple PPTX files, under about 300KB.  When we have PPTX files and then save them into PPT format, the PPT version will always index even when the PPTX version would not.  We've been able to index PPT files up to around 10MB.

PPT files, indexed up to at least 10MB (Megabytes)

PPTX files, indexed up to around 300KB (Kilobytes)

 

If I have a PPTX file and it will not index then I can save it to PPT and it will index.  If I take that same PPT file and save it back to PPTX then the PPTX will not index.  Can't say I've tried only PPT->PPTX.  PPTX->PPT works.  PPTX->PPT->PPTX does not work.

 

Thanks,
Steve Boyle

Chandu
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
March 6, 2021

@Steve Boyle 

Not sure if it's too late for the response.

Have you had a chance to introduce the system property, officeconnector.powerpoint.extractor.maxlength=<size> ? 

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
SERVER
VERSION
7.4.0
TAGS
AUG Leaders

Atlassian Community Events