Hello,
I am currently working on a project where I'm using the Jira API to read data indexed in attached files (such as PDF, DOCX, XLSX, etc.). I'm able to successfully retrieve data from text files. However, when I try to fetch data from other file types like DOCX, PDF, and XLSX, I'm unable to retrieve the correct data. It seems I cannot load content in any language, including English, from these files. The loaded content does not include the main text of the documents, but rather seems to only contain metadata related to these file extensions.
However, I am aware that there are ways to correctly load this data through Jira plugins. Could you advise me on how I should make API calls to correctly retrieve the indexed contents of these file types? Here is an example of the code I am using:
fileContentUrl = "https://" + MASTERURL + ".atlassian.net/rest/api/3/attachment/content/" + file['id']
response_content = requests.request(
"GET",
fileContentUrl,
headers=headers,
auth=auth
)
response_content.encoding = 'utf-8'
print(response_content.text)
Any insights or suggestions on how to resolve this issue would be greatly appreciated. Thank you!
Hello @f0ffaf
You've misunderstood what the Get attachment content endpoint does. It returns a stream of binary data, in bytes, from the attachment, which you then 'reconstitute' back into a copy of that attachment.
The endpoint does not have any capability to 'read' what is inside attached documents like PDFs and then translate what that back into what you are calling 'language' (the words / text etc).
If you Google "jira cloud rest api get attachment content" you will find all the times this same question has been asked before.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.