I would need a sql query to search the contents of all attachments, including file history revisions, for a string.
I am using sql server database. I need the query to return the filename that contains a specific string. I want to search in attachments to all pages, blogs, all versions.
Use something like this if you're looking for content attachments.
SELECT 'All Content Attachments', count(*) FROM [dbo].[CONTENT] WHERE CONTENTTYPE = 'ATTACHMENT';
You can't and there's two reasons
First, some attachments are not directly readable. Imagine zip or pdf files - you need software to open and read them, SQL would not be able to find plain text in them.
Second, attachments are not held in the database (unless you're on an old unsupported version of Confluence where you've chosen to enable that). So SQL can't find them, as they won't be where it can look.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thank you Nic! I appreciate your help. If I want to search for contents of files/attachments in confluence, Can you please suggest what is the best way to do?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Do you think getting the attachment file through REST API using attachment id and searching for a specific string is a good idea?
1. I get the attachment:
content
2.Then I need to use a script to search for a specific string in the attachment.
But I need to do this for all the attachments in all spaces and blogs. Do you think this sounds like a good idea or do you have another solution for this?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
That is a good approach, but you will still need to think about how you "open" (i.e.read) attachments after you have downloaded them.
I should have said before though, Confluence can index the contents of attachments, as long as they are in a format it understands. So you might want to consider using the built-in searches. That can also be done over REST, with CQL
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thank you very much Nic for your support.
I am wondering if I could get the "extracted_text" of the attachment files through sql query?
Please refer to the following link for my reference to extracted_text:
https://confluence.atlassian.com/doc/hierarchical-file-system-attachment-storage-704578486.html
When a text based file is uploaded in Confluence (for example Word, PowerPoint, etc), its text is extracted and indexed so that people can search for the content of a file, not just the filename. We store the extracted text so that when that file needs to be reindexed, we don't need to re-extract the content of the file.
The extracted text file will be named with the version number, for example 2.extracted_text
, and stored alongside the file versions themselves (within level 8 in the explanation above). We only keep the extracted text for the latest version, not earlier versions of a file.
Regards,
Ilakk
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
That will only work for files that are converted.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Online forums and learning are now in one easy-to-use experience.
By continuing, you accept the updated Community Terms of Use and acknowledge the Privacy Policy. Your public name, photo, and achievements may be publicly visible and available in search engines.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.