Scrapping Enterprise confluence pages

Hi,

In my company, confluence is deployed on enterprise's private server.

I am trying to scrape the page content and want to save it in cloud storage. While performing GET request with Python's requests module with username and token as auth parameters, I am getting following error:

ConnectionError: HTTPSConnectionPool(host='confluence.enterpriseserver.xyz', port=443): Max retries exceeded with url: /display/space/CloudDocs (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7efd7765d0f0>: Failed to resolve 'confluence.enterpriseserver.xyz' ([Errno -2] Name or service not known)"))

Note: URL in the error message has been changed due to privacy reasons.

Can anyone give me a proper work for this problem?

Thanks

2 answers

0 votes

Hi, Disha, hope it finds you well

Is your Confluence instance located in a secured network? Do you need to use VPN to access it? If so, make sure that you run your script with VPN turned on.

You may also try to connect this way:

import urllib3
import getpass
import requests

url = r'https://your_confluence.com/'

user = 'your_user_id'
password = getpass.getpass("password, now: ")

# -------------------------------------------------

headers = {
'X-Atlassian-Token': 'no-check',
'Content-Type': 'application/json'
}
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
auth = (user, password)


### Example Request that gives info about the requested user.

user = requests.get((url +'rest/api/user?username=user_id'), 
                     auth=auth, verify=False).json()


print(user)