Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

Scrapping Enterprise confluence pages

Disha Patel January 25, 2024

Hi,

In my company, confluence is deployed on enterprise's private server. 

I am trying to scrape the page content and want to save it in cloud storage. While performing GET request with Python's requests module with username and token as auth parameters, I am getting following error:

ConnectionError: HTTPSConnectionPool(host='confluence.enterpriseserver.xyz', port=443): Max retries exceeded with url: /display/space/CloudDocs (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7efd7765d0f0>: Failed to resolve 'confluence.enterpriseserver.xyz' ([Errno -2] Name or service not known)"))

Note: URL in the error message has been changed due to privacy reasons. 

Can anyone give me a proper work for this problem?

 

Thanks

2 answers

0 votes
Dmitri Bukovski March 12, 2024

Hi, Disha, hope it finds you well

Is your Confluence instance located in a secured network? Do you need to use VPN to access it? If so, make sure that you run your script with VPN turned on.

You may also try to connect this way:

import urllib3
import getpass
import requests

url = r'https://your_confluence.com/'

user = 'your_user_id'
password = getpass.getpass("password, now: ")

# -------------------------------------------------

headers = {
'X-Atlassian-Token': 'no-check',
'Content-Type': 'application/json'
}
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
auth = (user, password)


### Example Request that gives info about the requested user.

user = requests.get((url +'rest/api/user?username=user_id'),
auth=auth, verify=False).json()


print(user) 
0 votes
Benjamin
Community Champion
January 25, 2024

@Disha Patel - Based on the message seen, it says it can't resolve the Domain name. Is wherever you are running this script, is connected to a DNS to resolve the domain name. Doesn't look like a script issue but more of a network issue.

Disha Patel January 26, 2024

I am running this script in GCP jupyter notebook.

Benjamin
Community Champion
January 26, 2024

Thanks. Do you know if your Jupyter notebook is able to reach the DNS to resolve the domain? Also, is it setup with a DNS Server IP?

Disha Patel January 27, 2024

I am not sure about that. But I can say that the environment of this Jupyter Notebook is very secure and might have firewall restrictions or egress traffic is restricted. 

In the case of above-mentioned restriction, is there any other way to get confluence data and store it in GCS(Google cloud storage)? 

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events