Sporadic WebGUI failure

Recently I have come across a sporadic issue that is causing a Jira Server webGUI to not resolve.

This system is running NGINX.

I have checked all running services to ensure that all services are running when the system ends up in this state.

I am unable to locate the cause of the issue. Could someone please point me in the right direction to what is causing this issue?

A simple reboot brings it all back up on line.

1 answer

0 votes

Hi Kale,

There are many, many possibilities. Let's try to gather more information to see if we can narrow them down!

When this problem comes up, what are the specific symptoms? Example: do you get an error in the browser, or does the browser window simply appear to be loading (and only give you a white page) for 60+ seconds without actually stopping its attempt to load?
Do you have other sites configured (such as a test page or error page) in nginx that you can use to confirm that nginx has not hung?
Do any errors appear in Jira's log file when this occurs?
Do you have nginx logging enabled, and do the nginx logs show that nginx responded to access requests during the timeframe this happens?

Some general thoughts:

nginx has a value you can set in the server block for your reverse proxy to stop trying to connect after a bit and instead throw an nginx error. You can try adding this in your server block to rule out problems with nginx:
```
proxy_read_timeout 60s;
```

Particular Jira plugins could cause severe performance issues that might show symptoms like this. Have you recently upgraded Jira or any plugins that might coincide with the time you started noticing this behavior?

I think possibly the most likely scenario (without additional information) is that your Java heap for Jira may be undersized, it's using the CMS garbage collector, and what you're seeing is the JVM trying (unsuccessfully) to complete a full garbage collection. You could confirm this by looking at the GC logs which should be in <jira's install directory>/logs. A tool like GCViewer would be helpful to analyze your GC logs if you're not familiar with Java's Garbage Collection mechanisms. I would be interested in knowing the following related to this theory:

What are your current maximum and minimum heap values? (if you're not sure what these are, I'd also settle for the "Total Memory" value in Jira's System Information page):
Have you configure Jira to use G1GC? (you can search the System Information page for "G1GC" - if your browser can't find it on this page, the answer is "no")
Does GCViewer show your heap usage creeping up and stopping, or can you see a Full GC event in the logs around the time things seem to stop working?

Finally, I would just mention that restarting the services individually (Jira and nginx) rather than doing a full system reboot is going to help you narrow down where the cause of this is. If possible, eliminate variables by only changing one thing at once.

Cheers,
Daniel

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hi @Daniel Eads

Thanks for this detailed response.

It has been some time since I posted this and as per my message it is sporadic and well touch wood it has not yet had a repeat issue as yet. But I will still dive in and investigate your comments above and see what I find.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

To answer your questions:

When the issue appears it completely loads the "page" and the browser is not trying to continue to load anything else as it appears the site has "loaded". No error message or error codes just an "un-loaded" page.
No, I do not this is a good point. I could look at this. However when I load up things like HTOP I can see the process is still running with no abnormal behaviour. Additionally can run a systemctl and see that the process is active and running.
I have trawled the error log files I know of and found nothing. Could you suggest where I should look in case I have missed something?
Good question on this one, I am not sure. I will investigate this one.

To answer your questions below about memory and GC, I will investigate this shortly.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hi @Daniel Eads

So to touch on the memory in the box. Please see attached.

Screen Shot 2020-11-06 at 3.53.32 pm.png

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

I am currently in the process of working out how to work the GC Viewer app works.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Thanks for the extra info! Given all the above (particularly the default heap application of 1820mb), I think the most likely scenario is still the slowdown being caused by a full garbage collection. Since you mentioned it's been some time since this slowdown has happened, it's possible your GC logs have rolled over and wouldn't show the last memory exhaustion.

As a preventative measure, you might consider increasing your Xmx value by 512mb if there's enough free RAM on your server to do so (and still provide the base operating system with enough RAM to carry out its operations safely) - for example you would probably want to ensure the server had at least 6GB of RAM total if Jira and nginx are the only applications running on it, and increase the Xmx value to 2332mb.

We've got guidance on increasing your heap size on this article . Setting the Xmx (maximum heap size) and Xms (minimum heap size) to the same value can also help reduce the time it takes for a GC to complete.

Since you've not noticed this issue in some months, I'm thinking it's likely that just a preventative boost of the heap size may stave off issues for you going forward. By all means, keep learning about the GC allocation so this is less likely to be a mystery - but from what you've seen so far, I think tuning the Xmx/Xms slightly might be a quick way to ensure stability in the coming months.

Cheers,
Daniel

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Forums

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Sporadic WebGUI failure

1 answer

Suggest an answer

Was this helpful?

Thanks!

DEPLOYMENT TYPE

VERSION

TAGS

Community showcase

Atlassian Community Events