Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

Large Server profile, GC errors, out of memory

Dave Varon October 11, 2019

Hi,

I'm getting out of memory (oom) errors on a system with the following specs:

  • JIRA Server
  • Jira Core 7.7.1
  • ~10 users
  • ~20 projects
  • ~1.5 million issues
  • AWS mx4.2xlarge 8cpu, 32GB Ram
  • PostgreSQL colocated 
  • Xmax=16384Mb

We have a large import job--1000s of new issues--and subsequent REST API-driven updates with built-in 15 second delays.  We started hitting ooms after 75 minutes, and have bumped up Xmax from 2GB to 3, then 4 then 8, now 16. We were failing with ooms after 75 minutes at first.  Now, it takes a 3-4 hours before CPU utilization jumps to ~80% and GC errors multiply- and eventually ooms.

I've looked at the sizing recommendations, but all are based on scaling multiple dimensions simultaneously: issues, users, projects, etc.  We only have issues at scale.

I'm digging deeper into plugins, indexing, searching performance as suggested by other posts, but i'm hoping there is an obvious thing to try, like "Oh yeah, issue count is the dominant scaling factor because of xyz so double your instance size to 16/64".  Or "tune your gc according to recommendations in this document" or "it's definitely a search problem, try to optimize that".

Please note that we have been steadily scaling with this process for over two years, and have only experienced this problem two or three times, however this week it's been in the headlines consistently.

 

Thanks!

1 answer

1 accepted

1 vote
Answer accepted
Andy Heinzer
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 15, 2019

Hi Dave,

Sorry to hear that you are having performance issues with Jira Server here.  Thanks for detailing the specifics of your environment here.

First I should start off by letting you know that the specific version of Jira you are using right now (7.7.1), is known to have a number of performance problems for very large instances like yours.  Namely, there are a few bugs we have fixed in later versions that you would noticeably benefit from with an upgrade to Jira itself, such as:

That is not even a comprehensive list of the improvements made, but each of those issues document a bug that I believe you could be seeing here that is fixed in later versions of Jira.  In addition to those noted performance improvements we have made, there have been a number of high and critical security vulnerabilities recently disclosed for Jira server.  More details in https://confluence.atlassian.com/jira/security-advisories-112853939.html.

Ultimately I believe that upgrading Jira is going to be the best avenue to explore to address these performance problems.  That said I would be interested in offering some other guidance here as I understand that upgrading Jira is not a trivial task to pull off, especially for large environments.

That said, there are a number of steps you can take without upgrading to try to improve performance and/or help identify the performance bottlenecks here. 

  1. Off-load Postgresql to another server.   While you could technically keep both on the same server, when you run into performance issues like this, it tends to be a better means of troubleshooting the performance here.   If you've been expanding the Jira Xmx value to 16g, which is half the total system memory, we're really not leaving sufficient resources for SQL and the base operating system to be able to make sure they operate as expected.  The Jira sizing guides we have published have an expectation that you will have a separate dedicated SQL server for those performance aspects.
  2. With heaps over 6gb in size, we almost always recommend using the G1GC method of garbage collection.  You can enable this with the startup parameter of  -XX:+UseG1GC
  3. The Garbage Collection (GC) Tuning Guide guide can be helpful to identify problem areas as well.  However I suspect you will most likely come upon known existing bugs in your current Jira version that Atlassian has already fixed in later versions.

I suspect you are seeing a number of performance problems specific to indexing in Jira.  This was a very common performance bottleneck in that version of Jira, which Jira 8 overhauled immensely. But it's also possible that if you're using other plugins with Jira that these could be putting an additional overhead on the system that might not really be noticeable until you scale to a size like yours.   I would start with these steps, please let me know if you have any questions or concerns here.

Cheers,

Andy

Dave Varon November 15, 2019

Hi Andy,  Thanks for the all these recommendations.  I'll update the ticket here with status as we proceed through them.  I'm expecting to resume stress testing in earnest in the next week or two.

Suggest an answer

Log in or Sign up to answer