We are facing a serious performance degradation with our JIRA Software 7.6.17. Every once in a few days (or more recently every day) our JIRA application starts to generate a huge number of input/output (mostly read) operations, which increases with every minute and causes performance degradation severe enough to make us restart the system service. After restarting JIRA works with satisfying performance (kind of) - until another time.
We want to find out which JIRA component or operation causes this number of I/Os. Any idea how to do this? We use incoming mail extensively (44 accounts, some of them quite busy, ie. dozens of messages w/attachments per minute).
Unfortunately in our setup all of ${catalina.home}, ${jira.home} and ${jira.home}/caches reside on a single filesystem, so we cannot exactly tell where all these I/Os go to from virtual disks stats. We consider splitting the dirs, but it takes time and requires maintenance window. And maybe there's another, simpler solution, like increasing log level of certain JIRA components. Any help will be appreciated.
JIRA Software 7.6.17 and ServiceDesk 3.9.17 installed in a single instance.
OS: CentOS 7 x86_64, Java: Oracle JDK 1.8.0_102-b14
virtualization: ESXi 6.7, disk storage on a medium-class array with 10k HDDs, RAID5
UPDATE: turns out that with the latest incarnations of the issue we have lots more reads that writes. This rather eliminates logging from root causes.
In my experience, I've seen this happen in three broad cases. Obviously, this is limited to my experience, so it could well be something else I have not encountered.
1. Swap
When a java process uses up too much RAM, it will cause the OS to make heavier use of "swap" space - using disk for things it can't keep in RAM.
If your people are triggering processes that outstrip the memory available, then you will see disk io massively increase while the Java process needs the extra space. The io should however drop back to normal when the process has finished.
This will also happen if you have something with a "memory leak". With a leak though, the disk access will not reduce. A memory leak means something is being allocated memory to work with, but it never releases that memory for other things to use. As more and more memory gets blocked out like that, your OS will increasingly use swap space, so you'll see io increasing until the system can't cope with it.
2. People
There are some functions in Jira that absolutely hammer your disks, notably the xml backup, indexing, and exports. Could your people be triggering actions that chew up the disk io?
3. Incorrect infrastructure
Someone put the index on a shared file system. Don't do that. It guarantees problems like the one you are seeing and can eventually corrupt your data.
I would look for those first, but even if you don't, my answer would not be to try to improve the disk performance, it would be to look at why Jira is doing this. That means looking at what files are being created/updated/destroyed during a period of high io.
Thank you.
Swapping is not the issue, we have checked it already.
XML (application-based) backup and exports are not used at. Indexing is used and this is our primary suspect, so we want to move it to another filesystem to check how the I/Os split.
At the moment the whole ${jira.home} (together with caches/indexes) is on a single filesystem. This is a dedicated, per-VM, VMFS/VMDK-based virtual disk. No sharing involved (except for the shared disk array).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Have you identified, at least, what files (if "storage" means the filesystem in this context) are written or read extensively?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Yes, storage stands for JIRA application filesystem storage here. Not the database.
This is the essence of my question - how to identify directories/files that get this number of read/write operations. If we find the files, we're close to getting to the root cause.
Because of the scale (millions of files in ${jira.home} directory) any attempt to ex. find /jira/home/ -type f -mmin -1 -print takes hours. I can only think of tracing that using application logs, but I don't want to increase everything to DEBUG level, because doing so can make things even worse, ie. beat performance further because of increased log write rate.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
How about taking a thread dump in a moment when things are massive?
Maybe looking at the active threads could reveal what's going on? (And it could be cheaper or at least "expensive for a short period of time" than increasing the log level or watching the file system on the OS level...)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
OK, I could use Atlassian's support-data.sh tool and proceed with the dump. The script runs with errors, but thread dump seems to be working.
Not sure if I can analyze the dump - I'm definitely not an expert with Java. But I'll try.
Today we're mounting some throughput-sensitive directories (${jira.home}/caches/indexes most importantly) on a separate volume to measure i/o rate. Hope this sheds some light on the mystery.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
We just recovered from another situation like that. Unfortunately taking threads dump was not possible, again because of errors during the operation (don't know what the hell is that 636 PID - no process with this id is running, maybe the script's shell itself):
/tmp/support-data.sh
JIRA PID detected 39637
JIRA_HOME detected in /datastore01/appdata/jira
JAVA_BIN detected javaDo you want to test disk access speed? (y/n) n
Do you want to capture thread dumps? (y/n) y
Collecting information about the running JIRA instance. It will take approximately 1 minute.
39637: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
Error attaching to process: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 626: No such process
sun.jvm.hotspot.debugger.DebuggerException: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 626: No such process
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.execute(LinuxDebuggerLocal.java:163)
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach(LinuxDebuggerLocal.java:278)
at sun.jvm.hotspot.HotSpotAgent.attachDebugger(HotSpotAgent.java:671)
at sun.jvm.hotspot.HotSpotAgent.setupDebuggerLinux(HotSpotAgent.java:611)
at sun.jvm.hotspot.HotSpotAgent.setupDebugger(HotSpotAgent.java:337)
at sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:304)
at sun.jvm.hotspot.HotSpotAgent.attach(HotSpotAgent.java:140)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:185)
at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:140)
at sun.tools.jstack.JStack.main(JStack.java:106)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
The case is most likely solved. Here's what we've found.
After we've disabled the other application (it's non-critical) the index filesystem read/write rate returned to acceptable values. We've reported the problem to their support and we're waiting for the fix. Our monitoring clearly shows that there should be no more issues like that in any near future.
As our JIRA instance runs behind a reverse proxy (Apache httpd-based) we're also implementing a guard to filter out all ^/rest/api/[0-9]+/search(/.+)$? requests without fields= query string arg., to enforce proper integration in future.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Online forums and learning are now in one easy-to-use experience.
By continuing, you accept the updated Community Terms of Use and acknowledge the Privacy Policy. Your public name, photo, and achievements may be publicly visible and available in search engines.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.