Community
Products
Jira
Questions
Why does our JIRA's open file descriptor count plateau at 3500?

Why does our JIRA's open file descriptor count plateau at 3500?

After upgrading from JIRA 4.4.4 to 5.2.4, we noticed that the number of JIRA's open file descriptors started reaching higher numbers than we'd seen before, eventually exceeding the ulimit, which caused various problems. We've since adjusted the ulimit and are OK, but we noticed last week that the number of open descriptors climbed to about 3500, and then plateaued there for several days. Many/most of these point to index cache files that had been deleted.

We don't know if this is normal or a symptom of a problem. Why 3500? Why deleted files? Can we safely assume that JIRA or java will manage these descriptors in a way that won't exceed our ulimit settings (currently 8192)? Are there administrative practices we need to perform to avoid a problem with this?

Thanks.

Tom

7 answers

1 accepted

0 votes

Answer accepted

We don't have all the details resolved, and certainly, we never again saw this "flat top" in the charts for the open file descriptions, but I'm not sure that matters. The long standing problem we experienced with unbounded growth of open file descriptors was the more serious problem.

The file descriptor leak stopped when we moved the cache directories to a local/native file system rather than an NFS mounted one. We're still digging to see exactly why we can't run the way we did before the upgrade, and it may turn out to be some interaction between specific versions of the OS, NFS, Java, and of course JIRA.

For performance reasons, we may never go back to the NFS hosted cache directories, but moving away from them certainly solved our problem.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Out of interest, when you say "NFS hosted cache directories" - do you mean you had the Jira "cache" directories on an NFS drive?

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Yes. The entire install (product and data directories) have been on NFS file systems since we first started using JIRA, because we run JIRA on muliple hosts but kept resources for all the instances on the same file system, which included the cache directories. Now the <JIRADATA>/caches directory is a sym link to a directory in /var, and the leak has stopped.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

No help to you but I have seen similar problems hosting Nexus's lucene index on an NFS-mounted dir. Long and short of it is, keep Lucene well away from NFS.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Thanks for the tip.

We're not used to being concerend about this sort of issue. We have been using NFS to host home directories and many. many critical resources to share all these across a large and diverse cluster, and have been doing so literally for decades in an intensively used developement and build environment. There have been a few NFS problems over the years, yet we have been reliably building millions of lines of code nearly continuously in this enviroment, with an extremely low problem rate (maybe one every couple of years). So we've come to take all this for granted, and the problems we've experience with JIRA's caches are the first like this. That said, JIRA is the first intensively used Java web app we've supported in this cluster, so we probably still have some things to learn.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Ah, I see.

I've tried putting the index on several file-systems. Anything other than raw, local, direct access has been an unmitigated disaster. NFS, SAN, Samba - they all seem to put such a drag on the Lucene index, Jira crawls to a halt and dies. Never got to the bottom of it, a simple test of moving the index "the fastest nastiest local hard drive you can get your paws on" made all the problems vanish.

On the other hand, all the rest of the data - absolutely fine. Plugins, home directory contents, attachments - not a problem.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Nic - Thanks for sharing. If there were recommendations along these lines in the setup documentation, we missed it, but it does make sense to take steps to keep these sorts of accesses as fast as possible. We don't see obvious changes in the perforamce or other user visible behavior since we made the change, but for sure we can now reduce the nofiles ulimit to something more reasonable, and we can stop doing maintenance restarts so often.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Thanks for sharing your experience - I've seen it hinted at several times, commented, but not really have anyone follow up to confirm that their particular storage was definitely the problem.

I was starting to think it might be just that if I was around, NFS or SAN disks would start to shred themselves for no reason other than to make my life difficult. ;-)

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Quick Update - I guess we just inspired this ...

https://jira.atlassian.com/browse/JRA-32080

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

From the linked javadoc, this looks like exactly your issue:

This is necessary on filesystems like NFS that do not support "delete on last close" semantics, which Lucene's "point in time" search normally relies on.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Indeed. Apparently we just got lucky with JIRA 4.4, but not not so much.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hi Nic,

Can you confirm your experiences with a /tmpfs file system on Linux? We are using JIRA 5.2.11

We have put our indexes into a RAM disk but today we saw JIRA crashing with more than 7000 files open. It looks like the number of open files are gradually increasing in the range of two weeks.

Regards,

Dieter

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

The JIRA supported platform docs do stuill say "don't use NFS for JIRA home dir" because Lucene indexes don't work well with it. I'm seeing problems with a RAM disk too I think, but it may nothing at all to do with SSD and more to do with what is causing a AlreadyClosedException. Every time that happens I see a new file descriptor accessing the same Lucene index file (issues)

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

FYI - This is still being worked, with lots of ideas, but still no rock solid understanding of the underlying problem, nor a real workaround except to set the ulimit very high and restart the system periodically.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

Yes - Our local team has an issue actively being worked (JSP-149435) and presumbly that will yield something in the near term.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

I am guessing that this is being worked upon by Atlassian support. Let us wait for the results.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

During the "flat top" period, about 80% of the descriptors referred to deleted files. The plateau ended when we rebooted the system as part of routine maintenance.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

This is what we see in Java Melody for a week that includes the phenomenum

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

Do you have some actual numbers? Will be interesting to check whether is an actual bug.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Forums

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Why does our JIRA's open file descriptor count plateau at 3500?

7 answers

1 accepted

Suggest an answer

Was this helpful?

Thanks!

TAGS

Community showcase

Atlassian Community Events