We are trying to setup a Disaster Recovery solution for our Atlassian applications (so far, Crowd, JIRA, Confluence and Crucible). The production environment consist of the following servers:
Our approach is to have every server replicated to a cold server (in a different geographical location), do an rsync to keep the different data folders up to date and have a secondary database server that we keep up to date with database replication.
First issue is to make sure we filter what files are replicated through rsync, so we do not overwrite the cold server settings like database configuration (should point to the failover DB server).
Second problem is to filter what tables get replicated for the databases. The last releases of Atlassian apps have the User Directory configuration stored in the DB. This means that if we do not filter this settings, we'd have the failover JIRA server pointing to the production Crowd, instead of the failover one.
Still haven't completed this setup, but would like to hear of any thoughts about this setup and other possible solutions to provide resiliance to our Atlassian environment. I'm specially concerned of the administrative burden that this will bring when upgrading the live environment. Also, any changes in the configuration files and/or configuration settings stored in the DB in future releases would probably mean our cold failover environment will be broken.
Sounds unnecessarily complex... your JDBC url should contain the DNS alias for the database server, such that if the database is failed over then it the same url automatically points to the DR database system. Unless you are a very small company this should be provided for you by the DBAs I would have thought.
I don't use Crowd, but the same thing applies to LDAP servers. You point to one that gets round-robinned by DNS, and any that are down get dropped automatically. So I'd suggest you just set up DR for Crowd and use F5s or whatever to automatically have the crowd url directed to the correct crowd server.
We use a clustered filesystem so in the event of failover the filesystem is automatically mounted on the DR machine. If we had to change configuration files or ensure that they had not been synced that would just increase the chance of a problem in an already panicked situation.
In short, at least for the DB thing, try to leverage whatever your DBAs recommend.
Thanks for the tick, hopefully other people will chime him with more information. One final piece of advice - test it! And then again every 6 months or so.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Have to say your solution is embarrassingly simple :)
I agree it'd be good to hear from other people implementations.
I'm thinking of creating static entries in the failover servers hosts files to point to LDAP and DB server. This way we can test it without bringing the prod environment up and there'll be less steps to follow in case of failover. We are thinking of doing this manually, no F5s ;)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
On this topic: Atlassian has just released a dedicated best practice guide for High Availability. It covers a cold failover scenario and includes implementation details on reverse proxying, monitoring, replication and failover mechanisms:
https://confluence.atlassian.com/display/ATLAS/Failover+for+JIRA
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
how does one access this document? We're about to start a migration/combination and this doc would really come in handy
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
No... I can't access it anymore -presumably as data center is available, then this document has been retired?
It would be useful for the rest of us, as I need to test our cold standby environment, and it's been a few months since I last reviewed this doc!
Can someone at Atlassian free it up from it's black hole?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi, you can find the newest version of the document here: https://confluence.atlassian.com/display/ENTERPRISE/Failover+for+JIRA+Data+Center
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Christine, I don't see any data other than a basic image.
Your previous doc had heartbeat and brbd information and a bit on database replication.
Cheers
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Sadly, the new link doesn't have much information at all. There are many of us who are either not using Jira Data Center yet, or choose not to for various reasons. For example, my company has datacenters in different geographical regions. Jira Data Center doesn't cluster between different geographic locations yet. So for us, the cold failover approach makes more sense.. But I can't seem to find cold failover documents for Jira *anywhere* on atlassian -- the few pages that still exist appear to be restricted. I see stuff for Confluence, Bamboo, stash... but not Jira. If I were a conspiracy theorist, it would appear that we are being heavily encouraged to use Jira Data Center. ;)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
 
 
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.