Hi @Shrikant Bijapurkar _NTT DATA_ Technically this health check error will be shown when full indexing is running or just completed on the node.
In full indexing the current index is deleted and new index is build from the scratch, while the indexing is running on the node there is a Jira internal service NodeReindexService
is keep running every 5 sec to distribute index change from one node to another so that indexes will be upto date on all other nodes across the Jira, this process is resource intensive and sometime it takes time to sync the indexes between the nodes and index health check gets timeout and we will see this error message.
Once the indexing is completed you can copy that updated index to the other nodes so that index on the other nodes will be updated this latest index to have the updated data, this process will take time based on the instance and index size but typically it's a fast process and once it is completed error will be gone in the next health check, this is common issues we face when the full indexing is triggered or completed.
If the error not auto-resolved after hour or so then there are various reasons for this that need to check as mentioned in the below KB document that you can check,
But before checking the KB document I would suggest understand how to indexing is work in Jira Datacenter so that it will hep you understand the issue and find the cause of this error.
https://confluence.atlassian.com/jirakb/how-indexing-works-in-jira-1167744587.html
https://confluence.atlassian.com/adminjiraserver/search-indexing-938847710.html
Related KB documents for the Cluster index replication errors.
If you Datacenter is setup correctly then 90% of time this error will be auto-resolved once the updated index is copied to other active nodes and next health check passed in specified time.
Let me know in case of any questions.
Thanks a lot for your answer, Ravina.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Attached picture shows our 3 nodes.
1 of these goes down when the full reindexing is triggered.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Shrikant Bijapurkar _NTT DATA_Nodes does not goes down as you can see the uptime of all the three nodes are same if node goes down then the uptime of the node will be reset, but yes, when the full indexing is triggered node does not accept the active traffic until the full indexing is completed, so it is recommended to offload/remove the node where indexing is triggered from the load balancer so that user traffic will be redirected to the other active nodes.
Regarding the index copy "FROM" to "TO" node, once the indexing is completed on the triggered node (In your case it is the node that goes down/not accessible to the users) so this will be your FROM node, you can also identify these nodes based on the node-id as nodes will have a unique id's identify then, so for example in your case if you started/trigger the full indexing on node - i-078239b... and completed the indexing then this will be your FROM indexing and node - i-02b7afca... will be your TO node in first case, click on copy index button, wait for sometime then perform the same action to copy the index to another node keep FROM node as same and use node - i-0368df0... in TO and click on copy index.
As you have setup a auto scaling to scale another node if the CPU load increase then in that case as this will be a new node joining the cluster once the node is started then the NodeReindexService
will look for the latest copy of the index from other nodes or shared home and build the index for the new node joining the cluster, so in case of new node is getting scale technically there is no need to perform this copy index action.
So best practice to follow the full indexing are:
Let me know in case of any question.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks a bunch for your detailed reply, Ravina.
I understood the FROM and TO nodes.
We are already following the Best Practices viz.
1. We run the Full reindexing in OFF time when there is minimal Jira traffic.
2. The other 2 nodes are always UP
I will update the FROM and TO next time we run the full reindexing tomorrow, after it completes.
And then I hope the warnings do not show up next day.
If all goes as expected, would surely ACCEPT your answers.
Thanks anyways. This has been helpful so far.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
The Full reindexing on i-02b7afcad5a3bc955 node completed within 2 hours.
This is very effective, the background reindexing used to take 20 hours or more.
Then I went to the Copy area and put data as below -
From:
Current node: i-02b7afcad5a3bc955
To:
i-03618df0a70329128
When I press Copy Index, the page scrolls up and now i-03618df0a70329128 shows as current node, prompting to start full reindexing again on i-03618df0a70329128 node.
Actually, I was hoping it would take time to copy the newly created node into i-03618df0a70329128 and then allow me to also copy i-02b7afcad5a3bc955 into the 3rd node viz. i-078239bb88fd7a123
But I am unable to do that. The To box is no longer editable.
What am I doing wrong?
In the interim, the Health check warning messages are popping up, and I fear the teams would all see those tomorrow morning in Japan, before we start work in India.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I ran the Full reindex on one of the 2 nodes.
And now the Warning messages have stopped.
So, I guess it takes a little while for the new index to propagate to other nodes, I believe.
I might be wrong.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Yes, As I said earlier the NodeReindexService
is running on the node to sync the indexes between the nodes and update the missing indexes in the local node indexes as there are other bunch of services and process running in Jira so based on that index replication time between the nodes varies.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Online forums and learning are now in one easy-to-use experience.
By continuing, you accept the updated Community Terms of Use and acknowledge the Privacy Policy. Your public name, photo, and achievements may be publicly visible and available in search engines.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.