Hi,
I am confluence admin, and I got the problem several times. I have read almost all related articles in the community, but didn't find the answer. Need your help!
Env:
Confluence: 6.15.8 (Server)
Database: mysql 5.7 (character set=utf8 , collation=utf8_bin)
OS: CentOS 6.9
The process & command:
root 172001 1 99 Apr17 ? 08:41:31 /data0/confluence/confluence_home/jre//bin/java -Djava.util.logging.config.file=/data0/confluence/confluence_home/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -XX:ReservedCodeCacheSize=384m -XX:+UseCodeCacheFlushing -Dconfluence.context.path= -Dorg.apache.tomcat.websocket.DEFAULT_BUFFER_SIZE=32768 -Dsynchrony.enable.xhr.fallback=true -Xms32g -Xmx32g -XX:+UseG1GC -Datlassian.plugins.enable.wait=300 -Djava.awt.headless=true -XX:G1ReservePercent=20 -Xloggc:/data0/confluence/confluence_home/logs/gc-2020-04-17_16-22-37.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M -XX:-PrintGCDetails -XX:+PrintGCDateStamps -XX:-PrintTenuringDistribution -Dignore.endorsed.dirs= -classpath /data0/confluence/confluence_home/bin/bootstrap.jar:/data0/confluence/confluence_home/bin/tomcat-juli.jar -Dcatalina.base=/data0/confluence/confluence_home -Dcatalina.home=/data0/confluence/confluence_home -Djava.io.tmpdir=/data0/confluence/confluence_home/temp org.apache.catalina.startup.Bootstrap start
root 172492 172001 44 Apr17 ? 03:31:30 /data0/confluence/confluence_home/jre/bin/java -classpath /data0/confluence/confluence_home/temp/2.1.0-release-confluence_6.15-32f7299a.jar:/data0/confluence/confluence_home/confluence/WEB-INF/lib/mysql-connector-java-5.1.39-bin.jar -Xss2048k -Xmx1g synchrony.core sql
[root@java115-online qiaoxiaolin001]# vim /data0/confluence/confluence_home/logs/gc-2020-04-1
Descrption:
1. All users get "unable to communicate with confluence, please contact your administrator" message when they try to create or edit a page
2. The problem happened in Friday afternoon, more than 6 times. So weird. Friday afternoon is a rush hour, or some scheduled job runs regularly?
3. Confluence server CPU increased quickly to 1800% from 200% (32 cores) after error occured
4. Lots of ERROR log like this: hibernate doRallback
2020-04-17 16:17:49,503 WARN [http-nio-8090-exec-218] [confluence.impl.hibernate.ConfluenceHibernateTransactionManager] doRollback Performing rollback. Transactions:
->[com.atlassian.confluence.api.impl.service.content.draft.ContentDraftServiceImpl.publishEditDraft]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT (Session #599987164)
[null]: PROPAGATION_REQUIRES_NEW,ISOLATION_DEFAULT,readOnly (Session #1499872420)
-- referer: http://wiki.lianjia.com/pages/resumedraft.action?draftId=594127569&draftShareId=de61105b-eaa2-4081-8967-01b38c6fa2da& | url: /rest/api/content/594127389 | traceId: 93b7f73415de52d6 | userName: wangning040
5. Lots of WARN log like this: took a long time for post-commit, increaseing from 5000ms to 81525ms
2020-04-17 16:15:37,428 WARN [http-nio-8090-exec-496] [confluence.util.profiling.DurationThresholdWarningTimingHelperFactory] logMessage Execution time for post-commit task com.atlassian.confluence.pages.DefaultPageManager$$Lambda$4896/707434629@3edbbc34 took 81525 ms (warning threshold is 5000 ms)
6. JVM stats:
2020-04-17 16:18:27,770 INFO [http-nio-8090-exec-443] [atlassian.confluence.status.SystemErrorInformationLogger] writeToLog
Request Unique ID : b1b12381-afb9-426a-85a8-689827e17bbf
--------------------------
JVM Stats
--------------------------
usedMemory = 14940083728
usedMemoryInMegabytes = 14247
availableHeap = 19419654640
freeMemoryInMegabytes = 18520
allocatedHeap = 34359738368
freeAllocatedHeap = 19419654640
totalMemory = 34359738368
totalMemoryInMegabytes = 32768
availablePermGen = 0
maxPermGen = -1
maxHeap = 34359738368
usedHeap = 14940083728
freeMemory = 19419654640
usedPermGen = -1
--------------------------
8. Sometime got "GC allocate failure" log
9. The only thing I can do is restart service.
What I tried:
1. Set ReservedCodeCacheSize from 256M to 384M, due to "CodeCache is full" log and "C2 compiler thread" eats a lot of CPU
2. Set Tomcat max thread from 50 to 500
3. Memory setting: -Xms32g -Xmx32g
4. Restart service in mid-night to avoid LDAP sync issue
What I will try:
1. "Restart Synchrony" in collaborative editing page, instead of restarting confluence if the same issue occur next time
What I want:
Find the root cause, and solve it, no need to restart anything.
Thanks!
And, I searched the stacktrace following this article, find some synchrony functions
https://confluence.atlassian.com/confkb/how-to-analyse-thread-dumps-788039144.html
at com.atlassian.confluence.plugins.synchrony.service.http.SynchronyRequestExecutor.execute(SynchronyRequestExecutor.java:43)
at com.atlassian.confluence.plugins.synchrony.service.SynchronyAbstractManager.execute(SynchronyAbstractManager.java:72)
at com.atlassian.confluence.plugins.synchrony.service.SynchronyExternalChangesManager.performExternalChange(SynchronyExternalChangesManager.java:76)
at com.atlassian.confluence.plugins.synchrony.service.SynchronyExternalChangesManager.syncContentOnUpdate(SynchronyExternalChangesManager.java:64)
at com.atlassian.confluence.plugins.synchrony.service.SynchronyContentService.syncContentOnUpdate(SynchronyContentService.java:111)
...
...
...
at com.atlassian.confluence.web.filter.DebugFilter.doFilter(DebugFilter.java:46)
at com.atlassian.core.filters.AbstractHttpFilter.doFilter(AbstractHttpFilter.java:32)
It's a long shot, but did you check for database congestion? Maybe there aren't enough connections on the Confluence server and/or database side configured:
https://confluence.atlassian.com/confkb/startup-check-database-connection-pool-size-960713815.html
Think about deploying JavaMelody (Check Marketplace, free addon) to find the root cause of your issue.
Best
JP
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks for your reply!
I have checked db connection:
In confluence.cfg.xml: <property name="hibernate.c3p0.max_size">3000</property>
In mysql: max_connections =5000
And I checked the max used connections is 23 in grafana monitor
So i think the connnection is enough
( a little confuse, we have 3000+ active users per day, only 23 connections? )
Any other advice for me ? Thanks!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Some new clue:
I found many http threads stuck in "java.net.SocketInputStream.socketRead0", which may result in poor response
Thread dump like this:
"http-nio-8090-exec-504" #78468 daemon prio=5 os_prio=0 tid=0x00007f8dd49ca000 nid=0x28688 runnable [0x00007f82cb5f1000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Are you using a proxy server?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
In my confluence, collaborative editing is enabled, and synchrony proxy is managed by confluence with port 8091 as documentation's recommend (when i start confluence, synchrony process will start automatically).
PS: Could it be related to VPN network? Most of our members work at home from Feb. and problems happen (i'm in China)
Thanks you!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.