Hi,
There are times when our Bamboo agents lose connectivity with the main Bamboo server, even though there are no network connectivity issues that we can see.
The exception message is:
org.springframework.jms.UncategorizedJmsException: Uncategorized exception occured during JMS processing; nested exception is javax.jms.JMSException: org.apache.activemq.transport.RequestTimedOutIOException; nested exception is org.apache.activemq.transport.RequestTimedOutIOException
And occurs after several unsuccessful heartbeat attempts:
INFO | jvm 4 | 2011/12/15 11:11:51 | 2011-12-15 11:11:51,403 INFO [QuartzScheduler_Worker-4] [AgentHeartBeatJob] Not sending a new heartbeat since an old one is still being sent, last successful transmission time was 44 seconds ago, dropping the current heartbeat...
This happens to all of our agents at the same time and they usually come back after about 20 minutes or so, but their builds usually fail, giving "agent has gone offline" as the error.
Has anyone run into this problem before? My first thoughts were that it's a networking error, but I can confirm that the network connection between Bamboo and its agents stays up the whole time, even when the agents lose connection with the server.
Thanks
You might want to take a look at this, although not sure if it will help in your case:
http://confluence.atlassian.com/plugins/viewsource/viewpagesrc.action?pageId=216957427
Also I would suggest carefully monitoring the load on the Bamboo server host. If the host CPU is overloaded or the disk IO bandwidth is maxed out, it could cause agent connectivity problems.
This can particularly be a problem when running many remote agents.
Thanks for that, we have 19 agents at the moment and we've recently added a few, so it might very well be the Bamboo server host. I've disabled the timeout to see if that might fix the problem, but moving to a faster server and/or reducing the number of agents will be the likely solution.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
We're currently running 47 remote agents off a single Bamboo server with 20-25 running builds at any time. Our server is a several-year-old 1RU Dell, dual-core, 3GB RAM with a single 7200RPM drive. Windows Server 2003 R2 x64. So nothing special.
All the agents are stable now, although we've had similar problems to yours in the past. Our problems seemed to be related to disk IO. A couple of things have helped:
(1) Disabling the virus scanner on the Bamboo home directory (duh!)
(2) De-fraging the drive & ensuring there is plenty of free space.
If your server specs are comparable you should have no problems running 19 agents
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I disabled six of the agents yesterday afternoon and the problem hasn't reappeared, so it does look like it's the Bamboo server performance. We're running bamboo on a VM on a Dell blade server, but were going to be migrating it to a more powerful machine anyway, this just gives us one more reason to do so. Thank you for your answers, they were really helpful.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Online forums and learning are now in one easy-to-use experience.
By continuing, you accept the updated Community Terms of Use and acknowledge the Privacy Policy. Your public name, photo, and achievements may be publicly visible and available in search engines.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.