We use continuous integration and continuous deployment techniques regularly in the delivery of our projects. Whilst our team predominantly uses GitHub Actions these days, we’ve supported a variety of CI tooling including Jenkins and GitLab Pipelines.
We have a number of projects that still use Jenkins, even though the majority of our recent projects use CI tools such as GitHub Actions.
We recently noticed a cryptic issue with our build agents where the agent would fail to start properly. We run our agents as spot instances on EC2, and this issue caused our Jenkins Server to try and provision an agent only to terminate the agent when it failed to start and to repeat the process in a never-ending loop.
The error message that we encountered was:
INFO: Connecting to 10.0.0.123 on port 22, with timeout 10000.
Jan 17, 2024 10:33:26 PM hudson.plugins.ec2.EC2Cloud
INFO: No SSH key verification (ssh-ed25519 aa:bb:cc:dd:ee:ff:11:22:33:44:55:66:77:88:99:00) for connections to EC2 (ec2-EC2 Spot Slaves) - Jenkins Slave AMI (t3a.medium) (i-01ab234567cd890ab)
Jan 17, 2024 10:33:26 PM hudson.plugins.ec2.EC2Cloud
INFO: Connected via SSH.
Jan 17, 2024 10:33:26 PM hudson.plugins.ec2.EC2Cloud
INFO: Creating tmp directory (/tmp) if it does not exist
Jan 17, 2024 10:33:28 PM hudson.plugins.ec2.EC2Cloud
INFO: Verifying: which scp
/usr/bin/scp
Jan 17, 2024 10:33:28 PM hudson.plugins.ec2.EC2Cloud
INFO: Copying remoting.jar to: /tmp
Jan 17, 2024 10:33:28 PM hudson.plugins.ec2.EC2Cloud
INFO: Launching remoting agent (via Trilead SSH2 Connection): java -jar /tmp/remoting.jar -workDir /home/jenkins
ERROR: unexpected stream termination
java.io.EOFException: unexpected stream termination
at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:459)
at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:404)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:437)
at hudson.plugins.ec2.ssh.EC2UnixLauncher.launchScript(EC2UnixLauncher.java:284)
at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:48)
at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:298)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
The error message isn’t descriptive or helpful. And because we’re running the build agents as EC2 Spot Instances, the instance is terminated after failing to start, which means there is no server to do further diagnostics on.
We identified the issue by spinning up a regular EC2 instance and manually connecting the instance as a build agent to the main Jenkins Server. We then triggered a new build to run on this agent, which experienced exactly the same issue. As the server isn’t a spot instance and wasn’t auto-terminated, we were able to investigate and troubleshoot the issue.
Root Cause
We found that the root cause of the issue was that our Jenkins build agent was running an outdated Java version, and our Jenkins server was providing a version of the build agent jar which has been compiled using a newer version of Java.
After we updated the version of Java that runs in our Jenkins build agent AMI EC2 image, the issue disappeared, and we were able to run builds again!
If you experience the same issue, we hope you find this post useful because when we had this issue, there was no clearly documented solution that we were able to leverage.