Hi
I have this crazy plan that may just work. The way I understand it, Bitbucket says dynamic runners are not possible, and there has to be at least one runner available for any tag combination.
But here's what I'm thinking: dynamic pipelines, and some stuff that the Kubernetes autoscaler does with its use of undocumented APIs.
When the dynamic pipeline runs, there's no need for a runner to exist for any labels it will attach. So there's an opportunity to launch an AWS EC2 instance that's been set up just right to start a runner on boot, with the right UUID.
The most pressing problem is that I have only 25 seconds to do everything and return a pipeline, and that is non-negotiable as it's a hard Bitbucket limit.
So far, my tests show that I can start an unmodified Linux instance in about 20 seconds (sometimes it's 17). That leaves me with little wiggle room to do supporting operations.
I have 25 seconds to do this:
And once the runner completes its work (whether succesffuly or not), I have to shut down and terminate the instance and remove the runner registration from Bitbucket (reuse of runner registrations doesn't seem like a good idea in a concurrent environment, as I don't see any locking mecanism that I can use to prevent double allocation). I don't know right now if there are any events, hooks, or stable log messages I can use to trigger the shutdown.
In my dynamic pipeline, I would generate UUID labels based on some scheduling logic. I can share the biggest `size` instance requested by all the sequential steps, but I have to start one instance per parallel step if I want to keep things running smoothly.
Please let me know if this looks workable or if it's terminally insane and I should look for other drugs :)
I realised I can make self-registering runners using the API endpoints used by the Kubernetes autoscaler, so I don't have to pre-register UUIDs.
A wrapper script would handle the management for the runner: registration, starting up, state polling, shutting down, deregistration. And as a safety against garbage registrations (if the management script doesn't exit through the cleanup path), a cron job could delete any runners that have been unregistered or offline for more than 5 minutes.
The wrapper just needs to know the authentication credentials, what labels to apply, and possibly a name (but it could autogenerate it).