This is a bit of a "note to self" but if you find it interesting let me know.
This isn't intended to be a deep dive into any particular technologies, but it might touch on some situations familiar to developers with a few years of experience in managing microservices.
Setting the scene
The main Java-based microservices had some common needs:
- authorization
- logging
- service discovery
- api documentation generation
- accessing objects from the cloud provider
Naturally
it made sense for all of this common functionality to be bundled
together in one common framework and used everywhere - which is fine.
Unfortunately
some awkward shortcuts were taken to achieve some of the functionality,
which made upgrading the underlying open source framework impossible to achieve without introducing breaking changes.
Before I
had joined the organisation a couple of attempts had already been made to get
things back into shape, but they ended up being aborted as the
developers came to the realisation that they could not make the
necessary updates without breaking the build for most of the existing services.
I helped to persuade the management team that
this inability to upgrade had to be addressed to enable us to avoid
security issues and take advantage of performance improvements, so a
team was formed.
My main coding contributions were:
- migrating between JAX-RS versions
- updating logging to continue to preserve correlation Ids since the underlying framework had changed significantly
- migrating away from using the very out of date S3 client
- repetitive small changes that didn't make sense for every team to learn about and apply themselves
Dependencies for tests should be scoped as "test"
Something
that looked like a minor oversight turned into a couple of weeks of
work for me. Some dependencies that were needed for running unit tests
had been specified with an incorrect scope, so instead of just being
available in the framework during build time they were actually being
bundled up and included in most of our microservices.
Changing
the scope of a handful of dependencies only to realise that a dozen or
more projects had been relying on that to bring in test support
libraries for their builds made me a little unpopular with developers
who had just completed work on new features and found that their builds
were broken so they could not deploy to production.
This
led to one of my first time sinks in the project. The majority of the
broken builds were for services that were not under active development,
so the teams that owned them could not be expected to down tools on what
they were actively working on to fix their services' dependencies.
Fortunately a migration of one of the test dependencies was a
recommended part of preparing to upgrade the underlying framework, so I
was able to get that change out of the way more quickly than if the
individual teams had done this themselves.
Service discovery client changes
The
existing service discovery mechanism involved service instances
registering themselves with Zookeeper during deployment. This allowed
Zookeeper to know which instances were available, and allowed services
to keep up to date about the available instances of services that they needed
to call.
Some aspect of this setup was not working properly, so
as a result each time a new version of a service was deployed we would
see a spike of errors due to clients sending requests to instances that
were not yet ready.
We had an alternative mechanism for
services to reach eachother by introducing fully qualified domain names pointing to the relevant load balancers,
so removing the Zookeeper dependency and updating the various http
clients to have some new configuration was in scope for this upgrade
project.
A colleague from another team contributed by
using his team's services as a proof of concept for migrating to the
fully qualified domain names.
The initial approach was
fine for those services as the clients could all be updated at the same
time. When it came time to apply the same type of change to other
services we struck an issue whereby not all client libraries could be
updated at the same time - so I had to introduce a bridging interface
for the internal API client to allow old style clients and fully
qualified domain name clients to co-exist. This became my second time
sink of the project, as once more we could not rely on every team having
time made available by their product owners to address this migration
work.
I saw the client service discovery migration work as
being an area where specialization can speed things up. Having one or
two individuals applying the same mechanisms to make the necessary
changes in dozens of places is much more time efficient than having
dozens of individuals learn what is required and apply the change in two
or three places each. A couple of teams that did attempt to apply the
changes themselves missed applying changes to the bundled default
configuration for their client libraries - meaning additional
configuration changes would need to be set up in each of the services
that used their client libraries.
Not all services were created equal
Some services' build phases were more stable than others. One particularly complex system had not been deployed for a couple of months before we came to apply the service discovery client changes. The flaky nature of some of its integration tests left us in the dark about some broken builds for a few weeks. It didn't help that the test suite would take over half an hour to run. Eventually I realised that a client that I had configured in a hurry was missing one crucial property and we were able to unblock that service and it's associated client for release.
Libraries and versioning
Several services had a common type of sub-system to interact with - the obvious example being a relational database
- so over the years some common additional functionality had been introduced into a supporting library. Due to the lack of modularity in the underlying framework we found ourselves tying these support libraries to the framework - even for just being able to specify the database's connection properties as a configurable object.
I took the decision to treat the new version of the framework as a completely new artifact, which meant that each library also had to diverge from the existing versioning so that we would not have a situation of a service automatically picking up a new library version and bringing along a completely incompatible framework as a transitive dependency.
This got a bit of push-back from some developers as they started to look into what had been involved in successfully migrating services so far. "Are we going to end up with a new artifact for the next upgrade as well?" was a very fair question. Like most things in technology the short answer is, "It depends." My hope is that the current stable version of the underlying open source framework will only have minor upgrades for the next year or two. Alongside this my expectation is that many of the existing microservices will be migrated to a different type of environment, where some of the functionality provided by the in-house framework will be dealt with in ways that are external to the service - e.g. sidecars in systems such as Kubernetes.