Friday, 8 November 2019

Upgrading the in-house microservice framework


This is a bit of a "note to self" but if you find it interesting let me know.

This isn't intended to be a deep dive into any particular technologies, but it might touch on some situations familiar to developers with a few years of experience in managing microservices.

Setting the scene
The main Java-based microservices had some common needs:
 - authorization
 - logging
 - service discovery
 - api documentation generation
 - accessing objects from the cloud provider

Naturally it made sense for all of this common functionality to be bundled together in one common framework and used everywhere - which is fine.

Unfortunately some awkward shortcuts were taken to achieve some of the functionality, which made upgrading the underlying open source framework impossible to achieve without introducing breaking changes.

Before I had joined the organisation a couple of attempts had already been made to get things back into shape, but they ended up being aborted as the developers came to the realisation that they could not make the necessary updates without breaking the build for most of the existing services.

I helped to persuade the management team that this inability to upgrade had to be addressed to enable us to avoid security issues and take advantage of performance improvements, so a team was formed.

My main coding contributions were:
 - migrating between JAX-RS versions
 - updating logging to continue to preserve correlation Ids since the underlying framework had changed significantly
 - migrating away from using the very out of date S3 client
 - repetitive small changes that didn't make sense for every team to learn about and apply themselves

Dependencies for tests should be scoped as "test"
Something that looked like a minor oversight turned into a couple of weeks of work for me.  Some dependencies that were needed for running unit tests had been specified with an incorrect scope, so instead of just being available in the framework during build time they were actually being bundled up and included in most of our microservices.

Changing the scope of a handful of dependencies only to realise that a dozen or more projects had been relying on that to bring in test support libraries for their builds made me a little unpopular with developers who had just completed work on new features and found that their builds were broken so they could not deploy to production.

This led to one of my first time sinks in the project.  The majority of the broken builds were for services that were not under active development, so the teams that owned them could not be expected to down tools on what they were actively working on to fix their services' dependencies.  Fortunately a migration of one of the test dependencies was a recommended part of preparing to upgrade the underlying framework, so I was able to get that change out of the way more quickly than if the individual teams had done this themselves.

Service discovery client changes
The existing service discovery mechanism involved service instances registering themselves with Zookeeper during deployment.  This allowed Zookeeper to know which instances were available, and allowed services to keep up to date about the available instances of services that they needed to call.
Some aspect of this setup was not working properly, so as a result each time a new version of a service was deployed we would see a spike of errors due to clients sending requests to instances that were not yet ready.

We had an alternative mechanism for services to reach eachother by introducing fully qualified domain names pointing to the relevant load balancers, so removing the Zookeeper dependency and updating the various http clients to have some new configuration was in scope for this upgrade project.

A colleague from another team contributed by using his team's services as a proof of concept for migrating to the fully qualified domain names.

The initial approach was fine for those services as the clients could all be updated at the same time.  When it came time to apply the same type of change to other services we struck an issue whereby not all client libraries could be updated at the same time - so I had to introduce a bridging interface for the internal API client to allow old style clients and fully qualified domain name clients to co-exist.  This became my second time sink of the project, as once more we could not rely on every team having time made available by their product owners to address this migration work.

I saw the client service discovery migration work as being an area where specialization can speed things up.  Having one or two individuals applying the same mechanisms to make the necessary changes in dozens of places is much more time efficient than having dozens of individuals learn what is required and apply the change in two or three places each.  A couple of teams that did attempt to apply the changes themselves missed applying changes to the bundled default configuration for their client libraries - meaning additional configuration changes would need to be set up in each of the services that used their client libraries.

Not all services were created equal
Some services' build phases were more stable than others.  One particularly complex system had not been deployed for a couple of months before we came to apply the service discovery client changes.  The flaky nature of some of its integration tests left us in the dark about some broken builds for a few weeks.  It didn't help that the test suite would take over half an hour to run.  Eventually I realised that a client that I had configured in a hurry was missing one crucial property and we were able to unblock that service and it's associated client for release.

Libraries and versioning
Several services had a common type of sub-system to interact with - the obvious example being a relational database - so over the years some common additional functionality had been introduced into a supporting library.  Due to the lack of modularity in the underlying framework we found ourselves tying these support libraries to the framework - even for just being able to specify the database's connection properties as a configurable object.

I took the decision to treat the new version of the framework as a completely new artifact, which meant that each library also had to diverge from the existing versioning so that we would not have a situation of a service automatically picking up a new library version and bringing along a completely incompatible framework as a transitive dependency.

This got a bit of push-back from some developers as they started to look into what had been involved in successfully migrating services so far.  "Are we going to end up with a new artifact for the next  upgrade as well?" was a very fair question.  Like most things in technology the short answer is, "It depends."  My hope is that the current stable version of the underlying open source framework will only have minor upgrades for the next year or two.  Alongside this my expectation is that many of the existing microservices will be migrated to a different type of environment, where some of the functionality provided by the in-house framework will be dealt with in ways that are external to the service - e.g. sidecars in systems such as Kubernetes.

No comments:

Post a Comment

How should we set up a relational database for microservices?

Introduction Over the years I've provisioned and maintained relational databases to support a range of web based applications, starting ...