How Service Meshes May Save Continuous Delivery

How Service Meshes May Save Continuous Delivery

The Truth

There is a truth about continuous delivery that gets lost behind all the buzz. And the truth is that it is a lot like sex when you’re a teen – everybody talks about it, everybody thinks about it, everybody is preparing for it, but very few actually do it.

A repeating scenario we see with many teams is this: they build out a pipeline, but when the time comes to roll out each change to production they suddenly say: “Wait, wait, wait, no so fast! We can’t allow all changes go live! We need an ability to stop the pipeline! We don’t have enough test coverage! You can deploy to staging, but production deployment will be on-demand. This is too dangerous!” When I tell them that this breaks the whole idea of CD they nod their heads and say: “Yes, you’re right. This will be the next step, but right now we’re not ready yet”. And you know what? In most cases, the next step never happens.

At the human level – this is understandable. It’s hard to let go of the illusion of control that manual gating provides. Moreover – most complex modern systems still lack so much in observability. Analyzing an issue often feels like searching for lost objects in deep, muddy water. Add to this a stream of continuous change and the search becomes an extreme rescue operation under storm.

In order for real continuous delivery to occur, we need to make releasing software as stress-free as possible. Not a new idea, but one frequently overlooked. It has to be safe first, before it can become fast.

Making Releases Safe

A number of techniques for making software releases stress-free have emerged over the years. The DevOps Handbook separates them in two main categories:

  • Application-based – with things like feature toggling and configuration-based dark launches.
  • Environment-based – with blue-green deployments, ramped rollouts and canary releases.

All great ideas, each one with its advantages (and downsides). All quite non-trivial to implement and maintain. Moreover  – this mixture of application- and environment-based concerns requires a true DevOps organization in order to work. An organization where devs and ops truly think together and where system operability and deliverability is an architectural concern.  And we all know that in spite of all the talk – such organizations are few and far between.

But as more and more folks start having dreams about true continuous delivery it becomes obvious that these techniques are a must-have. Not a luxury reserved for the chosen ones, So much so  – that we now even have a new definition for this approach. It is now called progressive delivery – named so because it provides the ability to gradually, progressively release changes to our end-users.

The Network Tissue

Things are changing fast. In the last couple of years microservice architectures and container orchestration have become the defining powers of the zeitgeist. They’ve made many things simpler, and many other things – extra complex. Moreover – they’ve made some of the problems we were already aware of increasingly acute. By taking service-oriented approach, horizontal scaling and distributed computing to the extreme. By making the network the defining integration layer that all our systems depend upon.

One of the questions that we had to answer was – how to make our systems resilient to failure now that there’s this unreliable, ephemeral network tissue standing in all our data paths?

At first we started solving this on application level. Making our services resilient with libraries like Netflix Hystrix or Twitter’s Finagle. Arming our brave sailors before sending them into the stormy waters of production environments. But with time the amount of needed ammunition continued growing and our lightweight ships felt heavier and heavier. Libraries made services uncomfortably interdependent and thus ruined the vision of granular, restriction-free delivery.

Enter Service Mesh

That’s when the idea of service meshes arrived. The idea of creating a network of feather-light proxies that will manage all inter-service communication, protect them in this turbulent world and take the burden of resilience off developer’s shoulders.

And then we suddenly realised that these centrally managed proxies are also a great fit for providing all the environment-based progressive delivery techniques that we’ve dreamt about. To think of it – the mesh is already in charge of all the traffic! Moreover – even the application-based approaches can in large part now be outsourced to it. Why not put our feature flags inside our requests and redirect traffic to corresponding application versions based on just that?

Add to that the increased observability of the network fabric. Anybody who’s installed Istio demo is amazed by how easy it suddenly becomes to understand who talks to whom, trace requests and analyze their durations and statuses.

Can We Do CD Now?

Service meshes make things like dark launches, a/b testing and canary releases so much easier to implement and automate. And that’s why I think they may just save continuous delivery. If I can send stuff to production without being scared of setting all systems on fire. If I know that there’s a protection layer watching over me and slowly, gently and intelligently releasing functionality while giving me a lot of visibility into how and where the data is flowing. Then I’ll be ok with saying – “Yes, sure, send every commit to prod. The mesh will let me know if anything bad happens.”

No wonder everyone is so excited about service meshes. The tech has been growing like a weed. Only yesterday the Service Mesh Interface specification has been announced  at KubeCon in Barcelona, taking the industry one step closer to having a common language for talking about what it is a mesh does and realizing its evident potential. The plan is quite clear, it is now up to us to execute upon it.

And then maybe, just maybe – continuous delivery will become the boring reality for most, if not all of us.

P.S: if you’re interested in service meshes and how they are evolving take a look at the following projects: Istio, Linkerd, Consul Connect, Service Mesh Hub, SMI Spec

P.P.S: I’m giving a workshop on doing progressive delivery with Istio at DevopsCon Berlin. Not sure if there are any spots left, but we’re also planning to give the same workshop in Israel, Russia and UK. Reach out if interested.