The System Of Continuous Migration

The System Of Continuous Migration

Migrating

Introduction

We live in a world where a commercial organization has to be in a state of constant flux. That is  – if it wants to survive and prosper.

This statement is even more accurate for IT companies. (And  – as the popular saying goes – every company is an IT company today)

One could of course argue that I’m suffering from a consultant worldview bias. After all – consultants are mostly brought in to help with organizational and technological changes. In the last couple of years we at Otomato have been involved in dozens of projects that all had ‘migration’ or ‘transformation’ in their title.  So yes, definitely – change is all we see.

But I’ve spent more than 15 years in IT companies small and large prior to becoming a consultant – and it’s always been like this. With ever accelerating speed. We’ve been changing languages, frameworks, architectural patterns and of course tools. Always migrating, rewriting, adapting and rethinking. Because that’s the business we’re in – the business of innovation. Because the value we provide is the promise of brighter future. And that means we can never stand still – as yesterday’s future is tomorrow’s past.

The practical side of this exciting (and somewhat frightening) reality is that we are always on the outlook for new tools and technologies. Moreover  – at any given moment we have at least one migration project planned, executed or failing. And it is stressful. Because these migrations and POCs are always full of uncertainty and risk. And because our performance is often measured by migration success. We are expected to have grand triumph or to fail fast – to minimize the cost of failure. And the larger the migration project – the harder this becomes. The benefits of the new approach aren’t always immediately measureable. The true costs of migration only become seen after we’re neck deep. And we can’t really stop the daily grind to think it all through till the last bit.

So migrations are inevitable but stressful. And how do we make something less stressful? We practice it daily, we learn all the pitfalls and then develop a system to mitigate failures and risks. In other words – we do it continuously! And it certainly feels like we as an industry can benefit from a systemic definition of continuous migration. So let us look at various existing approaches, try to understand what works best and attempt to define a system.

The Two Approaches

In general we can say there are 2 leading approaches to migration. We can even label them as ‘the old way’ and ‘the new way’. The old way is the grand cutover approach and the new way is the start small approach. Yes, I know –  this old vs. new dichotomy is over-simplistic. Each approach has its own history, its own benefits and disadvantages. Moreover different systems require different approaches. Still there are certain trends in the industry that we can’t ignore. Sometimes these trends influence our decisions. And our goal here is to provide a system to base our decisions upon. A system that cuts through the mist of personal preferences and industry trends and provides a clearer view of the subject at hand.

But before that  – let’s overview the 2 approaches and what each one of them entails. To make this more interesting we’ll start with what we previously labeled ‘new’ and then look at the ‘old’.

I must admit – I have my own biases that I’ve developed over the years. I’ll do my best to keep them out of the text when describing the existing approaches. Still – if I was perfectly sure that one of the approaches is superior  – I wouldn’t be writing this. What we’re trying to do here is to develop a superset of concepts and criteria. Something that will allow us to enjoy the best of both worlds while escaping most pitfalls on the way.

At this stage an attentive reader might object that I’m not discussing anything new here. This is just the old, beaten dichotomy of product development  – waterfall vs. agile, planned ahead vs. iterative. I do realize there are similarities. But migration projects aren’t the same as application development projects. One could argue that in migration there is no such thing as MVP. Showing that migration is viable isn’t enough to prove that it’s cost-effective. Moreover many existing business-wide systems don’t lend themselves easily to iterative migration. In a way they can be seen as life-critical systems which require meticulous testing and extensive proof of meeting the requirements prior to going live. The kind of proof that is very hard to obtain in a playground environment.

The Grand Cutover Diagram
The Grand Cutover
Start Small Diagram
Start Small


 

So let us start:

Start Small (the iterative approach)

This approach stems from the idea that it’s either impossible or too expensive to create a real staging environment for verifying the changes. As a matter of fact it’s not only about creating an environment. It is mainly about generating sufficient load of real-life use cases in order to verify system readiness. The investment in such testing is seen as too high, especially if we think of migration as a one-time process. Migrate and forget. Which – as we already said in the introduction – is not the case in our modern world.

So if preparing everything on the side in one stride doesn’t look feasible, what do we do? We start small. We take a greenfield project, a small service on the side, a specific system module. Or a separate team. The innovateurs. The test pilots. The Kamikazes. The Shaheeds. We migrate (or start from scratch, if it’s a new project) that part of our system to the new framework. This is an experiment, an evaluation. No obligations, no commitments. Only good intentions and some bravery. In fact I think we need a new word for such migration projects – migrevaluations.

As a side note – from a small survey we’ve done – most engineers and managers today prefer to start small. With many of them not even seeing any other option. That’s why I called this ‘the new way’ – this is how many of us today feel things should be done. And it’s quite understandable. Psychologically it’s much easier and less intimidating to start something small than to try and think through all the implications of a months-long system-wide change. Additionally  –  most of us have had our brains so cleanly washed with Agile soap that we don’t see any alternatives. Scrum, Kanban et al. offer some great project management techniques – but they’re not necessarily the best framework for reasoning about a problem.

But the big question with the ‘start small’ approach is always: how (and when) do we verify that the migration is worthwhile?  “Define KPIs!”  – the smarter folks will say. E.g: the migration to the new tool should shorten the build time by 30%. Or: the migration to the new orchestration framework will allow us to release twice as often with 25% less bugs. I certainly believe that defining these goals is important and even vital when starting a new migrevaluation. So let’s say – we’ve determined the KPI. And our small kamikaze project has consistently achieved it across a defined state matrix . Now – how do we know if this achievement will scale all across our system?  After all – it’s evident that large systems require different approaches. You can’t manage a large company the same way you manage a startup. The performance and stability of a large multi-component system is based on the interactions between the multitude of its components. Testing in isolation doesn’t really prove anything.

The preachers of iteration will say: “ok then. If the sample is too small – we’ll add another component, team, service. And we’ll continue adding more – until we prove our point. Or find that the solution doesn’t scale well”  Which is a perfectly valid approach. In the world of science and experimentation. But not in the world of business and heartless financial calculations. Because if we prove ourselves wrong – we’ve already spent a lot of time and money.

In many cases what happens in such situation – is that a migration is led to a completion anyway. With some KPI mangling to make it look more like a success than a wasted effort. This happens because we’re all human and we all have loss aversion hard coded into our system. It’s much harder for us to admit failure after we’ve already envisioned success.

As we’ve seen – the ‘start small’ approach definitely has some very attractive sides, but isn’t without pitfalls. Let’s see what the alternative is.

The Grand Cutover

This approach entails an exhaustive preparation stage. First – all the migration costs are carefully evaluated. The KPIs are defined.  Then – a testing or staging environment is prepared. And only after all the tests have proven that the new platform is fully functional – we perform the grand migration!

We’ve already seen the main issues with this approach. It has high upfront costs, is perceived as hard to pull off and still – gives no promise that the migration will provide the expected benefits. The demon of loss aversion is raising its head in our psyche.

But I would argue that there are situations where investing in preparation is actually much more cost-effective than starting small and planning as we roll.

First there’s the case of life-critical systems – those systems where the cost of disruption is too high.

And second – it’s important to remember that not all migrations we perform are migrevalutaions. Some of them aren’t done to improve any business metrics. Instead they are required because:

  • the old system isn’t supported anymore
  • there’s been a company wide decision we have no influence upon
  • The migration is required by another change in a related system
  • Add your own reason here.

When this is the case – there’s no real reason to start small. Instead we want the transition to be as fast and painless as possible. With minimal downtime and no hidden hope for a rollback. And that means – we need to do everything in our power to get properly prepared for the shift. With steps being:

  • Define all the players and stakeholder influenced by the change
  • Gather their inputs and expectations from the new framework
  • Based on 2 – define the functional requirements that the new framework must implement
  • Define the test data set
  • Define and allocate the necessary resources (human, compute, storage and network)
  • Plan and implement company-wide training
  • Define the minimal time for system functionality restore
  • Rehearse the migration until the defined KPIs are consistently achieved.
  • Set the date for migration.
  • Cut over!

This is easier said than done, of course. Anyone who’s been through such a project should realize how much detail is hidden behind each of these steps. How much virtual blood, sweat and tears have to be shed in order to bring this to completion.

But on the brighter side – this is a much better planned-out process. With a defined start and end criteria, with a decisive direction. As long as we’re on track – we don’t need to re-evaluate as we go. And even if obstacles prevent us from delivering on time – we can always move the dates without compromising the content of the original plan.

Note that with all the grandeur of the task at hand  –  this planned out, monolithic  (I know, I know – this is a curse word) process involves much less heroics ( and consequently – less burnout)  than the guerilla mode of the iterative innovation.

With all that said – we all realize why this approach is out of favour nowadays. Exactly for the same reason we need to be continuously migrating.  The technological world is changing fast, the deadlines are pressing. Companies usually go into cross-the-board migrations only when  they find themselves in a near-death condition. The infamous Project Inversion at Linkedin required the infrastructure team to freeze all changes in existing systems for a few months. Only so were they able to focus on rebuilding everything for the move to microservices they had planned. And it’s not easy to convince ourselves that we need to put everything on hold for the promise of brighter future. It requires either trust or desperation.

Let’s Try To Define a System

So, with all that said – how do we define a global system for continuous migration?

  1. Embrace Continuous Migration

    • The first thing to do here is to accept the fact that migration is a continuous process. No matter of we start small or go all in – this is a work that’s never done. We’ll always have more stuff to migrate even before the current migration is over.
  2. Define Migration Strategy

    • Be very clear about why you’re entering a migration project, what type of system you are migrating, what will be the success and failure criteria and if failure is even an option.
    • Some questions to ask at that stage:
      • Is this a ‘life-critical’ system?
      • What can be considered a representative sample?
      • Is this a migrevaluation or a migration?
      • Are there alternative frameworks you’ll want to evaluate before deciding?
  3. Involve the Stakeholders

    We’ve outlined this when describing the end-to-end migration steps. But – we do believe this  to be a very important stage also when starting small. A lot of migrevaluations or side-project migrations either fail or become too costly because this stage is skipped. Take for example an infrastructure team tasked with evaluating a migration for a codebase that they have no deep understanding of. We always see much better results when developers and testers are involved from the very beginning. The have intimate knowledge of the code, it’s quirks and caveats, and of all the reasons for ugly hacks that are hidden all across the system.So please make sure you:

    • Define all the players and stakeholder influenced by the change
    • Gather their inputs and expectations from the new framework
  4. Define the KPIs and Exit Criteria

    • The intensiveness of this stage very much depends on the type of migration we’ve defined this to be in 2. Still – no matter if we start small or go all-in – we need to have a defined concept of where we want to arrive. Or at least what’s the next milestone we want to reach. And how do we decide if this is a go or  a no go.
  5. Define the Verification Strategy

    • How do we measure the KPIs and criteria we’ve defined? Options include:
        • Defining a testing data set
        • Using A/B testing
        • Using dark launching
        • Manual verification in a sandbox environment.
        • Any combination of the above.
  6. Allocate resources

    • Who is tasked with migration? Do we assign a special team? (generally an anti-pattern, in our experience). Or do we reserve some capacity of the existing teams for continuous migration activity. (The recommended approach) What non-human resources are needed for the migration effort? How scalable do we want these resources to be.
  7. Define the Knowledge Accumulation and Distribution Patterns

    This definitely depends on the migration strategy we’ve chosen. For all-in, grand cutover migrations – we want our teams to be ready when the big day arrives. Therefore this is the time to organize training, assign change agents and start preparing a corporate knowledge base for the new framework.

    If we’re starting small, evaluating and learning as we go – this is where we define best practices for progress documentation and create a migration project Wiki. Needless to say – in evaluation projects the accumulation of knowledge should be our foremost goal.

  8. Start the progress.

    We’re done with all the thinking – time to start doing. It’s important to note that our migration strategy shouldn’t directly impact our project management methods. We can perfectly well manage grand cutover projects using Kanban for splitting the work into manageable tasks, limiting WIP and verifying our progress all along the road.

  9. Plan for the next migration

We’ve already embraced the fact this was a continuous process, haven’t we?

 

Conclusion:

Migrations are an everyday part of our tech life. The stacks will continue to change and we’ll never want to be left behind. Migrations are inevitable but not easy. Different strategies and and approaches can be applied. In this post we’ve presented an attempt at creating a sequence of steps to base our continuous migration effort upon. This sequence is a result of our combined four decades of industry experience. Things we’ve seen working better and worse. Following these steps won’t guarantee a successful migration (as there are a lot of other factors involved) but can definitely make your effort less stressful and more effective.

 

Would you like some help with DevOps transformation or software delivery optimization at your company? Drop us a note – we’ll be happy to help!