Is your engineering team overworked and underperforming? Maybe you’re allocating its capacity wrong. Read on to see why.
Yossi can be called an industry veteran. He’s been in various engineering roles for over a decade and in the last 5 years moved on to leading engineering teams. He’s smart and experienced. He knows how work is done. He has lived through hundreds of ‘unexpected’ production incidents and is very pragmatic in his effort estimations.
And still – his team is overworked, underpowered and late on deadlines time after time.
As I sit with Yossi and his colleagues, brainstorming the ways of optimizing their flow, I ask 3 simple questions:
- How much of your team’s capacity is spent on support?
Yossi scratches his head and says, “About 20%, I guess.” “Is this measured?” – I ask.
“No,” he says,”but I think in general this is true”.
- How much of your team’s capacity is reserved for learning?
Yossi laughs uneasily and says, “we learn on the job”. “That’s great!” I say. “But how do you account for it when evaluating your tasks?”. He doubts, “we don’t. To be honest – I learn most of the things on my spare time. We’ve been so busy lately”
- How much of your team’s capacity is dedicated to research?
He doubts for a moment, “How is this different from learning?” “Well,” I say, “research is when you look for that next thing you should learn. It is an exploratory activity, it’s about asking the question: what should I learn? While learning is taking that thing you found and and actually understanding it deeply.”
“Oh,” he says, “that’s too advanced. We don’t really have methods around that here. As I said – we learn when we need something done.”
Yossi is not an exception. In fact his team is probably even better off than most IT engineering departments out there. While we gloriously talk about DevOps, Agile and Lean, the general reality is much bleaker, with teams continuously overloaded, deadlines blown and engineers left burnt out and unmotivated.
And all this comes down to a severe misunderstanding of engineering teams capacity. Most manager out there fall into one of the following traps:
We Just Go As Fast As We Can
No capacity measurement is performed. Effort estimation is done reliant on previous experience and sincerely good intentions to be fast and efficient. But we all know where this road leads. Sometimes this is combined with a distorted version of kanban: “Our engineers are responsible for pulling work from backlog when they are ready”. But the truth is that the backlog is looming, business realities interfere and we find ourselves impatiently nudging the engineers. Or asking them to switch to another task because “it’s an emergency”. The cost of context switch is, of course, usually ignored. That’s when WIP (Work In Progress) becomes an actual horse race whip to ensure everybody is always busy and not taking too long to complete a task.
We Measure Burndown
Many folks do take measurement more seriously, realizing the difference between mindless and mindful collaborative work (more on this in a separate post). It’s usually the same folks who take Agile seriously, really trying to make some sense of all those rituals and divine terminology. So they arm themselves with the burndown chart and try to make sense of team capacity and velocity. Then they either: try to adapt sprint content to whatever they find or try to improve velocity by removing bottlenecks, introducing automation but mostly decorating the walls with motivational slogans.
What you Measure Is …
In general – if you’re serious about improving your process – any measurement is better than none at all. But when it comes to measurement – the usual warning is “what you measure is all you’ll get”. So it’s not how we measure, and not even whether we measure or do not. It’s rather how we define work that is being measured, what we do and especially what don’t take into account.
The three questions I asked Yossi point at exactly those things we don’t account for. Those vital, necessary things that distinguish between great, productive engineering teams and all the rest.
In general I see engineering work as a table that has four legs: development, support, research and learning. The trouble is when we measure our team’s capacity, we usually only measure the development capacity, sometimes allowing some time for support. And largely disregard research and learning, assuming they will somehow happen on their own, without assigning them any resources.
And that’s exactly how we get those overloaded, stressed out, under-performing engineering teams.
So how do we realize our teams’ true capacity?
Let’s answer the questions:
- How much of your team’s capacity is spent on support?
In most organizations – support is something that special teams are assigned to do. For years we’ve been trying to barricade developers from support work by building tiers upon tiers of protection. But this practice has long been proven ineffective. All it leads to is customer dissatisfaction and hostility between teams. Part of the devops mindset is developers owning their service all the way to production and doing on-call rotation. Forward looking organizations are now switching to modern collaborative support models such as intelligent swarming.
On the other hand – the whole idea of DevOps and SRE models is that ops start thinking more like devs, focusing less on support and more on building and running the platform.
All this leads to a lot of confusion regarding the share of support work a modern engineering team should do. But here are the facts: if you’ve built anything that has active users – be it a service, a product, a process, or a platform – you’ll need to provide support. You’ll need to answer questions, update documentation, explain and listen. In fact, support is probably the most important part of engineering work – that’s how we learn what users want and need. Otherwise – we aren’t really doing our work, there is no ownership, and eventually – there will be no users.
So no matter if your team is defined as dev, ops or devops – you have to account for support work. And the faster you move, the more active and lively your product is – the more support you need to provide. My rule of thumb for any engineering team starting with healthy capacity planning is allocating at least 30% of their time to ongoing support. High performers with well-defined support katas can reduce this to 20%, but starting with anything less than 30% is just tricking yourself into overload and stress.
- How much of your team’s capacity is reserved for learning?
The concept of a “learning organization” was initially described by Peter Senge in his influential book “The Fifth Discipline”. In modern times any organization must become a learning organization in order to survive. Even more so – an IT organization. The pace of innovation is exhilarating and if there’s one thing no engineer can afford – it is to stop learning. Learning can take many forms : online courses, organized classroom training sessions, post mortem meetings or team learning exercises such as DiRT or game days. But while we all know we must be always learning, we seem to forget all about it when planning our work. Somehow courses and knowledge-sharing sessions always seem to interfere with deliveries and deadlines. Because they aren’t accounted for! As training providers we at Otomato often get requests from our customers to break the workshops we facilitate into half-days, sometimes spread out across a whole month. Because “we cant let them take a whole day off!”
Instead of seeing learning as necessary evil that you somehow must squeeze in between “the real work” we must allocate defined percentage of our team’s time to it. Because it is learning that makes a great engineering force! It is learning that gives our team a competitive edge. How much time? 20% as popularized by Google is a great rate – so that you know that your engineers get at least one day of learning each week.
- How much of your team’s capacity is dedicated to research?
As I explained to Yossi – research and learning are not the same. Research is when you look for that next thing you should learn. We traditionally call our programming departments R&D – Research and Development (evidently to differentiate from support), but then we call programmers developers, we don’t call them researchers. As Avishay Ish-Shalom put it in his keynote at DevOpsDays TLV last year : “What happened to R in R&D?!” In some organizations research is outsourced to system architects – those smart folks who haven’t written a line of production code for years. That’s exactly how we get dozens of microservices where a monolith would do a much better job.
Instead the best engineering teams out there do their own research – they put research tasks on their backlog and prioritize them no lower than building new features or fixing non-critical bugs.
While research is an exploratory activity – it is harder to measure and quantize than learning. Still starting with at least 10% of your team capacity for research projects is a healthy recommendation. With time this should and will grow.
It is important to note that in modern reality we’re not talking about traditional development teams only. Ops and devs grow closer together – now we’re all in engineering – and research becomes a necessity for us all.
And finally
- How much of your team’s capacity is left for development?
The math here is simple: 100 – (30+20+10) = 40! The actual development capacity of a great engineering team is only 40%. That is the amount of time we can invest in building new stuff, in fixing bugs, in running tests and attending meetings where we discuss interfaces, algorithms and conventions.
Trouble is – in many places only this development part is thought of as work. So we plan for 100% but we only get 40%.
No wonder we are stressed out and late on deliveries. No wonder our migration projects fail, and by the time we get to adopt a new technology it is already obsolete.
The Path of True Capacity
Now, if you recognized your team in this grim picture you are probably asking yourself – “how do I start planning my team’s work according to its true capacity?”
It’s far from simple: how do you find 60% more time when you’re already constantly overloaded?
The first thing to do is of course to start the dialogue. Learning, research and support must be brought out into the bright light as first-class citizens. Each time effort estimations are required, remind everyone that the silent 60% have to be accounted for. This won’t be taken lightly, initially you’ll get massive pushback. But with time, the annoyance will pass, the load on the team will gradually adapt more and more to its true capacity. And that’s when the team will start showing great results.
Here’s a diagram to remind us all of what our work really consists of. (You’ll notice I pumped up research to 15%, because I’m a curious soul, and I really want to see more research everywhere I go:)):
Reminds of a peace sign, right? In fact – this is the correct balance that will give your team inner peace and the real feeling of flow : the necessary conditions for productive, enjoyable and creative work.