DevOps for Blockchain Systems

Blockchain has become the enfant terrible of the tech world. As any conceptually new tech it poses more questions than it provides answers. But the buzz around it is more than justified. Beside the crypto-gold rush that’s definitely been the main hype-driver, this is a technology that provides a promise of a different, decentralized future. A promise of distributed, global trust based on science and technology — not on military force, geographical proximity or national identity. It remains to be proven if such trust is possible, but if it is — world economy is up for a total paradigm shift.

No wonder so many engineers all around the world are starting to build blockchain-based distributed applications (AKA Dapps). Blockchain startups are getting solid funding, and there’s of course Hyperledger — the umbrella project for a number of enterprise-oriented blockchain platforms curated by none other than the Linux Foundation.

We at Otomato are very enthusiastic about blockchain and have started building a number of our own solutions based on Hyperledger and Exonum

In parallel – we’ve also been helping a couple of startups bringing up their own blockchain-based solutions.

As consultants we are first and foremost focused on optimizing the software delivery processes. For the last decade we’ve been applying DevOps principles to all kinds of companies — chip development, IoT, web, Big Data and heavyweight enterprise software. So it was very interesting to see how and if these same principles can be applied to blockchain systems development and operation.

In addition to that — in the last year I had the privilege to watch the talks by Vivek Ganesan (of SolutionsIQ-Accenture India) and Inal Kardanov (of Waves platform) on applying DevOps practices to blockchain.

This post is the aggregation of their thoughts on the matter and our own findings.

What is Blockchain?

I won’t be explaining that one here. There are numerous explanations, posts and videos online.

But it’s important to outline the underlying principles of the technology as they influence the software delivery practices we’ll be discussing further.

And the principles are:

A decentralized network of autonomous compute and data nodes
Fully shared and transparent data (and data model)
Fully shared transaction log
Cryptographic identity and transaction validation

How is Blockchain Different?

As Vivek correctly notes — there are some fundamental differences:

1. Environment

So the basic principle of a blockchain platform is ultimate decentralization. Our platform, our production environment consists of geographically dispersed, potentially unreliable devices we don’t own. When running applications on blockchain we are effectively giving up control over the infrastructure. This is an environment that cannot be controlled or configured, we can’t force it to be updated, and it’s very challenging to collect any information regarding its behaviour.

If you think about it — this per se is not new — it’s exactly the world of desktop software and mobile applications. A multitude of OS versions, patch levels, hardware resources and no real control over updates and configuration. But as Vivek points out — this is very different than the cloud platforms in which “the app owner has either actual or contractual control over the infra or platform. In blockchain, the app owner has zero control.”

2. Data

Now this is where it really becomes interesting. A blockchain is all about data decentralization. Each instance of our app has its own instance of data, but all of this data is shared between all instances and has to be eventually consistent. This is the basic idea of a blockchain. And it’s very unlike anything we’ve seen before. It’s not the private data model in desktop applications of the past and not the centralized data sharing of modern, cloud-backed desktop and mobile apps. And of course — this is very different from a SaaS or enterprise information system with controlled data access.

3. Access

Each instance of a Dapp can access any other instance. That’s how transactions are communicated. All access is cryptographically authenticated — and as such is secure. (As long as whoever holds the encryption keys is not an attacker or a crook.) Once we have access — all data is fully accessible. Even though enterprise-oriented blockchains are introducing the concept of channels which separate the fully shared data from that shared by a limited number of participants.

Moreover — all the logs of all the apps are world-accessible and transparent. But there is no centralized log or metrics storage.

4. Reliability

We’ve already given up reliability with distributed systems we are running in the modern cloud. We do realize that network is unreliable, that compute nodes are ephemeral and even storage has become virtualized — so we’ve adapted our expectations regarding latency, throughput and consistency. Previously a basic system requirement — reliability has now become an engineering discipline base on an extensive, ever-growing body of knowledge. Google’s SRE (Service Reliability Engineering) model has become the way many organizations see the Dev-to-Ops-to-Production flow these days. And it certainly looks like blockchain systems can pose new challenges for the reliability professionals.

In blockchain nothing is reliable — neither the nodes, nor the network, neither the code deployments. And not only they are unreliable — they are also uncontrollable by us — the builders. Go ensure the reliability of your software on such an unreliable platform.

5. Deployment

The lack of control also influences the code we deploy. In SaaS systems — if there’s a bug — we simply replace the bad version with a good one. In enterprise, on-prem systems — we announce the need to update, but sometimes have to wait months or even years for the update window. Which means we have to maintain quite a number of old versions with their bugs and known pains. Sometimes this has to do with customer’s approval procedures. At one client of ours at least 2 years pass since the date a version is released and a deal for it is signed until it gets deployed onto customer’s production environment.

Sometimes the only way to force customer to upgrade is by announcing end of support for the version they are running. And even that doesn’t help.

With blockchain it’s even more acute. We have no control of what version of an app each node is running and most of the time — no way to force them to upgrade.

Moreover — if there are still transactions on the chain depending on the buggy code — that code has to stay even in the new versions, or else the old transactions will be deemed invalid. And the only way to fix this is by forking and rewriting the whole chain. All this multiplies the cost of a bug and makes testing more important than ever.

Software Delivery Practices for Blockchain Development

So how can or should we adapt our delivery practices to accommodate all these new challenges?

Separation of Concerns

When talking about software for the blockchain it’s important to provide a clear separation between 2 types of deliverables. There is the blockchain code that runs the platform itself and runs on the platform and there are client applications running elsewhere that interact with the blockchain. The client applications can be delivered by following standard software delivery practices. By contrast — the practices described further mainly apply to the blockchain code. In order to provide this separation we must make sure the client code is architecturally decoupled from the chain code. Both on data model and delivery lifecycle levels.

Testing is the Key

TDD (Test Driven Development) should be the default practice. Only by writing tests first can we somewhat ensure that our smart contracts behave the way we expect and our application doesn’t crash in the face of eventual consistency.

Containers Can Help with Functional and Load Tests

Integration and performance testing becomes ever more challenging because there is no way we can replicate the actual production network in our testing environment. Remember — it’s unreliable and not under our control. But we still are responsible for doing our best at predicting all possible network and node configurations and testing across all these. Containerization can definitely help for modeling all kinds of network topologies, but the matrix can quickly become unmanageable and defining the minimal viable amount of permutations becomes more of an art than science.

As an example — hyperledger fabric comes complete with a number of docker-compose files for bringing up a minimal network configuration to get started.

Chaos Testing

Chaos engineering as the approach of building and operating unreliable distributed systems can and should definitely be applied in order to model situations of node failures, network disruptions and malicious behaviour.

Logging

We don’t control the nodes in the network but we do have access to the logs. In many cases the logs become the only way to understand what is it our application is doing. As a result — the importance of meaningful, structured logs that can be machine-processed and used for monitoring is further multiplied.

Operation

We can not control the network, but we can participate in it — just as anyone else. According to Inal Kardanov — the Waves platform developers operate a number of nodes themselves and use those to analyze application behaviour and extract metrics from other nodes. They even use these nodes to fix faulty transactions and help other nodes achieve consensus faster.

Security

Running our own nodes or spinning up testing networks will require us to manage cryptographic identities for our testing processes. We will need those in order to create and verify transactions and roll out chain code updates. But what if the nodes holding these identities are compromised? This can threaten the whole of the network. This places heavy security restrictions on accessing such nodes. At one startup we’re working with — the nodes run in a hermetic network (no access from outside in) with automated processes within the network polling outside resources to get updates and test instructions.

Summary

Blockchain technology takes distributed systems to the edge and as such — poses new challenges to software delivery practices. Currently existing practices such as TDD and Chaos testing along with containerization technologies can help in the development phase. Meaningful, structured logs and running our own nodes can help with operational aspects. But many questions are still not answered and will have to be dealt with — if and when blockchain-based systems become more widespread and occupy their place in current financial and overall information system landscape.

Thanks a lot to Vivek Ganesan and SolutionsIQ-Accenture India for providing guidance and valuable comments when writing this post.

And thanks to Inal Kardanov and Waves platform for sharing their experience running an actual world-wide blockchain platform.