Interview with Madhur Kathuria

Madhur Kathuria has coached nearly 300 teams for almost 75 clients across the US, Europe, South East Asia, Malaysia and Thailand. In this interview he talks about some of the cultural challenges for agile adoption. Read it here.

Interview with Elena Yatzeck

Elena was Chief Agilist for JP Morgan Chase Treasury Services and is now a VP of Corporate Compliance Tech. Find out how JP Morgan Chase reconciles agile with compliance and risk management demands. Read it here.

Wednesday, May 16, 2018

DevOps Release Management


The purpose of release management is to ensure that the risks associated with deploying software releases are managed.
 
Waterfall software development is a phased and gated process, and so waterfall release management is also phased and implemented as a gate. Waterfall processes also seldom use much automation for testing and deployment, and so test reports tend to be manually assembled. In the end, the release management gate consists of the various stakeholders attesting that their test processes have completed satisfactorily and that all governance artifacts have been completed.
These methods work well for releases that occur a few times per year. However, organizations today are moving toward “continuous delivery”, in which releases are frequent, possibly monthly, possibly many times per day. (Amazon releases software to production every ten seconds.) Manual processes, governed by attestation meetings are too cumbersome for continuous delivery. A new approach is needed.

Managing Risk in a Continuous Manner

Continuous delivery requires that nearly all software and system tests are automated, and that all deployment processes are also automated. As such, it is expected that all of those processes are fully tested and exercised prior to production deployment. The primary means of managing risk for automated processes is that those processes are tested, and the primary metric for risk management is test coverage. This applies to all areas of risks that are amenable to automation, including functional testing, performance testing, failure mode testing, and security scanning.
 The primary metric for risk management
is test coverage.
The goal of continuous delivery is that release to production should be a “business decision” – that is, any software build that passes all of its tests should be considered releasable, and the decision whether to release (and deploy) it is therefore based only on whether stakeholders feel that the user community and other business stakeholders are ready for the new release. Risk management has been automated!
For the above process to work, the software tests must be trustworthy: that is, there must be confidence that the tests are accurate, and that they are adequate. Adequacy is normally expressed as a “coverage” metric. Accuracy is typically ensured by code review of the tests, or spot checking them, and ensuring a separation of duties so that acceptance tests for a given code feature are not written by the same people who write the application code for that feature. For very high risk code, additional methods can be used to ensure accuracy. In the end, however, tangible metrics should be used, and preferably metrics that can be measured automatically. (See the article series, Real Agile Testing in Large Organizations.)

Is Attestation Still Needed?

In a continuous delivery process, attestation is still needed, but the attestation should be on the testing process – not on the result. Specifically, risk management attestation focuses on whether the process for creating and assessing tests ensures that the tests are accurate and that they have sufficient coverage. Attestation does not occur for the releases themselves, because they arrive with too much frequency. Instead, attestation is done at a process level.

Are Gates Still Needed?

Since release to production is a business decision, humans make the decision about whether to release a given software build. In addition, there are sometimes tests that fail, or quality criteria that are not fully met, but release to production might still be justified. Therefore, for most businesses, release to production will still be governed by a gated process, even when all tests have been automated. Release to production can only be fully automated and gateless if one automates all of the production release decision criteria and places quality control on those automated decision criteria.

What About Documents?

Some things cannot be automated. For example, design documentation must be created by hand. Design documentation is important for managing the risks associated with maintainability.
The continuous delivery approach to such things is to shift assessment into the Agile sprint. As such, updating artifacts such as a design document are part of the “definition of done” for each team’s Agile stories. To manage the risk that these documents might not be updated, one embeds risk mitigation practices into the Agile team’s workflow. For example, one way to ensure that design documents are updated is to include a design review in the workflow for the team’s Agile stories. Thus, overall assessment and attestation of the risk should occur on the process – not on the set of documents that are produced. If the process is shown to be robust, then when the code passes its tests, it should be “good to go” – ready to release – assuming that releasing it makes business sense.
The surest path to unreliability is to provide
direct access to static test environments.

What About “Lower Environments”?

In many organizations that are early in their adoption of DevOps methods and therefore use static test environments, teams push code to the various test environments. That is a legacy practice that we plan to eliminate. In a DevOps approach, build artifacts (not code) are pulled into a test environment. Further, each test process is performed by a single build orchestration task (VSTS task or Jenkins job), and only that task should have the security permission required to pull into that environment. Thus, it should not be possible to push into an environment. This eliminates the need for any kind of release management for the lower environments.
Many of these issues go away once one starts to use dynamically provisioned environments. Until then, it is absolutely critical that one control updates to the various test environments, using an orchestrated process as described here. The surest path to unreliability is to provide direct access to static test environments.
-->