I have become increasingly interested in learning about Systems Thinking tools in order to be able to apply a transdisciplinary skill set to my DevOps Engineering Management and Lean IT Service Management practice.
I have decided to share these insights from my study of the subject as I think they may be useful, especially to the DevOps community.
Specifically, I want to show how making use of a Systems Thinking tool at my client helped me to better understand a failing application migration project.
Something to keep in mind as you read through the post, Systems Thinking encourages holistic feedback thinking rather than linear event based thinking and warns against the traps of dogmatism (the disregard of other perspectives when exploring a problem situation) and reductionism (the tendency to take a silo view on the problem).
There is a third thinking trap, dealing with holism and pluralism. This is a trap whereby the system thinker may become convinced that the system, or systemic intervention that is being designed to effect a strategic change is whole. That it has considered all interrelationships and all interdependencies. Also, that it is in no way partial, that it has absorbed every single perspective and claims to be all things to all people in its system boundary judgements.
This is of a course a trap as no design can make a claim to complete holism and complete pluralism.
I have been working with a client who has been doing a number application migrations from external DC (DataCentre) providers to internally managed data centres. The business driver for this activity is application security remediation and OpEx cost reduction.
My involvement has been in providing DevOps Engineering Management to influence the approach and the effectiveness of migration activities that I have been involved with.
Recently I was involved with an unsuccessful migration project and I want to show you how a Systems Thinking tool can be effective in describing the mental model of what constitutes a successful migration, why this one was not successful and how the model can be used to identify leverage points that can be changed (strategic changes) in order to be more successful on the next attempt (and with other similar, ‘messy’ migrations).
A ‘messy’ (or wicked) problem is a systems thinking term to denote a problem situation where there are multiple perspectives, multiple variables and a large degree of uncertainty. This is juxtaposed against ‘tame’ problems where there are fewer variables and a single overarching perspective. Messy problems typically need Systems Thinking to improve on them.
I want to show how causal loops in System Dynamics, a Systems Thinking tool, can model the types of interactions and resulting feedback loops which I believe contribute to the successful migration of applications out of external DCs and into either on-premise or internally managed Cloud DCs. The modelling process is a collaborative tool to illustrate the understanding of the current problem situation to the stakeholders and to assist with strategy making for an intervention. The problem is framed as one where insufficient or incorrect boundary judgements are being made by the engineering group on the system to migrate the application. To me, the word ‘boundary’ in this context has a practical utilitarian component. It should be being renegotiated in order to make the likelihood of the application migration successful.
Having the two groups, DC / Security Operations and the Application Engineering group effectively collaborating about taking collective responsibility for boundary judgements increases the likelihood of achieving a higher level of ‘requisite variety’ which is a concept formulated by William Ross Ashby, an English psychiatrist, pioneer of cybernetics (which is the study of complex systems). More on ‘requisite variety’ later.
Before I show and explain the Systems Dynamics causal loop diagram illustrating the Application DC migration dynamics, a quick reference link to Jay Wright Forrester, the founder of Systems Dynamics.
Application Migration System Dynamics Diagram :
When looking at the feedback loops, the flow is clockwise.
On the diagram, each phrase, for example ‘App Migration Backlog’ denotes a variable that can be increasing or decreasing. The increase or decrease in the variable is also the cause for an increase or decrease in the downstream linked variable based on the polarity of the causal link (+ or -).
A ‘+’ denotes that if the ‘cause’ variable increases the linked variable will also increase.
Also, if the ‘cause’ variable decreases the linked variable will also decrease.
So a ‘+’ indicates a change IN THE SAME DIRECTION between the cause and the effect.
A ‘-’ indicates a reversed polarity. So if the ‘cause’ variable increased then the linked variable will decrease.
Also if the ‘cause’ variable decreases then the linked variable will increase.
So a ‘-’ indicates a change IN THE OPPOSITE DIRECTION between the cause and the effect.
Explanation of the dynamics in the diagrams:
These diagrams in my opinion demonstrate the dynamics required for a successful application migration out of an external DC provider and into an internal on-premise or internal Cloud DC.
There are two stakeholders working together and two perspectives.
- A DC Operations infrastructure and security group.
- An Application engineering group.
There are other stakeholders involved, such as product/service management and suppliers but for the purposes of simplification I have excluded these and focussed only on the two primary collaborating groups.
The two perspectives can be described as follows
- The DC Operations group has a mandate to exit the external DC by a certain date (to achieve the OpEx cost reduction) and to make sure the application is secure in its new location. By secure I mean that the application data is in a correctly classified zone, the application is tiered into secure VLAN segments and so on.
- The Application engineering group views the application as fragile and is of the opinion that the entire application pipeline needs to be redesigned, the application needs a build server, to be packaged with a suitable taxonomy, proper configuration management practices designed and implemented, automated deployment mechanisms implemented, environment entropy removed and the relationships with the suppliers reevaluated.
The current problem situation :
The balancing dynamics of negotiation and collaboration around the different perspectives is what I refer to in the diagram as the balancing feedback loop of ‘Engineering Sophistication’ and it has a direct impact on defining the boundary judgements on the system for migration, balancing it with elements of both perspectives (DC Ops/Security and Application Engineering) and therefore increasing the level of ‘requisite variety’ required to be successful in the migration.
It is either effective in achieving this purpose – or it is not.
At this stage it is useful to explain what ‘requisite variety’ is. This is a great description that I have taken directly from the linked website of a consultancy company called ‘Requisite Variety’. I quote directly from their page, “In order to deal properly with the diversity of problems the world throws at you, you need to have a repertoire of responses which is (at least) as nuanced as the problems you face”. I also think the diagram on that page is particularly useful in understanding how sufficient responses to problems need to be present for requisite variety to be of a high enough level to be able to manage the problem situation. Take a look at it.
My first (top) Systems Dynamics diagram shows an unhealthy dynamic that exists in the ‘Engineering Sophistication’ balancing loop. There is a long delay between the DC Ops / Security Engineering groups perspective (that of, architect secure infrastructure design, then lift and shift into this design) being necessarily balanced and adjusted by an Application Engineering groups perspective (that of, refactor application, rebuild environment pipeline, promote down pipeline). This collaboration delay is front loaded in the DC Ops / Security group because as shown on the right hand side of the diagram, this is where the project management stakeholder engagement has started. The project manager engages this group first and the initial migration system design is partial to this group until the partiality is challenged by the Application Engineering group.
Unfortunately the challenge suffers from an ongoing and cyclical delay which prevents the boundary judgments on the migration system being adjusted to be less partial to the DC Ops / Security perspective and more inclusive of an Application Engineering perspective.
These delays ultimately result in migration project failure.
The strategic intervention :
Lets have a look at the two lower loops in the my lower (bottom) diagram where a strategic intervention is now being illustrated :
- Engineering Sophistication (A balancing feedback loop)
- Engineering Management Effectiveness (A reinforcing feedback loop)
The success of balancing the perspectives that influence the migration systems boundary judgements is significantly influenced by the engineering management (both the DC/Security engineering management and the Application engineering management) being able to collaborate together in such as a way that the perspective of either side is understood, contextualised, reflected on, critically challenged if there is the need and then absorbed.
This is what I refer to in the diagram as the virtuous reinforcing loop of ‘Engineering Management Effectiveness’.
The ‘Engineering Management Effectiveness’ loop runs through the ‘Engineering Sophistication’ loop influencing the balancing dynamics of the migration systems boundary judgement focus. It increases the level of requisite variety until the engineering is sophisticated enough to successfully complete the migration.
The boundary judgement focus can be balanced informally by the management team or it can critiqued and adjusted in a more structured way by using for example Werner Ulrich‘s Critical System Heuristics.
Lets have a look at the top loop in the diagram which shows the flow of a successful migration project, a loop which will complete if the strategic change is in fact effective.
- Successful Migrations (A balancing feedback loop)
This loop is a balancing loop that attempts to reduce the backlog of migration projects that the business is instructing it to complete. That is to say it wants to successfully process the migrations so as not have the backlog keep increasing. This is the loop that the business is most concerned with as it tracks the number of successfully migrated applications. However it is only successful if the loops mentioned above are effective.
The successful migrations loop is balancing because as the business Opex reduction mandate increases, the App Migration and remediation backlog increases and the Migration Project Management activity increases. Following the loop around in a clockwise manner, the stakeholder engagement increases, the requisite variety increases, the migration is successful and it comes off the backlog (so the App Migration and remediation backlog decreases).
This model shows that for migrations to be successful there needs to be collaboration between DC Ops / Security and the Application Engineering teams that is at least sufficient enough to negotiate migration system boundary judgements that will achieve the required level of requisite variety.
In the case of the failed migration project a sufficient level of requisite variety was not achieved and a different approach will need to be taken on the second attempt. This second attempt involves putting in place the engineering management loop not only as an adjunct to the project management co-ordination but as an essential and critical component. The purpose of the strategic change is to more efficiently and with less partiality allow both DC Ops/ Security and Application teams to properly absorb each others perspectives and to challenge the boundary judgements, changing them where required and necessary to design a successful system for migration.
Achieving requisite variety in a ‘messy’ problem-situation takes time and the perspectives and boundary judgement outputs must be properly negotiated to be successful.
Effective Engineering Management can reinforce the quality of that collaboration helping it along.
Patrick Hyland, Co Founder at OpsWorks Group. DevOps Engineering Management Consultant