Monday, January 30, 2017

Revamping SAFe's Program Level PI Metrics Part 4/6 - Quality

The Systems Science Institute at IBM has reported that the cost to fix an error after product release was four to five times as much as one uncovered during design, and up to 100 times more than one identified in the maintenance phase”- iSixSigma magazine

Series Context


Given the central nature of the “build quality in” mindset to Lean and Agile, my early drafts of the metrics dashboard devoted 3 full categories to quality:
  • Technical Health 
  • Quality 
  • Deployment Health 
The “quality” aspect of the original cut took a lean lens on the traditional “defect/incident” quality metrics, whilst the other two focused on technical quality and “devops” type quality respectively.

I was fortunate enough to both get some great review feedback from +Dean Leffingwell on the drafts and spend some time at a whiteboard brainstorming with him. He dwelt on the fact that I had “too many metrics and too many quadrants” :) As we brainstormed, we came to two conclusions. Firstly, the 3 concepts listed above were just different perspectives on quality – and secondly, we could separate my individual metrics into “the basics everyone should have” and “the advanced things people should have but might take time to incorporate”. The result is the set of basic and advanced definitions below.

One might question the incorporation of highly technical metrics in an executive dashboard, however there are three very good reasons to do so:
  • If our technical practices are not improving, no amount of process improvement will deliver sustainable change. 
  • If our teams are taking lots of shortcuts to deliver value fast, there is no sustainability to the results being achieved and we will wind up “doing fragile not agile”. 
  • If the executives don’t care, the teams are unlikely to. 
The only non-subjective way I know to approach this is through static code analysis. Given the dominance of Sonarqube in this space, I have referenced explicit Sonarqube measures in the definitions. Additionally, effective adoption of Continuous Integration (CI) amongst the developers is not only a critical foundation for DevOps but also an excellent way to validate progress in the “build quality in” mindset space.

On the “traditional quality measurement” front, my core focus is “are we finding defects early or late”? Thus, I look to both evaluate the timing of our validation activities and the level of quality issues escaping the early life-cycle. For deployment health, all inspiration was sourced from DevOps materials and as we re-structured the overall model it became apparent that many of these measures really belonged in the “Speed” quadrant – all that remained in the quality quadrant was clarity on production incidents.

Basic Definitions

Basic Metrics Rationale

Unit Test Coverage %

As I regularly inform participants in the training room, "if you do not aggressively pursue automated testing your agile implementation will fail!"  It is impossible to sustainably employ an iterative and incremental approach to software development without it.

Static analysis tools will not tell you the quality of the unit tests or the meaningfulness of the coverage, but simply having coverage will give the developers confidence to refactor - the key to increasing maintainability.  It should also increase the ratio of first fix resolution, giving confidence that defects can be resolved fast and minor enhancements made without causing unintended side effects.
Further, even if automated functional tests are still on the to-do list, testers who can read unit tests will be able to more effectively adopt risk-based manual testing and thus reduce manual test effort.

Mean Time Between Green Builds (mins)

Note that many ARTs will implement multiple CI cycles – local ones executing on branches and a central master cycle on the mainline.   Whilst branch-level CI cycles might be of interest at the team level, the only one we are interested in at the ART level is the master on the mainline.

Red CI builds are of course an indicator of poor developer quality practices (failure to locally validate code prior to check-in), and most believe the full CI cycle should occur in under 10 minutes to provide an adequate level of timely feedback to the developers, but failure on either of these fronts will naturally extend the time between green builds, so they need not be discretely measured on the dashboard.

Mean Time to Recover from Red build (mins)

Two things will cause this metric to trend in the wrong direction.  One is lack of the Andon mindset (its someone else's fault, or even worse its always red, just ignore it).  The second is failure to regularly commit, resulting in complex change-sets and difficult debugging.  The second is easily identified through the mean time between Green Builds, so the metric enables measurement of the establishment of the Andon mindset among developers.

Late Phase Defects #

The identification and resolution of defects during the execution of a story is evidence of good team quality practices, and should be excluded from any strategic treatment of defect trends.  However, defects identified in functionality associated with a story after its acceptance or in late-phase (integration, performance, security, UAT etc) testing are indicators of a failure to "build quality in".   
Whilst many teams do not formally log defects identified during story development, where this is done there will be a need for classification in the defect management system to separate late phase defects for reporting purposes.

Validation Capacity %

Great agile means a story is accepted once it is in production.  Good agile means it is accepted once it is ready for production.  For most enterprises in the early years of their agile adoption, this seems like a fairy-tale - the DevOps definition of "Unicorns" such as Amazon and Netflix resonates strongly!   
The reality is for some time there will be testing and packaging activities which get batched up and executed late in development.  Typical examples include:
  • User Acceptance Testing - of course, the Product Owner as the embedded customer is meant to do this in good agile but for many they are neither sufficiently knowledgeable nor sufficiently empowered.
  • Integration Testing - in theory redundant if the team is practicing good full-stack continuous integration.  But for all too many, environment constraints prohibit this and lead to extensive use of stubs until late phase.
  • Performance Testing - for many organisations, the performance test environments are congested, hard to book, and take days if not weeks to configure for a performance test run.  
  • Penetration Testing - a highly specialised job with many organisations possessing a handful of skilled penetration testers spread across thousands of developers.
  • Release Documentation
  • Mandated Enterprise Level Integration and Deployment preparation cycles for all changes impacting strategic technology assets.
Given that the backlog "represents the collection of all the things a team needs to do" , all of these activities should appear in backlogs, estimated and prioritized to occur in the appropriate iterations.   It is a simple matter to introduce a categorization to the backlog management tool to flag these items as hardening activities.

Average Severity 1 and 2 Incidents per Deploy

High severity incidents associated with deployments are a critical quality indicator.  Measurement is generally fairly trivial with the appropriate flagging in incident management systems.  However, some debate may exist as to whether an incident is associated with a deployment or simply the exposition of a preexisting condition.  An organisation will need to agree on clear classification standards in order to produce meaningful measures.

Advanced Definitions

Advanced Metrics Rationale

Duplication %

Duplicate code is bad code.  Its simple.  One line of duplicated business logic is a time-bomb waiting to explode.  If this number is trending down, its an indicator developers are starting to refactor, the use of error-prone copy/paste techniques is falling and the maintainability of the source code is going up.  Its potentially debatable whether one measures duplicate blocks or duplicate lines, but given the amount of logic possible to embed in a single line of code I prefer the straight up measurement of duplicated lines.  

Average Cyclomatic Complexity

Cyclomatic complexity is used to measure the complexity of a program by analyzing the number of linearly independent paths through a program's code.    More complexity leads to more difficulty in maintaining or extending functionality and greater reliance on documentation to understand intent.  It can be measured at multiple levels, however from a dash-boarding perspective my interest is in function or method level complexity.  

Average Branch Age at Merge (days)

This metric may take a little more work to capture, but it is well worth the effort.  The modern ideal is of course not to branch at all (branching by abstraction), however the technical sophistication required by developers to achieve this takes some time to achieve.  
Code living in a branch is code that has not been integrated, and thus code that carries risk.  The longer the code lives in a branch, the more effort it takes to merge it back into the mainline and the greater the chance that the merge process will create high levels of late-phase defects.
Whiteboard spotted at Pivotal Labs by @testobsessed

Fault Feedback Ratio (FFR) %

When it comes to defects, we are interested in not just when we find them but how we respond to them.  In his book "Quality Software Management vol 2: First-Order Measurement, Gerry Weinberg introduced me to the concept (along with many other fascinating quality metrics).  Our goal is to determine what happens when we address a defect.  Do we resolve it completely?  Do we introduce other new defects in resolving the first one?  A rising FFR value can indicate poor communication between testers and developers, hacked-in fixes, and deterioration in the maintainability of the application among other things.  According to +Johanna Rothman in this article (), a value of <= 10% is a good sign.
Measuring it should be trivial with appropriate classifications of defect sources and resolution verification activities in the defect management system.

Average Open Defects #

When it comes to open defects, one needs to make a number of local decisions.  Firstly, what severity are we interested in?  Restricting it to high severity defects can hide all kinds of quality risk, but at the same time many low severity defects tend to be more matters of interpretation and often represent minor enhancement requests masquerading as defects.
Further, we need to determine whether we are interested in the open count at the end of the PI or the average throughout the PI.  A Lean focus on building quality in leads me to be more interested in our every-day quality position rather than what we've cleaned up in our end-of-PI rush.


More than for any other quadrant, I wrestled to find a set of quality metrics small enough not to be overwhelming yet comprehensive enough to provide meaningful insight.  At the team level, I would expect significantly more static code analysis metrics (such as “Code Smells”, “Comment Density” and “Afferent Coupling” ) to be hugely valuable.  Kelley Horton of Net Objectives suggested a Defect Density measure based on “# of production defects per 100 story points released”, and “% capacity allocated to technical debt reduction”.  For further inspiration, I can recommend nothing so much as the “Quality Software Management” series by +Gerald Weinberg.

You should name a variable with the same care with which you name a first-born child” – Robert C. Martin, Clean Code

Wednesday, January 25, 2017

Revamping SAFe's Program Level PI Metrics Part 3/6: Culture

"Organizational culture can be a major asset or a damaging liability that hinders all efforts to grow and become more successful. Measuring and managing it is something few companies do well." - Mark Graham Brown, Business Finance Magazine


After exploring the Business Impact quadrant in Part 2 of this series, our focus now moves to Culture. I have been involved with over 30 release trains since I started working with SAFe in early 2012, and I have come to the passionate belief over that time that positive movement in culture is the most accurate predictor of sustained success.

While most agree that it is impossible to truly measure culture, there are certainly indicators that can be measured which help us in steering our path.

In selecting the mix of measures proposed, I was looking for a number of elements:
  • Are our people happy?
  • Are our stakeholders happy?
  • Are we becoming more self-organizing?
  • Are we breaking down silos?

The basic metrics address the first 2 elements, while the advanced metrics tackle self-organization and silos.

Basic Definitions

Basic Metrics Rationale

Team Net Promoter Score (NPS) - "Are our people happy?"

In his book The Ultimate Question 2.0, Fred Reichheld describes the fashion in which many companies also apply NPS surveys to their employees - altering the question from "how likely are you to recommend [Company Name]" to "how likely are you to recommend working for [Company Name]".

My recommendation is that the question is framed as "how likely are you to recommend being a member of [Release Train name]?". Survey Monkey provides a very easy mechanism for running the surveys.

For a more detailed treatment, see this post by my colleague +Em Campbell-Pretty. Pay particular attention to the value of the verbatims and the inclusion of vendor staff in the survey – they’re team members too!

As a coach, I often ponder what “mission success” looks like. What is the moment when the ART I’ve been nurturing is set for greatness and my job is done? Whilst not enough of my ARTs have adopted the team NPS discipline to give me great data, I have developed a belief based on the data I do have that the signal is the moving Team NPS above +20.

Business Owner Net Promoter Score (NPS) - "Are our stakeholders happy?

This is a more traditional treatment of NPS based on the notion that business owners are effectively internal customers of the ART. The question is framed as "how likely are you to recommend the services of [Release Train Name] to a friend or colleague?"

If you’re truly serious about the Lean mindset, you will be considering your vendors when you identify the relevant Business Owners for this metric. There is vendor involvement in virtually every ART I work with, team-members sourced from vendors are a key part of our culture, and vendor management need to be satisfied the model is working for their people and their organization.

Staff Turnover %

In one sense, this metric could be focused on "Are our people happy", however I believe it is more holistic in nature. Staff turnover can be triggered either by people being unhappy and leaving, or by lack of organizational commitment to maintaining long-lived train membership. Either will have negative impacts.

Advanced Definitions

Advanced Metrics Rationale

Developer % (IT) - "Are we becoming more self-organizing?"

When an ART is first formed it classically finds “a role in SAFe” for all relevant existing IT staff (often a criticism of SAFe from "anti-SAFe crowd"). However, as it matures and evolves the people might stay but their activities change. People who have spent years doing nothing but design start writing code again. Great business analysts move from the IT organisation to the business organisation. Project managers either return to a practical skill they had prior to become project managers or roll off the train. In short, the only people who directly create value in software development are software developers. All other IT roles are useful only in so far as they enable alignment (and the greater our self-organisation maturity the less the need for dedicated alignment functions). In short, if we seek true productivity gains we seek a greater proportion of doers.

One of my customers started using this metric to measure progress on this front and I loved it. One of the early cost-saving aspects of agile is reduction in management overhead, whether it be the instant win of preventing duplication of management functions between the implementing organization and their vendors or the conversion of supervision roles (designers, project managers) to contribution roles.

Obviously, this is a very software-centric view of the ART. As the “Business %” metric will articulate, maturing ARTs will tend to deliberately incorporate more people with skills unrelated to software development. Thus, this measure focuses on IT-sourced Train members (including leadership) who are developers.

As a benchmark, the (Federal Government) organization who inspired the incorporation of this metric had achieved a ratio of 70%.

Business % - "Are we breaking down silos?"

While most ARTs begin life heavily staffed by IT roles, as the mission shifts towards global optimization of the “Idea to Value” life-cycle they discover the need for more business related roles. This might be the move from “proxy Product Owners” to real ones, but equivalently and powerfully sees the incorporation of business readiness skill-sets such as business process engineering, learning and development, marketing and other business readiness type skills.

Whilst the starting blueprint for an ART incorporates only 1 mandatory business role (the Product Manager) and a number of recommended business roles (Product Owners), evolution should see this mix change drastically.

The purpose of this measure could easily have been written as "Are we achieving system-level optimization?", however my personal bent for the mission of eliminating the terms "business" and "IT" led to the silo focus in the question.


When it comes to culture, I have a particular belief in the power of a change in language employed to provide acceleration. A number of ARTs I coach are working hard to eliminate the terms “Business” and “IT” from their vocabulary, but the most powerful language change you can make is to substitute the word “person” for “resource”!

Series Context

Part 1 – Introduction and Overview
Part 2 – Business Impact Metrics
Part 3 – Culture Metrics (You are here)
Part 4 – Quality Metrics
Part 5 – Speed Metrics 
Part 6 – Conclusion and Implementation

Instead of trying to change mindsets and then change the way we acted, we would start acting differently and the new thinking would follow.” David Marquet, Turn the Ship Around.

Thursday, January 19, 2017

Revamping SAFe's Program Level PI Metrics Part 2/6: Business Impact

Managers shape networks’ behavior by emphasizing indicators that they believe will ultimately lead to long term profitability” – Philip Anderson, Seven Levers for Guiding the Evolving Enterprise


In Part 1 of this series, we introduced the Agile Release Train (ART) PI metrics dashboard and gave an overview of the 4 quadrants of measurement. This post explores the first and arguably most important quadrant – Business Impact.

As you may have guessed from the rather short list, there can be no useful generic set of metrics. They must be context driven for each ART based on both the mission of the ART and the organisational strategic themes that gave birth to it. As Mark Schwartz put it so elegantly in The Art of Business Value, “Business value is a hypothesis held by the organization’s leadership as to what will best accomplish the organization’s ultimate goals or desired outcomes”.



Fitness Function

When reading The Everything Store: Jeff Bezos and the Age of Amazon, I was particularly taken by the concept of the fitness function. Each team was required to propose ".. A linear equation that it could use to measure its own impact without ambiguity. … A group writing software code for the fulfillment centers might home in on decreasing the cost of shipping each type of product and reducing the time that elapsed between a customer's making a purchase and the item leaving the FC in a truck". Amazon has since moved to more discrete measures rather than equations (I suspect in large part due to the bottlenecks caused by Bezos' insistence on personally signing off each team's fitness function equation), but I believe the “fitness function mindset” has great merit in identifying the set of business impact metrics which best measure the performance of an ART.

To illustrate based on three ART's I work with:
  • An ART at an organisation which ships parcels uses "First time delivery %". They implement numerous digital features enabling pre-communication with customers to avoid delivery vans arriving at empty houses. Moving this a percentage point has easily quantifiable bottom-line ROI impacts.
  • An ART focused on Payment Assurance at an organisation which leverages "Delivery Partners" to execute field installation and service work. Claims for payment submitted by these partners are complex and require payment within tight SLA's. A fitness function based on Payment lead time and cost savings based on successful claim disputes would again easily be mapped to quantifiable ROI.
  • A telco ART focused on self-service diagnostics for customers. The fitness function in this case would reference “reduced quantity of fault-related calls to call centers” (due to the customer having self-diagnosed and used the tool to make their own service booking if required), “reduced quantity of no-fault-found truck rolls” (due to the tool having aided the customer in identifying ‘user error’), “increased first call resolution rates for truck rolls” (due to the detailed diagnostic information available to service technicians).
Considerations when selecting fitness function components
Obviously, the foremost consideration is identifying a number of components from which one can model a monetary impact. However, I believe two other factors should be considered in the identification process:
  • Impact on the Customer
  • Ensuring a mix of both Leading and Lagging Measures
Net Promoter Score (NPS) is rapidly becoming the default customer loyalty metric, and whilst Reichheld argues in The Ultimate Question 2.0 that mature NPS implementations gain the ability to quantify the value of a movement in a specific NPS demographic I have yet to actually work with an organization that has reached this maturity. However, most have access to reasonably granular NPS metrics. The trick is identifying the NPS segments impacted by the customer interactions associated with the ART’s mission and incorporating those measures.

When it comes to identifying useful leading metrics, there can be no better inspiration than the Innovation Accounting concepts explained by Eric Ries in The Lean Startup. In some cases (particularly Digital), it can also be as simple as taking the popular Pirate Metrics for inspiration. For many trains with digital products, I also believe abandonment rate is an extremely valuable metric given that an abandoned transaction tends to equate to either a lost sale or added load on a call center.

Program Predictability

This is the standard proxy result measure defined in SAFe. It is a great way of ensuring focus on objectives whilst leaving space for Product Owners and Managers to make trade-off calls during PI execution. In short, I paraphrase it as "a measure of how closely the outcomes from PI execution live up to the expectations established with Business Owners at PI planning and how clear those expectations were".

But wait, there's more!

A good train will use far more granular results metrics than those listed above. Each feature should come with specific success measures that teams, product owners and managers should be using to steer their strategy and tactics (fuel for another post), but I am seeking here a PI level snapshot that can be utilized consistently at portfolio levels to understand the success or otherwise of investment strategy.

A closing note on the Fitness Function

I believe the fitness function definition should be identified and agreed at the ART Launch Workshop. Well-launched ARTs will have all the key Business Owners present at this workshop, and I strongly believe that agreement on how the business impact of the ART will be measured is a critical component of mission alignment.

Series Context

Part 1 – Introduction and Overview
Part 2 – Business Impact Metrics (You are here)
Part 3 – Culture Metrics
Part 4 – Quality Metrics
Part 5 – Speed Metrics 
Part 6 – Conclusion and Implementation

The gold standard of metrics: Actionable, Accessible and Auditable ... For a report to be considered actionable it must demonstrate clear cause and effect … Make the reports as simple as possible, so everyone understands them … We must ensure that the data is credible” – Eric Ries, The Lean Startup

Saturday, January 14, 2017

Revamping SAFe's Program Level PI Metrics Part 1/6 - Overview

"Performance of management should be measured by potential to stay in business, to protect investment, to ensure future dividends and jobs through improvement of product and service for the future, not by the quarterly dividend" - Deming

Whilst the Scaled Agile Framework (SAFe) has evolved significantly over the years since inception, one area that has lagged is that of metrics. Since the Agile Release Train (ART) is the key value-producing vehicle in SAFe, I have a particular interest in Program Metrics - especially those produced on the PI boundaries.

In tackling this topic, I have numerous motivations. Firstly, the desire to acknowledge that it is easier to critique than create. I have often harassed +Dean Leffingwell  over the need to revamp the PI metrics, but not until recently have I developed a set of thoughts which I believe meaningfully contribute to progress. Further, I wish to help organisations avoid falling into the all-too-common traps of mistaking velocity for productivity or simply adopting the default “on time, on budget, on scope” and phase gate inheritance. It is one thing to tout Principle 5 – Base milestones on objective evaluation of working systems, and quite another to provide a sample set of measures which provide a convincing alternative to traditional milestones and measures.

Scorecard Design

It is not enough to look at value alone. One must take a balanced view not just of the results being achieved but of the sustainability of those results. In defining the PI scorecard represented here, I was in pursuit of a set of metrics which answered the following question:

"Is the ART sustainably improving in its ability to generate value through the creation of a passionate, results-oriented culture relentlessly improving both its engineering and product management capabilities?"

After significant debate, I settled on 4 quadrants, each focused on a specific aspect of the question above:

For each quadrant, I have defined both a basic and advanced set of metrics.  The basics represent “the essentials”, the bare minimum that should be measured for a train.  However, if one desires to truly use metrics to both measure and identify opportunities for improvement some additional granularity is vital – and this is the focus of the additional advanced metrics.

Business Impact

Whilst at first glance this quadrant might look sparse, the trick is in the “Fitness Function”. Wikipedia defines it as “a particular type of objective function that is used to summarise, as a single figure of merit, how close a given design solution is to achieving the set aims”. Jeff Bezos at Amazon quite famously applied it, insisting that every team in the organization developed a fitness function to measure how effectively they were impacting the customer. It will be different for every ART, but should at minimum identify the key business performance measures that will be impacted as the ART fulfils its mission.


The focus in culture is advocacy. Do our people advocate working here? Do our stakeholders advocate our services? Are we managing to maintain a stable ART?


For quality, our primary question is “are we building quality in?” Unit Test coverage demonstrate progress with unit test automation, while “Mean time between Green Builds” and “MTTR from Red Build” provide good clues as to the establishment of an effective Continuous Integration mindset. From there we look at late phase defect counts and validation capacity to understand the extent to which our quality practices are “backloaded” – in short, how much is deferred to “end-to-end” feature validation and pre-release validation activities. And finally, we are looking to see incidents associated with deployments dropping.


This quadrant is focused on responsiveness - how rapidly can our ART respond to a newly identified opportunity or threat?  Thus, we start with Feature Lead Time - "how fast can we realise value after identifying a priority feature?". Additionally, we are looking for downward trends in time spent “on the path to production”, mean time to recover from incidents and frequency of deployments as our Devops work pays dividends.


In parts 2 through 5 of this series, I will delve into each quadrant in turn, exploring the definitions of and rationale for each measure and in part 6 wrap it all up with a look at usage of the complete dashboard.

Series Context

Part 1 – Introduction and Overview (You are here)
Part 2 – Business Impact Metrics
Part 3 – Culture Metrics
Part 4 – Quality Metrics
Part 5 – Speed Metrics 
Part 6 – Conclusion and Implementation 

"Short term profits are not a reliable indicator of performance of management. Anybody can pay dividends by deferring maintenance, cutting out research, or acquiring another company" – Deming

Thursday, January 5, 2017

Tips for Designing and Leveraging Great Kanban Boards


I’ve been working on an article about the SAFe Program Kanban, and found myself mixing in a number of basic Kanban techniques. As I read through the (overly lengthy) first draft and realised the fuzzy focus being caused by a mix of “Kanban 101” and “Program Kanban”, I found myself reflecting on the fact that a lot of people kind of “fall into Kanban”. The two most common cases I encounter are the dev team that evolves their Scrum implementation to the point that the arbitrary batching mechanism of the Sprint Boundary seems suboptimal and the Agile Release Train (ART) Product Management team taking their first crack at a Program Kanban. For whatever reason, many start to use it without ever understanding any of the basic tools available other than “use WIP limits”.

In this article, I’m going to cover two of the basic Kanban concepts every team should take advantage of and a third which tends to be more applicable for strategic Kanban systems than those operated at the dev team level.

Doing and Done

One of the simplest improvements you can make to a Kanban is the separation of states into “Doing” and “Done”. This separation enables far more accurate visualization of the true state of a piece of work, adds immensely to the granularity of the metrics that can be derived, and most importantly is critical to enabling the move from "push" to "pull" mindset.
Consider the simple yet very common 2-state Team Kanban below:

When the developer completes story S1, they will signal this by pushing it into Test. However, the system is now lying. The fact that the developer has placed it in test does not mean testing has commenced (the clue lay in the use of the word "pushed").

Consider an alternative:

Now, when the developer completes story S1, they place it in "Dev Done". This sends a signal to the testers that when they are ready, they can "pull" story S1 into Test. If we see a big queue building in Dev Done, we can see a bottleneck emerging in front of Test. If (over time), we discover that stories are spending significant amounts of time in "Dev Done" it should trigger some root cause analysis.

You could also achieve the same effect by making a 4 state Kanban as follows:

  • Dev
  • Ready for Test
  • Test
  • Ready for Acceptance
To be brutally honest, the difference is intellectual. Aesthetically, I tend to prefer the “Doing|Done” approach, partially because it leaves me with less apparent states in the Kanban and mainly because I tend to assign WIP limits spanning “Doing and Done”. In fact, when designing complex Kanbans I will often use a mix of “Single State” and “Multi-State (Doing|Done)” columns from a clarity perspective. The “Single State” columns tend to be those in which no activity is occurring – they’re just a queue (eg “Backlog”).

Exit Policies

The creator of the Kanban Method (David Anderson) identified 5 core properties of successful Kanban implementations, one of which was “Make Process Policies Explicit”. In designing the states of your Kanban system, you are beginning to fulfill this need by making the key stages of your workflow explicit (and supporting another of the key properties – “Manage Flow”). For the evolving Scrum Team, this is often sufficient as it will be supported by their “Definition of Done” (another explicit policy).

However, at the strategic level (or for a Kanban system that crosses team boundaries) we benefit by taking it to another level and defining “Exit Policies”. An Exit Policy is effectively the “Definition of Done” for a single state. Whilst it is up to the team member(s) (or Teams) exactly how they do the work for that state, it is not considered “Done” until it meets the exit policies for the state. These policies should be made visible as part of the Kanban design, and should be subject to review and evolution as part of continuous improvement activities. In the words of Taichi Ohno – “Without standards there can be no Kaizen”.
Note the explicit exit policies below each state heading in this Portfolio Kanban


Every piece of information you can add to a Kanban board is valuable in conveying extra information to support conversation and insight. Most teams are familiar with the practice of creating a set of laminated “avatars” for each team member. When a team member is participating in the work on a card, they add their avatar to the card as a signal. Thus, anyone observing a Kanban and wanting to know who is working on a card gets instant gratification. Incidentally, this is for me one of the biggest failure areas of digital Kanban boards. To my knowledge, the only digital Kanban tool that supports multiple avatars on a single card is LeanKit – a very strange condition in a world centred on collaboration :)

Now to extend the concept. There is no reason to restrict avatars to the representation of individuals. If we create avatars for Dev Teams, we can (for example) understand which dev teams are involved in the implementation of a feature on a feature Kanban. Take it up a layer, and we can create avatars for ARTs and other delivery organizations. Suddenly, we can look at a portfolio Kanban and understand which delivery organizations are involved in the implementation of an Epic.

The cards above are Epics on a Portfolio Kanban.  The "Standard Avatars" (with pictures) represent individual BA's, whilst the smaller solid color avatars represent the impacted delivery organisations (an ART in one case, Business Units in others)


There are many more tips and tricks to creating powerful Kanban visualisations, but these are the three I find myself introducing again and again as I help Scrum teams leverage Kanban techniques and ART Leadership teams implement strategic flow Kanban systems.

Always remember, as +Inbar Oren put it so well, a good Kanban board TALKS:
  • Tells
  • Always Visible
  • Lives
  • Keeps it Simple
  • Self-explanatory

Tuesday, January 3, 2017

Bringing your SAFe PI Plan to life during execution


The SAFe PI planning event leaves every team on the Release Train with a beautifully organized backlog identifying everything they need to do together to succeed in meeting their PI objectives.

However, as we’ve learnt from military strategist Helmuth von Moltke, “no plan survives contact with the enemy!” The scale and energy of the event makes it easy to fall into the trap of believing we’ve created a plan that will survive and inherited mindset drives the belief we “now have a plan to manage compliance to”. It’s easy to miss the fact that what teams are committing to is their objectives. The carefully identified backlogs and dependencies are simply things we have produced to help us understand the outcomes we believe are possible and a point-in-time representation of the best approach to achieving them.

Successful execution relies on us remembering that backlogs must live. Estimates will be wrong, stories will have been missed, we will discover more innovative ways of achieving the objectives.  In short, circumstances will change. PI planning provides a great start by launching the PI with 5 to 10 teams having beautifully coordinated and well-groomed backlogs but success rests on what we do with them once we leave the room.

All too often, I see new teams and trains treat the PI plan as a static artifact. Your plan should always reflect your current best understanding of the situation, and the primary representation of your plan in SAFe is your backlog. Having previously covered the ongoing lifecycle of the individual backlog item, I will focus here on the of principles and practices I have found most useful in helping SAFe implementations manage the plan as a whole.

The importance of visual management

Any scaled agile implementation will inevitably wind up using a digital tool. But PI planning itself gives us the clue – it’s an amazing collaborative planning event enabled by physical visualization of the plan. If you want your plan to live, physical visualization is essential. By all means, be responsible and keep your digital tool up-to-date – but recognize it for the information refrigerator it is. I’ve seen all the tools used both well and poorly, but I’ve never seen a tool enable effective ongoing collaborative planning to anywhere near the level and efficiency possible with physical visualization

The Team PI Plan – from Planning Room to Team Room (PI level visualisation)

I have a starter template for team level plan visualisation that I suggest to all the teams I coach. It works best when a large horizontal space is available and it can be organised in one continuous run, but most teams can find space for it with a little ingenuity. As a template, it looks a little like the following:

Every area except the Inbox and Overflow (covered later in the article) will contain an indication of planned velocity and load (as represented by current contents). Additionally, PI objectives should be somewhere prominent (preferably next to current sprint).

Team Master Builders: Mobile wall, top row is current sprint, middle is next spring, bottom future sprints

Team Olympus: Current to future from right to left

Team Nintendo: Current sprint on right window, next sprint on middle, future on left

Additional details for each area follow:

Current Sprint 

This is the visualization every team should have for their current sprint – often known as the Scrum board. Not the focus of this article, but it’s always good to bring it to life.
Olympus Current Sprint: Take every opportunity to build team identity through visuals

Next Sprint

The upcoming sprint gets slightly more focus than other future sprints, as this is where we are getting our stories to “Definition of Ready”. Typical swimlanes might be “Prioritised”, “Specifying” and “Ready. Operation of this wall is covered in my last post on story lifecycle

Team Pixar: Next sprint helping manage movement of stories to definition of ready

Future Sprints

These areas basically mirror the content from the PI plan. The backlog items and their intended sprints are carried out of PI planning and placed straight into their spot in the team area. As each sprint concludes, the cards move to the right to indicate the shrinking number of future sprints left in the PI.

Extended Backlog Tools

Given that the backlog represents “all the work” for the team, it is a great place to capture other activities which are not necessarily either classical “user stories” or “tech stories”. Working with ARTs, we regularly devise “specialty card types” to help them articulate and manage their plans. The two I use most frequently are “Commitment Cards” to help with team-level dependency visualisation and “Capacity Cards” to represent “work that’s not necessarily very agile but definitely necessary and often responsible for killing a team if they fail to plan for it”.

Commitment Cards

I was introduced to this concept by Craig Larman and Bas Vodde’s Scaling Lean and Agile Development, and have found it invaluable in SAFe. The identification of dependencies on the Program Board during PI planning (and fun with red wool) is one of the most visually powerful tools in SAFe, but little formal coverage is given to reflecting them at the team level.

The notion of a commitment card is that when one team makes a commitment to another they place a (0 point) story in their backlog describing the commitment made. The recipient also records the commitment in their own backlog as a 0 point story.

This assists on a number of fronts. Firstly, a dependency is rarely on a single story – it typically operates at a higher level of abstraction. Secondly, it becomes a first class citizen in the team’s backlog refinement activities.

Finally, it assists teams populating the program board with achieving an appropriate level of granularity when reflecting their dependencies. The commitment should already be written at the appropriate summary level of granularity, and can simply be duplicated onto the program board. If the commitment card shifts in the course of planning, its twin on the program board needs to shift.

Capacity Reservation Cards

Many teams have work which cannot specifically be planned for but is nonetheless reasonably predictable in size and timing and must be catered to. Whilst team-level capacity allocation is one option for dealing with this, it’s a reasonably blunt instrument which does not cater to situations where the scale of the reservation required varies from sprint to sprint.

The most common cause of this comes in the form of hardening activities. We all know good agile doesn’t rely on hardening, but many enterprises face the harsh reality that the elimination of extended hardening periods will require dedicated work over the course of years. For example, I routinely work with organisations which have some form of enterprise-wide release process that spans 6-12 weeks. We tend to find that the capacity of a train is divided between implementing the features for the current PI, supporting the progress of features from the previous PI through the enterprise release process, and providing warranty support for the previous release. This is particularly prevalent where mainframes, ERP systems and other major COTS implementations form a major component of the technology stack.

The bulk of work for teams arising out of these processes will be unplanned (eg defect fixes, incident resolution), but the intensity of activity and likely volume of work is more predictable. In these situations, we will place “Capacity Reservation Cards” in the backlog setting aside a certain number of points in the team’s capacity to be available.

An additional use we make of this is setting aside capacity for the team to participate in ideation or “discovery workshops” on upcoming features as they are prepared for the next PI. Whilst the IP sprint in theory provides capacity for this work, it has the effect of creating a late-stage batching effect on team involvement in feature preparation. Inserting “Feature Discovery” items in the backlog allows time to be set aside for the team sprint by sprint and spread these activities over the whole PI.

On a closing note, the usage of this technique enables one of my favourite PI metrics – “% of capacity required for hardening support”. Downward trends in this metric indicate the effectiveness (or otherwise) of efforts to bring forward quality and minimise the need for downstream hardening.

Managing Change

Whilst as agilists we “welcome changing requirements”, there is a reality that at scale these changes will have impacts and often those impacts will extend beyond the team directly dealing with the change. A chaotic world where the backlog is constantly changing and the impact of these changes is not discovered until late in the PI is not a great thing to have.

As such, I have found two tools to be very useful – the “Inbox” and the “Overflow”. The Inbox provides us with a mechanism for managing the introduction of new items to the backlog, and the Overflow helps provide visibility of backlog items at risk of not being completed within the PI.


As new stories are identified by the product owner (or others), they are placed in the Inbox as a holding area. The team can then establish ceremonies with the Product Owner for incorporation of new items into the backlog. Common agreements would include the team (and PO) achieving shared understanding of the backlog item, converting it to user voice form, estimating it and deliberately considering the prioritisation as it is inserted into the backlog.
Team Pixar Inbox and Overflow
A little humor from Team Aliens: Guess which is the overflow :)


As reality occurs, new backlog items are discovered, actual velocity is realised and planned velocity is updated this area comes into play. Any backlog items for the PI which will not fit based on the planned velocity are moved to the overflow to highlight the priority decisions being made by the Product Owner.

Backlog Refinement

Whilst supporting the sensitivity that led to this practice acquiring its new name, the unfortunate reality is that it most commonly seems to lead to people missing the key activity. Knowing that story elaboration is taken care of through specification workshops, the primary focus for refinement becomes the management of content and priority of the PI backlog for the team.

I generally place this on cadence either once per week or once per sprint as a one hour session. It is the responsibility of the product owner, but best performed as a shared activity with the scrum master. As this is an alignment activity, I also look for all teams on the train to schedule their backlog refinement session on the same day (leaving it open to the team what time of day). Motivations for this will become clear when we proceed to Program Backlog Refinement.

Key activities will include:

  • Updating velocity forecasts: This is obviously data provided by the scrum-master, but may shift either based on yesterday’s weather or as team leave comes into focus. 
  • Process the inbox: All stories from the inbox which have passed through the established entry conventions should be inserted into the backlog in the relevant sprint based on the product owner’s priorities. 
  • Adjust scheduling to match capacity: Based on the updated velocity forecasts (and any newly incorporated stories), any overloaded sprints should have stories shuffled back down to future sprints until the load level falls below forecast velocity (or other agreed load constraints). This will often result in stories from the final sprint in the PI being moved into the overflow. 
  • Validate commitment cards: Cast an eye over all the commitment cards in the backlog, asking the following questions: 
    • “Will we still be able to honour this commitment?” 
    • “Will this reprioritisation affect a commitment we’ve made?” 
    • “Do our changed circumstances mean we no longer require another team to meet the commitment they have made to us?” 
  • Adjust priorities: Now that the backlog has incorporated the inbox and been adjusted based on capacity and commitments, review for priority updates. At a bare minimum, take a good hard look at the overflow area and see whether any of those cards need to be traded back in and others demoted. 

On a final note, I suggest using a flagging system during refinement. Place a flag (and accompanying note) on any card that moves. This supports the following:
  • Updating the digital tool to reflect any changes made to the physical backlog. 
  • Updating the team at standup the next day on the outcomes of the refinement session. 
  • Informing program backlog refinement. 

Program Backlog Refinement

Whilst not formally specified in SAFe, I have found this to be an essential extension as a synchronization event. You will recall that all teams put their team level backlog refinement session on cadence to occur on the same day. We leverage this by placing the program backlog refinement ceremony on the following day. Quite often, we settle on Thursdays for teams and Fridays for the program. This is based on the fact that teams commonly adopt “Wednesday to Tuesday” sprints. You’re then refining the backlog the day after Sprint Planning (to reflect any updates coming out of the planning session) and on day 7 of the sprint (by which time you’ve discovered some new surprises and have a good clue of any stories which might slip the sprint boundary).

The ceremony has two phases – often with slightly different participation:

Team Updates (<30 minutes)

Attended by the RTE, Product Manager, ScrumMasters and Product Owners.

All teams provide an update of any outcomes from their backlog refinement session which may affect other teams. The focus is on moving or at-risk commitments. Typically held at the program level visualization (whether it be program board or something else).

Overflow Review (30-45 minutes)

Attended by the Product Manager, Product Owners, RTE and potentially other stakeholders.

This is a “management by walking around” exercise. The group moves from team area to team area, with a particular focus on the contents of the overflow areas. Using the flags applied during team backlog refinement, the product owner walks through the outcomes of the previous day’s refinement and validates their prioritization calls.

It provides the group with insight into areas where the overflow is mounting or critical commitments are entering high-risk status. This information can then be fed into the release management meeting to support program level decision making on moving scope to other teams or establishing effective mitigation strategies.


An agile plan lives. In the spirit of transparency, it always reflects the current reality and beliefs of the train. It is critical on many fronts:
  • Supporting a team maintaining ownership of their larger commitments, avoiding the trap of living only for the current sprint and losing sight of their commitment to a broader mission. 
  • Supporting ongoing synchronisation of plans across all the teams in a train 
  • Supporting effective trade-off decision making by the Release Management Team 
  • Providing early warning to support effective mitigation activities as the plan fails to survive contact with the enemy. 

The PI planning event provides an amazing springboard, but effective planning must be continuous and based on a transparent reflection of reality. Utilising cadence and synchronisation as enablers, the practices described above are simple to implement and will provide astonishing contributions to maintaining the alignment, transparency and focus of a Release Train.