Friday, February 15, 2019

Dealing with Unplanned and/or BAU work in SAFe

In the Agile world in general, we have long preached the move from “Project mindset” to “Product mindset”.   “Wouldn’t the world be simpler if we just talked about work instead of projects and BAU?” is a mantra on many an agilist’s lips. 

Whilst the notion of forming teams and trains that “just do the most important work regardless of its nature” is a great aspiration, it comes with a number of caveats:

  • Funding and capitalization are generally significantly different for the two
  • Planning and commitment are difficult when some (or much) of the team’s work is unplanned

Enterprises have typically solved for the problem through structural separation.  The first step into Agile is often to move from separate “Plan”, “Build” and “Run” structures to separate “Plan and Build” and “Run” structures.  Projects are fed through “Plan and Build”, then after some warranty period transitioned to “Run”.  Funding is separate, and “Run” is driven more by SLA’s than plans.

A truly product-oriented mindset requires the establishment of teams and ARTS that can “Plan,Build and Run”, and this post will tackle in-depth the issue of planning and commitment and introduce some tools for tackling the funding side of the equation.

Funding

I’ll tackle the topic of funding in greater detail in a future post, but the short version follows.  If a backlog item is categorized, the categories can be mapped to funding constructs.  We can then take the burn-rate for a team, the percentage of its capacity dedicated to each funding construct, and allocate funding accordingly.

Planning and Commitment

Both the PI cadence of SAFe and the Sprint cadence of Scrum seem to invalidate the incorporation of BAU.  After all, if we fix our Feature priorities for 8-12 weeks in SAFe and our Story priorities for 2 weeks in Scrum how do we deal with the unplanned?

Known BAU work can be represented by planned backlog items, but the answer to unplanned work lies in the effective utilization of Capacity Allocation.  We can reserve a given percentage of the team (or train’s) capacity for unplanned work, and plan and commit based on the remaining capacity. 

Team-level Illustration: Production Defects

One of the first benefits we find with persistent teams is that we can feed production defects back to the team responsible for introducing them.  This provides them with valuable feedback, typically dramatically improving quality. 

We might reserve 10% of team capacity to cater for this.  Thus, if the team’s velocity is 40 they would only plan to a velocity of 36 and reserve 4 points for production defects. 

Mechanically, the following occurs:

  • If less than 4 points of production defects arrive, the team pulls forward work from the following sprint.
  • If more than 4 points of defects arrive, the Product Owner makes an informed decision: defer new feature work or defer low-priority defects.

My preferred implementation of this technique is slightly different.  A number of times, we have reserved the 10% for a combination of Production Defects and Innovation.  If the team has shipped clean code, they get to work on their innovation ideas rather than pulling forward work!

ART-level illustration: BAU work

When staffing ARTs, we often find that some (or many) of the key staff are only available “if they bring their BAU work with them”.  In these cases, we plan the known BAU work and apply PI-level capacity allocation based on the percentage of their capacity we feel is needed to cater to expected “unplanned BAU” loads and withhold this when planning out the PI.

Dealing with fluctuations in unplanned work levels at the ART/PI level is a little more consequential.  Whilst the sprint-to-sprint mechanism of the production defect illustration still applies, we need to be monitoring for potential impact on PI objectives.

  • If less than the expected amount of unplanned work arrives for the team, we have the option to either use the spare capacity to absorb work from other teams struggling with their PI objectives or pull forward Features from future PI’s.
  • If more than the expected amount arrives, we are monitoring impact on committed objectives.  We can cater to a certain amount by sacrificing capacity allocated to stretch objectives, but if we are at risk of compromising committed objectives this should trigger a management decision to determine whether to defer or deflect the unplanned work or compromise a committed objective due to the significance of the unplanned work.

Discipline is a must

Applying these techniques will quickly run into a challenge.  Teams are often sloppy with BAU/unplanned work.  They “just do it”, viewing the effort of creating, sizing and running backlog items for it as unnecessary overhead.  This leaves us without the visibility required for the deliberate, proactive decision making illustrated above and often somewhat embarrassingly at the end of the Sprint or PI apologizing for missing a commitment “because BAU was more than expected” without any hard data to back it up and even more importantly without having given the Product Owner/Product Manager/Business Owners the opportunity to intervene and deflect the unplanned work to enable us to maintain the commitment.

Further, I find most teams dramatically underestimate the capacity consumed by BAU work.  We’ve routinely worked with teams who set a capacity of 30% aside for BAU, then when they’ve finally missed enough objectives to buy into actually tracking their BAU work find it to be 50-60%. 

However, the true benefit of discipline goes further – the data generated is a goldmine.


Reaping the Benefit of Discipline

Whilst the first benefit of discipline is obviously that of gaining an accurate understanding of your capacity and being able to more confidently make and keep commitments, exponential gains can be realized once you start to analyze the data generated.  A key first step is developing an awareness of failure demand and value demand.

Failure Demand vs Value Demand

Failure demand is demand caused by a failure to do something or do something right for the customer” – John Seddon
The first illustration that was given to me for failure demand many years ago was in the context of call centers.  It’s the 2nd and 3rd phone call you have to make because your issue wasn’t fully resolved on the first call.   If we take a typical agile team or ART, we can find many examples:

  • A late-phase defect is caused by failure to “build quality in”.
  • A production defect is caused by failure to deploy a quality product
  • A request for information is caused by failure to have provided that information previously or failure to have made the requester aware of where the information is published
  • An issue is often caused by failure to effectively mitigate a risk
  • Time spent issuing reminders or nagging is failure demand, as more effectively establishing the awareness of the “why” and clearly setting the expectation would have avoided it.
  • Managing the politics of a missed commitment results from both failure to meet the commitment and failure to effectively manage the possibility that the commitment would be compromised.

Value demands are demands from customers that we ‘want’, the reason we are in business” – John Seddon
Value demand for teams and ARTs should be obvious – the features and stories the teams are working on!  However, this can become a little more nuanced very quickly:

  • Is work done on an improvement initiative value demand?  Our customer probably didn’t directly ask for it.  In fact, many improvement initiatives are effectively failure demand as they are driven by addressing previous failures.
  • A great deal of BAU/Unplanned work is falsely perceived as value demand.  “I run this script or extract every morning”, “We produce and consolidate this report every month” are all great examples.  In theory someone values the result of the script or extract, and values the report – but the need to dedicate capacity to it results from a failure to automate it, or failure to fix a broken process.

Applying the Insights from Demand Patterns

Assuming we’ve had the discipline to channel all demand on a team through their backlog, and the further discipline to categorize it appropriately as failure or value demand, we can now start to drive significant improvement on the following basis:

  • If I reduce failure demand, I have more capacity to devote to value demand
  • If I find a more effective way to respond to value demand, I have more capacity to devote to value demand

In “Four Types of Problems: from reactive troubleshooting to creative innovation”, Lean expert Art Smalley defines a hierarchy of problem types and accompanying resolution strategies.  Three of these are pertinent to this situation:

  • Type 1: Troubleshooting – “Reactive problem solving based upon quick responses to immediate symptoms”.
  • Type 2: Gap from Standard – “Structured problem solving focused on problem definition, goal setting, root causes analysis, counter-measures, checks, standards and follow-up activities
  • Type 3: Target Condition – “Continuous improvement that goes beyond existing performance of a stable process or value stream.  It seeks to eliminate waste, overburden, unevenness, and other concerns systemically,  rather than responding to one specific problem”.

When you form a good Agile team, their ability to jump to each other’s aid, rally around problems and move from individual work to teamwork tends to exhibit a lot of troubleshooting – particularly in the case of unplanned work.  Good troubleshooting skills are fundamental to any team.  As Smalley comments, “to address each [issue] with a deeper root cause problem-solving approach would require tracking and managing a problem list that runs, literally, hundreds of miles long.  No organization can hold that many problem-solving meetings … in an efficient manner”.

Our response to most failure demand is to apply troubleshooting techniques.  However, while these will help us survive the prevailing conditions they won’t help us change them.  Change requires the use of Type 2 problem solving techniques.  We need to leverage our data to identify recurring trends, and act to remove the root cause of the failure demand.  Smalley devotes great attention to problem definition, and opens with two pieces of critical advice when framing the problem for attention:

  • “The first step is to clarify the initial problem background using facts and data to depict the gap between how things should be (current standard) versus how they actually are (current state).
  • “Why does this problem deserve time and resources?  How does it relate to organizational priorities?  Strive to show why the problem matters or else people might not pay attention or might question the problem-solving effort.”

As we are successful with the reduction of failure demand with our Type 2 activities, we can move on to Type 3 problem solving, driving activity to establish new target conditions.  If we accurately understand the capacity being devoted to various types of value demand we can more accurately assess whether the value being generated justifies the capacity being consumed – triggering informed continuous improvement.   An enterprise PMO we have been working with provided a wonderful example recently:

They had historically applied a QA process to every project the organization ran.  This, of course, was characterized as “BAU” work.  It had to be done every time a project passed through a particular phase in lifecycle.  As they gathered data on how much of their capacity it actually consumed, they started to question the value proposition.  How regularly did the QA check actually expose an issue?  What were the typical consequences of the issues exposed?  What other high-value discretionary activities were unable to proceed due to capacity constraints?  Eventually, they were able to make an informed decision to move to a sampling approach, freeing up more capacity to devote to high-value initiatives they had been frustrated by an inability to proceed with.

Conclusion

Capacity allocation allows us to deal with BAU/Unplanned work, but my experience has been that it never works well without the accompanying discipline of actually channeling that work formally through your backlog.  It might require some creativity to make it meaningful (eg a single backlog item for the sprint representing the capacity devoted to a daily BAU activity).  Beginning with the reduction of failure demand in BAU/Unplanned work will both improve performance and free capacity which can then be devoted to true continuous improvement initiatives.

However, the usefulness of the Sprint or PI cadence-driven cycle seems to fall apart at the point where more than 30-40% of capacity is being reserved for unplanned work.  Some form of cadence-driven alignment cycle will always be valuable, but adaptation from the standard events and agendas will be necessary to make them meaningful and Kanban is far more likely to provide a useful lifecycle model.  The ARTs I have worked with in this situation have tended to wind up with shortened planning events far more focused on “priority alignment” than detailed planning.

Above all, the benefit comes from the mindfulness generated in the presence of data reflecting “where you really spend your time” as opposed to “what your value priorities are”, and the accompanying discipline of acting on that data to achieve better alignment.

No comments:

Post a Comment