The ART of SAFe: March 2013

What you see is all there is

I've just finished reading a book called "Thinking Fast and Slow" by Daniel Kahneman. Set in the world of psychology and behavioural economics, it delves deeply into the role of cognitive bias in decision making. The driving theme in the book is the division and interplay between intuitive response and deliberate reasoning, with a particular focus on the fact that the influence of intuitive response on our reasoning is far more pervasive than we realise.

A concept Kahneman returns to repeatedly is WYSIATI - "what you see is all there is". In short, the vast majority of our reasoning is based on what is immediately apparent - unless something signals a warning sign that we should look deeper.

As I've been thinking through how to wrap this series up, one of the things I've been looking for is a "key takeaway" from my SAFE journey to-date - and as I finished the book a few days ago it struck me.

For me, the single most powerful aspect of SAFe is visibility. Of course, lean and kanban practitioners will quite rightly suggest that this is not something SAFe contributes as the power of visualisation of work in progress (WIP) and flow has been central to lean for many years. What I feel SAFe gives us, however, is a structure and an approach to providing both strategic and tactical visibility:

At the portfolio level, I can visualise the WIP and flow of multi-million dollar initiatives, the application and ratification of my investment prioritisation and the distribution of these initiatives (or parts thereof) across my programs of work.
At the program level, I can visualise the WIP and flow of significant features as they travel through my gating and funding cycles and are eventually implemented and deployed.
At the team level, I can understand my flow in the context of the teams surrounding me.

At each level, I'm looking for different insights to support different decisions and generate different feedback. What I need is an ability to provide a simple and clear enough depiction of the state of my world to enable solid intuitive decision making and ring the warning bells that will trigger deliberate reasoning at the right times.

Achieving this has without doubt been the most significant contribution of SAFe to the BI COE.

Does structure kill agility?

When I speak to agilists about SAFe (or read their blogs), there is a prevailing concern that it looks far too structured and formal and will kill off the empowerment, collaboration and innovation that Agile seeks.

The simplest response to this is to resort to complexity theory. To quote Jurgen Appelo's Management 3.0 - "Without a boundary a system lacks the drive and constraints to organise itself". Or in Glenda Eoyang's Facilitating Organization Change ... "Just as a person needs time and space to incubate thoughts before a new idea can emerge, a system needs a bounded space for the emergence of new patterns."

For a single Scrum or Kanban team, the principal is "just enough structure to enable innovation and feedback". When you start to consider hundreds or thousands of people in hundreds of teams, "just enough structure" is a little more complex.

I would contend that SAFe provides a structure and a set of constraints and boundaries that facilitates growth and change. As the growth occurs, the formality and structure can be gradually peeled back to the bare essentials necessary to support effective participation in the enterprise lifecycle.

The challenge for the SAFe coach, of course, is guiding the tuning of the formality - recognising when it is both over and under constrained and assisting the group in selecting the right adjustments and collecting feedback on their impact.

What about the results?

At this point, it is too early to provide quantitative assessment on the impact of SAFe at the portfolio level for the BI COE. It has triggered many changes which have removed waste and improved flow for the demand management group. Two months in, the manager of the group felt he had recouped roughly two full-time employees' work by eliminating administrative waste. This was supporting a shift of focus from "keeping up with admin" to "pursuing insight and improving strategic foresight".

For the Strategic Delivery release train, however, there are some hard numbers. When she presents on their SAFe journey, one of the opening lines the general manager regularly uses is a quote an executive director issued about their group a few years ago - "they're the laughing stock of [enterprise] IT".

Following are her current metrics on improvements achieved in the past 12 months:

Average delivery cycle time down from 12 months to 3 months

Frequency of delivery increased from quarterly to fortnightly
Cost to deliver down 50%
100% of projects delivered on time and on budget
Happy project sponsors (NPS 29)
Happy teams (Team NPS 43)

My favourites are the last two. For those not familiar with NPS, it represents "Net Promoter Score" - a system achieving growing popularity as a measure of customer loyalty. Customers are asked a question along the lines of "On a scale of 0 to 10, how likely are you to recommend us to a friend or colleague?" Respondents are classified as either a promoter (9 or 10), passive (7 or 8) or detractor (0 to 6). The NPS score is then calculated by deducing the percentage of detractors from the percentage of promoters (eg 20% promoters, 70% passive, 10% detractor would yield a score of 20% - 10% = 10). In the employee context, the question becomes "How likely are you to recommend working as part of Strategic Delivery to a friend or colleague"?"

I believe that whether or not you are cheaper or faster is important but secondary - whether or not you are delighting your customers/stakeholders whilst building happy teams is the true measure of your success. Whilst there is no baseline measure on the "Happy sponsors" front, as she characterises it the baseline probably would have been somewhere close to -100. When it comes to happy teams the baseline was -20, showing a massive shift for the positive in just one year.

Conclusion

My time coaching the BI COE came to an end a couple of months ago. I believe any good coach has two missions - enable your customer's success and make yourself redundant. As 2012 came to an end it became apparent my time was done and I moved on to fresh challenges closer to the heart of the enterprise. It was hard to let go, and in my ways writing this series has been cathartic for me as I relived the journey.

Of course, it continues without me. The GM of Strategic Delivery has become such a passionate believer in scaled agile that she flew out to the USA in February to join the ranks of Dean's certified SAFe consultants, and she's decided that since I'm no longer writing about her world she'd better start. To stay in touch with the continuing journey of the EDW Agile Release Train, please visit her blog.

In Part 3, we covered the program backlog lifecycle. This post will focus on implementation life and feature level visualisation. We have found that the key ingredients are:

Visualisation
Communication
Cadence
Continuous Improvement

Visualisation

Since visualisation is the enabler of so much else, it's where we'll start. Finding the right way to visualise 'in play features' involved a series of failed experiments.

The first of these failures began with us saying "well, everything else follows a kanban system for visualisation why don't we build a feature kanban wall?" So, we identified a set of lifecycle states for the feature and built the wall. It looked great, but achieved nothing. No conversations triggered, no insights generated, simply maintenance overhead. We learnt two things: firstly that at this level our interest was more in a sprint based view than a lifecycle view and secondly that we needed a finer grain.

The third incarnation delivered the answer. In large part, it was a logical extension of the 'PSI planning board' utilised to construct the overall view of the PSI during PSI planning in the standard guidance materials.

The wall is sprint/iteration based, and represents a rolling 10-sprint view of committed work in the teams. Whilst the full 10 sprints are rarely populated, it is necessary to cover the 'long tail' on enterprise deployments. The enterprise release process takes 5 sprints to conduct enterprise level shakeout and integration testing, during which time the team which built the work must maintain a typical 5-10 point per sprint capacity reservation to support testing and deployment preparation activities.

This is representative of a recurring theme throughout the lifecycle. The ideal is, of course, to minimize the number of features in flight and run a lifecycle of "start feature, build it fast, acceptance test it and leave it in a deployment-ready state before starting the next". The practical reality is, however, that dependencies outside the release train drive the pace at which any given feature can be developed. In particular, negotiation and implementation of interface contracts and provisioning of sample data is a key timing driver - particularly when the external dependency is to a part of the organisation that's running waterfall.

Overall wall structure

The columns represent sprints/iterations. They convey dates for the iteration and also denote any deployment or other significant dates which fall within it. On the image above you'll notice there's a public holiday in iteration 32 (the pink post-it) and a gateway checkpoint for enterprise release "1304" on the 28th of Jan. Iteration 33, on the other hand, has an independent deployment window (on the pink post-it for 17th Feb), a gateway checkpoint for enterprise release 1303, and a code-drop into enterprise release 1302.
The rows represent feature teams. The scrum-master of the team is responsible for keeping the content of the 'team row' up to date.
The "cells" represent a given team for a given iteration. In the top left corner of each, you'll see the "Planned Velocity" and "Committed Capacity" for the iteration for that team.

A note on capacity planning and budgetting

The train runs on a "cost per point" model, derived by summing the run cost per iteration of the combination of APMS, Deployment Services and the feature teams and dividing by the combined velocity of the feature teams.

Whilst this greatly simplifies the division of costs amongst active funding sources, it is reliant on confidence in velocity projections. Thus, a particular velocity point is used for calculation. As teams start to routinely exceed this velocity, a review cycle kicks in to determine whether the "planning velocity" can be uplifted. When the train commenced operation, this velocity was 40. 3 months later, it was raised to 45, and most recently revised up to 55. Shortfalls in individual teams are generally balanced out by overachievement in others, and by and large it works out with most epics coming in 10-20% under budget.

The other factor in planned velocity, of course, is planned leave and public holidays - which you'll see reflected in the planned velocities on the wall (quite varied with lots of annual leave during January in Australia).

Committed capacity (shown as "planned"), on the other hand, represents the "in-play" stories scheduled for the iteration. Where this exceeds planned velocity, it is either a "red-flag" for risk or an indication that the team is expecting a good iteration.

What goes on the wall?

A card on the wall can be one of 4 things:

A (green) "Discover card" representing discovery work on an epic (as described in Part 3).
A (white) "Implementation card" representing implementation (or Evolve) work on a feature
A (pink) "Defect card" representing a production defect
A (blue) "Improvement card" representing implementation of an improvement

The Team/Iteration cell

The cards inside the cell represent the work the team has planned for the iteration. They run at the feature level, and are tagged with a couple of extra pieces of information:

How many points of work will be done on the feature that iteration (either on a post-it or in the top right corner on the card)
A "completion flag" if that is the iteration when the feature will complete.

We experimented with numerous grains for this representation - both more detailed (ie what will be happening for the feature rather than just how many points) and less detailed (feature cards only go in the iteration where they will complete). In the end, it was a tradeoff between how rich the information, how much maintenance overhead it required and how visually cluttered the space was (too much information obscuring what you really wanted to see).

Strategic Insights from the Feature Wall

Any wall is measured by the conversations it facilitates and the insights it generates. We'll talk a little more about conversations in the next section, but some of the key strategic insights are:

How far out is a team committed? Where do they have capacity available and how much? Very useful when looking at new demand and understanding the best team to take it on.
What features are active in a given iteration and how much effort is planned against the feature? One of the key uses of this is ensuring working agreements for availability of feature owners can be managed with good forewarning of the periods when they will be needed.
When is a feature due to complete? Very helpful again for ensuring feature level acceptance testing commitments have been established with feature owners.
Where are we overcommitted? Are teams confident or should we be looking at finding some stories from the feature that can be taken on by another team with capacity to make sure we hit our commitments?

Tactical Insights from the Feature Wall

The grain of the current iteration (shown in the photo above) is naturally more detailed than future iterations. Expected information is:

Iteration goal for the team (written on A4 at iteration planning, stuck on part of wall not depicted)
Health-check for each feature (red/green/amber dots)
Features where all planned work for the iteration has been completed (spanked tags)
Features at risk ("Risky business" tag)
Blocked Features ("Blocked by something" tag)
Features where the feature owner is not living up to engagement expectations ("AWOL Feature Owner" tag)

Communication and Cadence

This may seem a strange combination, but in our experience very valid. If you want a "self-organising program" rather than a group of teams, constant and effective communication is vital. The trick is making it happen, and in particular helping people recognise the times when it's needed and the value of it. What we have found is that the more we invest in "cadenced communication" the more we enable "constant communication".

At the time of writing, 2 primary forms of cadenced communication are well-established and 3 are in their formative stages:

"Unity Day" - Train level sprint kickoff session described in Part 3.
The "Daily Cocktail party" - extension of Scrum of Scrums
Discipline Chapters
Cadenced Backlog grooming
Cadenced Retrospectives

Discipline Chapters

Inspired by this Spotify article, the chapters meet weekly with a mission of growing the maturity and consistency of practices in a particular discipline (eg ScrumMaster, data movement, testing etc). For more detail, the referenced article is a great read.

Cadenced Backlog Grooming

I mentioned the trials and tribulations of backlog grooming maturity in part 3, and this is the most recent concept in growing maturity in the space. The concept is to schedule synchronised backlog grooming sessions either once per week or once per iteration for each team followed by a review and update session with the train leadership group on the outputs.

Cadenced Retrospectives

This is targeted at improving the "Inspect and Adapt" feedback cycle. A constant theme for the train is beating "siloed learning" and finding ways for teams to learn from each other. In brief, all teams hold their retrospectives at the same time, then the iteration is closed out in a follow-up session facilitated by the "Release Train Engineer" where the scrum-masters bring the learnings generated from their team retrospectives and share with each other.

The Daily Cocktail Party

This is without doubt the key communication vehicle for the train. On every morning other than the first day of the iteration, the first hour works like this:

8:45am - Leadership group standup at "release Train continuous improvement wall"
9:00am - Tech Lead standup (at A0 model of warehouse with tags indicating areas of activity). Tech leads share on focus areas for the day, key technical challenges and inter-team dependencies
9:15am - All feature teams hold standups
9:30am - Scrum of Scrums at the Feature Wall. Attending by Scrum Masters, entire leadership team, APMS, Deployment Services and other team members as required. Scrum Masters speak to their current iteration cell on the feature wall and address the 3 questions for their team. Leaders and project managers listen for blockers, issues and risks.
9:45am - APMS and Deployment Services standups. Deployment Services include coverage of deployment related issues they heard at scrum of scrums while APMS balance their priorities for the day between moving card on the Program kanban and providing support for issues raised at the scrum of scrums.

There was considerable debate over the time commitment involved in this, in particular whether it should occur daily. The investment has, however, reaped untold dividends. Not only does it provide superb visibility for senior leadership, but it triggers an immense amount of cross-team communication - "we've encountered that, we'll come visit and help" is a common catchcry.

Continuous Improvement

From a continuous improvement perspective, you want three things:

Teams figure out how to be become better teams
The release train figures out how to become a better release train
Teams benefit from other teams' learning and innovation.

The first is, of course, covered off by the team retrospective. The other two, however, need attention. Built into each team's capacity planning is a 10% reservation for "innovation and contingency". Likewise, the leadership team builds into their time "10% for driving train-level improvement" through the function of team loco (introduced previously).

One of the keys here is treating improvement/innovation initiatives as first class citizens. The leadership runs an entire wall dedicated to their initiatives, and innovations feature teams commit to appear as innovation features on the feature wall.

Examples of leadership team improvements might be:

Introduce discipline chapters
Introduce cadenced backlog grooming
Engage with operations to negotiate simplification of the handover process and consolidation/simplification of support documentation

The most recent team level innovation initiative related to testing. Being a data warehouse, a significant proportion of the tests the teams wrote involved validating data integrity. One of the teams looked at it and said "we build all those rules into our data model documentation, I wonder if we can automate it". The result was to eliminate the need to implement basic tests that was consuming 30-40% of a team's test automation effort, freeing them to put more effort into feature level test automation and providing a roughly 3 point per team per iteration uplift across the entire train.

The trick, of course, is getting an innovation from one team in use by all the others - it takes time for the team that created it to educate/support others in implementing it. So, we run an "innovation cup". Inspired in part by looking at the trophy the Rally dev teams win for hackathons, we got Rally to sponsor a trophy to be held by the team with the most recent winning innovation. To capture the trophy, a team not only needs to implement a good innovation but they need to have at least one other team who has implemented it successfully.

Conclusion

Prior to the introduction of the Release Train, the primary management vehicle for the program was a weekly 3 hour management meeting attended by program attended by program management, release management and project managers. It was supported by a 40+ page status report, and was often entirely disconnected from what was actually happening in the teams. It was followed 2 days later by a "senior management' program status meeting which ran another 2 hours dealing with escalations from the main meeting.

Both meetings are now entirely gone. For archiving purposes, the status report is still produced (as a Rally extract) but the knowledge of "what is really happening" comes from the "cocktail party" on a daily basis and the standard sprint ceremonies. Most importantly, the leadership group no longer have "managing the teams" as their primary mandate - instead they focus on finding the right way to support the teams in delighting their stakeholders. Status and planning discussions simply look at the question to be answered and pick the wall with the right grain to support the discussion.

In Part 5, I'll wrap the series up with a look at some of the quantitative results and key learnings from this group's journey into SAFe.

The ART of SAFe

Friday, March 29, 2013

Scaled Agile Framework Applied 5/5 - Conclusion

What you see is all there is

Does structure kill agility?

What about the results?

Conclusion

Saturday, March 16, 2013

Scaled Agile Framework Applied 4/5 - In-play work and the program level Feature wall

Visualisation

Overall wall structure

A note on capacity planning and budgetting

What goes on the wall?

The Team/Iteration cell

Strategic Insights from the Feature Wall

Tactical Insights from the Feature Wall

Communication and Cadence

Discipline Chapters

Cadenced Backlog Grooming

Cadenced Retrospectives

The Daily Cocktail Party

Continuous Improvement

Conclusion