The ART of SAFe: Scaled Agile Framework Applied 4/5 - In-play work and the program level Feature wall

In Part 3, we covered the program backlog lifecycle. This post will focus on implementation life and feature level visualisation. We have found that the key ingredients are:

Visualisation
Communication
Cadence
Continuous Improvement

Visualisation

Since visualisation is the enabler of so much else, it's where we'll start. Finding the right way to visualise 'in play features' involved a series of failed experiments.

The first of these failures began with us saying "well, everything else follows a kanban system for visualisation why don't we build a feature kanban wall?" So, we identified a set of lifecycle states for the feature and built the wall. It looked great, but achieved nothing. No conversations triggered, no insights generated, simply maintenance overhead. We learnt two things: firstly that at this level our interest was more in a sprint based view than a lifecycle view and secondly that we needed a finer grain.

The third incarnation delivered the answer. In large part, it was a logical extension of the 'PSI planning board' utilised to construct the overall view of the PSI during PSI planning in the standard guidance materials.

The wall is sprint/iteration based, and represents a rolling 10-sprint view of committed work in the teams. Whilst the full 10 sprints are rarely populated, it is necessary to cover the 'long tail' on enterprise deployments. The enterprise release process takes 5 sprints to conduct enterprise level shakeout and integration testing, during which time the team which built the work must maintain a typical 5-10 point per sprint capacity reservation to support testing and deployment preparation activities.

This is representative of a recurring theme throughout the lifecycle. The ideal is, of course, to minimize the number of features in flight and run a lifecycle of "start feature, build it fast, acceptance test it and leave it in a deployment-ready state before starting the next". The practical reality is, however, that dependencies outside the release train drive the pace at which any given feature can be developed. In particular, negotiation and implementation of interface contracts and provisioning of sample data is a key timing driver - particularly when the external dependency is to a part of the organisation that's running waterfall.

Overall wall structure

The columns represent sprints/iterations. They convey dates for the iteration and also denote any deployment or other significant dates which fall within it. On the image above you'll notice there's a public holiday in iteration 32 (the pink post-it) and a gateway checkpoint for enterprise release "1304" on the 28th of Jan. Iteration 33, on the other hand, has an independent deployment window (on the pink post-it for 17th Feb), a gateway checkpoint for enterprise release 1303, and a code-drop into enterprise release 1302.
The rows represent feature teams. The scrum-master of the team is responsible for keeping the content of the 'team row' up to date.
The "cells" represent a given team for a given iteration. In the top left corner of each, you'll see the "Planned Velocity" and "Committed Capacity" for the iteration for that team.

A note on capacity planning and budgetting

The train runs on a "cost per point" model, derived by summing the run cost per iteration of the combination of APMS, Deployment Services and the feature teams and dividing by the combined velocity of the feature teams.

Whilst this greatly simplifies the division of costs amongst active funding sources, it is reliant on confidence in velocity projections. Thus, a particular velocity point is used for calculation. As teams start to routinely exceed this velocity, a review cycle kicks in to determine whether the "planning velocity" can be uplifted. When the train commenced operation, this velocity was 40. 3 months later, it was raised to 45, and most recently revised up to 55. Shortfalls in individual teams are generally balanced out by overachievement in others, and by and large it works out with most epics coming in 10-20% under budget.

The other factor in planned velocity, of course, is planned leave and public holidays - which you'll see reflected in the planned velocities on the wall (quite varied with lots of annual leave during January in Australia).

Committed capacity (shown as "planned"), on the other hand, represents the "in-play" stories scheduled for the iteration. Where this exceeds planned velocity, it is either a "red-flag" for risk or an indication that the team is expecting a good iteration.

What goes on the wall?

A card on the wall can be one of 4 things:

A (green) "Discover card" representing discovery work on an epic (as described in Part 3).
A (white) "Implementation card" representing implementation (or Evolve) work on a feature
A (pink) "Defect card" representing a production defect
A (blue) "Improvement card" representing implementation of an improvement

The Team/Iteration cell

The cards inside the cell represent the work the team has planned for the iteration. They run at the feature level, and are tagged with a couple of extra pieces of information:

How many points of work will be done on the feature that iteration (either on a post-it or in the top right corner on the card)
A "completion flag" if that is the iteration when the feature will complete.

We experimented with numerous grains for this representation - both more detailed (ie what will be happening for the feature rather than just how many points) and less detailed (feature cards only go in the iteration where they will complete). In the end, it was a tradeoff between how rich the information, how much maintenance overhead it required and how visually cluttered the space was (too much information obscuring what you really wanted to see).

Strategic Insights from the Feature Wall

Any wall is measured by the conversations it facilitates and the insights it generates. We'll talk a little more about conversations in the next section, but some of the key strategic insights are:

How far out is a team committed? Where do they have capacity available and how much? Very useful when looking at new demand and understanding the best team to take it on.
What features are active in a given iteration and how much effort is planned against the feature? One of the key uses of this is ensuring working agreements for availability of feature owners can be managed with good forewarning of the periods when they will be needed.
When is a feature due to complete? Very helpful again for ensuring feature level acceptance testing commitments have been established with feature owners.
Where are we overcommitted? Are teams confident or should we be looking at finding some stories from the feature that can be taken on by another team with capacity to make sure we hit our commitments?

Tactical Insights from the Feature Wall

The grain of the current iteration (shown in the photo above) is naturally more detailed than future iterations. Expected information is:

Iteration goal for the team (written on A4 at iteration planning, stuck on part of wall not depicted)
Health-check for each feature (red/green/amber dots)
Features where all planned work for the iteration has been completed (spanked tags)
Features at risk ("Risky business" tag)
Blocked Features ("Blocked by something" tag)
Features where the feature owner is not living up to engagement expectations ("AWOL Feature Owner" tag)

Communication and Cadence

This may seem a strange combination, but in our experience very valid. If you want a "self-organising program" rather than a group of teams, constant and effective communication is vital. The trick is making it happen, and in particular helping people recognise the times when it's needed and the value of it. What we have found is that the more we invest in "cadenced communication" the more we enable "constant communication".

At the time of writing, 2 primary forms of cadenced communication are well-established and 3 are in their formative stages:

"Unity Day" - Train level sprint kickoff session described in Part 3.
The "Daily Cocktail party" - extension of Scrum of Scrums
Discipline Chapters
Cadenced Backlog grooming
Cadenced Retrospectives

Discipline Chapters

Inspired by this Spotify article, the chapters meet weekly with a mission of growing the maturity and consistency of practices in a particular discipline (eg ScrumMaster, data movement, testing etc). For more detail, the referenced article is a great read.

Cadenced Backlog Grooming

I mentioned the trials and tribulations of backlog grooming maturity in part 3, and this is the most recent concept in growing maturity in the space. The concept is to schedule synchronised backlog grooming sessions either once per week or once per iteration for each team followed by a review and update session with the train leadership group on the outputs.

Cadenced Retrospectives

This is targeted at improving the "Inspect and Adapt" feedback cycle. A constant theme for the train is beating "siloed learning" and finding ways for teams to learn from each other. In brief, all teams hold their retrospectives at the same time, then the iteration is closed out in a follow-up session facilitated by the "Release Train Engineer" where the scrum-masters bring the learnings generated from their team retrospectives and share with each other.

The Daily Cocktail Party

This is without doubt the key communication vehicle for the train. On every morning other than the first day of the iteration, the first hour works like this:

8:45am - Leadership group standup at "release Train continuous improvement wall"
9:00am - Tech Lead standup (at A0 model of warehouse with tags indicating areas of activity). Tech leads share on focus areas for the day, key technical challenges and inter-team dependencies
9:15am - All feature teams hold standups
9:30am - Scrum of Scrums at the Feature Wall. Attending by Scrum Masters, entire leadership team, APMS, Deployment Services and other team members as required. Scrum Masters speak to their current iteration cell on the feature wall and address the 3 questions for their team. Leaders and project managers listen for blockers, issues and risks.
9:45am - APMS and Deployment Services standups. Deployment Services include coverage of deployment related issues they heard at scrum of scrums while APMS balance their priorities for the day between moving card on the Program kanban and providing support for issues raised at the scrum of scrums.

There was considerable debate over the time commitment involved in this, in particular whether it should occur daily. The investment has, however, reaped untold dividends. Not only does it provide superb visibility for senior leadership, but it triggers an immense amount of cross-team communication - "we've encountered that, we'll come visit and help" is a common catchcry.

Continuous Improvement

From a continuous improvement perspective, you want three things:

Teams figure out how to be become better teams
The release train figures out how to become a better release train
Teams benefit from other teams' learning and innovation.

The first is, of course, covered off by the team retrospective. The other two, however, need attention. Built into each team's capacity planning is a 10% reservation for "innovation and contingency". Likewise, the leadership team builds into their time "10% for driving train-level improvement" through the function of team loco (introduced previously).

One of the keys here is treating improvement/innovation initiatives as first class citizens. The leadership runs an entire wall dedicated to their initiatives, and innovations feature teams commit to appear as innovation features on the feature wall.

Examples of leadership team improvements might be:

Introduce discipline chapters
Introduce cadenced backlog grooming
Engage with operations to negotiate simplification of the handover process and consolidation/simplification of support documentation

The most recent team level innovation initiative related to testing. Being a data warehouse, a significant proportion of the tests the teams wrote involved validating data integrity. One of the teams looked at it and said "we build all those rules into our data model documentation, I wonder if we can automate it". The result was to eliminate the need to implement basic tests that was consuming 30-40% of a team's test automation effort, freeing them to put more effort into feature level test automation and providing a roughly 3 point per team per iteration uplift across the entire train.

The trick, of course, is getting an innovation from one team in use by all the others - it takes time for the team that created it to educate/support others in implementing it. So, we run an "innovation cup". Inspired in part by looking at the trophy the Rally dev teams win for hackathons, we got Rally to sponsor a trophy to be held by the team with the most recent winning innovation. To capture the trophy, a team not only needs to implement a good innovation but they need to have at least one other team who has implemented it successfully.

Conclusion

Prior to the introduction of the Release Train, the primary management vehicle for the program was a weekly 3 hour management meeting attended by program attended by program management, release management and project managers. It was supported by a 40+ page status report, and was often entirely disconnected from what was actually happening in the teams. It was followed 2 days later by a "senior management' program status meeting which ran another 2 hours dealing with escalations from the main meeting.

Both meetings are now entirely gone. For archiving purposes, the status report is still produced (as a Rally extract) but the knowledge of "what is really happening" comes from the "cocktail party" on a daily basis and the standard sprint ceremonies. Most importantly, the leadership group no longer have "managing the teams" as their primary mandate - instead they focus on finding the right way to support the teams in delighting their stakeholders. Status and planning discussions simply look at the question to be answered and pick the wall with the right grain to support the discussion.

In Part 5, I'll wrap the series up with a look at some of the quantitative results and key learnings from this group's journey into SAFe.

The ART of SAFe

Saturday, March 16, 2013

Scaled Agile Framework Applied 4/5 - In-play work and the program level Feature wall