The ART of SAFe

Friday, March 29, 2013

Scaled Agile Framework Applied 5/5 - Conclusion

What you see is all there is

I've just finished reading a book called "Thinking Fast and Slow" by Daniel Kahneman. Set in the world of psychology and behavioural economics, it delves deeply into the role of cognitive bias in decision making. The driving theme in the book is the division and interplay between intuitive response and deliberate reasoning, with a particular focus on the fact that the influence of intuitive response on our reasoning is far more pervasive than we realise.

A concept Kahneman returns to repeatedly is WYSIATI - "what you see is all there is". In short, the vast majority of our reasoning is based on what is immediately apparent - unless something signals a warning sign that we should look deeper.

As I've been thinking through how to wrap this series up, one of the things I've been looking for is a "key takeaway" from my SAFE journey to-date - and as I finished the book a few days ago it struck me.

For me, the single most powerful aspect of SAFe is visibility. Of course, lean and kanban practitioners will quite rightly suggest that this is not something SAFe contributes as the power of visualisation of work in progress (WIP) and flow has been central to lean for many years. What I feel SAFe gives us, however, is a structure and an approach to providing both strategic and tactical visibility:

At the portfolio level, I can visualise the WIP and flow of multi-million dollar initiatives, the application and ratification of my investment prioritisation and the distribution of these initiatives (or parts thereof) across my programs of work.
At the program level, I can visualise the WIP and flow of significant features as they travel through my gating and funding cycles and are eventually implemented and deployed.
At the team level, I can understand my flow in the context of the teams surrounding me.

At each level, I'm looking for different insights to support different decisions and generate different feedback. What I need is an ability to provide a simple and clear enough depiction of the state of my world to enable solid intuitive decision making and ring the warning bells that will trigger deliberate reasoning at the right times.

Achieving this has without doubt been the most significant contribution of SAFe to the BI COE.

Does structure kill agility?

When I speak to agilists about SAFe (or read their blogs), there is a prevailing concern that it looks far too structured and formal and will kill off the empowerment, collaboration and innovation that Agile seeks.

The simplest response to this is to resort to complexity theory. To quote Jurgen Appelo's Management 3.0 - "Without a boundary a system lacks the drive and constraints to organise itself". Or in Glenda Eoyang's Facilitating Organization Change ... "Just as a person needs time and space to incubate thoughts before a new idea can emerge, a system needs a bounded space for the emergence of new patterns."

For a single Scrum or Kanban team, the principal is "just enough structure to enable innovation and feedback". When you start to consider hundreds or thousands of people in hundreds of teams, "just enough structure" is a little more complex.

I would contend that SAFe provides a structure and a set of constraints and boundaries that facilitates growth and change. As the growth occurs, the formality and structure can be gradually peeled back to the bare essentials necessary to support effective participation in the enterprise lifecycle.

The challenge for the SAFe coach, of course, is guiding the tuning of the formality - recognising when it is both over and under constrained and assisting the group in selecting the right adjustments and collecting feedback on their impact.

What about the results?

At this point, it is too early to provide quantitative assessment on the impact of SAFe at the portfolio level for the BI COE. It has triggered many changes which have removed waste and improved flow for the demand management group. Two months in, the manager of the group felt he had recouped roughly two full-time employees' work by eliminating administrative waste. This was supporting a shift of focus from "keeping up with admin" to "pursuing insight and improving strategic foresight".

For the Strategic Delivery release train, however, there are some hard numbers. When she presents on their SAFe journey, one of the opening lines the general manager regularly uses is a quote an executive director issued about their group a few years ago - "they're the laughing stock of [enterprise] IT".

Following are her current metrics on improvements achieved in the past 12 months:

Average delivery cycle time down from 12 months to 3 months

Frequency of delivery increased from quarterly to fortnightly
Cost to deliver down 50%
100% of projects delivered on time and on budget
Happy project sponsors (NPS 29)
Happy teams (Team NPS 43)

My favourites are the last two. For those not familiar with NPS, it represents "Net Promoter Score" - a system achieving growing popularity as a measure of customer loyalty. Customers are asked a question along the lines of "On a scale of 0 to 10, how likely are you to recommend us to a friend or colleague?" Respondents are classified as either a promoter (9 or 10), passive (7 or 8) or detractor (0 to 6). The NPS score is then calculated by deducing the percentage of detractors from the percentage of promoters (eg 20% promoters, 70% passive, 10% detractor would yield a score of 20% - 10% = 10). In the employee context, the question becomes "How likely are you to recommend working as part of Strategic Delivery to a friend or colleague"?"

I believe that whether or not you are cheaper or faster is important but secondary - whether or not you are delighting your customers/stakeholders whilst building happy teams is the true measure of your success. Whilst there is no baseline measure on the "Happy sponsors" front, as she characterises it the baseline probably would have been somewhere close to -100. When it comes to happy teams the baseline was -20, showing a massive shift for the positive in just one year.

Conclusion

My time coaching the BI COE came to an end a couple of months ago. I believe any good coach has two missions - enable your customer's success and make yourself redundant. As 2012 came to an end it became apparent my time was done and I moved on to fresh challenges closer to the heart of the enterprise. It was hard to let go, and in my ways writing this series has been cathartic for me as I relived the journey.

Of course, it continues without me. The GM of Strategic Delivery has become such a passionate believer in scaled agile that she flew out to the USA in February to join the ranks of Dean's certified SAFe consultants, and she's decided that since I'm no longer writing about her world she'd better start. To stay in touch with the continuing journey of the EDW Agile Release Train, please visit her blog.

Saturday, March 16, 2013

Scaled Agile Framework Applied 4/5 - In-play work and the program level Feature wall

In Part 3, we covered the program backlog lifecycle. This post will focus on implementation life and feature level visualisation. We have found that the key ingredients are:

Visualisation
Communication
Cadence
Continuous Improvement

Visualisation

Since visualisation is the enabler of so much else, it's where we'll start. Finding the right way to visualise 'in play features' involved a series of failed experiments.

The first of these failures began with us saying "well, everything else follows a kanban system for visualisation why don't we build a feature kanban wall?" So, we identified a set of lifecycle states for the feature and built the wall. It looked great, but achieved nothing. No conversations triggered, no insights generated, simply maintenance overhead. We learnt two things: firstly that at this level our interest was more in a sprint based view than a lifecycle view and secondly that we needed a finer grain.

The third incarnation delivered the answer. In large part, it was a logical extension of the 'PSI planning board' utilised to construct the overall view of the PSI during PSI planning in the standard guidance materials.

The wall is sprint/iteration based, and represents a rolling 10-sprint view of committed work in the teams. Whilst the full 10 sprints are rarely populated, it is necessary to cover the 'long tail' on enterprise deployments. The enterprise release process takes 5 sprints to conduct enterprise level shakeout and integration testing, during which time the team which built the work must maintain a typical 5-10 point per sprint capacity reservation to support testing and deployment preparation activities.

This is representative of a recurring theme throughout the lifecycle. The ideal is, of course, to minimize the number of features in flight and run a lifecycle of "start feature, build it fast, acceptance test it and leave it in a deployment-ready state before starting the next". The practical reality is, however, that dependencies outside the release train drive the pace at which any given feature can be developed. In particular, negotiation and implementation of interface contracts and provisioning of sample data is a key timing driver - particularly when the external dependency is to a part of the organisation that's running waterfall.

Overall wall structure

The columns represent sprints/iterations. They convey dates for the iteration and also denote any deployment or other significant dates which fall within it. On the image above you'll notice there's a public holiday in iteration 32 (the pink post-it) and a gateway checkpoint for enterprise release "1304" on the 28th of Jan. Iteration 33, on the other hand, has an independent deployment window (on the pink post-it for 17th Feb), a gateway checkpoint for enterprise release 1303, and a code-drop into enterprise release 1302.
The rows represent feature teams. The scrum-master of the team is responsible for keeping the content of the 'team row' up to date.
The "cells" represent a given team for a given iteration. In the top left corner of each, you'll see the "Planned Velocity" and "Committed Capacity" for the iteration for that team.

A note on capacity planning and budgetting

The train runs on a "cost per point" model, derived by summing the run cost per iteration of the combination of APMS, Deployment Services and the feature teams and dividing by the combined velocity of the feature teams.

Whilst this greatly simplifies the division of costs amongst active funding sources, it is reliant on confidence in velocity projections. Thus, a particular velocity point is used for calculation. As teams start to routinely exceed this velocity, a review cycle kicks in to determine whether the "planning velocity" can be uplifted. When the train commenced operation, this velocity was 40. 3 months later, it was raised to 45, and most recently revised up to 55. Shortfalls in individual teams are generally balanced out by overachievement in others, and by and large it works out with most epics coming in 10-20% under budget.

The other factor in planned velocity, of course, is planned leave and public holidays - which you'll see reflected in the planned velocities on the wall (quite varied with lots of annual leave during January in Australia).

Committed capacity (shown as "planned"), on the other hand, represents the "in-play" stories scheduled for the iteration. Where this exceeds planned velocity, it is either a "red-flag" for risk or an indication that the team is expecting a good iteration.

What goes on the wall?

A card on the wall can be one of 4 things:

A (green) "Discover card" representing discovery work on an epic (as described in Part 3).
A (white) "Implementation card" representing implementation (or Evolve) work on a feature
A (pink) "Defect card" representing a production defect
A (blue) "Improvement card" representing implementation of an improvement

The Team/Iteration cell

The cards inside the cell represent the work the team has planned for the iteration. They run at the feature level, and are tagged with a couple of extra pieces of information:

How many points of work will be done on the feature that iteration (either on a post-it or in the top right corner on the card)
A "completion flag" if that is the iteration when the feature will complete.

We experimented with numerous grains for this representation - both more detailed (ie what will be happening for the feature rather than just how many points) and less detailed (feature cards only go in the iteration where they will complete). In the end, it was a tradeoff between how rich the information, how much maintenance overhead it required and how visually cluttered the space was (too much information obscuring what you really wanted to see).

Strategic Insights from the Feature Wall

Any wall is measured by the conversations it facilitates and the insights it generates. We'll talk a little more about conversations in the next section, but some of the key strategic insights are:

How far out is a team committed? Where do they have capacity available and how much? Very useful when looking at new demand and understanding the best team to take it on.
What features are active in a given iteration and how much effort is planned against the feature? One of the key uses of this is ensuring working agreements for availability of feature owners can be managed with good forewarning of the periods when they will be needed.
When is a feature due to complete? Very helpful again for ensuring feature level acceptance testing commitments have been established with feature owners.
Where are we overcommitted? Are teams confident or should we be looking at finding some stories from the feature that can be taken on by another team with capacity to make sure we hit our commitments?

Tactical Insights from the Feature Wall

The grain of the current iteration (shown in the photo above) is naturally more detailed than future iterations. Expected information is:

Iteration goal for the team (written on A4 at iteration planning, stuck on part of wall not depicted)
Health-check for each feature (red/green/amber dots)
Features where all planned work for the iteration has been completed (spanked tags)
Features at risk ("Risky business" tag)
Blocked Features ("Blocked by something" tag)
Features where the feature owner is not living up to engagement expectations ("AWOL Feature Owner" tag)

Communication and Cadence

This may seem a strange combination, but in our experience very valid. If you want a "self-organising program" rather than a group of teams, constant and effective communication is vital. The trick is making it happen, and in particular helping people recognise the times when it's needed and the value of it. What we have found is that the more we invest in "cadenced communication" the more we enable "constant communication".

At the time of writing, 2 primary forms of cadenced communication are well-established and 3 are in their formative stages:

"Unity Day" - Train level sprint kickoff session described in Part 3.
The "Daily Cocktail party" - extension of Scrum of Scrums
Discipline Chapters
Cadenced Backlog grooming
Cadenced Retrospectives

Discipline Chapters

Inspired by this Spotify article, the chapters meet weekly with a mission of growing the maturity and consistency of practices in a particular discipline (eg ScrumMaster, data movement, testing etc). For more detail, the referenced article is a great read.

Cadenced Backlog Grooming

I mentioned the trials and tribulations of backlog grooming maturity in part 3, and this is the most recent concept in growing maturity in the space. The concept is to schedule synchronised backlog grooming sessions either once per week or once per iteration for each team followed by a review and update session with the train leadership group on the outputs.

Cadenced Retrospectives

This is targeted at improving the "Inspect and Adapt" feedback cycle. A constant theme for the train is beating "siloed learning" and finding ways for teams to learn from each other. In brief, all teams hold their retrospectives at the same time, then the iteration is closed out in a follow-up session facilitated by the "Release Train Engineer" where the scrum-masters bring the learnings generated from their team retrospectives and share with each other.

The Daily Cocktail Party

This is without doubt the key communication vehicle for the train. On every morning other than the first day of the iteration, the first hour works like this:

8:45am - Leadership group standup at "release Train continuous improvement wall"
9:00am - Tech Lead standup (at A0 model of warehouse with tags indicating areas of activity). Tech leads share on focus areas for the day, key technical challenges and inter-team dependencies
9:15am - All feature teams hold standups
9:30am - Scrum of Scrums at the Feature Wall. Attending by Scrum Masters, entire leadership team, APMS, Deployment Services and other team members as required. Scrum Masters speak to their current iteration cell on the feature wall and address the 3 questions for their team. Leaders and project managers listen for blockers, issues and risks.
9:45am - APMS and Deployment Services standups. Deployment Services include coverage of deployment related issues they heard at scrum of scrums while APMS balance their priorities for the day between moving card on the Program kanban and providing support for issues raised at the scrum of scrums.

There was considerable debate over the time commitment involved in this, in particular whether it should occur daily. The investment has, however, reaped untold dividends. Not only does it provide superb visibility for senior leadership, but it triggers an immense amount of cross-team communication - "we've encountered that, we'll come visit and help" is a common catchcry.

Continuous Improvement

From a continuous improvement perspective, you want three things:

Teams figure out how to be become better teams
The release train figures out how to become a better release train
Teams benefit from other teams' learning and innovation.

The first is, of course, covered off by the team retrospective. The other two, however, need attention. Built into each team's capacity planning is a 10% reservation for "innovation and contingency". Likewise, the leadership team builds into their time "10% for driving train-level improvement" through the function of team loco (introduced previously).

One of the keys here is treating improvement/innovation initiatives as first class citizens. The leadership runs an entire wall dedicated to their initiatives, and innovations feature teams commit to appear as innovation features on the feature wall.

Examples of leadership team improvements might be:

Introduce discipline chapters
Introduce cadenced backlog grooming
Engage with operations to negotiate simplification of the handover process and consolidation/simplification of support documentation

The most recent team level innovation initiative related to testing. Being a data warehouse, a significant proportion of the tests the teams wrote involved validating data integrity. One of the teams looked at it and said "we build all those rules into our data model documentation, I wonder if we can automate it". The result was to eliminate the need to implement basic tests that was consuming 30-40% of a team's test automation effort, freeing them to put more effort into feature level test automation and providing a roughly 3 point per team per iteration uplift across the entire train.

The trick, of course, is getting an innovation from one team in use by all the others - it takes time for the team that created it to educate/support others in implementing it. So, we run an "innovation cup". Inspired in part by looking at the trophy the Rally dev teams win for hackathons, we got Rally to sponsor a trophy to be held by the team with the most recent winning innovation. To capture the trophy, a team not only needs to implement a good innovation but they need to have at least one other team who has implemented it successfully.

Conclusion

Prior to the introduction of the Release Train, the primary management vehicle for the program was a weekly 3 hour management meeting attended by program attended by program management, release management and project managers. It was supported by a 40+ page status report, and was often entirely disconnected from what was actually happening in the teams. It was followed 2 days later by a "senior management' program status meeting which ran another 2 hours dealing with escalations from the main meeting.

Both meetings are now entirely gone. For archiving purposes, the status report is still produced (as a Rally extract) but the knowledge of "what is really happening" comes from the "cocktail party" on a daily basis and the standard sprint ceremonies. Most importantly, the leadership group no longer have "managing the teams" as their primary mandate - instead they focus on finding the right way to support the teams in delighting their stakeholders. Status and planning discussions simply look at the question to be answered and pick the wall with the right grain to support the discussion.

In Part 5, I'll wrap the series up with a look at some of the quantitative results and key learnings from this group's journey into SAFe.

Sunday, February 24, 2013

Scaled Agile Framework 3/5 - Program level pipeline management and the program kanban

Introduction

Part 2 of the series concluded with the final stage of the demand management process – 1 or more Epics generated from the COE level initiative and handed off to the relevant delivery groups. In this post we will explore the lifecycle of an Epic as it travels through the delivery group functioning as an Agile Release Train – Strategic Delivery.

As described in Part 1, the adoption of the Agile Release Train in Strategic Delivery commenced in early 2012 (April). Whilst it is being continually refined, the implementation has gone through a number of step changes in approach.

After describing the initial adoption context and approach and covering some of the early experiences, the post will conclude by illustrating the current operating model.

Initial adoption context

At the time of adoption, the group was modelled roughly as follows:

5 project-based Scrum teams all with the same team-shape, following a reasonably similar model and in the final stages of delivery for their projects
1 project-based ‘pseudo-Kanban’ team with 3-4 months remaining on delivery of their project
2 stakeholder-aligned ‘pseudo-Kanban’ teams with very different team shapes and 4-5 months of work remaining in their pipeline
A newly formed ‘System Team’ working on implementing a continuous integration capability
10-15 projects running under an outsourced/offshored waterfall model
A group of system analysts fulfilling requirements and design elaboration for the waterfall projects
18 project managers

The initial approach was defined in a series of workshops between the general manager of the delivery group and her extended leadership team. The objectives of the workshops were:

Establish a shared understanding of SAFe fundamentals
Determine the initial implementation model and approach to transitioning/maturing beyond initial implementation
Determine the most effective organisational structure for the group

Upon conclusion, the group had reached agreement on the following plan:

Immediately establish an Agile Release Train involving the 5 Scrum teams, converting them from project based to stable feature teams. All new work entering the group would be fed to the train.
Manage the outsourced waterfall projects through to conclusion.
Allow the ‘pseudo-Kanban’ teams to conclude their current commitments, whilst gradually reshaping them to be ready to join the train upon completion.

The organisation to support this had the following structure:

Agile Pipeline Management Services (APMS) – composed of the group of project managers from the Scrum teams and the system analysts. Primary responsibility: Manage preparation of the pipeline of work for PSI’s and make sure the teams had the support they needed in delivery.
Development Services – run by the equivalent of the “Release Train Engineer” and composed of the feature teams. Primary responsibility: Deliver PSI outcomes
Deployment Services – run by the deployment manager and incorporating the System Team. Primary responsibility: Liaise with enterprise release management and operations and ensure smooth deployment and ongoing continuous integration capability uplift.
Transition Services – composed of the ‘pseudo-Kanban’ teams and the outsourced waterfall projects/PM’s. Primary responsibility: Ensure graceful conclusion of inflight work whilst preparing the teams for transition onto the train.

Early Days

Some insights into the early experiences of the leadership group in the implementation can be found in this earlier post, but I’ll delve here into a number of key activities.

For the first month or so, it was ‘business as usual’ for the feature teams. They were already operating on the same iteration cadence, and all had a number of sprints committed in order to deliver on existing commitments. The primary focus area was the formation of the APMS group, but before delving into that I’ll briefly cover the feature teams.

Early days for the feature teams

The feature teams basically had two things to do:

Become ‘stable feature teams’ instead of project teams
Normalise sizing.

The first was great fun and fairly trivial. Formerly known by the reference number for the project they were working on (ie the ‘5531 team’), they were asked to select a name for themselves. The name had to be ‘train related’ and ‘safe for work’ – so we wound up with teams such as Maglev and Astrotrain. A simple change but immensely powerful when it comes to sense of identity.

The second was painful. The concept of story point normalisation is perhaps one of the most controversial in SAFe. Having been reluctantly convinced of the need by Dean, I still found that no matter how carefully you approach it you will still be impacted by most of the evils used to argue against the concept. We began by using the past 8 iterations’ metrics for every team to construct a normalisation formula. Simplistically, some teams downshifted their sizing one notch on the Fibonacci scale and other upshifted by one. It was deceptively simple, and one of the key lessons learnt in the time since is that we should have far more actively coached and retrospected on sizing practices.

Early days for APMS

The APMS team had 4 initial objectives:

Establish the Epic Kanban system reflecting both the active work in the feature teams and work in the pipeline
Prepare the ‘work in pipe’ and logistics for the first PSI
Work with the rest of the COE (in particular the PMO) to agree governance, financial and other details of how the release train would mesh with the broader enterprise.
Continue to provide project management support to the feature teams

APMS Challenge 1 – Team Formation

Whilst APMS was composed of PM’s, System Analysts and an architect, there was a ‘no exception’ stance on the release train being ‘Agile top to bottom’. So, the team had a scrum-master and an aspiration to work as a cross-functional team with shared commitments, standups, retrospectives, transparency and accountability to the feature teams. Today, it is a high-performance team with an exceptional team ethic – but the journey was more challenging than for any other agile team I’ve ever worked with. In the end, it required a far larger change of mindset for the team members than for a typical dev team and involved significant tuning of team balance and membership.

APMS Challenge 2 – Feature Grooming: How much is enough?

The second challenge lay in the approach to grooming features for the PSI. The reality of the funding model and the dependency complexity of the work required a fair amount of analysis and estimation work to create ‘ready to play’ features. The best way to summarise this issue lies in the fundamental paradox of SAFe. Part of the reason the model is so powerful is because when you present it to program/project managers and architects they can readily map it to their existing mental models. On the other hand, the most common criticism I hear from agilists is that it ‘looks too formal and RUP-like’. What we experienced was that the intent of the model is too easily interpreted in practice into waterfall-like behaviours.

The original feature grooming model involved the APMS group identifying the business intent and features for the Epic, resolving the high level architecture and feeding it through a feature team for ‘enough discovery work to size it to +-30% confidence estimates’. The teams would thus hold a portion of their velocity available to be dedicated to these activities, which would be done in parallel with fulfilling their delivery commitments. What eventuated was the analysts/architects in the APMS group identifying and specifying the features and handing fully articulated designs to the teams who then simply estimated the design they’d been handed and handed their estimates back to the PM. This caused no end of problems, and was eventually heavily course corrected as you’ll see once we reach the current-day model.

APMS Challenge 3 – The Product Owner construct

Prior to the adoption of the release train, finding the right product owners for agile initiatives had been a serious challenge for the group. Given the nature of a data warehouse, there are extremely diverse groups of stakeholders for any given initiative and finding someone with sufficient diversity of domain knowledge, sufficient availability and sufficient empowerment had been entirely unsuccessful. Dean’s “Product Manager/Product Owner” separation was the key to unlocking the puzzle. Whilst we couldn’t employ it exactly as specified in the framework, the underlying principal of separation of concerns was a huge enabler. In standard SAFe, the Product Manager is basically the overall decision maker on prioritisation and scope for the release train whilst the product owner travels with the team. In our context, there could be 15 different funding sources active at once and prioritisation and scope had to belong to the sponsors providing the money.

The solution adopted was to employ “Epic Owners” and “Feature Owners”. The Epic Owner was the person providing the funding for an Epic, and was looked to for engagement at key points and strategic prioritisation and scope calls for their Epic. The Feature Owner, on the other hand, was to have deep domain expertise on a particular feature, have delegated authority within the scope of the feature and be far more actively engaged with the team whilst their feature was in play. The construct resolved a vast number of issues, particularly in addressing the contrast between availability and authority.

APMS Challenge 4: Can we really do PSI’s?

I’ve left the biggest challenge for last. SAFe deliberately separates the deployment cycle from the PSI cycle. Whilst the PSI produces a ‘potentially shippable increment’, the framework allows for either deployment on the PSI boundary, multiple PSI’s before a deployment or multiple deployments within a PSI. Our intent at launch was to run 8-week PSI’s and use the PSI as a planning boundary with multiple deployments happening in any given PSI.

The first difficulty we encountered was feature shaping. Given that the standard guidance for a feature is ‘small enough to be delivered in a single PSI but large enough to represent a significant outcome for the business’, there was a clash between good feature shaping and deployment window requirements. Available deployment windows were specified by enterprise release management, with 1 per month for minor deployments without significant integration requirements and a varying cadence for ‘enterprise deployments’ with major integration involved. Further, the deployment window for a feature was generally dictated by external dependencies. It was impossible to find a PSI cadence which aligned well with deployment boundaries, and shaping features to PSI’s was also largely impossible as we would regularly encounter features which would need to be deployed 1 iteration into a PSI but needed 2-3 iterations to implement.

Feature shaping was difficult but not insurmountable – in the end, we could have resolved it by allowing ‘cross-PSI’ features whilst still maintaining strong size limitations. However, the ‘bridge-too-far’ lay in establishment of sufficient funded pipeline. Once again driven by enterprise level gating cycles and funding processes, an 8 week PSI that fully funded and committed the entire release train was just not emerging.

By this point, however, we had established significant momentum with every aspect of the release train other than the PSI launch. The pipe was operating, the work was flowing into the teams, deployments were going out the door and momentum was tangible. One large leadership workshop later, we made the call: “know that PSIs will start eventually, but in the meantime develop compensating mechanisms and keep maturing the release train”.

Our focus then turned to compensating mechanisms – what were the key things we lost by not having the PSI, and how could we utilise our knowledge of the principals to achieve the purpose. To my mind, we identified the first 3 of the 5 key ingredients initially:

The power of having the whole release train in a planning workshop together from the perspective of developing a sense of ‘release train as one large team with shared outcomes’ as well as ‘release train as team of teams’
Employing a deliberate set of disciplines to groom out 4 iterations worth of backlogs at the release train level and ensure that visibility, dependency, risks and issues were fully articulated at the story level for all active features.
Release train level ‘Inspect and Adapt’ cycles
Maintenance of a strategic view of capability buildout on the platform that effectively capitalises on synergies between features.
Strategic showcasing and communication of PSI level outcomes to draw together the ‘whole of product’ view of progress for broad stakeholder groups and avoid fragmented and isolated demonstration and communication feature by feature/epic by epic to narrow stakeholder groups.

I’d like to write that we rapidly implemented effective compensation strategies, but I’d be lying. Being brutally honest, I’d suggest that we initiated effective strategies for the first and third items immediately. Time and painful lessons triggered focus on the second and fourth, and the fifth is currently in the spotlight.

The first strategy was the concept of ‘PSI-lite’, better known as ‘Unity Day’. A 1-hour ‘whole of train’ session at the start of each sprint aimed at building a sense of ‘train as team’ and ensuring everyone started the sprint on the same page. These have been incredibly powerful from a cultural transformation perspective. They are always attended by numerous visitors, and repeat attendees regularly comment on the tangible change in atmosphere over time. (for a deeper treatment, see this post)

The second strategy involved the establishment of ‘Team Loco 131’, composed of the extended leadership group and functioning as a ‘program level continuous improvement team’. Their stories are generated through learning generated by various forms of retrospective, and the leaders are fully bought into the lean management model of taking responsibility for improving the system of work.

Whilst the story of addressing backlog grooming challenges is too long and torturous to describe here, I will offer the following. Many organisations find the investment of putting a whole program and set of stakeholders in a room for 2 days planning every 8-10 weeks unpalatable. For Strategic Delivery, this commitment was one of the major sticking points in management workshops – although the general manager bought in and was willing to support it, most of her leadership group felt it was unjustifiable. I’ve heard similar stories from many other SAFe Program Consultants. Our journey, painful lessons and ongoing investment required to compensate in the area of release train level backlog grooming/planning have utterly convinced me of the vital importance of this activity. To put my systems thinking hat on, failure to implement effective PSI planning creates an insane amount of failure demand across the entire release train. Weighing the cost of the failure demand against the investment required for the PSI planning days provides a scale very heavily tipped towards the investment.

The Program Kanban

Before digging into the specifics of how the Program (or Epic) kanban operates today, there are two final points to cover off. The first involves a holistic view across both the initiative level kanban system and the epic level. Astute readers will notice a number of glaring inefficiencies between the two. Whilst I’ll cover this in a little more detail in the concluding post for the series, I will offer a reminder at this time that the purpose of explicit visualisation in a kanban is to bring to light waste and opportunity.

The second point is that whilst this blog series focuses on wall visualisations, every wall depicted is replicated in Rally with far more underlying detail. The Rally Portfolio Management capability is a key enabler in effective management and measurement of the system, whilst the walls are vital for tactile insight and facilitation of communication.

The Epic Kanban is, obviously, dedicated to the flow of an Epic through the release train value chain. Every card on the wall is an Epic. The value chain is divided into 5 phases, each with a number of states:

Initiate Phase: Transition from demand management and connection with stakeholders.
Discovery Phase: Feature elaboration and estimation refinement
Launch Phase: Approvals, funding and preparation to drop into teams for implementation
Evolve Phase: Implementation
Operate Phase: Deployment and follow-through to ensure happy users and successful transition to operations.

One of the primary aspects of the phases is to indicate the flow of responsibility between APMS, Development Services and Deployment Services. For any given phase, one group will be ‘moving the Epic’ and the others will be either supporting or being kept informed. You'll notice on the photo there are phases which have "rows" as well as "columns". The rows indicate teams, whilst the columns indicate states. Any state which has rows is one where a feature team has the ball - you'll notice photos in the start of each swimlane in the "Prepare Discover" state indicating which team is which.

On the day PSI’s become viable for this train, the phases and states will change very little. Initiate & Discover effectively involve the identification, sizing and prioritisation of features in preparation for PSI injection and maintenance of roadmap, Launch equates to locking in PSI content and executing PSI planning, and Evolve is delivery of the PSI.

The Initiate Phase

This phase is led by APMS and basically loads the Epic onto the train. The primary goals are as follows:

Execute handover with the demand management group (‘PMO’ and ‘Validate Entry’ states)
Connect with the Epic Owner, ensure their value drivers for the epic are clearly articulated and introduce them to the operating model (‘Envision’ state)
Identify candidate feature owners ( ‘Envision’ state)
Determine the feature team which will execute discovery, the sprint when it will occur, and ensure logistics are organised for discovery to proceed smoothly. (‘Prepare discovery’ state)

This phase is the primary evidence of the response to ‘APMS Challenge 2’. It explicitly limits the amount and nature of the work done by APMS before connecting the Epic and Feature owners with the teams for further analysis. In effect, it eliminated roughly 70% of the preparation work that was initially being conducted.

Looking at the wall, you'll see one epic that's just landed from demand management (with the smurf avatar on it indicating the project manager who's pulled the card). One is in the envisioning state, one has completed envisioning but is blocked from being pulled into discovery preparation (the big red "X" on it). Finally, there is one epic being prepared to drop into Discovery with the Astrotrain team.

The Discovery Phase

Discovery is led by the feature teams, and involves working with the Epic and Feature owners to ‘discover the features’ and explore the feature level requirements sufficiently to articulate and size the outcomes. A feature team will typically have roughly 10% of their sprint capacity dedicated to discovery work, with discovery for most epics concluding within a single sprint. The flow of states is as follows:

Kickoff Discovery: Workshop with Epic Owner and all Feature Owners to gain understanding of the vision for the epic, determine the exploration workshops required, identify key risks/issues and ensure the stakeholders understand the discovery process.
Explore Features: Detailed workshops with feature owners to validate the feature composition, high level stories and feature level acceptance criteria.
Consolidation: Consolidation of the outputs from feature exploration workshops, estimation and preparation of a high level release plan.
Discovery Outcome: Presentation back to Epic and Feature Owners of consolidated discovery outputs for validation and agreement of outcomes

The key outcome of Discovery is a set of prioritised, well-groomed features which have been estimated to a +- 30% level confidence and are ‘ready to play’ for implementation. The other potential outcome is an ‘Epic split’. Once an Epic starts to look larger than half a million dollars, the preferred approach is to identify the highest priority features for initial implementation and carve off the remainder into a new Epic for a future release. In PSI parlance, it’s making sure no more than roughly a PSI worth of features are progressed at once on the Epic.
Looking at the wall, you'll see Astrotrain finalising Discovery for one Epic, Kaizen (3rd row) finalising one and kicking off another and Maglev consolidating one set of outputs whilst kicking off another. It's unusual for a team to have more than one Epic in Discovery at a time, in this case it's likely that they are small Epics which minimal feature elaboration required.

Launch Phase

This phase is reasonably self-explanatory. The focus returns to APMS, who look after gaining the appropriate approvals of the discovery outcomes and funding to proceed. Whilst in an ideal world this phase would be traversed rapidly to achieve a smooth flow from discovery to evolve, the reality is epics regularly sit here for significant periods whilst business cases gain approval and enterprise gating cycles complete. It has generated one simple and obvious insight – the better the Initiate and Discovery phases are executed, the shorter and smoother the launch phase process.

Launch concludes with evolve preparation. The feature team or teams to execute evolve are identified, logistics are organised for the release planning session at the start of evolve, the commencement sprint is locked in and the feature teams re-familiarise themselves with the discovery outcomes.

The key aspect to evolve preparation is feature team selection. Obviously, the ideal scenario is for the entire epic to land with a single feature team and for the evolve feature team to be the same one which executed discovery. However, reality intervenes. Whilst evolve capacity availability is utilised in selecting the discovery team, when the launch phase is protracted that team is often not available for evolve. In these scenarios, some capacity is reserved in the backlog of the discovery team to ensure they are available to support the kickoff and rapid knowledge uptake of the evolve team(s). Likewise, where the Epic is too large to be executed by a single feature team, work will be spread across feature teams on a feature by feature basis.

Looking at the photo, you'll notice a few things. Firstly, queues are very visible :) Secondly, although the APMS team runs a kanban system for their work (as hinted at by the epic level cumulative flow diagram and control chart on the wall), they also track the sprint (or iteration) cadence. Their priorities on any given day are driven by where in the sprint cadence the feature teams are. Finally you'll see the cards with post-its on them ('1', '2' and '4'). These would have been identified at the standup that day as needing urgent attention.

Evolve Phase

Implementation and deployment of the epic will be covered in the next post, so I will keep this section intentionally light. The states are as follows:

Plan Release: Execute the release planning which would normally occur in a PSI planning session. The stories identified in discovery are broken down into ‘sprint-size’ and scheduled across sprints based on available capacity to create an initial release plan. One of the key outcomes of this workshop is establishment of working agreements with the feature owners. They know which sprints their feature will be active in, when acceptance testing for their feature will occur and can plan their availability accordingly.
Implement: Build and acceptance testing of features.
Integrate and Prep Deployment: This phase has a split personality depending on deployment type. For an ‘independent deployment’, it takes roughly a week and involves standard integration, deployment hardening and operational handover preparation activities. For deployments associated with an enterprise release (75-80%), it lasts 4-5 sprints. The work cannot be accelerated, as the timing and nature of involvement is dependent on enterprise level co-ordination. Teams thus wind up with a trickle-feed capacity reservation during this time.

Looking at the wall, you'll see Astrotrain juggling 3 Epics in implementation, with a 'stakeholder concern flag' on one probably giving hints as to why another Epic is being prepared to drop into the team. Moving down the board, Jacobite are implementing a single (large) Epic, Kaizen have one in implementation and are most likely getting close to done there based on the fact they're in release planning on a second. Maglev have a more even spread, with 5 epics at various stages of Evolve.

Fans of limiting WIP (presumably most readers of this blog) are probably looking at this picture and saying "why aren't they finishing one Epic before starting the next". My first, somewhat glib, response is "isn't it great that it's so obvious". On a more serious note, in a highly integrated world you often can't go at full pace on a single initiative. It has fast times and slow times as you synchronise with your interface partners and resolve other such dependencies. Having the flexibility to deal with this has been a very important aspect of the framework success.

Operate Phase

Under the leadership of deployment services, the epic is deployed and in operation. Whilst numerous ‘tidy-up’ activities around production verification testing and governance closure occur here, the key component introduced has been the Epic retrospective. This closes a significant gap in the PSI Inspect and Adapt cycle. Whilst numerous ‘release train’ feedback cycles existed, what was missing was the stakeholder learning. The feature team holds a retrospective with their epic and feature owners looking at the ‘whole of lifecycle experience’, and an invaluable source of strategic learnings is introduced.

The second state is ‘Spanked’. It is the cause of much hilarity amongst visitors, but has become part of the cultural identity of the release train. Originally introduced for some personality on a team kanban wall by an English scrum-master, it stands for ‘done done’. The Epic’s in production, the customers are happy, and the team has ‘spanked it’.Looking at the wall, it's fairly easy to conclude that it's not long since a major deployment window.

Conclusion

The daily rhythm of all the groups will be covered in the next post, but it’s worth making a few concluding comments here. The first is to play back to Dean’s concept of the release train as a ‘self organising program’. Effective visualisation of the epic lifecycle is a vital ingredient, as it generates so much ‘knowledge at a glance’ of the state of play of the entire value chain. Within seconds, the group can identify what epics are where in the value chain, who has carriage, where the queues are and what needs focus. And of course, metric analysis through cumulative flow and cycle time charting at the Epic level. All of this across a system of work that at any given time generally has 30-40 epics somewhere in the value chain.

I'll close with a hint of things to come when we talk about results in the final post for the series. You might remember a mention of 18 project managers in the initial context at the beginning of the post - there are now 3.

PS Part 4 in the series is now posted and deals with the program level feature wall and in-play work.

Sunday, February 10, 2013

Scaled Agile Framework Applied 2/5 - Demand Management and the Portfolio Kanban

Introduction

As described in the introductory post, the implementation being described is not managing an ‘enterprise level’ portfolio. Before delving into the operation of the portfolio kanban, it’s important to understand the functional design of the group.

They operate a “Business Intelligence Centre of Excellence (COE)”, and the relevant operational units within the COE are as follows:

PMO - Demand management and financial governance
Strategic Delivery - Delivery of functionality against the enterprise data warehouse (EDW) and the strategic technology stack
Legacy Delivery - Delivery of functionality against a group of legacy warehouses being managed through end of life
Express Delivery - Tactical solutions

Demand comes from three primary sources, with requests ranging in size from less than $50K to multiple million:

Projects conceived elsewhere in the enterprise which impact applications under management by the COE. Impact may include a requirement for new functionality or prevention of breakage to existing functionality
Investment in pursuit of a strategic roadmap based on an annual funding process
Smaller adhoc requests for reporting and analytic capability initiated by business users

The SAFe framework is designed to have a single portfolio management layer with multiple programs or ‘release trains’ operating beneath it. In this context, the PMO operates the portfolio layer and each delivery group functions as a program below it.

The Strategic Delivery group (covered in Parts 3 & 4 of the series) is a fully operational release train. Legacy operates as an outsourced/offshored waterfall delivery program, and Express delivery is in transition from a waterfall lifecycle to kanban.

From a maturity/timeline perspective, SAFe was introduced in early 2012 at the program level in Strategic Delivery. Adoption at the portfolio layer was commenced in late 2012, and the Legacy and Express groups are now re-conceiving their program layers as lean value chains.

The Requirement Hierarchy

We need one last piece of context (and some key learnings) before delving into how it works – nomenclature. One of the things that resonates for a lot of people is the fact that Dean puts some hard and fast names on the framework. You have Epics at the portfolio level which are then divided into and prioritised within Investment Themes. Epics are decomposed into Features at the program level, which are further broken down into stories at the team level. For many who are constantly confused with local hierarchies between themes, features, minimal marketable features (MMF’s), epics and stories the idea of having a clear-cut enterprise-wide hierarchy is great. Dean tends to recommend that if you have additional layers you introduce ‘sub-Epics’ and ‘sub-Features’ but it all still makes sense.

So far so good, but now for the learning part. Our initial vision was too small and too localised. We viewed Strategic Delivery as the portfolio layer, and also lacked co-ordination between the various groups applying SAFe across the enterprise, winding up with 4 different hierarchies. The coming months will see some hot debates about which naming hierarchy wins J

In this context, the hierarchy is as follows:

An Initiative represents any demand which enters the COE demand management funnel (correlating to an Epic in the standard SAFe framework)
Initiatives are decomposed into 1 or more Epics which align to the Delivery group boundaries (correlating to Features in the standard framework)
Delivery groups then decompose Epics into Features for Delivery (which are realistically sub-Features from the perspective of standard SAFe)

The role of the Portfolio Kanban

SAFe Portfolio Kanban

At a high level, the concept is as follows:

Ideas are dropped into the funnel
An initial assessment takes place to determine the rough size and value proposition of each idea
Ideas which pass the ROI criteria for investment versus value proposition are approved and go into a queue for more detailed assessment
Further refinement of the idea takes place to provide greater confidence on the estimate and value proposition and decompose it into smaller chunks for distribution to the programs required for delivery
These smaller chunks (Features) then compete in a more fine-grained prioritisation queue for capacity in the delivery programs.

The primary role of the portfolio layer can thus be summarised as:

Manage the prioritisation of investment ideas
Elaborate and decompose ideas into smaller pieces of work aligned to delivery program capabilities and manage the distribution of these to delivery programs

The Portfolio Kanban Applied

We took a fairly classic approach to implementing this with the PMO demand management group - composed of a number of senior BA’s with architecture support. We spent a couple of workshops mapping their existing value-stream, introduced a lean flow visualisation and daily standups at the demand wall and began to tune from the generated learnings.

The demand management kanban is divided into 4 main phases:

Impact Determination
Solution & Costing
Communicate & Engage
End-States (Delivery or Termination)

Impact Determination Phase

This phase is effectively the ‘idea funnel’ management section of Dean’s diagram. It is where potential demand is first assessed to determine whether it qualifies for further investigation. Demand arrives in ‘Initial Assessment’, which is typically processed at the standup. The team discusses the Initiative, and will either decide that it can be discarded, is a definite need, or requires further investigation. Definite needs proceed immediately to the entry state of the Solution & Costing phase, discarded initiatives are moved to ‘No Impact’, and those which need further information move to the Validation state for clarification.

Solution & Costing Phase

Mapping fairly closely to the ‘Backlog’ and ‘Analysis’ phases in Dean’s diagram, the goal of the Solution & Costing phase is to determine a solution direction and a rough cost (+- 75%) for the initiative.

The ‘Validate Entry’ state is basically used as a queue to control when an analyst will pick the initiative up. Qualification for exit from this state will be a combination of timing alignment with other enterprise groups involved in the initiative and analyst capacity. On a complex initiative, there may be as many as 10 other delivery groups involved and analyst WIP is managed by holding the gate here until there is enough alignment to effectively move forward.

‘Understand Need’ involves the analyst gaining enough information about the initiative to determine a solution direction and rough costing. Whether this takes the form of workshops, business requirements review, or some other means will vary based on the source and nature of the demand. By the conclusion of the phase, the high-level architecture will be understood as will the COE delivery groups involved and the nature of the functionality to be delivered by each.

‘Cost’ is typically very quickly transitioned. Simpler initiatives may be estimated by a pair of analysts, whilst more complex ones may involve escalation to a solution direction forum for costing. One of the current focus areas is the introduction of ‘T-shirt sizing’ to further simplify this state.

Communicate & Engage Phase

This is basically the boundary line between Dean’s ‘Evaluation’ and ‘Implementation’ phases. It is here that the first significant investment decision is made. Whilst we now have a rough cost estimate, this must be further refined (to +- 30%) before funding approval is obtained for implementation. Given that the cost to obtain this confidence refinement is more significant, the preliminary costs/benefits are evaluated to ratify this further investment. If the initiative does not stack up, it will be withdrawn. If it does, the first funding increment will be supplied and the required Epics will be created and handed over to the delivery programs for further refinement.

Operating Rhythm of the Demand Management Team

The team is spread across 3 states, and collaborates through both a physical kanban wall and a deeper level of detail captured in the Rally portfolio management functionality. Remote team members dial into the daily standup, and take care of updating the electronic wall whilst central members take care of updates to the physical wall.

There is then a twice-weekly standup attended by members of the delivery programs to share information on the state of the pipe and smooth the flow of demand understanding. All have access to the electronic initiative portfolio, and this standup provides a rich opportunity to capture concerns and convey additional insights.

Additional formal solution direction and steering committee forums operate to provide an escalation and strategic guidance overlay.

What's on a card?

Each initiative in the portfolio becomes a card on the wall. They are initially printed from Rally with whatever information is known at the time, then as significant information emerges it appears on the cards. In the image above, you'll notice the following:

Who owns the card (standard kanban avatars)
Which delivery programs have a part to play in the initiative (the SD, LL and EXP tags)
The current estimate (on a post-it for easy updating)

Various other post-its will decorate the wall indicating blocked initiatives, information on delays and the like. This is of course more richly backed in Rally through a combination of detail, discussion and attachments.

Portfolio Prioritisation

SAFe specifies the use of “Weighted shortest job first” (WSJF) for prioritisation at all layers of the framework. Full details can be found at http://scaledagileframework.com/wsjf/, but in brief this utilises a ratio between the value proposition (Expressed as ‘Cost of Delay’) and the size of the piece of work. The cost of delay is a combination of ‘Business Value’, ‘Timing Value’ and ‘Risk Reduction/Opportunity Enablement’.

In my experience, this has been one of the critical enablers at the enterprise scale. Traditionally, agile delivery is utterly focussed on the delivery of business value – preferably quantified, but at minimum in the eye of the product owner. At scale, you need more levers. In particular, timing value is crucial. In classical product development, it focuses on such things as the value of releasing a feature in time for an industry event such as a tradeshow or gartner review cycle. However, it is also extremely useful once you start to consider dependencies. When pieces of functionality need to be co-ordinated for simultaneous release across multiple delivery programs, the timing and opportunity enablement value allows you to start to visualise the chance you will hold up the delivery of significant business value in other dependent initiatives.

In the COE, timing value is the dominant influence on prioritisation. In effect, the vast bulk of prioritisation is driven by compliance to enterprise release schedules and dependent pieces of work. Initiatives with high localised value are then fed through the system as capacity is available above and beyond that required to meet external timing pressures.

Benefits to-date

In an idealised implementation of SAFE for commercial product development, one assumes a centralised pool of investment funding. Effective application of the portfolio layer provides you with the following:

The use of investment themes to provide a structure to divide your overall funding into key investment areas. For further information on this, see either Investment Themes or Baghai's investment horizon model as described in a previous post
The WSJF model to prioritise investment initiatives within these themes
A flow-based framework for progressing elaboration and delivery increment identification which minimises work in process and effectively feeds work to delivery programs "just in time"

The application being described, however, faces different opportunities and challenges. The most significant of these challenges is the lack of a centralised pool of funding. The COE faces the daunting task of dealing with an extremely fragmented funding model and knitting together demand from dozens of funding sources into a strategic build-out of the core information platform whilst progressively decommissioning legacy applications.

Whilst this level of the implementation is still in its infancy, it is already yielding significant benefit – primarily through the power of visualisation and communication. Week by week the demand management team finds itself able to lift focus from a “project by project” view to a living “whole of portfolio” view. Patterns of demand are becoming visible which will lead to more effective strategic decision-making and far more synergistic prioritisation and implementation once exploited.

Further, as waste is eliminated from the value chain, more time becomes available to focus on exploiting these new insights. Less than 3 months into operation, the team has already eliminated close to 2 FTE’s time in administration and dramatically improved communication and co-ordination both between team members and with the delivery groups.

The next instalment

In the next instalment, we will pick up where the demand management team leaves us, with the entry of an Epic to the Strategic Delivery release train.