Sunday, February 24, 2013

Scaled Agile Framework 3/5 - Program level pipeline management and the program kanban


Part 2 of the series concluded with the final stage of the demand management process – 1 or more Epics generated from the COE level initiative and handed off to the relevant delivery groups.  In this post we will explore the lifecycle of an Epic as it travels through the delivery group functioning as an Agile Release Train – Strategic Delivery.

As described in Part 1, the adoption of the Agile Release Train in Strategic Delivery commenced in early 2012 (April).  Whilst it is being continually refined, the implementation has gone through a number of step changes in approach. 

After describing the initial adoption context and approach and covering some of the early experiences, the post will conclude by illustrating the current operating model.

Initial adoption context

At the time of adoption, the group was modelled roughly as follows:
  • 5 project-based Scrum teams all with the same team-shape, following a reasonably similar model and in the final stages of delivery for their projects
  • 1 project-based ‘pseudo-Kanban’ team with 3-4 months remaining on delivery of their project
  • 2 stakeholder-aligned ‘pseudo-Kanban’ teams with very different team shapes and 4-5 months of work remaining in their pipeline
  • A newly formed ‘System Team’ working on implementing a continuous integration capability
  • 10-15 projects running under an outsourced/offshored waterfall model
  • A group of system analysts fulfilling requirements and design elaboration for the waterfall projects
  • 18 project managers
The initial approach was defined in a series of workshops between the general manager of the delivery group and her extended leadership team. The objectives of the workshops were:
  • Establish a shared understanding of SAFe fundamentals
  • Determine the initial implementation model and approach to transitioning/maturing beyond initial implementation
  • Determine the most effective organisational structure for the group
Upon conclusion, the group had reached agreement on the following plan:
  • Immediately establish an Agile Release Train involving the 5 Scrum teams, converting them from project based to stable feature teams. All new work entering the group would be fed to the train.
  • Manage the outsourced waterfall projects through to conclusion.
  • Allow the ‘pseudo-Kanban’ teams to conclude their current commitments, whilst gradually reshaping them to be ready to join the train upon completion.
The organisation to support this had the following structure:
  • Agile Pipeline Management Services (APMS) – composed of the group of project managers from the Scrum teams and the system analysts. Primary responsibility: Manage preparation of the pipeline of work for PSI’s and make sure the teams had the support they needed in delivery.
  • Development Services – run by the equivalent of the “Release Train Engineer” and composed of the feature teams. Primary responsibility: Deliver PSI outcomes
  • Deployment Services – run by the deployment manager and incorporating the System Team. Primary responsibility: Liaise with enterprise release management and operations and ensure smooth deployment and ongoing continuous integration capability uplift.
  • Transition Services – composed of the ‘pseudo-Kanban’ teams and the outsourced waterfall projects/PM’s. Primary responsibility: Ensure graceful conclusion of inflight work whilst preparing the teams for transition onto the train.

Early Days

Some insights into the early experiences of the leadership group in the implementation can be found in this earlier post, but I’ll delve here into a number of key activities.

For the first month or so, it was ‘business as usual’ for the feature teams. They were already operating on the same iteration cadence, and all had a number of sprints committed in order to deliver on existing commitments. The primary focus area was the formation of the APMS group, but before delving into that I’ll briefly cover the feature teams.

Early days for the feature teams

The feature teams basically had two things to do:
  • Become ‘stable feature teams’ instead of project teams 
  • Normalise sizing. 
The first was great fun and fairly trivial. Formerly known by the reference number for the project they were working on (ie the ‘5531 team’), they were asked to select a name for themselves. The name had to be ‘train related’ and ‘safe for work’ – so we wound up with teams such as Maglev and Astrotrain. A simple change but immensely powerful when it comes to sense of identity.

The second was painful. The concept of story point normalisation is perhaps one of the most controversial in SAFe. Having been reluctantly convinced of the need by Dean, I still found that no matter how carefully you approach it you will still be impacted by most of the evils used to argue against the concept. We began by using the past 8 iterations’ metrics for every team to construct a normalisation formula. Simplistically, some teams downshifted their sizing one notch on the Fibonacci scale and other upshifted by one. It was deceptively simple, and one of the key lessons learnt in the time since is that we should have far more actively coached and retrospected on sizing practices.

Early days for APMS

The APMS team had 4 initial objectives:
  • Establish the Epic Kanban system reflecting both the active work in the feature teams and work in the pipeline 
  • Prepare the ‘work in pipe’ and logistics for the first PSI 
  • Work with the rest of the COE (in particular the PMO) to agree governance, financial and other details of how the release train would mesh with the broader enterprise. 
  • Continue to provide project management support to the feature teams
APMS Challenge 1 – Team Formation 

Whilst APMS was composed of PM’s, System Analysts and an architect, there was a ‘no exception’ stance on the release train being ‘Agile top to bottom’. So, the team had a scrum-master and an aspiration to work as a cross-functional team with shared commitments, standups, retrospectives, transparency and accountability to the feature teams. Today, it is a high-performance team with an exceptional team ethic – but the journey was more challenging than for any other agile team I’ve ever worked with. In the end, it required a far larger change of mindset for the team members than for a typical dev team and involved significant tuning of team balance and membership.

APMS Challenge 2 – Feature Grooming: How much is enough? 

The second challenge lay in the approach to grooming features for the PSI. The reality of the funding model and the dependency complexity of the work required a fair amount of analysis and estimation work to create ‘ready to play’ features. The best way to summarise this issue lies in the fundamental paradox of SAFe. Part of the reason the model is so powerful is because when you present it to program/project managers and architects they can readily map it to their existing mental models. On the other hand, the most common criticism I hear from agilists is that it ‘looks too formal and RUP-like’. What we experienced was that the intent of the model is too easily interpreted in practice into waterfall-like behaviours.

The original feature grooming model involved the APMS group identifying the business intent and features for the Epic, resolving the high level architecture and feeding it through a feature team for ‘enough discovery work to size it to +-30% confidence estimates’. The teams would thus hold a portion of their velocity available to be dedicated to these activities, which would be done in parallel with fulfilling their delivery commitments. What eventuated was the analysts/architects in the APMS group identifying and specifying the features and handing fully articulated designs to the teams who then simply estimated the design they’d been handed and handed their estimates back to the PM. This caused no end of problems, and was eventually heavily course corrected as you’ll see once we reach the current-day model.

APMS Challenge 3 – The Product Owner construct 

Prior to the adoption of the release train, finding the right product owners for agile initiatives had been a serious challenge for the group. Given the nature of a data warehouse, there are extremely diverse groups of stakeholders for any given initiative and finding someone with sufficient diversity of domain knowledge, sufficient availability and sufficient empowerment had been entirely unsuccessful. Dean’s “Product Manager/Product Owner” separation was the key to unlocking the puzzle. Whilst we couldn’t employ it exactly as specified in the framework, the underlying principal of separation of concerns was a huge enabler. In standard SAFe, the Product Manager is basically the overall decision maker on prioritisation and scope for the release train whilst the product owner travels with the team. In our context, there could be 15 different funding sources active at once and prioritisation and scope had to belong to the sponsors providing the money.

The solution adopted was to employ “Epic Owners” and “Feature Owners”. The Epic Owner was the person providing the funding for an Epic, and was looked to for engagement at key points and strategic prioritisation and scope calls for their Epic. The Feature Owner, on the other hand, was to have deep domain expertise on a particular feature, have delegated authority within the scope of the feature and be far more actively engaged with the team whilst their feature was in play. The construct resolved a vast number of issues, particularly in addressing the contrast between availability and authority.

APMS Challenge 4: Can we really do PSI’s? 

I’ve left the biggest challenge for last. SAFe deliberately separates the deployment cycle from the PSI cycle. Whilst the PSI produces a ‘potentially shippable increment’, the framework allows for either deployment on the PSI boundary, multiple PSI’s before a deployment or multiple deployments within a PSI. Our intent at launch was to run 8-week PSI’s and use the PSI as a planning boundary with multiple deployments happening in any given PSI.

The first difficulty we encountered was feature shaping. Given that the standard guidance for a feature is ‘small enough to be delivered in a single PSI but large enough to represent a significant outcome for the business’, there was a clash between good feature shaping and deployment window requirements. Available deployment windows were specified by enterprise release management, with 1 per month for minor deployments without significant integration requirements and a varying cadence for ‘enterprise deployments’ with major integration involved. Further, the deployment window for a feature was generally dictated by external dependencies. It was impossible to find a PSI cadence which aligned well with deployment boundaries, and shaping features to PSI’s was also largely impossible as we would regularly encounter features which would need to be deployed 1 iteration into a PSI but needed 2-3 iterations to implement.

Feature shaping was difficult but not insurmountable – in the end, we could have resolved it by allowing ‘cross-PSI’ features whilst still maintaining strong size limitations. However, the ‘bridge-too-far’ lay in establishment of sufficient funded pipeline. Once again driven by enterprise level gating cycles and funding processes, an 8 week PSI that fully funded and committed the entire release train was just not emerging.

By this point, however, we had established significant momentum with every aspect of the release train other than the PSI launch. The pipe was operating, the work was flowing into the teams, deployments were going out the door and momentum was tangible. One large leadership workshop later, we made the call: “know that PSIs will start eventually, but in the meantime develop compensating mechanisms and keep maturing the release train”.

Our focus then turned to compensating mechanisms – what were the key things we lost by not having the PSI, and how could we utilise our knowledge of the principals to achieve the purpose. To my mind, we identified the first 3 of the 5 key ingredients initially:
  • The power of having the whole release train in a planning workshop together from the perspective of developing a sense of ‘release train as one large team with shared outcomes’ as well as ‘release train as team of teams’ 
  • Employing a deliberate set of disciplines to groom out 4 iterations worth of backlogs at the release train level and ensure that visibility, dependency, risks and issues were fully articulated at the story level for all active features. 
  • Release train level ‘Inspect and Adapt’ cycles 
  • Maintenance of a strategic view of capability buildout on the platform that effectively capitalises on synergies between features. 
  • Strategic showcasing and communication of PSI level outcomes to draw together the ‘whole of product’ view of progress for broad stakeholder groups and avoid fragmented and isolated demonstration and communication feature by feature/epic by epic to narrow stakeholder groups. 
I’d like to write that we rapidly implemented effective compensation strategies, but I’d be lying. Being brutally honest, I’d suggest that we initiated effective strategies for the first and third items immediately. Time and painful lessons triggered focus on the second and fourth, and the fifth is currently in the spotlight.

The first strategy was the concept of ‘PSI-lite’, better known as ‘Unity Day’. A 1-hour ‘whole of train’ session at the start of each sprint aimed at building a sense of ‘train as team’ and ensuring everyone started the sprint on the same page. These have been incredibly powerful from a cultural transformation perspective. They are always attended by numerous visitors, and repeat attendees regularly comment on the tangible change in atmosphere over time.  (for a deeper treatment, see this post)

The second strategy involved the establishment of ‘Team Loco 131’, composed of the extended leadership group and functioning as a ‘program level continuous improvement team’. Their stories are generated through learning generated by various forms of retrospective, and the leaders are fully bought into the lean management model of taking responsibility for improving the system of work.

Whilst the story of addressing backlog grooming challenges is too long and torturous to describe here, I will offer the following. Many organisations find the investment of putting a whole program and set of stakeholders in a room for 2 days planning every 8-10 weeks unpalatable. For Strategic Delivery, this commitment was one of the major sticking points in management workshops – although the general manager bought in and was willing to support it, most of her leadership group felt it was unjustifiable. I’ve heard similar stories from many other SAFe Program Consultants. Our journey, painful lessons and ongoing investment required to compensate in the area of release train level backlog grooming/planning have utterly convinced me of the vital importance of this activity. To put my systems thinking hat on, failure to implement effective PSI planning creates an insane amount of failure demand across the entire release train. Weighing the cost of the failure demand against the investment required for the PSI planning days provides a scale very heavily tipped towards the investment. 

The Program Kanban

Before digging into the specifics of how the Program (or Epic) kanban operates today, there are two final points to cover off. The first involves a holistic view across both the initiative level kanban system and the epic level. Astute readers will notice a number of glaring inefficiencies between the two. Whilst I’ll cover this in a little more detail in the concluding post for the series, I will offer a reminder at this time that the purpose of explicit visualisation in a kanban is to bring to light waste and opportunity.

The second point is that whilst this blog series focuses on wall visualisations, every wall depicted is replicated in Rally with far more underlying detail. The Rally Portfolio Management capability is a key enabler in effective management and measurement of the system, whilst the walls are vital for tactile insight and facilitation of communication.

The Epic Kanban is, obviously, dedicated to the flow of an Epic through the release train value chain.  Every card on the wall is an Epic.   The value chain is divided into 5 phases, each with a number of states:
  • Initiate Phase: Transition from demand management and connection with stakeholders.   
  • Discovery Phase: Feature elaboration and estimation refinement
  • Launch Phase: Approvals, funding and preparation to drop into teams for implementation
  • Evolve Phase: Implementation
  • Operate Phase: Deployment and follow-through to ensure happy users and successful transition to operations.
One of the primary aspects of the phases is to indicate the flow of responsibility between APMS, Development Services and Deployment Services. For any given phase, one group will be ‘moving the Epic’ and the others will be either supporting or being kept informed.  You'll notice on the photo there are phases which have "rows" as well as "columns".  The rows indicate teams, whilst the columns indicate states.  Any state which has rows is one where a feature team has the ball - you'll notice photos in the start of each swimlane in the "Prepare Discover" state indicating which team is which.

On the day PSI’s become viable for this train, the phases and states will change very little. Initiate & Discover effectively involve the identification, sizing and prioritisation of features in preparation for PSI injection and maintenance of roadmap, Launch equates to locking in PSI content and executing PSI planning, and Evolve is delivery of the PSI.

The Initiate Phase

This phase is led by APMS and basically loads the Epic onto the train. The primary goals are as follows: 
  • Execute handover with the demand management group (‘PMO’ and ‘Validate Entry’ states) 
  • Connect with the Epic Owner, ensure their value drivers for the epic are clearly articulated and introduce them to the operating model (‘Envision’ state) 
  • Identify candidate feature owners ( ‘Envision’ state) 
  • Determine the feature team which will execute discovery, the sprint when it will occur, and ensure logistics are organised for discovery to proceed smoothly. (‘Prepare discovery’ state) 
This phase is the primary evidence of the response to ‘APMS Challenge 2’. It explicitly limits the amount and nature of the work done by APMS before connecting the Epic and Feature owners with the teams for further analysis. In effect, it eliminated roughly 70% of the preparation work that was initially being conducted.

Looking at the wall, you'll see one epic that's just landed from demand management (with the smurf avatar on it indicating the project manager who's pulled the card).  One is in the envisioning state, one has completed envisioning but is blocked from being pulled into discovery preparation (the big red "X" on it).  Finally, there is one epic being prepared to drop into Discovery with the Astrotrain team.

The Discovery Phase

Discovery is led by the feature teams, and involves working with the Epic and Feature owners to ‘discover the features’ and explore the feature level requirements sufficiently to articulate and size the outcomes. A feature team will typically have roughly 10% of their sprint capacity dedicated to discovery work, with discovery for most epics concluding within a single sprint. The flow of states is as follows:
  • Kickoff Discovery: Workshop with Epic Owner and all Feature Owners to gain understanding of the vision for the epic, determine the exploration workshops required, identify key risks/issues and ensure the stakeholders understand the discovery process.
  • Explore Features: Detailed workshops with feature owners to validate the feature composition, high level stories and feature level acceptance criteria.
  • Consolidation: Consolidation of the outputs from feature exploration workshops, estimation and preparation of a high level release plan.
  • Discovery Outcome: Presentation back to Epic and Feature Owners of consolidated discovery outputs for validation and agreement of outcomes
The key outcome of Discovery is a set of prioritised, well-groomed features which have been estimated to a +- 30% level confidence and are ‘ready to play’ for implementation. The other potential outcome is an ‘Epic split’. Once an Epic starts to look larger than half a million dollars, the preferred approach is to identify the highest priority features for initial implementation and carve off the remainder into a new Epic for a future release. In PSI parlance, it’s making sure no more than roughly a PSI worth of features are progressed at once on the Epic.
Looking at the wall, you'll see Astrotrain finalising Discovery for one Epic, Kaizen (3rd row) finalising one and kicking off another and Maglev consolidating one set of outputs whilst kicking off another.  It's  unusual for a team to have more than one Epic in Discovery at a time, in this case it's likely that they are small Epics which minimal feature elaboration required.

Launch Phase

This phase is reasonably self-explanatory. The focus returns to APMS, who look after gaining the appropriate approvals of the discovery outcomes and funding to proceed. Whilst in an ideal world this phase would be traversed rapidly to achieve a smooth flow from discovery to evolve, the reality is epics regularly sit here for significant periods whilst business cases gain approval and enterprise gating cycles complete. It has generated one simple and obvious insight – the better the Initiate and Discovery phases are executed, the shorter and smoother the launch phase process.

Launch concludes with evolve preparation. The feature team or teams to execute evolve are identified, logistics are organised for the release planning session at the start of evolve, the commencement sprint is locked in and the feature teams re-familiarise themselves with the discovery outcomes.

The key aspect to evolve preparation is feature team selection. Obviously, the ideal scenario is for the entire epic to land with a single feature team and for the evolve feature team to be the same one which executed discovery. However, reality intervenes. Whilst evolve capacity availability is utilised in selecting the discovery team, when the launch phase is protracted that team is often not available for evolve. In these scenarios, some capacity is reserved in the backlog of the discovery team to ensure they are available to support the kickoff and rapid knowledge uptake of the evolve team(s). Likewise, where the Epic is too large to be executed by a single feature team, work will be spread across feature teams on a feature by feature basis.

Looking at the photo, you'll notice a few things.  Firstly, queues are very visible :)  Secondly, although the APMS team runs a kanban system for their work (as hinted at by the epic level cumulative flow diagram and control chart on the wall), they also track the sprint (or iteration) cadence.  Their priorities on any given day are driven by where in the sprint cadence the feature teams are.   Finally you'll see the cards with post-its on them ('1', '2' and '4').  These would have been identified at the standup that day as needing urgent attention.

Evolve Phase

Implementation and deployment of the epic will be covered in the next post, so I will keep this section intentionally light. The states are as follows:
  • Plan Release: Execute the release planning which would normally occur in a PSI planning session. The stories identified in discovery are broken down into ‘sprint-size’ and scheduled across sprints based on available capacity to create an initial release plan. One of the key outcomes of this workshop is establishment of working agreements with the feature owners. They know which sprints their feature will be active in, when acceptance testing for their feature will occur and can plan their availability accordingly. 
  • Implement: Build and acceptance testing of features. 
  • Integrate and Prep Deployment: This phase has a split personality depending on deployment type. For an ‘independent deployment’, it takes roughly a week and involves standard integration, deployment hardening and operational handover preparation activities. For deployments associated with an enterprise release (75-80%), it lasts 4-5 sprints. The work cannot be accelerated, as the timing and nature of involvement is dependent on enterprise level co-ordination. Teams thus wind up with a trickle-feed capacity reservation during this time. 

Looking at the wall, you'll see Astrotrain juggling 3 Epics in implementation, with a 'stakeholder concern flag' on one probably giving hints as to why another Epic is being prepared to drop into the team.  Moving down the board, Jacobite are implementing a single (large) Epic, Kaizen have one in implementation and are most likely getting close to done there based on the fact they're in release planning on a second.  Maglev have a more even spread, with 5 epics at various stages of Evolve.

Fans of limiting WIP (presumably most readers of this blog) are probably looking at this picture and saying "why aren't they finishing one Epic before starting the next".  My first, somewhat glib, response is "isn't it great that it's so obvious".  On a more serious note, in a highly integrated world you often can't go at full pace on a single initiative.  It has fast times and slow times as you synchronise with your interface partners and resolve other such dependencies.  Having the flexibility to deal with this has been a very important aspect of the framework success.

Operate Phase

Under the leadership of deployment services, the epic is deployed and in operation.  Whilst numerous ‘tidy-up’ activities around production verification testing and governance closure occur here, the key component introduced has been the Epic retrospective.  This closes a significant gap in the PSI Inspect and Adapt cycle.  Whilst numerous ‘release train’ feedback cycles existed, what was missing was the stakeholder learning.  The feature team holds a retrospective with their epic and feature owners looking at the ‘whole of lifecycle experience’, and an invaluable source of strategic learnings is introduced.

The second state is ‘Spanked’. It is the cause of much hilarity amongst visitors, but has become part of the cultural identity of the release train. Originally introduced for some personality on a team kanban wall by an English scrum-master, it stands for ‘done done’. The Epic’s in production, the customers are happy, and the team has ‘spanked it’.Looking at the wall, it's fairly easy to conclude that it's not long since a major deployment window.


The daily rhythm of all the groups will be covered in the next post, but it’s worth making a few concluding comments here. The first is to play back to Dean’s concept of the release train as a ‘self organising program’. Effective visualisation of the epic lifecycle is a vital ingredient, as it generates so much ‘knowledge at a glance’ of the state of play of the entire value chain. Within seconds, the group can identify what epics are where in the value chain, who has carriage, where the queues are and what needs focus. And of course, metric analysis through cumulative flow and cycle time charting at the Epic level. All of this across a system of work that at any given time generally has 30-40 epics somewhere in the value chain.

I'll close with a hint of things to come when we talk about results in the final post for the series. You might remember a mention of 18 project managers in the initial context at the beginning of the post - there are now 3.

PS Part 4 in the series is now posted and deals with the program level feature wall and in-play work.

No comments:

Post a Comment