David Anderson Headshot
Ask a question!
Voice an opinion!
Join
Agile Management
Yahoo! Group
 
 
 
 
 
 

Lessons Learned from Eli #3

Wednesday, Jun 30, 2004
 

Don't Assign Blame

Don't assign blame or point fingers when complaining about (ah hem, explaining) your delivery problems. It's all too easy to point the finger at someone elsewhere in the value chain and say, "I can't get my job done because _____ doesn't deliver _____ for me."

Eli Goldratt would prefer that we teach managers to express this in a less confrontational style using the language of variation and conformant quality. "I can't deliver to expectations because my inputs suffer from [this] excessive common cause variance and [these] specific special cause variances."

What Goldratt is effectively asking us to do is to plot each element in the value chain in relation to Wheeler's States of Control matrix. Hence, we might get explanations which look like, "I can't deliver an architecture with conformant quality expectations because my input, the requirements, exists in the Brink of Chaos State."

This concept asks us to define the notion of "conformant quality" at each step in the value chain. Remember, we get to choose the definition of "conformant quality". If we want to conform, we can always lower our standards. The agile movement requires us to do this. The industry standard for "success" when used in the context of "How many IT projects are successful?" is defined as "on-time, on-budget, with the required scope". The agile movement argues that there is always too much uncertainty in the scope for it to be brought under control. Hence, we should accept, the notion of "on-time and on-budget, with most of the scope" as the new standard for conformant quality.

How might it be possible to tighten up our definition of conformant quality and maintain that "on-time, on-budget with agreed scope" definition? Simple, as I pointed out, on Monday, reduce the batch size. With a smaller batch size, it is more likely that requirements will not change during the processing time and hence, the system will remain under control and delivering conformant quality. If we increase the batch size, the system moves out of control forcing us to lower our standards.

 
 

Lessons Learned from Eli #2

Tuesday, Jun 29, 2004
 

Resistance to Change

Eli Goldratt described some of the reasons why people resist change and what it is about the culture of an organization that creates an environment that molds such people. I realized that he was talking about what Jerry Weinberg has described as Level 1 and Level 2 organizations - the hero developer level and the hero manager level. The hero is cast in the role of firefighter and they are the hero because they deliver. They deliver by putting out fires. As a result they are rewarded for putting out fires and praised and admired by their colleagues as a champion firefighter. The more fires, the better practiced they become at putting them out and the more admired they become for putting them out. As a result, they measure their self-esteem by the prowess at putting out fires.

Hence, the hero firefighter learns to thrive on chaos. Chaos is the norm in the organization and the hero is the master of chaos. The one who parts the seas and delivers the team from the perils of chaos and delivery of non-conformant quality.

A hero does not want to move to a controlled state because their self-esteem will drop as they are no longer praised for being a hero. A controlled organization is one that no longer needs heroes!

What is the injection which solves the core conflict here? The senior management must start to reward people for behavior which is congruent with controlled performance and they must build self-esteem around that behavior. The heroes must be coached and assisted to adapt to a new pattern of behavior - one which anticipates and absorbs uncertainty rather than one which heroically reacts to it.

 
 

Lessons Learned from Eli #1

Monday, Jun 28, 2004
 

Last week on July 24th, Eli Goldratt gave one of his Viable Vision Tour seminars here in Seattle. I picked up a number of little insights from some of his more subtle comments. I'll be documenting these over the next few days.

Small Batch Sizes

It was Donald Reinertsen who taught me that small batch sizes (coupled to a focus on quality) are often enough to bring in big results. In other words, forget gathering data, identifying the constraint and so forth, simply reduce the batch sizes and focus on quality assurance (not quality control) and things will improve immensely - and, ergo, your client (if you are a consultant) will be happy.

Eli Goldratt put it this way, "often reducing batch size is all it takes to bring a system back into control". This ought to have been obvious to me - a trained control systems engineer - because I learned it in college. To bring something back inside the control envelope simply reduce the amplitude of the signal. In this case, the amplitude is the batch size in the process. In layman's terms, when you are going too fast, take your foot off the gas and slow down.

In most human endeavors we reward control under high amplitude conditions. We reward the fastest drivers, the fastest runners, the fastest mountain bikers, skiers, speed skaters and the list goes on and on. We intuitively know that control under high speed is hard. Control under any high amplitude signal is hard - so we base measurements for graceful subjective sports such as ice skating, diving, most X-Sports such as BMX biking for the style under the height of the jump and the speed of the movement or rotations. In industry and project management it is the size of the process or project which represents the amplitude in the control signal. Hence, maintaining control with large batch sizes is hard. When something is out of control then it is by definition, failing to deliver conformant quality and the customer isn't happy.

Small batch sizes are the first step on the journey towards a TOC solution. [I've mentioned this idea before, Is DBR the transition step to CCPM?]

 
 

FDD Six Sigma #3:Global 8D

Wednesday, Jun 23, 2004
 

In the past two days posts, I've looked at the DMAIC and DMADV processes in Six Sigma which deal with essentially common cause systemic variation. Six Sigma has adopted a number of root cause or problem solving methods for dealing with special cause variation. Perhaps the most popular of these is known as Global 8D (or Ford 8D). The method was first documented in 1987 by Ford Corporation in a document titled "Team Oriented Problem Solving". The idea is not new and advocates of Lean Manufacturing will recognize similar concepts exist in Toyota approach to management. The 8Ds are as follows:

  1. Use a team approach
  2. Describe the problem
  3. Contain the problem
  4. Identify/Define and Verify Root Cause
  5. Choose Corrective Action
  6. Implement Corrective Action
  7. Prevent Recurrence
  8. Reward (or Congratulate) the Team

There is a strong argument that 8D can also be used as a method for improving common cause variation where it exceeds desirable limits - when a process is in the right-hand column of the Wheeler matrix - in either the Threshold or Chaos states. Excessive common cause variance also has a root cause and root cause analysis and fix is at the heart of the 8D method.

Global 8D with FDD and Agile Management

I've written about why I introduced daily standups to the FDD process in, Morning Roll Call. This daily meeting was first introduced as a mechanism for recovering a project from a special cause variance. It's a team meeting where each member of the team is encourage to surface issues impeding them. An experienced and knowledgeable chief programmer or project manager will be assigned to the issue. So that covers the first D!

"If help was needed, a more senior and experienced developer was assigned to help out for that day."

The problem is described briefly in the standup. A subsequent meeting will be scheduled later if detailed description is needed. That covers the 2nd D!

The extent of the problem can be identified against the Feature List and the affected Feature Sets and Chief Programmer Work Packages can be identified and marked as "blocked" in the Knowledge Management System and an issue raised in the log with the assigned leader listed against it. Blocked developers can, if possible, be assigned new work, though this grows the DIP inventory. The problem has been corralled and contained. The 3rd D!

There isn't a prescribed root cause analysis mechanism in FDD but there are tools in the toolkit - the TOC Thinking Processes and the use of the Domain Model as a communication tool. The 4th D!

Corrective action will be chosen by consensus amongst the Chief Programmers, project manager and other senior team members e.g. chief architect and development manager. Implementation will be worked into the Plan-by-Feature plan and corrective work allocated at a subsequent daily meeting. The 5th and 6th Ds!

What's really critical to understand about the value of daily meetings is there preventative ability. If the team can see an issue ahead, then they can raise it before it creates a special cause variation. This allows the project manager to prevent it before it occurs. I've made this observation before in response to Phillipe Kruchten's observation that my CFD chart, for a project earlier this year was near linear, Getting to Linear. This was possible because the team prevented special cause variation from affecting the schedule through early warning at daily meetings. The 7th D!

FDD has other ways to prevent re-occurrence of problems - the checklists for the ETVX templates for each of the 5 processes. For example, if there is a problem with tight coupling of the architecture which incurred refactoring late in a release and caused a drop off in the throughput on the CFD and usage of the Critical Chain project buffer then that can be addressed by introducing new guidelines in to the design and code review standards. These guideline review meetings are typically held once per month with all chief programmers and architects present. Someone is assigned to own the process templates and the guideline documents. Typically, I give this person a special role on the team - the process guru or lawyer. Everyone on the team knows who this person is and goes to them for guidance. The process guru is expected to schedule and facilitate the regular review meetings. The built-in progressive quality assurance in the FDD processes help to insure that recurrence of special cause variation doesn't happen.

As for the 8th D! Why do we need that in a process definition? My process is something I learned from IBM - the recognition lunch! :-)

 
 

FDD Six Sigma #2:DMADV

Tuesday, Jun 22, 2004
 

A few more thoughts on FDD and Six Sigma process...

DMADV

The Design for Six Sigma (DFSS) process is known as DMADV - Define, Measure, Analyze, Design, Verify. This process is generally used for product design before manufacturing. The DMAIC process is then used to refine the manufacturing process through reduction of common cause variation. The DMADV process is designed to remove variation between what the customer wants and what actually got designed. It's a different kind of variation perhaps but it is still variation. We get variation from uncertainty and change. There is potential conflict between DMADV and the agile movement. DMADV seems to take the traditional approach of eliminate variation by "getting it right first time" rather than "responding to change". However, we could take a more liberal view of DMADV and say that we are still DMADV compliant if we can accept change late in our design. We need to be able to accept change late in order that our design matches what the customer wanted. Hence, you cannot achieve good DMADV results without accepting change in the agile fashion.

DMADV in Brief

  1. Define: Initiate, scope and plan the project.
  2. Measure: Understand customer needs and specify client-valued functionality
  3. Analyze: Develop design concepts and high level design
  4. Design: Develop detailed design and control/test plan
  5. Verify: Test design and validate it meets customer requirements

DMADV with FDD and Agile Management

Now it ought to be somewhat obvious that FDD includes many of the features required for a DMADV complaint process with its analysis and modeling, design and verification steps. But it is also clear that FDD puts a twist on it. DMA becomes more like AMD - Analyze, Measure and Define. Perhaps DMADV truly needs to become AMDDV in order to embrace the uncertainty in design problems and to acknowledge the variability inherent in design problems.

Define:
In FDD we define the overall scope as a set of Subject Areas in which we'd like to develop a system. However, the scope is not agreed until after process 1 - Modeling - and process 2 - Build a Feature List - are completed. Both of these processes are defined using an ETVX (Entry, Task, Verification and Exit) template. Hence, there is quality assurance built in to the method which is compliant with the spirit of Six Sigma and DMADV. After process 3 - Plan by Feature - there is an agreed scope and plan for the project. Hence, FDD achieves a DMADV compliant Define step as an exit criteria from process 3 - Plan By Feature.

Measure:
In FDD we measure Features as the basic unit on design in process (DIP) inventory and the basic unit of client-valued function. A Feature List - a definitive list of what will be measured in the project - is an exit criteria from process 2 - Build a Feature List. Process 2 has an ETVX template and has quality assurance in the spirit of Six Sigma designed into the method.

Analyze:
FDD process 1 - Develop an Overall Model - is the analysis step which enables the precise definition of the scope and the identification of Features for tracking and measuring the project. The domain model provides a high level design and equivalent interaction architecture modeling for the UI completes the design concepts. The exit criteria for process 1 provide Analyze step for Six Sigma.

Design:
Process 4 and 5 in FDD - Design by Feature and Build by Feature - represent the Design step for DMADV. Again both process have ETVX templates and quality assurance is built in to the process.

Verify:
All 5 FDD processes have verification steps in their ETVX templates. Inspections and cross-functional review are present at each stage in the process. Ultimately quality control testing is applied to validate that the agreed scope was delivered in the working software.

My overall feeling is that FDD is very compatible with DMADV though it can't be mapped to it precisely. I feel that FDD's 5 processes are more attuned to the agile goals of responding to change and delivering frequent, tangible, valuable working software. DMADV seems like a traditional "big requirements up front" process which denies the underlying variability and uncertainty in design problems. AMDDV - Analyze, Measure, Define, Design and Verify - might be a better Design for Six Sigma (DFSS) process because it better reflects the roots of Six Sigma in the fundamental understanding or variation.

 
 

FDD Six Sigma #1:DMAIC

Monday, Jun 21, 2004
 

I want to put some thoughts down on how we might go about explaining or relating the FDD process to Six Sigma. I want to stress that this is work in progress and just thoughts at this time. Not all blog entries (in fact very few) represent anything definitive.

DMAIC

The DMAIC process in Six Sigma is used to reduce variation usually in repeating processes. DMAIC is an acronym which means Define-Measure-Analyze-Improve-Control. People have a tendency to jump on it and state that it is only for manufacturing and only for reproducing the same thing again and again. The immediate reaction is to suggest that DMAIC cannot be used with software engineering and that the DMADV (see tomorrow's entry) is the right Six Sigma process for software.

I have a problem with this assumption. Firstly, DMADV is certainly about conformant quality but it isn't really about improvement. DMAIC is the process for controlling and measuring improvement in the system. In the agile community, we are definitely interested in delivering high quality software regularly but we are also interested in creating a culture of continuous improvement. DMAIC is a process which helps us to move from right to left on the Wheeler matrix into the conformant quality column. I, personally, don't see why you can't use DMAIC with knowledge work. You simply have to treat it as something which has a wider degree of variation (than you would find in manufacturing). By accepting and understanding that wider degree of variation and defining the notion of conformant quality accordingly, you can make progress.

DMAIC with FDD and Agile Management

I see the use of DMAIC with FDD and Agile Management as primarily for measuring variance in estimation, productivity and quality. Using my 5 point feature complexity point scale, we can use DMAIC to both refine the estimation technique and monitor both productivity and quality.

Define:
Firstly, let's expand the definition of a Feature to include Coad's usual template, <action> a|the <result> of|from|to|by|with|for a(n) <object>, but also to include business rules, [Oct 16th 2003], (using the Ross and Von Halle template) and task flows (using my Statechart driven approach or Larry Constantine's Task Cases approach). Now we can define what we want to track for quality purposes as Features including Business Rules and Tasks Flow Definitions. Further we can track Feature Complexity Points (FCP) - the inventory to be tracked, and our estimating technique which converts FCP's to man hours.

Measure:
We will measure Features complete per developer week, defects per feature, Critical Chain buffer usage - variance in actual versus plan,  and variance out with control limits of all three measures*

Analyze:
We will analyze the Cumulative Flow Diagram, Control Charts* derived from it, the Critical Chain buffer usage against a temperature chart rating, and the Issue Log - both growth/decline in issues and status of issues

Improve:
At any time, we may choose to make changes to the system - these could be development method changes or more simply changes to codification for Feature Complexity Points or adjustments to control limits or the level of effort conversion table.

Control:
For control we will use daily standup meetings using analysis data from CFD, Control Charts* and Critical Chain plan versus actual, and the Monthly Operations Review and/or Project Retrospectives to analyze longer term trends

[*It occurs to me that I haven't published the work I'm doing with Control Charts at this site yet. It gets its first airing at the Motorola S3S next month.]

 
 

Six Sigma as the Agile Future?

Saturday, Jun 19, 2004
 

My recent posts discussing the importance of understanding variation can help us to explain and relate agile development in terms of Six Sigma - a process of continuous improvement mostly used in the very big companies such as General Electric and Motorola.

Defining Six Sigma

Six Sigma is a method of management for continuous improvement which understands variation. Most people associate Six Sigma with quality because its name is rooted in the notion of less than 4 defects per million opportunities. However, the practice of Six Sigma requires the deep understanding of variation and the steady elimination of special cause variation and reduction of common cause variation from a process or system. Quality improves as variation is eliminated and reduced. The Wheeler Matrix helps us to understand that. Conformant Quality is defined in the left-hand column and to move from the right-hand to the left-hand column requires the reduction of common cause (or systemic) variation.

Six Sigma is rooted in the work on variation done by Shewhart and in the work in quality by his successors such as Deming and Juran. There is another management method rooted in the work of Shewhart, Deming and Juran which also strives to achieve continuous improvement - Lean or the Toyota Production System. There is now work on-going to consolidate these two branches of management science into Lean Six Sigma and this comparison of Lean Six Sigma with CMM.

We are seeing more members of the agile community being influenced by the work of Deming and talking about very low defect levels. Kent Beck has starting talking about goals for TDD such as 1 defect per quarter. Martin Fowler has also talked about a Very Low Defect Project and observed that this is a trend amongst good agile teams.

If on the one hand, we have the agile development crowd moving towards Deming quality assurance methods and very low defect counts and on the other hand, the agile project management crowd moving towards probabilistic methods such as critical chain which embrace and understand uncertainty then ultimately is the agile movement moving towards a definitive Six Sigma solution for software engineering? Is anyone shocked or surprised by this trend? Comments please...

 
 

Microsoft and Six Sigma

Friday, Jun 18, 2004
 

There was some chat in my Yahoo! group recently about Six Sigma applied to software engineering and one specific question about Microsoft and what if anything they may be doing with Six Sigma. Microsoft aren't so much as adopting Six Sigma fro software development - this would truly have surprised me - but rather offering a product to help their customers implement Six Sigma. Here are the details. It seems that Microsoft is adopting Six Sigma in its operations and fulfillment side, i.e. stuff they need to do to ship products but not software development related. [Updated: May 5th 2005]

Now, if only I could get them interested in some of my recent work on the underlying theory of variation and how it relates to agile development, then that might really be interesting. Hmmm...

 
 

Drive Out Fear!

Thursday, Jun 17, 2004
 

In Deming's Theory of Profound Knowledge and his 14 Points for Management, he emphasizes the importance of driving out fear from an organization. Driving out fear is so important to the functional (as opposed to dysfunctional) effectiveness of an organization. Deming underpinned his Theory of Profound Knowledge in the statistical methods of process control. He observed that "some of the greatest contributions from control charts lie in areas that are only partially explored so far, such as applications to supervision, management, and systems of measurement..." [Shewhart 1986] In other words, Deming liked the idea that someone would come along at a later date and apply his theories to areas like software engineering.

Wheeler's 4 States of Control (see chart from yesterday) and in particular the Threshold State help us to understand how it is possible to reduce fear in an organization. The Threshold State says that the system (of software engineering) delivers non-conformant quality, i.e. the project is late, or over-budget, or dropped scope, or has a higher than acceptable defect count, or perhaps all of the above, but that there was no assignable cause variation. We all know that nonconformance is the norm in the software engineering world. In fact, it's dominant in about 4 out 5 documented cases. So there is reason to be fearful. How can you drive out fear in a world where non-conformant quality is the norm?

Understanding variation is the vital ingredient in driving out fear - Deming's second element in his Theory of Profound Knowledge. Management must understand variation and know how to separate out common (chance or systemic) cause variation from special (assignable) cause variation. Management must also be responsible for educating staff on variation and helping them to identify it and report it. Let there be fear at the staff level only of assignable cause variation and then only of assignable cause variation to which they made an inadequate response. As I stated back in September, in Special Cause Truck Grounding, there is no point in assigning blame for special cause variation which was beyond someone's control. And there is never a cause for assigning blame for excessive common cause variation as seen in the Threshold State.

Management, on the other hand, must carry the burden for that common cause variation beyond the limits of control in the Threshold State. It is all too easy for management to deflect blame from themselves and make false claim to an assignable cause for variation which exceeded the bounds of the prediction in the project plan. How many staff live in fear that their manager will blame them for something over which they had no control? Most current software development methods which root their definition of client value in use cases or stories or loosely worded requirements documents suffer from wide, high tolerance variation. This means that buffers in plans have to be large or the plan is at risk. Even if these projects are profoundly successful at eliminating special cause variation through use of techniques like those described in the Scrum method, then at best they exist in the Threshold State.

Management can drive out fear by accepting responsibility for the system of software engineering and responsibility for non-conformant quality. They can reduce their own personal risk by gathering data and reporting it transparently - don't give someone else the opportunity to claim false assignable cause for non-conformant quality. By learning to recognize and report when the system of software engineering is operating in the Threshold State, the Brink of Chaos State or in Chaos, a manager can eliminate fear from the staff and increase the likelihood that they, as a team, can bring the process to the Ideal State over time. Only then can they start to use Quality as a Competitive Weapon.

 
 

From Change to Variation Part 2

Thursday, Jun 17, 2004
 

Here is the final text extracted from my forthcoming article at the Cutter It Journal. This section deals with why understanding variation ultimately allows us to embrace change. Comments welcome...

Common Versus Special Cause Variation

Walter Shewhart first classified two types of variation from his work at Bell Labs in the 1920's. He called them "controlled variation" from chance causes and "uncontrolled variation" from assignable causes [Wheeler 1992]. Edwards Deming later modified this terminology to "common cause variation" and "special cause variation," and it is these terms that are most commonly used today [Wheeler 1992]. The teachings of Shewhart, Deming, and others in the field of statistical process control are at the foundation of the management theory called Six Sigma, which seeks to create a system of continuous improvement through the reduction of variation. Another disciple of Shewhart, Donald Wheeler, classified what he called the "four states of control," as shown here.

The four states are divided into a 2x2 matrix, with the rows representing common (or chance) cause variation and special (or assignable) cause variation. The columns represent conformant quality and nonconformant quality. For project management, we might decide to define conformant quality as all functionality is delivered on time with a defect count of less than two, Severity 3 (or lower) bugs per 100 function points of scope.

Embrace Change - Embrace Uncertainty - Understand Variation

It may not be immediately obvious why understanding variation is important to being agile. Kent Beck asked us to "embrace change" as the subheading in the title of his Extreme Programming Explained [Beck 2000]. The Agile Manifesto asks us to "respond to change over following a plan". This seems to place an emphasis on reacting (to change) rather than controlling against a plan. Traditional critical path plans have a deterministic basis but project task durations cannot be calculated deterministically - they exhibit probabilistic behavior. In other words, project task durations are uncertain and over a sample set will exhibit variation. Shewhart and his followers, Chambers, Deming and Wheeler, have helped us to understand variation. By understanding it, we can use it to embrace uncertainty and consequently embrace change through anticipation.

Understanding Variation

It is worth considering very carefully the applicability and meaning of Shewhart's original terms, chance and assignable cause, to software engineering project management. Assignable cause variation is, by definition, identifiable. Assignable cause variation is the stuff of issue logs and risk management plans. If you can point at it or give it a name or describe it, then it is probably assignable (special) cause variation in your project. Chance cause, on the other hand, cannot be identified. Chance cause is endemic to the process or system of software engineering. Chance cause is the idea that it took 1 hour 20 minutes to design Feature 167 whilst it took 2 hours and 10 minutes to design Feature 168, which was estimated as being of similar complexity. Chance cause relates to how the work is done - the mechanism, the system dynamics.

Recalling the definition of the responsibilities of the engineering manager (text omitted in this extract), it is clear that chance cause variation is rightly the problem of the engineering manager. Chance cause variation is caused by the system dynamics and the engineering manager is responsible for the system - the team of engineers and their methods. As shown in the figure above, chance cause variation is reduced by changing the system, resulting in a movement of the system from right to left on the diagram. Assignable cause variation must be eliminated (not merely reduced) in order for the system to move vertically from bottom to top on the diagram.

References

[Beck 2000] Beck, Kent, Extreme Programming Explained - Embrace Change, Addison Wesley, New York NY, 2000
[Wheeler 1992] Wheeler, Donald J., and David S. Chambers, Understanding Statistical Process Control, SPC Press, Knoxville, Tennessee, 1992

 
 

From Change to Variation

Wednesday, Jun 16, 2004
 

Embrace Change - Embrace Uncertainty - Understand Variation

I realize from the review comments for my Cutter IT Journal article (the draft is still available from my Yahoo! group) that it is difficult to see the linkage in the sub-heading above. So I'm having to make some changes to the article.

First off I was asked to justify how a paper about variation addresses the Agile Manifesto choice of  "respond to change over following a plan".  The argument went that understanding (and buffering for) variation seemed to be a big planning up-front approach when the manifesto was asking us to manage by reaction to events. So I turned to the members of the Yahoo! group and they collectively educated me that I was being asked to answer the wrong question. The manifesto is not asking us to abandon planning but to prefer change to rigorous adherence to a plan. There is probably also an implication that the plan is a traditional deterministic critical path plan. Hence, a probabilistic, variation aware plan is perfectly agile.

Being agile does not mean that planning is no longer important but that plans must be able to accommodate change. I've talked before about the Marvin Patterson's [Patterson 1993] model of design as a process of information discovery (see my Overview of Agile Management slides). A completed design represents perfect information. I've also suggested that the inverse should be design is a process of uncertainty reduction. Brad Appleton brought Philip Armour's work [Armour 2003] to my attention, Software is not a Product, and The Five Orders of Ignorance. He uses the word "ignorance" instead of "uncertainty" but it seems to amount to the same thing. In Armour's model, software development is the gradual elimination of ignorance until working code is delivered.

More on this topic tomorrow...

Thanks to all of those who contributed in thr group and helped clarifiy my thinking on this - Brad Appleton, Bill Walton, Ron Jeffries, Jim Highsmith, Norbert Winklareth, Dean Schulze and Lowell Lindstrom ( I hope I didn't miss anyone).

Reference

[Armour 2003] Armour, Philip G., The Laws of Software Process: A New Model for the Production and Management of Software, Auerbach 2003
[Patterson 1993] Patterson, Marvin, Accelerating Innovation: Improving Process and Product Development, Van Nostrand Reinhold, New York NY 1993

 
 

Correction Page 145

Monday, Jun 14, 2004
 

Thanks to Bill Ramos for spotting the confusing typographical error with the market segmentation tables from Chapter 16 - Agile Product Management. You will find these on page 145 of the book. The ordering of the numbers was transposed in table 16-3 which made following the Throughput Accounting product mix example rather difficult. Bill also observed that the chosen table style from the book doesn't make the rows and columns obvious. These two revised tables correct the error and make the market segmentation matrix more obvious.

 
 

more thoughts on Scale

Thursday, Jun 10, 2004
 

On Sunday, Economy of Scale, I discussed the advantage that big outsource shops have over smaller ones because (assuming process maturity) they can more cost effectively buffer against special cause variation. Smaller businesses need to either buy insurance against it, negotiate the risk away in their contracts (unlikely) or simply take the risk and suffer the consequences.

I've been giving this some more thought this week as I consider the plan for scaling my new business. I care deeply about staff welfare and development and I like to see people keep their life in balance and proper perspective. This is not totally selfless. I genuinely believe that you get better work and higher productivity over a long term period, from people who have a well balanced life. So I would like to offer benefits such as 4 weeks paid vacation on starting - the Americans reading this will be envious whilst the Europeans will be scoffing - up to 6 months maternity leave and 1 month paternity leave within the first 6 months of the birth of a child, up to 1 month bereavement leave for immediate family members - spouse, parent, child - and so forth. At times of personal stress and personal special cause events, members of staff should not need to worry about their job, or their income.

However, consider how costly this is and the potential risk. For a small staff - say 10 people - doing one or two projects at a time, the project plan needs to identify many special cause events and calculate some probability. Then their is the issue of does the business buffer for these - effectively passing on the cost to the customer - or take the risk? Not to mention the potential financial burden. With larger corporations these concerns simply go away.

In fact, not only does the quality assurance / six sigma way of thinking about variation demonstrate that larger business have an advantage in staffing related variation but so does the Theory of Constraints which would suggest that only personnel working in the constraint (and theoretically fully loaded) can affect the throughput of the organization. Hence, variation whether special or common cause in other areas of the business will not affect throughput. If throughput is not affected then the cost of special cause variation, such as maternity leave, can be absorbed by the business. This still leaves the problem of what to do about the constraint. By insurance? Have a flexible workforce? Have a flexible supply chain with a vendor who can step in at short notice and absorb the capacity shortfall caused by the common cause variation. As most of you will know, this is not easy in knowledge work. What's the ramp up time for a new contract developer to become effective? And what about other more specialist skills such as architect or user experience designer? How for example do you provide backup for an expert consultant in the application of the Theory of Constraints to software engineering? For those who suffered calendar adjustments last week when I called to say, "I have a back injury and can't walk.... Oh it's a long story, I injured it almost 10 years ago... yadda yadda... Can we re-arrange for next week or the week after?" they must be only too aware of where the constraint is with my business.

 
 

Goldratt in Seattle June 24th

Wednesday, Jun 09, 2004
 

Eli Goldratt's Viable Vision Tour reaches Seattle on June 24th. If you are going to be there look for me and say "Hi!".

Meanwhile, in other TOC news, John Sambrook has started another constraints related weblog.

 
 

The is No "I" in FDD

Tuesday, Jun 08, 2004
 

Last Friday night, I was having a beer in a local pub with a former colleague who was relating a debate he'd been having with another friend and avid XPer about project estimation. His friend had been questioning "What do you mean you don't make the developers responsible for their estimates?" and continuing that it was key to XP that developers accepted individual responsibility for their estimates. And indeed, I know this to be true because I have read it and heard from the great and good in the XP community. However, it occurred to me in response that there is no "I" in Feature Dryven Development ;-) - it's a team sport and responsibility is held at the team level for everything. There is no such concept as doing something on your own in FDD. You don't code features on your own, you don't design on your own, you don't review on your own and most of all you don't analyze or plan on your own. Like everything else in FDD estimation is done by a team (not necessarily everyone on the project) on behalf of the project. There is a shared responsibility.

The second aspect of planning in FDD is that it isn't done on a "How long do you think this Feature will take?" basis. It's done by estimating the complexity of the Feature against a codification scheme. The combination of these two systemic approaches to estimation has the effect of moving the estimation into the top row of the Wheeler matrix. It makes it a problem of chance cause variation rather than assigned cause variation.

When Shewhart originally defined chance and assigned cause variation, he deliberately classified variance across individuals as assigned cause variation. For example, on an assembly line, if you could identify that defects came from a particular worker then that was an assigned cause - by definition - and the cure was to give the worker training to improve their quality. Hence, by asking individual developers to estimate stories, you are adopting assigned cause variation into your system. By definition, you must eliminate all assigned cause variation in order to move to the upper row of the Wheeler matrix. If you have a goal of achieving the Ideal State then you must eliminate assigned cause variation.

The argument from the XP community is that by forcing individuals to make and recognize mistakes - failure to estimate accurately being one such mistake - the individual will learn. Over time, the standard of knowledge in an XP team will normalize and assignable cause variation across the set of individuals will disappear. However, this assumes that the team remains stable. Any change of personnel is a change in the system. Any change in the system risks the (re-)introduction of assignable cause variation.

Agile methods avoid the problem of failing to achieve the Ideal State by relaxing their standards. The scope of the project is allowed to vary in order that dates and budgets can be met. In other words, scope is used as the buffer against assigned cause variation such as "Joe underestimated all his stories and we had to drop 3 of them in order to finish the iteration on time."

Using quality as a competitive weapon, I could compete against such a system by offering the customer, a guaranteed delivery date, guaranteed scope and guaranteed price. In other words, re-define the meaning of conformant quality. All that I need to do is gain the customers agreement to use some non-essential scope as an insurance premium against assigned cause variation. In this case, I'll be able to provide the guarantee because I'm buffering the chance cause variation within known control limits. My price will be more competitive because my buffers will be smaller, or my promise will be stronger, than a team that needs to buffer for assignable cause variation.

So in summary is it better to take the Mike Cohn approach and codify the method of estimating stories, or is it better to use the standard prescription and make developers individually responsible? Well I think the answer to this is going to vary by the learning and analysis ability of the individuals. There is an argument which says that the bee sting approach of individual responsibility and correction for error may work faster than trying to devise and teach a codification mechanism. On the other hand, you only need one or two people on a team to be good at codification to get results in the upper row of the Wheeler matrix - to eliminate the assigned cause variation. My personal preference is to go for codification and try to develop the skill as widely across the team as possible but rely on the best expertise to keep the estimate as a common cause system variable. If you do decide to stick with individual estimates and individually accepted responsibility then you are accepting the fact that your system operates at best in the Brink of Chaos State.

Agilists such as Ken Schwaber are on record as saying that agile methods manage at the "edge of chaos" (Wheeler's "brink of chaos") and in many cases that may be good enough. However, I know that if I can push a team to the Ideal State and turn the screws on the definition of conformant quality, I can always out perform a team operating at the edge of chaos. The last word on this belongs to Eli Goldratt...

"Don't let inertia stop you from seeking out further improvement".

 
 

More Thoughts on Conformant Quality

Monday, Jun 07, 2004
 

Back in October, I discussed the passing of the Concorde supersonic airliner, and briefly mentioned its Russian rival the TU-144. The Wheeler matrix gives us a way to understand the competitive advantage that the Anglo-French consortium had over the Russians. As I explained before, the Russians acquired the blueprints for the Concorde through clever espionage. In other words, all they had to do was build the plane. However, what they built did not fly and design modifications were required. What went wrong?

The Concorde's design had a narrow envelope of control and its wings needed to be manufactured very precisely. The Russian equipment lacked the ability to manufacture it with sufficient precision. The system of production produced non-conformant quality. The chance cause variation in manufacture was too large to ensure the stability of the aircraft in flight. The result was that it didn't fly without the addition of little winglets for added stability. It never did fly satisfactorily and, for an aircraft, this is a basic definition of non-conformant quality.

To have made the TU-144 successful the Russians would need to have changed their system. They would have needed better equipment which could manufacture the wing with higher accuracy. This is a very clear example where controlling chance cause variation provided a clear competitive advantage. As it happened, because the competition lacked the systemic ability to compete in the Ideal State, the Anglo-French consortium could have posted the plans on the Internet (if it existed in the 1960's) and their competitors would still have been unable to enter the market against them.

 
 

Economy of Scale

Sunday, Jun 06, 2004
 

Our 4 classifications of uncertainty can help us to understand some of the economics of software projects. Quite simply, the more uncertainty, the more we must buffer, if we want to deliver conformant quality, i.e. on-time, with agreed function, at an acceptable defect level.

If we have two organizations which have similar software development teams using similar methods and hence the common cause variation is similar across both organizations, then there is no leverage of quality as a competitive advantage. However, the larger organization still has a cost advantage if they can execute well enough. Why is that?

Assuming the larger organization has a functional program management office, then it can leverage its scale to provide the insurance for special cause variation (Foreseen Uncertainty) across the organization. Variation is aggregated as the square root of the sum of the squares. Hence, the larger the aggregation, the smaller proportionately the size of the buffer needs to be and as a result the lower the cost of allocating a buffer.

What this means is that big is beautiful in mature outsourcing organizations. The IBM Global Services of this world, should be able to outperform smaller competitors because they have an economy of scale. Naturally, they need to show maturity to bring their development methods up to the Ideal State and demonstrate that they can meet the bar for conformant quality - metaphorically speaking hit the ball as hard and with similar accuracy to Tiger Woods. If they can level the playing field against smaller competitors on common cause variation, then they can win on the insurance cost of buffering special cause variation.

[Most of the literature on software engineering says that there is no economy of scale. The Mythical Man Month predicts that as you add people, the costs grow non-linearly. There are few if any documented cases of the cost per Function Point falling as size increases. I do remember reading of one but I can't find the reference. If anyone remembers this please post a comment. I believe the documented case is with the National Jet Propulsion Lab or somewhere within NASA. Despite the lack of literature on economy of scale in reality, I believe that we have the framework to understand how to achieve it - build a world class agile development organization that can deliver projects in the Ideal State then mature into a world class program management organization using Critical Chain multi-project solution. Finally scale-up to take advantage of the aggregation effect of buffering special cause variation across the portfolio.]

 
 

Classifiying Uncertainty

Saturday, Jun 05, 2004
 

[Sometimes I just get it wrong. This is one of these times. The 2x2 grid below is nonsense. See my updated diagram in a more recent blog entry representing my more recent thinking on how to map De Meyer et al to Deming. I'm leaving the original as a historical record. DON'T READ THIS ;-)]

I didn't use this diagram in my Cutter IT Journal article though I created it with the intention of needing it. This is a supplement to the content of Chapter 4 - Dealing with Uncertainty. The diagram reflects the 4-types of variation identified by De Meyer et al in their article from Winter 2002 edition of MIT Sloan Management Review, "Managing Project Uncertainty from Variation to Chaos".

We can use this model to better understand a system for software engineering in a given market. This matrix can be mapped against the Wheeler matrix more or less quadrant for quadrant. Variation is where assigned cause has been eliminated and only chance cause exists within a known and understood system. Foreseen Uncertainty is where there are identifiable risks and understood issues which affect the delivery of the project but the basic market for the deliverable is understood and the business model or go-to-market strategy is understood. Unforeseen Uncertainty is a land where the system is not understood well enough to be under control. It will feel out of control and what gets delivered won't be exactly what the customer wanted or when they wanted it. This could be because the software development is happening with a new paradigm of tools or method - when teams started using 4GLs for example - but it may also occur in new markets where the model is not understood and the degree of variation cannot be predicted. Finally, there is Chaos, the land where we don't know what we don't know and we are trying to find out - neither the market, the business model, the customer base, the product features, or the technology are understood.

From a project management perspective, knowing where our project lies in this matrix is important for setting expectations. With Variation we can simply create a common cause buffer based on existing process control limits. With Foreseen Uncertainty, we create a risk management plan which may contain some mitigation approaches which amount to further buffer. In some industries this is called insurance. Insurance works just like common cause buffering but the aggregation is at a higher level. Say for example, a risk mitigation was to bring on more staff to a project, then those staff must be coming from a pool of people elsewhere in the organization. In other words, the wider organization is underwriting the insurance for the project. Unforeseen Uncertainty is a land of high risk - we don't know what we know, or our system is not understood. Playing in this quadrant could also be called gambling. As every gambler will tell you - you win some and you lose some. Many VC's will tell you that they lose up to 19 for every one they win. The Unforeseen Uncertainty quadrant is the land of venture capital. Finally, Chaos is a land where only the research budget should venture. It's extreme gambling. It's adventure rather than merely venture. It's a world where you assume you lost your money and nothing got delivered. Delivery is a bonus. It's like winning the lottery. The project management objectives for Unforeseen Uncertainty are simple - try to move the project into the Foreseen Uncertainty quadrant before the money runs out - then ask for more. With Chaos it is similar - try to move the project into either the Unforeseen Uncertainty of Foreseen Uncertainty quadrants before the money runs out. It is unreasonable to buffer a plan in the right hand quadrant. It simply costs too much. Better not to buffer, to get as far as you can with what you've got and demonstrate that you've moved to a world of greater certainty before asking for more resources.

It's easy to see from this matrix that the more uncertain, the greater agility is needed to react to change. In the right hand column, the iteration cycle must be short, the feedback loop very tight. By understanding where our project lies in this matrix we can make decisions about what represents the best iteration cycle and make informed guesses about how much we want to invest in requirements and analysis versus code, test and refactor.

 
 

World Class Velocity

Friday, Jun 04, 2004
 

The American media is winding itself up for the feeding frenzy that will be Lance Armstrong's attempt to win a record breaking 6th Tour De France this July. Meanwhile, one of his rivals, the somewhat Scottish David Millar spent a month on a voluntary ban from cycling whilst his team was investigated for drug taking. Millar gave this interview to Scotland on Sunday to express his feelings about it. The important words for me are...

It upsets me to think that people assume every pro is on drugs. Just because people can't comprehend the level of fitness and ability that some riders have, they now assume they all do it on drugs.

It really is difficult for ordinary folks to comprehend how guys like Millar - the World Time Trial Champion - or Armstrong, can ride a bike for long distances at average speeds exceeding 27 miles per hour. In fact in time trials, both Millar and Armstrong regularly top 34 miles per hour for a distance of 30 miles.

Now I'm no slouch on a bicycle on my day, and back at the time of the Singapore project I was riding mountain bike races in Indonesia, Malaysia and Singapore. On one race, I came in (a distant) 15th, behind a winner who just the previous month had picked up the gold medal in the Commonthwealth Games for the road race. Another regular racer, an Australian, had a gold medal from the olympics (for speed skating). Now these guys were about 15% faster than me - and that is a lot believe me, its huge when you have to chase after it. In turn, neither of them could have lived with the European pros who are perhaps 15% faster again, and most of them can't stay with a Millar or an Armstrong who are maybe 5% faster when it suits them to be.

These days I get around Seattle on my bike at about 22 mph. Armstrong can happily scoot around his home town of Gerona, Spain at 34 mph - a better than 50% productivity improvement over me - and people think he is doing drugs.

So is it any wonder, when I tell people (indeed, show them with metrics) that agile development team A is 5 times more productive than team B who in turn are twice as fast as team C at another location, that people don't believe me? After all, these programmers are only human - right? How can it be possible to go 10 times faster? Just what kind of drugs do you take to make that possible? Is it time we banned these prescriptions with names like FDD or TDD or XP in order to level the playing field for everyone else? Perhaps some police raids to dig out the story cards and some spy equipment to detect morning stand up meetings? It is disgusting that some developers can go so quickly. It has to be stopped!

[Update: July 7th, 2004 - In rather sad reports recently it turns out that David Millar has admitted to taking drugs on at least 3 occassions over the past 3 years and he is likely to be suspended for at least 2 years as a result of his admission. There is speculation that his career as a top cyclist is over.]

 
 

Quality as a Competitive Weapon

Thursday, Jun 03, 2004
 

Recently, I was putting together an article for the Cutter IT Journal which will appear over the summer. The article is essentially a re-write and update of Chapter 8 - The Agile Manager's New Work. If you want a sneak preview then you can pull the draft down from the Yahoo! group files section. It will be there for another week or so.

Without giving the story away, the article contains this diagram - Donald Wheeler's States of Control matrix derived out of the work of Walter Shewhart.

This chart can tell us almost all you ever need to know about the Six Sigma approach to continuous improvement. The rows show the difference between assignable cause and chance cause variation (these are better known by Deming's terms - special and common cause variation). However, Shewhart's original terms are perhaps more useful to us when considering agile management.

Assignable cause. Something which has an assignable cause is identifiable. You can point to it. As such an assignable cause problem which has not happened yet, is a risk and should appear on a risk management plan. As assignable cause problem which has happened should be on an issue log. Chance cause. Variation which is endemic to a process and cannot generally be identified as having a root cause is chance cause variation. Chance cause is the idea that Feature 167 with Complexity 2 took 1 hour and 40 minutes to design, whilst Feature 168 with Complexity 2 took 2 hours and 10 minutes to design. Chance variation in the interaction of the team discussing the design problems at hand.

The columns are named after quality but the meaning is more general than most associate with the term. Conformant quality from a project management perspective might mean "all functionality, delivered on-time, with a defect level below 20 bugs per hundred features".

So now let's consider the 4 states of control. Chaos, says that we have assignable cause variation and we have chance cause variation which is producing non-conformant quality (e.g. our estimation isn't accurate enough). Brink of Chaos, says that we have our chance cause variation under control but our assigned causes mean that anything can happen. In other words, we aren't running down issues fast enough before they hit the critical path, or our risk mitigation policies were insufficient. The Threshold state is saying that we have eliminated assigned cause variation but our chance cause variation is still not under control. Finally, an Ideal project has eliminated assigned cause variance, whilst demonstrating that chance cause variation is within tolerance and the product is conformant with our measure of ultimate quality.

Consider, Philippe Kruchten's observation of the near linear progress, from  Facilitating Near Linear Progress, in my CFD from a recent project.

This project would be considered to be in the Ideal State. As I stated in Facilitating Near Linear Progress, the project used the morning standup meetings to surface issues early and the team leads and project manager did a good job of running them down before they became critical. So assigned cause variation was eliminated. By focusing on the basics of the FDD method and doing the design, design review, code review and unit tests properly, the code quality was high, and the granularity of analysis and estimation was accurate. The Critical Chain probabilistic buffer based on uncertainty meant that the project was delivered on-time, with the agreed function with almost perfect quality.

So how do you use this framework for competitive advantage?

Well it all depends on what you define as "conformant quality". Have you ever considered what it means when a world class golfer says that he is "rebuilding" his swing? At that level of golf, a player can go hit balls all day with regularity. The idea that assigned cause problems affect play is rare. Clubs don't break, bees don't sting in mid swing, low flying aircraft pass by very seldom. Most problems in golf are chance cause problems involving the system of swing. The horizontal barrier between the Threshold State and the Ideal State is defined competitively. For example, it used to be good enough to drive 275 yards and land the ball in the fairway 75% of the time. Since the arrival of Tiger Woods, a top player now has to drive 290 to 300 yards and land the ball in the fairway around 80% of the time or better. If they can't do that then they can't be competitive. Players who can't do it are in the Threshold State - the don't deliver conformant q