The waste of variation

I have been asked to write a blog about how Lean deals with variation and resilience. I tried to write it all day yesterday but couldn’t find a narrative I was happy with, partly because there are so many ways, and so many perspectives of variation, I was struggling to fit it all into one blog without over complicating things. Thus, this blog looks at a small aspect of Lean thinking and quality improvement with a focus on variation, rather than all aspects. (I would also add, I’m out of practice on this, so my memory of my practice might be a little rusty, so let me know if I inadvertently make an error here).

First, Lean has 2 values: 1) respect for people and 2) continuous improvement through the removal of waste. In a previous blog I described the three broad categories of waste: Muda, Muri and Mura, with Muda breaking down further into the 7 (or 8) wastes. Reducing Muri, tends to also help in reducing Muda. In the blog, I focus on Muri – which kind of translates into ‘excess variation’. I use the word ‘excess’ deliberately: humans are all different, recognising that with respect, also means recognising that we will do things slightly differently and that ‘normal variation’ is ok, and can even help us learn about new ways of doing things. There is no goal to reduce people to identikit automatons, (that is not to say that is there benefit in everyone doing entirely different things that exacerbates variation either).

Variation is important and it is generally viewed as a waste because it influences the effectiveness, efficiency and quality of products, services and processes and can in some cases lead to harm. Further, some evidence (Hopp and Spearman, 2000. Factory Physics) suggests that variation will always degrade a process over time. Buffers, inventories and time, are needed to smooth out the variation, ensure flow and add in choices linked to risk appetite.^ The equivalent of a buffer (sometimes unintended buffers) in healthcare most typically would be a waiting room, waiting lists, ambulances queueing outside EDs, corridor care, and supplies stock rooms, medication cupboards and linen cupboards). Thus it follows that with more variation in a system, chances are the more inventory or longer waiting list or waiting room needed for the system to function, to smooth it out, and hide the problems and obstacles that variation represents. Variation can be measured and presented in lots of ways, commonly this is done with a measure of standard deviation and the use of statistical process control (SPC) or histograms.

In most systems, there are several sources of possible variation, so it is worth thinking about which variation we need to reduce (or try to increase if customers want more):

  • Case mix or product mix variation
  • Variation in demand
  • Process variation ( e.g. variation in practices, working methods together with capacity variation)
  • Outcome variation

There are different potential next steps to reduce these different forms of variation contingent on your goals and which sort of variation is being reduced. I can’t write about all of these in one blog, so here is a short overview. I haven’t covered outcome variation as there is masses of analysis on this elsewhere on this.

To deal with case mix/product variation, understand of the mix pattern e.g. with Glenday’s sieve, and also look for any seasonal or day/time of the week patterns. This may give clues as to any potential rationalisation to reduce the variation (the long tail) for example. It can also be helpful to look at ways of separating out the different products or services in to different flows, so that more obscure or capacity consuming cases can be streamed in different ways.

To deal with demand variation, again analysis of the current demand pattern over time is helpful, what patterns are there, and what pattern would be more desirable?, i.e. would it be helpful to have some spare capacity and redundancy? ( I worked once on a plant that had 2 of everything but only 1 ever in use, so if 1 compressor, for example, went down, there was another one that could be used). How many steps are in the process? The ‘bullwhip’ effect can make it seem that demand is highly variable. However many different step ‘owners’ adding more and more ‘just in case’ need to the demand at each step, amplifying the variability and increasing demand, making it harder for the supplier to service such peaks and troughs and variation in demand. The more steps in the chain, the greater the amplification. This is one of the reasons it can be helpful to try to reduce the number of steps in a pathway and move things closer together. Both solutions are also conveniently solutions that help to reduce safety errors too. In addition, the shorter the pathways, the better the communication, the less handoffs and errors, and the easier it is to change things quickly and responsively. In addition, the waste and delay of transportation is reduced also improving resilience in a supply chain (pathway etc.). Offshoring and long transportation requirements reduce resilience and do not easily enable the rapid deliveries needed for a Lean and resilient just-in-time delivery system.

Process variation can be reduced in several ways, again it will depend on the process, the goals and the context to decide what the best thing to do is. Some of the ways include thinking about the case mix and demand, how can the flows be separated and waste reduced from them and/or how the flow can be levelled. Methods and tools such as combination charts, yamazumi boards, Heijunka, takt time and SMED (changeover reduction) amongst other things can be helpful here. Knowing how long it takes to do something in most cases is pretty important here too. It has always surprised me that this type of data (cycle times) is rarely routinely collected in healthcare.

Another way to reduce process variation involves using standard work, 6S and visual management: three of the essential elements of a base Lean building block, the Lean cell to build in control of the process to start to reduce the variation. This isn’t about ‘control’ in the way Lean is often critiqued, that is, these elements are not designed to be coercive. In a way these elements help to operationalise the ‘control’ lines on a statistical process control (SPC) chart into practice in a visual, tangible way (I am assuming reader that you are already familiar with SPC and the way variation is defined and plotted in those charts). If, as an example, the designed and agreed safe way of operating a process in ED triage is 2 rooms, with 2 nurses with the appropriate professional skills and knowledge, and the staff that work there know from measuring themselves and their own experience that they can safely triage 99% of cases in 15 mins with a particular operating pattern and protocol, with a set up of functioning and available equipment etc. Then, this is the current standard, the current pattern of work – set by the people that do the work. (Note how this is different from a target set by others). The current standard work is the best current known safe way of doing the work in the current circumstances, developed and agreed by the people that do the work. There can different standard work on a Saturday pm or during winter, if there are time or seasonal changes in demand patterns, no need for a one size fits all stand if demand patterns warrant several.

The principles of standard work, 6S and visual management are that when standards are out of place or missed, it is easy to see ‘at a glance’, to signal that variation in practice.. i.e. they act a trigger to indicate there is something a little out of the ordinary going on here, it may be nothing but just check what and why (like the dot outside the line on a control chart etc). This might be an element of standard work took longer than standard, it might be that some equipment is missing, or that a team member is absent, etc. As with an SPC chart, is that deviation a normal cause or a special cause? This then helps to understand 1) if it is still safe to proceed 2) what needs to be done immediately about it (not in a week when someone in informatics finally draws the control chart in retrospect) 3) helps us to identify if a change is an improvement, i.e. we intended the trigger and 4) helps to collect real-time data for more substantial analysis. An SPC chart can also be used in real time as part of visual management.

More proactively, when the standard best known safe way of working is known, then the teams that do the work, can do small experiments everyday, to see what happens if we destabilise the process in different ways, e.g. what happens if we try to: see more patients, use less people, have less equipment, try a different technique, introduce this new check, have more people, use different rooms etc? These enquiries are a source of learning about vulnerabilities that can be used to both improve and build in resilience, and in changing and adapting the standard way of doing things as learning is generated by the team.

All of these ideas will need some PDSA, to test out different streaming, case mixes, pathways, processes, standard work etc, and reviews to see what is learnt. This is very difficult in unstable processes, and reducing variation needs a lot of commitment, persistence and effort. Overtime, more sophisticated measures can be used such as process capability (CpK) to measure the ability of a process to meet its requirements, robustly and reliability as a ratio of actual variation against specified (or required) variation. (There are also lots of other things that can help, but as I said this is just an initial outline. There are other things that are important with variation such as process stability as hinted at, but I can’t fit it all in one blog).

The reduction of Muri is a very important aspect of waste removal and quality improvement, yet it can remain unaddressed. For systems able to lower the impact of the waste of variation on their processes, the higher their flexibility and resilience to respond to changes in customer demand and/or the environment, and the safer their processes and services.

^ Inventory size equations that can used to work out optimum inventory sizes. If you want a bigger just-in-case safety factor e.g. for resilience and redundancy, then you probably will need a bigger buffer, and be willing to pay for that larger capacity and you calculate the inventory size accordingly. As an undergraduate I was taught an engineers rule of thumb of a safety factor of 2, but I’m sure this has moved on and more sophisticated now, and the size in the end, is a leadership choice.



  1. Hi, very interesting blog!
    I’m familiar with Deming’s work and Factory Physics and I really appreciate the discussion of reducing variability. I agree that some degree of variability is appropriate. In some cases it is also necessary to increase this level (for example when you need to elevate your product / service mix).
    Sandro Rizzoli
    Lean&Quality Manager from Bologna (Italy)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s