The HPI Equation

(Editor’s intro) From time to time we will have guest bloggers. For the first I am pleased to introduce Bill Roege, Director, Corporate Safety Analysis, DOE.  Bill is a retired U.S. Air Force officer and fighter pilot.  He is also a mathematician and an Ops Researcher.  He is a graduate of the United States Air Force Academy and the Massachusetts Institute of Technology.  Before joining DOE Bill was involved with leading edge performance analysis for the Pentagon.  After joining DOE he became an advocate of HPI and High Reliability since he was familiar with much of the science behind these concepts and personally knowledgeable of the high reliability practices necessary to lead in the modern Air Force. Thanks to Bill for this blog!

The HPI “Performance Improvement Formula” and Mathematics

By William H. Roege

An important concept in Human Performance Improvement (HPI) literature is captured in the performance improvement formula:  Re + Mc → ØE.  In plain English, reducing errors (Re) coupled with managing controls (Mc) leads to zero significant events (ØE) (DOE Human Performance Improvement Handbook, Vol. 1, pg. 1-16).  This simple, common sense formula is very appealing and useful to most people.  However, to the mathematically inclined using even simple mathematical symbols implies a precision that does not exist and is therefore confusing.  Instead of facilitating learning as intended, the formula becomes a barrier.

I believe the formula has two characteristics that give the mathematician trouble.  First, adding two terms implies they independently contribute to the result.  In this case, if one succeeded in driving errors to zero, there would still be many significant events if Mc were some positive number.  Instead, these terms intuitively interact.  That is, errors combine with the inadequate controls to produce events.  In mathematics interactive terms are multiplied (or divided).  They give each other leverage or are catalysts.

Second, I find the term “managing controls” is counterintuitive.  My intuition says that more or better managing is good, but the current formula implies that would lead to more events not fewer!  How should one think about Mc if it is small?  I propose a slightly altered meaning for the Mc term to:  “maximizing control effectiveness” thereby making large Mc very good.

Combining the two changes, I propose an alternative performance improvement formula that may appeal to the mathematician and non-mathematician alike:  Lim Re/Mc → ØE. That is, one approaches zero events by reducing errors and maximizing control effectiveness.  This recognizes that errors will never be zero and that control effectiveness can never be infinite, but by making one small and the other very large one can get very close to zero (many “9’s” in quality-speak).

Finally, I think it is important to communicate that Re and Mc are actually complex, non-linear functions with multiple variables.  Indeed, humans are in the loop and there are no known closed form formulas that describe human behavior.  The best one can do is use heuristic tools designed to help identify and correct undesirable situations.

Extending the Concept to High Reliability Organizational (HRO) Theory

Rick Hartley at Babcock & Wilcox Pantex is working to extend the performance improvement formula using High Reliability Organization (HRO) concepts.  Pantex uses a notion they call a “work perception gap”—basically the difference between work as imagined as opposed to how work is actually done.1 When the two are not aligned the resulting gap can further complicate error reduction and control management efforts and lead to more significant events.  This is a simple, but very powerful concept.

For this discussion, define Wi = work as imagined and Wd = work as done (each > 0); assume Wd can never be better than Wi (Wd ≤ Wi), so that the ratio Wd/Wi is always less than or equal to 1.   Ideally, Wd/Wi would equal 1, but in the real world the ratio is going to be less than 1.

In order to be a good scaling factor in the performance improvement formula the new parameter must approach zero as it improves so it contributes to achieving zero events.  A suitable parameter is then (1 – Wd/Wi).  This parameter approaches zero as the work as done approaches work as imagined.  Some may recognize an equivalent form is (Wi – Wd)/Wi or the difference between work as imagined and work as done scaled by the larger of the two, Wi.

I propose this new work gap scaling parameter and use the form “delta W” for the work perception gap:

ΔW = (1 – Wd/Wi) = (Wi – Wd)/Wi.

The new HRO form of the performance improvement formula then becomes:

Lim (Re/Mc) ΔW → ØE.

Minimizing errors and the work perception gap while maximizing control effectiveness will significantly lower the probability of significant events.

(Editor’s concluding comments) Jens Rasmussen, James Reason, Erik Hollnagel and others discuss “error” in cognitive terms relating to human information processing.  It is in that sense that we use the term Re.  Bill mentions the value of heuristics, and indeed there are a large number of tools that have been developed to support human performance.  (see for example the DOE Human Performance Handbook Volume II)

1 (ed. note:  the term Work as Imagined versus Work as done was introduced into the lexicon by Sidney Dekker in the first Resilience Engineering book.)

8 Responses to The HPI Equation

  1. Dear Mr. Roege,

    Fake Equations

    I appreciate your comments on fake equations. As a technically trained person I have great respect for equations and don’t support fake ones.

    Work as Imagined vs. Work as done

    Dr. Hartley’s emphasis on work as imagined vs. work as done is a great one and deserves further exploration. My following response is a start. Consider is as “pump priming” or as an invitation to “draw fire.”

    The proposition that the work as done can never be better than the work as imagined is patently false. I would be surprised if anyone did not have an example of improving the work as done as compared to the work as imagined. I frequently get into a task and find that my imagination had been improvable, expandable, thin, poor, misinformed…

    Also in many cases that I have investigate one of the harmful factors is that the task team did not take the time to imagine what the work should be like. This is sometimes called “The Nike Approach” (Just do it!).

    Among the deep harmful factors of most adverse events are:
    1. The failure to think ahead
    2. The failure to plan for known hazards
    3. The failure to notice and attend to anomalies
    4. The failure to have learned from experience

    The work as imagined needs to include countermeasures for those four harmful factors. The DOE’s own Integrated Safety Management System (ISMS) is a good start.

    The above is not a criticism of Dr. Hartley’s basic idea of comparing the work as imagined with the work as done. In our book, “Causal Factor Analysis: An Approach for Organizational Learning”, Dr. Hartley and I promote many tools to facilitate that comparison.

    They include:
    1. The Comparative TimeLine,
    2. The Barrier Analysis matrix,
    3. The Missed Opportunity Matrix,
    4. Change/Difference Analysis, and
    5. Re-enactment.

    All the best.

    Take care,

    Bill Corcoran
    Mission: Saving lives, pain, assets, and careers through thoughtful inquiry.
    Motto: If you want safety, peace, or justice, then work for competency, integrity, and transparency.

    W. R. Corcoran, Ph.D., P.E.
    Nuclear Safety Review Concepts Corporation

  2. Todd Conklin says:

    Wow! I like this idea of a blogged conversation in this area. I also am honored to put a couple cents worth of my thought with such a great collection of thinkers in this area. Thanks for that guys.

    From the LANL audience perspective, I could not agree more with the comments on the use of the quasi-mathematical equations… our audience does not think the equations are terribly meaningful. In fact, I take quite a bit of heat in this area.

    And both Bill and Bill make some excellent points – but I almost wonder if we are a bit trapped in the old thinking. One of the best parts of Rick’s (Dr. Hartley –or is that Livingstone- I presume) journey in to the HRO world is that the Pantex journey is allowing all of us to not give preferential treatment to some of the, dare I say it…. older ways to see the world.

    Let me add that moving through the HPI world for me at least has been a progression. I think it is important to start the thought process at the beginning. James Reason was a great resource to start this mind-shift and to begin the journey — INPO gave us much and we have much to be thankful for that came from their shop – Dekker challenged us in many ways to be very aware of the bias’ that our organization’s drive in to our ability to understand failure – and Hollnagel has (in my humble opinion) given us the keys to the kingdom (apologies to the many I left out and there are many).

    I have recently been wondering if, as Bill R. is telling us, we have tried to apply a linear failure model to a complex adaptive system and are worried that the application of the linear failure model does not work because we are not doing it correctly – when if fact the linear failure model does not work because we don’t have linear failure.

    A simple (or kind of simple) linear model (i.e.: Re + Mc → ØE) probably does not reflect the type of failure we see in our facilities. I should own that statement better. A simple linear model does not reflect the type of failure we see here in this facility.

    In fact what we think we are seeing is unusual combinations of normal performance variability.

    If we assume that work can never be done better that how we imagine work should be done… we are implying that all deviation from procedures are evil. I just have not found that to be true. The vast majority actually represents process improvements. Think about it, whenever discrepancies are identified in our annual review of procedures – it is almost always the procedure that is revised to match the work. That action alone is a bit of pretty interesting information.

    Human’s Err. Human’s Optimize. Human’s Deviate. Humans create efficiencies. Human’s respond to pressures. Humans want to do the job. Near as I can tell that is ALWAYS true.

    I would caution our community to not look so desperately for a symbolic or linear solution to our problems. I don’t think we can create a pictogram that could even begin to explain what we need.

    There is a difference between work as imagined and work as done. It is an important difference. A difference that we all care about. But, we must look deeper for the context around that difference in order to better understand how these differnces may exist.

    I can’t wait to talk more about the FOUR HARMFUL FACTORS. I think that idea may be the best method ever for creating a management wish list…but tells us nothing about how the event transpired. Of course the harmful factors exist – that is why we had the event…but it misses the bigger question: Using only work as planned, could the work have even been done?

    Here is the question I struggle with the most:

    Do we write procedures and plan work to:
    A. Achieve mission success
    B. Avoid compliance failure

    Both are really important. But these themes by pure definition will create different forms of work as imagined.

    As always, I really learn a lot from you guys and love the chance to add some counter points to this discussion. You know, I could always be wrong on all of this —

    Cheers My Brothers and Sisters on this Journey,

    Todd Conklin, PhD
    Senior Advisor
    Los Alamos National Laboratory

  3. wroege says:

    Great conversation and fair comments. I like both comments a lot.

    I’ll start by saying the “delta W” construct was to try and depict what I interpreted the Pantex concept was. I did contemplate whether it is necessarily so that the imagined work is better than work actually done. While I very much agree that the actual work planners have significant shortfalls when preparing work packages, I was thinking more at the senior management level where I believe their imagined work is generally perfection (another issue to discuss?). Then the work as done would include faults in the work planning process.

    I like Todd’s contemplation about not thinking about deviations as good or evil (reinforced by some of Bill’s comments). In a learning organization we know that workers often have better ways of doing business. As an example, as a pilot we were encouraged to propose changes in procedures to make them better. I found the good deviations were generally from purposeful experiments rather than chance errors though. This concept would certainly argue for a different construct in the proposed “equation”.

    I’d also like to add to Todd’s thoughts on linear thinking. I think the realization that complex systems are rarely linear is very important as it changes expectations on observed performance. In general, life and work are non-linear and have many interactive factors and variables. This can cause widely variable results for seemingly similar inputs. I believe all dynamic, complex systems exhibit this chaotic behavior to some extent. It makes precise analysis very difficult.

    Again great comments and hope we can continue to explore the topics further.

  4. Roger Kruse, LANL says:

    Clearly a fascinating discussion of the fundamental tenets on how to model and understand accidents. I have become a firm advocate of Hollnagel and his thoughts on accident modeling and causality, which I will try to summarize. Accident models have evolved over time and can be characterized by the three models below:

    Simple, linear cause – effect model (e.g., Domino)

    Accidents are seen the natural culmination of a series of events or circumstances, which occur in a specific and recognizable order. In this model, accidents are caused by unsafe acts or conditions and accidents are prevented by finding and eliminating possible causes.

    Complex, linear cause – effect model (e.g., Swiss cheese)

    Accidents are seen as the result of a series of active failures (unsafe acts) and latent conditions (hazards). These are often referred to as epidemiological models, using a medical metaphor that likens the latent conditions to pathogens in the human body that lay dormant until triggered by the unsafe act. In this model, accidents are prevented by strengthening barriers and defenses.

    Complex, non-linear accident model (e.g., Functional Resonance)

    Both accidents and success are seen to emerge from unexpected combinations of normal variability. In this model, accidents are triggered by unexpected combinations of normal actions, rather than action failures, which combine, or resonate, with other normal variability in the process to produce the necessary and jointly sufficient conditions for failure to succeed.

    The JengaTM game is an excellent metaphor for describing the complex, non-linear accident model. The missing blocks represent the sources of variability in the process and are typically described as organizational weaknesses or latent conditions. Eventually, the worker makes an error or takes an action that seems appropriate, but when combined with the other variability, brings the stack crashing down. The first response is to blame the worker because his action demonstrably led to failure, but it must be recognized that without the other missing blocks, there would have been no consequence.

    The use of the complex, non-linear models requires a shift in how we view causality. Because it views accidents as resulting from unexpected combination of normal variability, we cannot find the cause of failures in the normal actions since they, by definition, are not wrong. Rather than a search for cause, we need to seek an understanding of how normal variability combined to create the accident. From this, latent conditions or organizational weaknesses can be identified and strengthened.

    Although generally accepted as the overarching purpose of the investigation, the identification of causes is problematic. Causal analysis gives the appearance of rigor and the strenuous application of time-tested methodologies, but the problem is that causality (i.e., a cause-effect relationship) is constructed where it does not really exist. To understand how this happens, we need to take a hard look at how we investigate accidents, how cause – effect relationships are determined, and the requirements for a true cause – effect relationship.

    Using a maze metaphor, accident investigations look backwards and as a result, oversimplify the search for causality. What was uncertain for the people working forward through the maze becomes clear for the investigator looking backwards. Investigators look backwards with the undesired outcome (effect) preceded by actions, which is opposite of how the people experienced it (actions followed by effects). Because they are looking for cause – effect relationships and there many actions taking place along the timeline, there are usually one or more actions or conditions before the effect (accident) that seem to be plausible candidates for the cause(s).

    There are some common and mostly unavoidable problems when looking backwards to find causality. As humans, we have a strong tendency to draw conclusions that are not logically valid and which are based on educated guesses, intuitive judgment, “common sense”, or other heuristics, instead of valid rules of logic. The use of event timelines, while beneficial in understanding the event, creates sequential relationships that seem to infer causal relationships.

    A quick Primer on cause and effect may help to clarify:

    Cause and effect relationships are normally inferred from observation, but are generally not something that can be observed directly.

    Normally, we repeatedly observe Action A followed by Effect B and conclude that B was caused by A. It is the consistent and unwavering repeatability of the cause followed by the effect that actually establishes a true cause – effect relationship.

    Accident investigations, however, involve the notion of backward causality, i.e., reasoning backward from Effect to Action.

    We observe Effect B, assume that it was caused by something and then try to find out which preceding Action was the cause of it. We lack repeatability and only assume a causal relationship because it seems plausible.

    Plausible, however, is not certainty. A true cause and effect relationship must meet three requirements:
    1. The cause must precede the effect (in time)
    2. The cause and effect must be contiguous (close) in time and space
    3. The cause and effect must have a necessary and constant connection between them, such that the same cause always has the same effect

    The third requirement is the one that invalidates most of the proposed causes identified in accident investigations. As an example, a cause statement such as “the accident was due to inadequate supervision” cannot be valid because the inadequate supervision does not cause accidents all the time. This type of cause statement is generally based on the simple “fact” that the supervisor failed to prevent (counterfactual) the accident. There are generally some examples, such as not spending enough time observing workers, to support the conclusion, but these examples are cherry-picked to support the conclusion and are typically value judgments made after the fact.

    With the exception of physical causes, such as a shorted electrical wire as the ignition source for a fire, causes are not found; they are constructed in the mind of the investigator. Since accidents do happen, there are obviously many factors that contribute to the undesired outcome and these factors need to be eliminated or controlled. Although true, repeatable cause and effect relationships are almost impossible to find, many factors that seemed to have contributed to the outcome can be identified. Because it is really opinion, sufficient information needs to be provided so that others can review the information and draw the same conclusion. In other words, understand and explain.

    Wow! That was long-winded, but I hope you found it thought provoking. You can blame Todd since he pointed me to the discussion 😉

    In closing, have become a firm believer that there needs to be a Hippocratic Oath (Above all, do no harm) for accident investigation. This is the foundation of the accident investigation class we teach at LANL.

  5. Bill Rigot says:

    This has been a most interesting conversation, for which I feel unworthy to even comment.

    To get back to Bill Roege’s premise (BTW there are too many Bill’s on this blog), as I understand it, reducing errors and maanging defenses is another way of managing risk. Risk is generally defined in the following equation:

    R = C X F

    Risk equals Consequence times Frequency. Consequence equates to managing defenses, and Frequency equates to reducing errors in this context. Ergo, improving defenses and reducing errors will asymptotically approach zero. In summary, I agree with Bill R’s premise.

    In terms of Delta Work, my hueristic is that any variability from work as imagined is evil (and should be killed). This comes from my Six Sigma background. In my discussions with my INPO colleagues, their premise is that what happens when workers physically “touch” the plant should be exactly as planned; nothing more and nothing less. As with anything, this is a goal. While I recognize that you can always get more work than you imagine (with the right rewards system), in this context it’s still evil, and you should learn from the delta. While I appear to be at variance with Todd, I’m really not. Workers are going to figure out better ways to get things done. Leaders in the organization need to learn from those experiences to narrow the delta W in the future.

    As this blog discussion has been raging, I’ve been working with a large team in our HPI self assessment looking at what we imagined we were doing with HPI and measuring against what we actually are doing in the field in 6 nuclear facilities. The gaps have been intellectually stimulating. The team will be joined by Earl and George Mortensen from INPO next week as we wrap it up. When I read through the comments (thanks Todd for alerting me of the Blog posts), it was interesting to compare this academic discussion to some of the field results we’re seeing. To paraphrase Dekker, we’re seeing a lot of non-Newtonian complexity. We hope to be able to describe our learnings at the June EFCOG ISM meeting in Washington.

    Finally, to take a page from Bill Corcoran’s book, “All tools are flawed, some are useful”. Every heuristic discussed in this blog is flawed in some way. But if you understand the context of where they fit in our non-Newtonian complex structures, they can be used successfully to drive change in the direction the leaders desire.

    • wecarnes says:

      To learn more about the Savannah River effort that Bill Rigot mentioned, see his guest blog to be posted next week.

      Thanks for joining the discussion Bill!

  6. Thanks to all who posted to this thread.

    There was a request above for more on “The Four Harmful Factors.” The request followed my posted observation:
    “Among the deep harmful factors of most adverse events are:
    1. The failure to think ahead
    2. The failure to plan for known hazards
    3. The failure to notice and attend to anomalies
    4. The failure to have learned from experience

    The work as imagined needs to include countermeasures for those four harmful factors. The DOE’s own Integrated Safety Management System (ISMS) is a good start.”

    I recently released an article on Hazard Recognition that gives more detail.

    Different (not new) topic.

    In a posting above there was a tutorial on causation. There’s more to it in our business. We are usually concerned with how harm comes about so that we can avert the harm in the future. The following two sets of necessary factors always apply.

    First set of Necessary Factors (for Any Harm):
    1. There must have been a hazard.
    2. There must have been a victim or item subject to harm (called a “target.”)
    3. There must have been a channel for the harm to be delivered to the target, and
    4. The hazard must have been released to the channel.(Hence to the target.)

    Second Set of Necessary Factors (for Specific Harm):
    1. There must have been set-up factors.
    2. There must have been a triggering factor that activated the set-up.
    3. There must have been exacerbating factors that made the harm as bad as it was. And
    4. There must have been mitigating factors that kept the harm from being worse than it was.

    The above applies when the target is an end victim, e.g., the Gulf Coast Shoreline AND when the target is a barrier, e.g., the blow-out preventer.

    There is more detail in the article mentioned above.

    Take care,

    Bill Corcoran
    Mission: Saving lives, pain, assets, and careers through thoughtful inquiry.
    Motto: If you want safety, peace, or justice, then work for competency, integrity, and transparency.

    W. R. Corcoran, Ph.D., P.E.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: