Things I learned preventing nuclear meltdowns

Let’s discuss mission critical design, like when I had to design a way to prevent nuclear power catastrophes.

Everything we design affects people, but a nuclear disaster can kill millions of people. To paraphrase Mark Twain, I learned things about human factors by designing to prevent nuclear meltdowns that you can learn in no other way. The responsibility of preventing disasters dialed the pressure up to 11. I enjoy hard problems, so my design skills quickly dialed up as well.

Design for how people really act

For most of us, the phrase “nuclear reactor safety” conjures images of dangerous radiation and the threat of Chernobyl-like meltdowns. Human error is the #1 source of traffic accidents, nuclear meltdowns and airplane crashes. It’s not that humans suck. It’s simply that most things are designed for how people are supposed to act instead of how they actually act.

Why there are so many driving deaths

Let’s discuss a problem that seems smaller; car accidents. A lot of people (1.2 million annually) are killed every year in car accidents. Driving a car isn’t particularly hard, but millions die because cars are designed for how people are supposed to act, not what they really do.

Millions die because cars are designed for idealized human drivers, not the real thing.

Cars are designed with the assumptions that: a) people are always paying full attention to driving, b) never fiddle with their phones, c) never drive too fast, and d) never drive impaired. Most humans try to pay attention, but occasionally fail. People have distracting thoughts, are short on time, stressed, and have long, boring commutes. Add in a little alcohol, drugs or sleep medication and serious accidents are inevitable. Laws don’t change who people are and the fact that driving is boring and monotonous. Millions die because of faulty assumptions; the gap between ideal and actual human behavior. Perhaps you can see why self-driving cars will make sense someday? To be safe, cars require a driver (robot or human) who is never distracted and single-tasks exclusively on good driving 100% of the time.

Designing for idealized human behavior never works, whether it’s for e-commerce, dating or nuclear safety. For example, Tinder works because people really do judge based on appearances, not on numerical compatibility. People attempt to adapt themselves to systems of rule and law, but if you really want things to work, systems need to adapt to people.

Nuclear meltdown is a spooky design problem

Designing for nuclear reactors seemed spooky. Life and death takes design to another level. As part of my work at the government ADAPT group (Advanced Decision Aids and Productivity Tools), it was my job to invent, envision and prototype designs to help operators prevent a meltdown. I learned as much as I could before getting creative.

Nuclear power is usually super safe & efficient, but everyone could die in an emergency.

Most reactors are well engineered, and nuclear power is the natural source that keeps the sun shining. Reactors rarely have operational problems, but those rare failures can kill a small city if they aren’t handled right.

We’d like to think that the tech is so advanced and so secure that our fate is not in the hands of a Homer Simpson running a massive control room. It isn’t that simple. I learned that nuclear operators were smart and well-trained. Almost 100% of the time, operators monitored normal operation and made small adjustments now and then. Emergencies were so rare that nobody ever got much experience at handling them. During an emergency, hesitating could be catastrophic, but the choice of actions was so complex that it required an emergency operations manual (EOP) over 300 pages long. It was insane to expect, under the pressure of an impending meltdown, to read a long book before acting, but doing the wrong thing was even worse. These guys knew NORMAL operations inside out, but you can’t drill people constantly on 300 pages of complex stuff that never happens.

Normal operations were very complex. Operators were expected to read 300 pages AND act quickly in emergencies.

Designing for things that never happen

Generally, I love problems; the harder, the better. This one was tough. I had to find a way to help stressed out humans follow complex procedures quick enough to prevent disaster. Think about paramedics. Paramedics deal with a dozen emergencies every day, so medical emergencies are just standard procedure for them. We are all good at things we do all the time, but being perfect at things we never encounter isn’t possible. To face reality, I needed to design for smart guys tempted to panic and screw up.

Visualizing the data isn’t enough

Don’t tell me how awesome your design looks. It either solves the problem or is wrong.

In PHASE 1, we assembled an on-screen simulation of the reactor that was updated with live feeds from sensors. Back then, the control panel was a poorly designed ~~shit show~~ of meters, dials, lights and buttons. The idea was that a color coded visual representation of the parts of the reactor would help operators understand what was going on. They could even try adjustments on the simulated reactor before doing it on the real thing. At the time, this was a bleeding edge use of graphical user interfaces. To do this, I learned visual programming for the Apple Macintosh with colleagues from NASA, JPL and MIT. PHASE 1 was very impressive, but didn’t solve anything. Like the space program, every component in nuclear reactors has to be MIL-SPEC, which means 20-year old tech, so that every failure mode is known, no surprises. Ergo, our new displays couldn’t replace control panels. It just added two MORE screens to the visual noise for operators.

Reframing the problem to help operators in a crisis

Studying the people and their context brought the problem into focus. Nobody is going to read a 300 page book in an emergency. In a crisis, people act predictably. We rush to act, freeze, panic, miss details and skip steps. Operators are humans, so I had to account for these things. My redefined task was to design “something” that helped reactor operators get a grip, keep their cool and do the right thing in a crisis.

Designing backwards

Flowcharting the end result was very helpful in PHASE 2. For every possible nuclear emergency, the correct procedures to follow existed in disconnected parts of the emergency manual. The problem was how to help operators calmly and correctly proceed to diagnose and resolve the emergency. In studying the manual, the reactor designs, a bit of nuclear physics, and talking to domain experts, I iteratively moved towards design solutions. For any emergency, there was a path through the manual that contained every diagnostic step. I reverse engineered this into a diagnostic expert system design.

Streamlining the flow

Part of the problem was that the manual had too much text. Every page was like a text book on a relevant part of nuclear physics, the mechanism, and how to diagnose what to do. All those definitions and notes were simply noise when read by operators who knew the fundamentals by heart. When I told the operators that I couldn’t fit it all on the screens, they told me how much they hated the manual. Most of the text on every page was chaff they already knew. I didn’t have room for it, so I created hyperlinks to a glossary to save room.

I iteratively stripped each expert screen to a clean visual expression of a) the step they were on b) the things they needed to check c) pictures of the gauges and d) a series of choices based on the data readings. Since we already had a separate app built to collect sensor data from throughout the reactor, I piped the sensor data into each diagnostic step. With that information, each screen showed them the values to check right there, but simply asked them to confirm the values by checking the real gauges.

This idea came out a team discussion where the operators worried that the process would take too long in some cases. Incorporating this into the design allowed them to fix problems very quickly if they double checked sensor data with the physical gauges, but to go twice as fast if they just trusted the sensor data.

Less is more

Continual iterative testing revealed opportunities to optimize speed and usability. When the operator’s computers were upgraded with larger monitors, I was tempted to put more advice and notes on each page. Testing showed that this extra information simply slowed operators instead of helping them make the best and quickest choices. The pages were presenting choice overload about where to look, which wasn’t helpful. The copy and microcopy was subject to intensive re-edits for conciseness, clarity and consistency. In each test round, we were able to improve results until the final result.

A short time later, the system was approved for deployment on real reactors on the East Coast, and submitted as world-class innovation in safety to the IAEA (International Atomic Energy Agency).

What did I learn?

I spent a lot of time making wrong-headed moves before pulling out a win. My design ended up being a simple interface into a deep expert system network of knowledge, backed by live data to guide operators to quick and calm resolution of any emergency. In most cases, the correct procedure could be completed in 5-7 steps without rushing. It was nice when the chain of management above me liked my work. I was more impressed when the International Atomic Energy Agency saw a presentation and endorsed it for worldwide reactor use.

1. If we don’t understand users and context, we don’t know how to design.

In this case, everything designed and built before modelling the persona of the operator and the context of the problem was pointless. This may seem obvious in retrospect, but it wasn’t at the time. Even now, I sometimes have to argue for user research. I wasted time attempting to design before studying the most important tasks for the most important users.

2. To design something useful, first redesign the problem itself to what’s essential.

Hack away at the unessential. My biggest takeaway was how simple and obvious the solution became when I reduced the problem to “what needs to be on this screen to help someone decide what to do next“. Most of the text in the emergency manual was a distracting repetition of facts or non-actionable policy platitudes. Operators didn’t need to be told what they already knew. They just needed to know the next corrective action.

3. Flow is the most important aspect of design.

The value of this particular system depended on good flow, an essential component of a design’s x-Factor. No printed manual could achieve what our system did. The hyper-linked screens allowed operators to FLOW through the steps of diagnosing and solving problems in a natural, quick and friction-less way. It was also a concrete lesson in the power of SGML/HTML and the importance of flow design.

Everything is Apollo 13 today.

Working on extreme problems at a secure government facility involved with space shuttles and particle physics was pretty cool. It’s funny when clients tell me “It’s not like we’re putting people on the moon“, as justification for scrimping on research, analysis and usability testing, as if it won’t matter. The only time it doesn’t matter is when you don’t care about the result. The question isn’t whether you can afford to follow good design process, but whether you can afford NOT to, whether it’s lives or money on the line.

Every time a customer struggles to use your site or app, your business at risk. Unless users are forced to use your stuff, you’re a few bad experiences away from being chucked. In this day and age, everything is Apollo 13.

If you like this article, share it.
If you’re interested in my work, check out my Portfolio & Case Studies.

Contact me if you need a great UX Design Director/Manager

William Stewart

SENIOR UX DIRECTOR

Things I learned preventing nuclear meltdowns

Let’s discuss mission critical design, like when I had to design a way to prevent nuclear power catastrophes.

Design for how people really act

Why there are so many driving deaths

Nuclear meltdown is a spooky design problem

Normal operations were very complex. Operators were expected to read 300 pages AND act quickly in emergencies.

Designing for things that never happen

Visualizing the data isn’t enough

Reframing the problem to help operators in a crisis

Designing backwards

Streamlining the flow

Less is more

What did I learn?

1. If we don’t understand users and context, we don’t know how to design.

2. To design something useful, first redesign the problem itself to what’s essential.

3. Flow is the most important aspect of design.

Everything is Apollo 13 today.

About William Stewart

SENIOR UX DIRECTOR

Let’s discuss mission critical design, like when I had to design a way to prevent nuclear power catastrophes.

Design for how people really act

Why there are so many driving deaths

Nuclear meltdown is a spooky design problem

Normal operations were very complex. Operators were expected to read 300 pages AND act quickly in emergencies.

Designing for things that never happen

Visualizing the data isn’t enough

Reframing the problem to help operators in a crisis

Designing backwards

Streamlining the flow

Less is more

What did I learn?

1. If we don’t understand users and context, we don’t know how to design.

2. To design something useful, first redesign the problem itself to what’s essential.

3. Flow is the most important aspect of design.

Everything is Apollo 13 today.

Share this:

About William Stewart