Root Cause of an Electrical Problem, Did You Find the Systematic Problem to Solve?
Root cause failure analysis is a common term used within reliability and maintenance. Working with reliability and maintenance, we see organizations do root cause analyses but very few leads to corrective action and improvements. So why don’t we solve problems?
A good root cause elimination program needs a process, practical training in critical thinking, and coordination and follow-up of actions. IDCON has developed a process called Root Cause Problem Elimination (RCPE) where we emphasize solving the problem, documenting and executing tangible actions and visible results.
I’ll give you a real-world example of a facilitated Root Cause Problem Elimination event.
Wednesday at 08:42 AM Paper Machine #8 shutdown due to power loss in the press section. The paper machine was down for more than 9 hours before the circuit breaker was replaced and started up.
The suggested approach is to use the same tools as in a murder investigation, because the circuit breaker was murdered! We started by collecting the data for the investigation. The data included:
Understanding how this equipment works
- What happened, where, when, and similar objects
- Changes in time – before, during and after the paper machine shut down?
- Physical evidence - gathered all the parts of the breaker, took pictures, and did a forensic analysis
- Conducted interviews with personnel involved
As mentioned earlier we train people to eliminate problems using critical thinking tools and a structured method. Much like RCA, the process starts with the trigger. In this case, the trigger was defined as “machine downtime event >30-minutes then initiate formal root cause analysis”
The next step is to clearly state the problem. The problem statement should follow the rule “one object and one problem” in this case “circuit breaker to press section shorted”
The problem was now identified. The next step was to determine “How Can” the circuit breaker short. To find alternatives that may have caused the problem, we use a tool called the How-Can Diagram (Figure 3).
Start from the right, spell out the trigger, problem statement and then you ask the questions “how-can the circuit breaker short? Answers “phase to phase” or “phase to ground”
Next question How-Can this short happen? “Dust on the terminal connects (stabs)” or “Corrosion on the terminal connections (stabs)” and keep going until you have exhausted the alternatives. Most important, provide all alternatives that make sense but don’t jump to conclusion.
The best way to start the creative process is not to use a spreadsheet or software but post-it notes on the wall. The post-it notes engages the problem-solving team and keeps them actively involved – trust us on this one!
Next step is to check the facts and find the most likely cause, in this way you can eliminate or confirm parts of the How-Can diagram. State each fact for the each of the boxes in the How-Can Diagram and ask the question “This is a fact because?” Example the dust and dirt on the terminal connects (stabs) is a fact because we could see it and confirmed that it was present on other circuit breakers too. We could not confirm that there was any mechanical damaged or alignment problem of the terminal connects (stabs) of the circuit breaker. The Electrician and Electric Engineer confirmed that the arcing occurred between the breaker phase terminal connects (stabs) and not to ground. The investigation confirmed that the Technical Cause for the breaker to short is the missing dust and chemical filters.
There were other causes we needed to figure out – How Can a filter not be in place? What we found was that the filters weren’t in stock because they weren’t approved for purchase! The cost for the filters were $50K each. It was deemed too expensive to the maintenance budget. The Human Cause is that the maintenance team should have known the impact of not replacing the filters to save $100k. The decision to save $100K on the filters, ended up costing them a lot more to repair the “murdered” circuit breaker.
The Systematic Cause is the lack of training for how the filters impact the long-term reliability of the MCC room electrical equipment. The other systematic issue, changes to the equipment should be reviewed and approved by the qualified technical expert through the Management of Change process before being implemented.
Eliminating the problem
The RCPE team presented the investigation to the mill leadership and put a plan in place that would eliminate the systematic cause, the human cause and the technical cause. A training plan was put in place so that everyone understood how important the filters were to the reliability of the plant and a MOC process was put in place ensuring that no changes were made without proper approval.