RCM is NOT a Repeatable Process
Having practiced Reliability Engineering for 17 years, I am still continually excited and encouraged by the new lessons I learn. I would like to share with you a recent lesson learned while working at a Fortune 100 company which is re-engineering its asset management plans to ramp up production by more than 50 percent over the n ext calendar year; a wonderful challenge for this Reliability Rhino. The type of challenge I can put my head down and charge in to.
Like all good Moubray trained RCM practitioners, we began our analysis of the current asset management plans by looking at the “Bad Actor” offenders of production downtime, using Pareto Analysis to quantify and identify the “critical to production” systems of assets that posed the greatest threat to our 1.5-day ramp-to-rate Takt time target. Then, as you might imagine, we validated the installed asset hierarchies within each system, finding that many of the newer or reconfigured assets were not up to date in our Maximo enterprise asset management system. But this is not the revelation I want to share with you today.
What came next was physically, violently difficult for this concrete sequential Reliability Engineer to comprehend. You see, I like a repeatable process of 1-2-3 and A-B-C, and once I find the best possible sequence to follow it is inherently problematic for this hardened Rhino to abruptly change direction. What came next was not a formal Asset Criticality Analysis – qualifying the consequence risk to each unique business driver, like Safety Performance, Maintainability, Life Cycle Cost, and of course Production Impact – but instead we immediately began to analyze the failure modes and effects associated with each Bad Actor system with no regard to the importance of each system, taking a narrow view of just the impact to Takt time. Crazy! Right?
Everything about me thrashed about, struggling with the reality that both I, and the hardworking Engineers, Managers and Maintenance Technicians assigned to this project may all be wasting our time endlessly chasing failure modes that have no immediate consequence to the business. The sheer thought of designing an Equipment Maintenance Plan, MRO Spare Parts Strategy or even a Condition Monitoring Program without first ranking the importance of systems, machine assemblies and maintainable components was overwhelming. “Over tinkering” and over stocking is not the desired outcome of a reliability centred maintenance program. But, alas we pressed on. This is what I want to share with you today. The value of Failure Modes and Effects Analysis as a possible prerequisite for meaningful Asset Criticality Analysis.
Over the years I have written many articles on the topic of Asset Criticality Analysis, a very popular topic I might add. I advocate for criticality as a collective framework for making risk-based asset management decisions within the Engineering, Maintenance, Operations and Procurement business systems and workflow processes. I believe that the ranking or “score” is only part of the value we gain from the analysis, and that it may be more valuable to understand what makes an asset “critical” and how we must change our asset management plans to mitigate the risk. Today I look at Asset Criticality Analysis in the same manner, but the way I use the analysis has changed be-cause of my continued practice and learning.
Coming back to our Fortune 100 company, and their desire to re-engineer asset management plans to support a shrinking Takt time target, through a cross-functional team we analyzed the specific manner of failure for each machine assembly associated with our “Bad Actor” systems. One outcome of the analysis was a set of common Failure Codes – a unique set of codes that can be used to evaluate the frequency of problems by asset class, failure class, or by specific causes – that were then implemented in Maximo to evaluate the effectiveness of engineered Equipment Maintenance Plans. A second outcome was a ranked, prioritized list of consequence risks. Using a traditional set of criteria to define the Risk Priority Number, or RPN for each functional failure and failure mode, the team was able to determine which functional failures represented the greatest risk to Takt Time, the leading business objective linked to our project, and the ranked order of failure codes that specifically contribute to this elevated level of risk. In short, if we began with 250 failure modes for a system, we prioritized our way down to 12 likely, consequential failure modes that have a direct, significant impact on Takt time. Twelve failure modes that would be used to construct a new Equipment Maintenance Plan, and 238 failure modes that would be used to evaluate maintenance effectiveness, by way of Failure Codes, and the effectiveness of standard operating procedures, MRO strategies, and other asset management plans.
Full circle. Feeling good about the FMEA outcomes, and relieved by the fact that we prioritized failure modes for re-engineering maintenance plans, I returned the team’s focus back to Asset Criticality Analysis. We still needed to evaluate and rank each manufacturing system in terms of the business, and wanted to define the actions required by Engineering, Maintenance, Operations and Procurement to mitigate business risks. Admittedly, I was trying to check my box. A box on my mental “RCM Checklist” that I had made and have used for the past 17 years. A list that delivered confident results over and over again. Excitedly, however, I learned a new lesson. The items on the list, like the FMEA and Asset Criticality Analysis, each answered a specific question, and each taught us something new about our assets that we did not know without it. But, it was just a list! It was not a sequence or process! It does not matter where the list begins or ends! RCM is not intended to be a repeatable process!
Using the prioritized, ranked list of high-risk failure modes as the basis of evaluating the impact to the business, the team and I launched off into asset criticality. System by system we used the “predominant functional failure” and their unique set of failure modes to qualify risk in terms of:
• Downtime impact to Production,
• Potential Impact to Personnel Safety,
• Potential Impact to the Environment, and
• Cost of Corrective Maintenance after one of our unique failure modes occurs.
My next statement will likely cause some traditionalists to erupt with emotion, shouting for all to hear that the team’s analysis is corrupt based on the modal distribution of rankings. But I am going to say it anyway. The Asset Criticality Analysis results were eye-opening! 64 percent of the “Bad Actor” systems analyzed, and their subsystems and major machine assemblies, fell into a “Low Risk” profile. 12 percent, oddly enough, landed squarely in the “High Risk” ranking profile – Insufficient risk controls.
What I learned in this odd sequence of reliability centred maintenance is that the FMEA enabled us to look deeper into the definition of risk, and normalize our collective assumptions using real-world scenarios that provided relevant pain for all parties involved in the analysis. Asset Criticality Analysis was no longer a subjective exercise to confirm our suspicions about what was “critical”. It had real meaning, and was able to tell the team and I if our current risk controls were protecting the Company’s investments or exposing it to unnecessary risks. The analysis also helped the team recognize where asset management plans could be optimized, saving time and money, without impacting the current level of risk mitigation. As I said, 64% of the assets analyzed fell into this ranking profile. Seasoned practitioners may recognize this pattern as the “over tinker” realm, in which production downtime and maintenance costs are elevated because of excessive volumes of time-based, fixed frequency, or even non-value adding preventive maintenance.
If you are still not convinced, but are moderately intrigued, let me tell you what we did next. I too was sceptical due to my preconceived notions of how the RCM process was meant to flow, and my cognitive bias of how a criticality analysis result was supposed to look. Next, the team, with help from equipment representatives, conducted a 9-part evaluation of the condition of both the 12 percent “High Risk” and 64 percent “Low Risk” assets. They peered over the physical health of each major component, looking for obvious signs of neglect. They scrutinized preventative maintenance procedures to conclude whether each PM was effective as written and scheduled. They dug into the availability of spare parts, including recent stockout history and replenishment cycles. They even examined maintenance training records as a possible indication of Maintenance Quality and Defect Elimination. This “Equipment Condition Assessment” was meant to prove or disprove our findings from the Asset Criticality Analysis. Was, in fact, the Company’s risk less for the 64% because they were doing a better job of managing the known, FMEA identified failure modes? The result was a resounding “YES”. Although the numbers were not identical, and maybe a second article will follow explaining why, 88 percent of the assets analyzed were deemed “Maintainable” with the current controls. Five percent on the other hand, were not maintainable and leaving the Company exposed to unnecessary, preventable risks.
My paradigm, and maybe yours as well, told me that Asset Criticality Analysis must be performed first to prioritize our assets for failure mode analysis, ensuring that time spent within the FMEA process was focused on those assets that are most important to the business. In reflection, and I have facilitated my share of criticality analysis, “critical” assets by the numbers were always, in fact, the most important assets because of how they impact the business when they fail. However, and what I mean to say is “Ya but…” in the past we were only confirming what we already knew to be true. We did not learn anything new about our asset management plans. Facilitating, or teaching others to facilitate criticality analysis using FMEA results allowed the team and the Company to learn what was working within the asset management system, where they needed to shore up controls, and how best to continually improve with-out exposing the company to additional risk.
I will leave you with this to ponder. Did the great Stanley Nowlan, Howard Heap and John Moubray intend for the RCM methodologies to be a sequential, hard fast process? Or, did these pioneers of asset management simply document, in their works, one possible means of deploying the methodologies because of the types of problems needing to be solved in their day. If our pioneers were embarking on a new era of Maintenance, focusing on crossing the threshold of proactive maintenance, then the process they followed may be the right path. But, it may not be the only path. And, each RCM methodology may be designed to provide a specific answer, in a sequence only relevant to the order of the questions being asked. (That’s deep…I know.)
I would enjoy hearing your paradigm shifting moments as a maintenance and reliability practitioner. Send your story to dwikoff@eruditio.com and we will share your lesson learned on reliabilitynow.com and blog.eruditiollc.com . Remember, if you are not learning you are dead. Stay centred. Stay inspired.
Darrin J. Wikoff biography
Darrin is a U.S. Navy veteran who lives in Dallas, TX. Over the past 17 years, he has coached and mentored business leaders around the world through improvement initiatives aimed at increasing value, reducing cost and extending asset life. He is a passionate and engaging teacher who brings real-world experience and practicality to the classroom.
Darrin is the author of "Centered On Excellence", co-author of Eruditio's "Leadership for Asset Management Excellence" book series, and associate editor and co-author of the “Maintenance Engineering Handbook”, 7th Edition. As a member of the U.S. Technical Advisory Group, he par-ticipated in the construction of ISO 55000 and continues to educate industry and government leaders as they develop and administer modern asset management systems.
Darrin also serves as an adjunct instructor at the University of Tennessee, College of Engineering, and the Mississip-pi State Center for Advanced Vehicular Systems, providing exciting and interactive courses in maintenance and reliability engineering, asset management, and organizational leadership.