Physics of Failure and its Role in Maintenance
The majority of the challenges in the field of maintenance are associated to failures and their prevention. Understanding why, when and how components fail should therefore play an important role in many aspects of maintenance management and engineering.
The key issue of maintenance is finding the optimal balance between minimizing the effects of failing (sub-) systems on the one hand, and minimizing the maintenance costs on the other. For critical systems, where failures have serious consequences in terms of costs, safety, environmental effects or consequential damage, measures are generally taken to prevent such failures from occurring and spending a considerable budget on preventive maintenance is then justified.
However, for many critical applications in the military, aerospace and nuclear power sectors, this risk avoidance has resulted in a very conservative maintenance approach. Many components are replaced far before they reach the end of their actual service life. Maintenance programmes are then effective (in preventing failures), but not very efficient.
Predictive Maintenance and Prognostics
The main reason for the conservatism in the maintenance interval determination is the uncertainty in the expected service life of subsystems and components. Traditionally, one tries to capture this uncertainty in statistical distributions of failure times, e.g. using the Weibull distribution function. However, this experience-based approach, where historical failure data is utilized to make predictions for future failures, has some important drawbacks.
Firstly, a sufficiently large set of failure data is required to accurately determine the parameters of the distribution function. For critical systems failures are prevented as much as possible, so failure data sets are by definition small. Secondly, the distribution functions are based on historical data and are thus associated to the usage profile in that period of time. When the present usage is significantly different, the distribution is not representative and thus cannot be used anymore for predictive maintenance.
An alternative approach that circumvents these drawbacks, is to use physical models for the prediction of failures [1] as illustrated in Figure 1. The purpose is to find out how the remaining life depends on the usage of the system as indicated in the upper half of the figure. The uncertainty in this relation, and the associated conservatism, can be reduced by zooming into the level of the physical failure mechanisms. By defining a failure model, the quantitative relation between the internal load (e.g. stress), which is directly related to the specific usage (e.g. rotational speed) of the system, and the resulting degradation rate is established. This gives the ability to accurately predict the expected time to failure, provided that the usage of the system is monitored appropriately.
Figure 1. Model-based approach showing the relation between usage, loads, condition and life consumption.
Figure 2. Navy frigate with several of its subsystems, as modelled in [3].
The big challenge in this approach is firstly to assess the critical failure mechanism(s) (e.g. fatigue, corrosion, wear) and its governing load. Secondly, a suitable model for the identified failure mechanism must be defined or developed. An overview of the most common failure mechanisms and related failure models is provided in a recent publication by the author [2].
The approach has been applied to several military systems in the past years, like gas turbines, frigates [3] (Figure 2), helicop- Figure 2. Navy frigate with several of its subsystems, as modelled in [3]. ters and military vehicles. These case studies have demonstrated that the application of the approach to real systems is feasible and provides benefits relative to the traditional (statistical) approach.
Condition Based Maintenance
The concept of Condition Based Maintenance (CBM) has been known for decades now, but recent developments have considerably widened the applicability of this methodology. On the one hand a large variety of new (reliable) sensors have become available, enabling the monitoring of a wide range of load and condition parameters. Sensors detecting fatigue cracks, disbonded joints, erosion, impact damage, composite delamination and corrosion are now available and enable structural health monitoring (SHM) of complex systems like aircraft or wind turbines. On the other hand, the increased computational power of modern computers has made the analysis of all collected data feasible.
Due to the boost in performance and availability of sensors and other hardware, condition monitoring systems are offered by OEMs in several industries as the way to increase maintenance efficiency. However, many operators, and even some manufacturers, only now start to realize that condition- based maintenance is not automatically possible when the condition monitoring system (sensors) is in place.
An additional requirement is the analysis and interpretation of the collected data that is needed to translate the raw data into useful maintenance information. In this analysis step the knowledge on the physics of failure is again crucial, as the challenge of condition monitoring is to find features in the monitored signal that can be related to failure or degradation processes in the system.
For example, understanding the failure mechanisms in gearboxes helps to interpret the different features in the vibration monitoring signal obtained from the monitoring system. Similarly, features in the electro- chemical noise signals of corrosion processes can be attributed to certain corrosion mechanisms [4].
Also in the phase of developing new condition monitoring systems, knowledge of failure mechanisms is essential. Only when the critical failure mechanisms for a certain system are known, the suitable type of sensor can be selected and the appropriate location to monitor can be determined. This process has recently be formalized in a guideline for CBM [5] (Figure 3). In this decision diagram a number of questions must be answered to identify whether a system is suitable for CBM. CBM becomes an appropriate maintenance concept for the considered system only when all questions can be answered positively.
Finally, prognostics for CBM is also important. The sensor data only provides information about the current state of the component. Just waiting till the moment that a monitored condition parameter exceeds a critical value, means that immediate action is required which is difficult to plan (e.g. personnel, spare parts) and may have serious consequences for the system availability. Therefore, a prognostic method is required to determine when future maintenance activities are necessary.
Figure 3. Decision diagram for condition-based maintenance.
In traditional condition monitoring systems, like vibration analysis, trending methods and growth models are used to extrapolate trends in monitored condition parameters (e.g. vibration levels) to determine component replacement or repair intervals. However, these are again experience- based approaches with the same drawbacks as mentioned before. Also in this case, knowledge of the physical failure mechanisms can assist in improving the maintenance efficiency.
By applying physical model-based prognostic methods, the effects of changes in usage can be taken into account [6]. The large difference with the prognostic methods used for fixed maintenance intervals, as discussed in the previous section, is the availability of the monitoring data. Firstly, this means that the prognosis is not done for the complete service life, but only for the fraction remaining after the last condition assessment. This limited scope makes the prediction generally more reliable and accurate. Secondly, the monitoring data can be used to validate the physical model used for the prediction. The consequence is that the model improves during operation, as more and more data can be used as feedback.
Root Cause Analysis
Despite the range of maintenance activities performed within industry, unexpected failures are unavoidable in practice. If the failure of a system has serious consequences, measures are generally taken to prevent such a failure from occurring again in the future. But also less critical failures can be extremely troublesome when they occur on a regular basis. In such cases it is essential to identify the root cause of the failure, as one can then find a solution for the problem, either by reducing the loads on the system or by increasing the load carrying capacity.
In a recent research project [7] it was demonstrated that the structured analysis of real failures in industry can benefit from the knowledge on failure mechanisms. Root cause analyses on four case studies have been executed, where the essence was to go down to the level of the physical failure mechanisms.
At that level the cause of a failure is rather straightforward. Either the load on the system was too high, or the load-carrying capacity of the system was too low. The former could be caused by either misuse of the system or by using the system in a different way than it has been designed for. The low capacity may be due to the application of wrong materials or due to a design error. The followed procedure consists of the following steps:
- Set-up fault tree to identify possible failure modes
- Prioritize failures (e.g. based on CMMS data)
- For critical failures: assess failure mechanism and governing load
- Solve problem: either increase capacity or reduce load
An example of a fault tree for a centrifugal pump is shown in Figure 4, where the colours of the basic events indicate the type of cause: capacity, human error or load (avoidable or unavoidable).
Once the type of cause is known, the solution to the problem is generally rather straightforward. If the load was too high, the usage of the system should be changed in order to prevent the failure from occurring again. If this is not possible, the failure could be made predictable by monitoring the usage. If, on the other hand, the capacity is too low, a redesign or modification could solve the problem.
Figure 4. Fault tree for a centrifugal pump, indicating the different types of causes.
Finally, human errors can be prevented by better training and instruction of the operators. The project showed that this structured approach, which considers the failure mechanisms of the systems, provided the industrial partners with much insight in the causes of the failures enabling them to solve many of the frequent failures.
Although the maintenance world is rather conservative and relies heavily on past experience, this article has illustrated that understanding the physics of failure provides many opportunities to improve the efficiency of maintenance processes.
»»References ››1. Tinga, T., Application of physical failure models to enable usage and load based maintenance. Reliability Engineering and System Safety, 2010. 95(10): p. 1061-1075. ››2. Tinga, T., Principles of loads and failure mechanisms; Applications in maintenance, reliability and design. Springer Series in Reliability Engineering, ed. H. Pham, 2013, London: Springer V erlag. ››3. Tinga, T. and R.H.P. Janssen. The interplay between deployment and optimal maintenance intervals for a navy frigate. in: European Safety and Reliability Conference. 2012. Helsinki. ››4. Homborg, A.M., et al., Timefrequency methods for trend removal in electrochemical noise data. Electrochimica Acta, 2012. 70: p. 199-209. ››5. Tinga, T., D. Soute, and H.J.H. Roeterink, G uidelines for Condition Based Maintenance, 2009, World C lass Maintenance Consortium: Breda. ››6. Tinga, T., Physical model based component prognostics, in: Maintenance Modelling and Applications, J. Andrews, C. Bérenguer, and L. Jackson, Editors. 2011, DNV: Hovik. p. 166-184. ››7. Tinga, T., Mechanism based failure analysis. I mproving maintenance by understanding the failure mechanisms, 2013, Den Helder: Netherlands Defence Academy.