# Fault analysis

Ensuring probability of failure is not systematically under/overestimated, or overfitted by adding too many explanatory variables, is of particular concern for regulators who must ensure revenue caps are neither too high nor too low. Data scientists and researchers also need to estimate new parameters for new variables, that might only relate to a few climates or geographical areas. Parameter estimation is also important for defining new models for asset types not currently in the CNAIM standard.

The fault statistics dataset, transformer_11kv_faults, is provided when the main package is loaded.

When identifying new parameters, it is reasonable to assume that C = 1, since it is constant for all parameters in the 2017 CNAIM specification. The shape of the curve is, in fact, exclusively defined by H.

In its general form, the Weibull distribution is a lifetime distribution with two parameters: the scale parameter $$\alpha$$ and the shape parameter $$\gamma$$. With these two parameters, the Weibull distribution has pdf

$$f(t, \gamma, \alpha) = \frac{\gamma}{t}\left(\frac{t}{\alpha}\right)^\gamma e^{-\left(\frac{t}{\alpha}\right)^\gamma}$$

$$\alpha \Gamma\left(1+\frac{1}{\gamma}\right),$$

variance

$$\alpha^2 \Gamma\left(1+\frac{2}{\gamma}\right) - \left[ \alpha\Gamma\left(1+\frac{1}{\gamma}\right)\right]^2$$.

For a first analysis, we set $$\gamma = 1$$: This reduces the Weibull distribution to an exponential distribution, with pdf $$f(t, \alpha) = \frac{1}{\alpha}e^{-\frac{t}{\alpha}}$$, mean lifetime $$\alpha$$, and variance $$\alpha^2$$. Firstly, the parameter $$\alpha$$ is fitted to the data using multilinear regression. Then the variance of the resulting distribution is compared with the variance estimated from the data. If the exponential distribution turns out to be a poor fit, the analysis can be extended with the shape parameter $$\gamma$$.

In our use case, the parameter $$\alpha$$ denotes the mean (or expected) lifetime of a transformer, which depends on several variables describing location and environmental factors. This dependence is represented by the following multilinear model for $$\alpha$$: $\alpha = \alpha_0 + \alpha_1 x_1 + \alpha_2 x_2 + ... + \alpha_n x_n,$ with $$x_1,...,x_n$$ the values of the explanatory variables, and $$\alpha_0,\alpha_1,...,\alpha_n$$ the coefficients to be determined by regression. The coefficients are then used to identify the shapes and the scales for the Weibull distribution.

model_params <- train_weibull_model(transformer_faults_data = transformer_11kv_faults)
model_params
    shapes scales.intercept      scales.1  scales.2     scales.3     scales.4
1 3.597272        100.17922  0.0028536801 -8.202209 -0.003023546 -0.040016081
2 2.528015         45.54622  0.0014449054 -3.856043 -0.001602048 -0.028129483
3 2.273607         73.63507  0.0011716558 -2.818854 -0.001348340 -0.017586604
4 2.101450         29.99655 -0.0003356626 -2.388243 -0.001988660 -0.009426902
5 2.048909         31.19306 -0.0017302242 -2.940468 -0.003149921 -0.021783120
scales.5     scales.6   scales.7   scales.8   scales.9
1 -1.4776137 -0.811395564 -4.4776511 -1.5861982 -0.7914404
2 -0.6794045  0.015705206 -0.3677058  0.0000000 -0.2632199
3 -0.6000869 -9.815935489  0.4590218 -0.1398528 -1.1882148
4 -0.3839049 -0.002548827 -0.6364809 -0.1721091  0.0000000
5 -0.4445468 -0.085903822 -0.3314029  0.0000000  0.0000000

The saved model parameters can then be used to predict output for an asset

predict_weibull_model(age = 50, weibull_model_parameters = model_params)
 0.005440838