https://www.reliawiki.com/api.php?action=feedcontributions&user=Miklos+Szidarovszky&feedformat=atomReliaWiki - User contributions [en]2022-12-06T23:39:13ZUser contributionsMediaWiki 1.34.2https://www.reliawiki.com/index.php?title=Warranty_Data_Analysis&diff=65252Warranty Data Analysis2017-08-16T15:37:46Z<p>Miklos Szidarovszky: /* Example */</p>
<hr />
<div>{{template:LDABOOK|19|Warranty Data Analysis}}<br />
The Weibull++ warranty analysis folio provides four different data entry formats for warranty claims data. It allows the user to automatically perform life data analysis, predict future failures (through the use of conditional probability analysis), and provides a method for detecting outliers. The four data-entry formats for storing sales and returns information are: <br />
<br />
:1) Nevada Chart Format<br />
:2) Time-to-Failure Format<br />
:3) Dates of Failure Format<br />
:4) Usage Format<br />
<br />
These formats are explained in the next sections. We will also discuss some specific warranty analysis calculations, including warranty predictions, analysis of non-homogeneous warranty data and using statistical process control (SPC) to monitor warranty returns.<br />
<br />
==Nevada Chart Format==<br />
The Nevada format allows the user to convert shipping and warranty return data into the standard reliability data form of failures and suspensions so that it can easily be analyzed with traditional life data analysis methods. For each time period in which a number of products are shipped, there will be a certain number of returns or failures in subsequent time periods, while the rest of the population that was shipped will continue to operate in the following time periods. For example, if 500 units are shipped in May, and 10 of those units are warranty returns in June, that is equivalent to 10 failures at a time of one month. The other 490 units will go on to operate and possibly fail in the months that follow. This information can be arranged in a diagonal chart, as shown in the following figure.<br />
<br />
[[Image:Nevada-Chart-Illustration.png|center|450px| ]]<br />
<br />
At the end of the analysis period, all of the units that were shipped and have not failed in the time since shipment are considered to be suspensions. This process is repeated for each shipment and the results tabulated for each particular failure and suspension time prior to reliability analysis. This process may sound confusing, but it is actually just a matter of careful bookkeeping. The following example illustrates this process.<br />
<br />
===Example===<br />
'''Nevada Chart Format Calculations Example'''<br />
<br />
A company keeps track of its shipments and warranty returns on a month-by-month basis. The following table records the shipments in June, July and August, and the warranty returns through September:<br />
<br />
<br />
{|border="1" align="center" style="border-collapse: collapse;" cellpadding="5" cellspacing="5"<br />
|-align="center"<br />
|colspan="2"| ||colspan="3" style="text-align:center;"|RETURNS<br />
|-align="center"<br />
|colspan="2" style="text-align:right;"|SHIP||Jul. 2010||Aug. 2010||Sep. 2010<br />
|-align="center"<br />
|Jun. 2010||100||3||3||5<br />
|-align="center"<br />
|Jul. 2010||140||-||2||4<br />
|-align="center"<br />
|Aug. 2010||150||-||-||4<br />
|}<br />
<br />
<br />
We will examine the data month by month. In June 100 units were sold, and in July 3 of these units were returned. This gives 3 failures at one month for the June shipment, which we will denote as <math>{{F}_{JUN,1}}=3\,\!</math>. Likewise, 3 failures occurred in August and 5 occurred in September for this shipment, or <math>{{F}_{JUN,2}}=3\,\!</math> and <math>{{F}_{JUN,3}}=5\,\!</math>. Consequently, at the end of our three-month analysis period, there were a total of 11 failures for the 100 units shipped in June. This means that 89 units are presumably still operating, and can be considered suspensions at three months, or <math>{{S}_{JUN,3}}=89\,\!</math>. For the shipment of 140 in July, 2 were returned the following month, or <math>{{F}_{JUL,1}}=2\,\!</math>, and 4 more were returned the month after that, or <math>{{F}_{JUL,2}}=4\,\!</math>. After two months, there are 134 ( <math>140-2-4=134\,\!</math> ) units from the July shipment still operating, or <math>{{S}_{JUL,2}}=134\,\!</math>. For the final shipment of 150 in August, 4 fail in September, or <math>{{F}_{AUG,1}}=4\,\!</math>, with the remaining 146 units being suspensions at one month, or <math>{{S}_{AUG,1}}=146\,\!</math>.<br />
<br />
It is now a simple matter to add up the number of failures for 1, 2, and 3 months, then add the suspensions to get our reliability data set:<br />
<br />
<br />
<center><math>\begin{matrix}<br />
\text{Failures at 1 month:} & {{F}_{1}}={{F}_{JUN,1}}+{{F}_{JUL,1}}+{{F}_{AUG,1}}=3+2+4=9 \\<br />
\text{Suspensions at 1 month:} & {{S}_{1}}={{S}_{AUG,1}}=146 \\<br />
\text{Failures at 2 months:} & {{F}_{2}}={{F}_{JUN,2}}+{{F}_{JUL,2}}=3+4=7 \\<br />
\text{Suspensions at 2 months:} & {{S}_{2}}={{S}_{JUL,2}}=134 \\<br />
\text{Failures at 3 months:} & {{F}_{3}}={{F}_{JUN,3}}=5 \\<br />
\text{Suspensions at 3 months:} & {{S}_{JUN,3}}=89 \\<br />
\end{matrix}\,\!</math></center><br />
<br />
<br />
These calculations can be performed automatically in Weibull++. <br />
<br />
<div class="noprint"><br />
{{Examples Box|Weibull++_Examples|<p>More Nevada chart format warranty analysis examples are available! See also:</p> <br />
{{Examples Both|http://www.reliasoft.com/Weibull/examples/rc5/index.htm|Warranty Analysis Example|http://www.reliasoft.tv/weibull/appexamples/weibull_app_ex_5.html|Watch the video...}}<nowiki/><br />
}}<br />
</div><br />
<br />
==Time-to-Failure Format==<br />
This format is similar to the standard folio data entry format (all number of units, failure times and suspension times are entered by the user). The difference is that when the data is used within the context of warranty analysis, the ability to generate forecasts is available to the user.<br />
<br />
===Example===<br />
{{:Warranty_Data_Analysis_Times-to-Failure_Format_with_Plot_Example}}<br />
<br />
==Dates of Failure Format==<br />
Another common way for reporting field information is to enter a date and quantity of sales or shipments (Quantity In-Service data) and the date and quantity of returns (Quantity Returned data). In order to identify which lot the unit comes from, a failure is identified by a return date and the date of when it was put in service. The date that the unit went into service is then associated with the lot going into service during that time period. You can use the optional Subset ID column in the data sheet to record any information to identify the lots.<br />
<br />
===Example===<br />
{{:Warranty_Data_Analysis_Dates_Format_Example}}<br />
<br />
==Usage Format==<br />
Often, the driving factor for reliability is usage rather than time. For example, in the automotive industry, the failure behavior in the majority of the products is mileage-dependent rather than time-dependent. The usage format allows the user to convert shipping and warranty return data into the standard reliability data for of failures and suspensions when the return information is based on usage rather than return dates or periods. Similar to the dates of failure format, a failure is identified by the return number and the date of when it was put in service in order to identify which lot the unit comes from. The date that the returned unit went into service associates the returned unit with the lot it belonged to when it started operation. However, the return data is in terms of usage and not date of return. Therefore the usage of the units needs to be specified as a constant usage per unit time or as a distribution. This allows for determining the expected usage of the surviving units.<br />
<br />
Suppose that you have been collecting sales (units in service) and returns data. For the returns data, you can determine the number of failures and their usage by reading the odometer value, for example. Determining the number of surviving units (suspensions) and their ages is a straightforward step. By taking the difference between the analysis date and the date when a unit was put in service, you can determine the age of the surviving units.<br />
<br />
What is unknown, however, is the exact usage accumulated by each surviving unit. The key part of the usage-based warranty analysis is the determination of the usage of the surviving units based on their age. Therefore, the analyst needs to have an idea about the usage of the product. This can be obtained, for example, from customer surveys or by designing the products to collect usage data. For example, in automotive applications, engineers often use 12,000 miles/year as the average usage. Based on this average, the usage of an item that has been in the field for 6 months and has not yet failed would be 6,000 miles. So to obtain the usage of a suspension based on an average usage, one could take the time of each suspension and multiply it by this average usage. In this situation, the analysis becomes straightforward. With the usage values and the quantities of the returned units, a failure distribution can be constructed and subsequent warranty analysis becomes possible.<br />
<br />
Alternatively, and more realistically, instead of using an average usage, an actual distribution that reflects the variation in usage and customer behavior can be used. This distribution describes the usage of a unit over a certain time period (e.g., 1 year, 1 month, etc). This probabilistic model can be used to estimate the usage for all surviving components in service and the percentage of users running the product at different usage rates. In the automotive example, for instance, such a distribution can be used to calculate the percentage of customers that drive 0-200 miles/month, 200-400 miles/month, etc. We can take these percentages and multiply them by the number of suspensions to find the number of items that have been accumulating usage values in these ranges.<br />
<br />
To proceed with applying a usage distribution, the usage distribution is divided into increments based on a specified interval width denoted as <math>Z\,\!</math>. The usage distribution, <math>Q\,\!</math>, is divided into intervals of <math>0+Z\,\!</math>, <math>Z+Z\,\!</math>, <math>2Z+Z\,\!</math>, etc., or <math>{{x}_{i}}={{x}_{i-1}}+Z\,\!</math>, as shown in the next figure.<br />
<br />
[[Image:Usage pdf Plot.png|center|250px| ]] <br />
<br />
The interval width should be selected such that it creates segments that are large enough to contain adequate numbers of suspensions within the intervals. The percentage of suspensions that belong to each usage interval is calculated as follows:<br />
<br />
::<math>\begin{align}<br />
F({{x}_{i}})=Q({{x}_{i}})-Q({{x}_{i}}-1)<br />
\end{align}\,\!</math><br />
<br />
where:<br />
<br />
::<math>Q()\,\!</math> is the usage distribution Cumulative Density Function, ''cdf''.<br />
<br />
::<math>x\,\!</math> represents the intervals used in apportioning the suspended population.<br />
<br />
A suspension group is a collection of suspensions that have the same age. The percentage of suspensions can be translated to numbers of suspensions within each interval, <math>{{x}_{i}}\,\!</math>. This is done by taking each group of suspensions and multiplying it by each <math>F({{x}_{i}})\,\!</math>, or:<br />
<br />
::<math>\begin{align}<br />
& {{N}_{1,j}}= & F({{x}_{1}})\times N{{S}_{j}} \\ <br />
& {{N}_{2,j}}= & F({{x}_{2}})\times N{{S}_{j}} \\ <br />
& & ... \\ <br />
& {{N}_{n,j}}= & F({{x}_{n}})\times N{{S}_{j}} <br />
\end{align}\,\!</math><br />
<br />
where:<br />
<br />
::<math>{{N}_{n,j}}\,\!</math> is the number of suspensions that belong to each interval.<br />
<br />
::<math>N{{S}_{j}}\,\!</math> is the jth group of suspensions from the data set.<br />
<br />
This is repeated for all the groups of suspensions.<br />
<br />
The age of the suspensions is calculated by subtracting the Date In-Service ( <math>DIS\,\!</math> ), which is the date at which the unit started operation, from the end of observation period date or End Date ( <math>ED\,\!</math> ). This is the Time In-Service ( <math>TIS\,\!</math> ) value that describes the age of the surviving unit.<br />
<br />
::<math>\begin{align}<br />
TIS=ED-DIS<br />
\end{align}\,\!</math><br />
<br />
Note: <math>TIS\,\!</math> is in the same time units as the period in which the usage distribution is defined.<br />
<br />
For each <math>{{N}_{k,j}}\,\!</math>, the usage is calculated as:<br />
<br />
::<math>Uk,j=xi\times TISj\,\!</math><br />
<br />
After this step, the usage of each suspension group is estimated. This data can be combined with the failures data set, and a failure distribution can be fitted to the combined data.<br />
<br />
===Example===<br />
{{:Warranty_Analysis_Usage_Format_Example}}<br />
<br />
To illustrate the calculations behind the results of this example, consider the 9 units that went in service on December 2009. 1 unit failed from that group; therefore, 8 suspensions have survived from December 2009 until the beginning of December 2010, a total of 12 months. The calculations are summarized as follows.<br />
<br />
[[Image:Usage Suspension Allocation.PNG|center|500px| ]] <br />
<br />
The two columns on the right constitute the calculated suspension data (number of suspensions and their usage) for the group. The calculation is then repeated for each of the remaining groups in the data set. These data are then combined with the data about the failures to form the life data set that is used to estimate the failure distribution model.<br />
<br />
==Warranty Prediction==<br />
Once a life data analysis has been performed on warranty data, this information can be used to predict how many warranty returns there will be in subsequent time periods. This methodology uses the concept of conditional reliability (see [[Basic Statistical Background]]) to calculate the probability of failure for the remaining units for each shipment time period. This conditional probability of failure is then multiplied by the number of units at risk from that particular shipment period that are still in the field (i.e., the suspensions) in order to predict the number of failures or warranty returns expected for this time period. The next example illustrates this.<br />
<br />
===Example===<br />
<br />
Using the data in the following table, predict the number of warranty returns for October for each of the three shipment periods. Use the following Weibull parameters, beta = 2.4928 and eta = 6.6951. <br />
<br />
{|border="1" align="center" style="border-collapse: collapse;" cellpadding="5" cellspacing="5"<br />
|-align="center"<br />
|colspan="2"| ||colspan="3" style="text-align:center;"|RETURNS<br />
|-align="center"<br />
|colspan="2" style="text-align:right;"|SHIP||Jul. 2010||Aug. 2010||Sep. 2010<br />
|-align="center"<br />
|Jun. 2010||100||3||3||5<br />
|-align="center"<br />
|Jul. 2010||140||-||2||4<br />
|-align="center"<br />
|Aug. 2010||150||-||-||4<br />
|}<br />
<br />
'''Solution'''<br />
<br />
Use the Weibull parameter estimates to determine the conditional probability of failure for each shipment time period, and then multiply that probability with the number of units that are at risk for that period as follows. The equation for the conditional probability of failure is given by: <br />
<br />
::<math>Q(t|T)=1-R(t|T)=1-\frac{R(T+t)}{R(T)}\,\!</math><br />
<br />
For the June shipment, there are 89 units that have successfully operated until the end of September ( <math>T=3 months)\,\!</math>. The probability of one of these units failing in the next month ( <math>t=1 month)\,\!</math> is then given by: <br />
<br />
::<math>Q(1|3)=1-\frac{R(4)}{R(3)}=1-\frac{{{e}^{-{{\left( \tfrac{4}{6.70} \right)}^{2.49}}}}}{{{e}^{-{{\left( \tfrac{3}{6.70} \right)}^{2.49}}}}}=1-\frac{0.7582}{0.8735}=0.132\,\!</math><br />
<br />
Once the probability of failure for an additional month of operation is determined, the expected number of failed units during the next month, from the June shipment, is the product of this probability and the number of units at risk ( <math>{{S}_{JUN,3}}=89)\,\!</math> or: <br />
<br />
::<math>{{\widehat{F}}_{JUN,4}}=89\cdot 0.132=11.748\text{, or 12 units}\,\!</math><br />
<br />
This is then repeated for the July shipment, where there were 134 units operating at the end of September, with an exposure time of two months. The probability of failure in the next month is: <br />
<br />
::<math>Q(1|2)=1-\frac{R(3)}{R(2)}=1-\frac{0.8735}{0.9519}=0.0824\,\!</math><br />
<br />
This value is multiplied by <math>{{S}_{JUL,2}}=134\,\!</math> to determine the number of failures, or: <br />
<br />
::<math>{{\widehat{F}}_{JUL,3}}=134\cdot 0.0824=11.035\text{, or 11 units}\,\!</math><br />
<br />
For the August shipment, there were 146 units operating at the end of September, with an exposure time of one month. The probability of failure in the next month is: <br />
<br />
::<math>Q(1|1)=1-\frac{R(2)}{R(1)}=1-\frac{0.9519}{0.9913}=0.0397\,\!</math><br />
<br />
This value is multiplied by <math>{{S}_{AUG,1}}=146\,\!</math> to determine the number of failures, or: <br />
<br />
::<math>{{\widehat{F}}_{AUG,2}}=146\cdot 0.0397=5.796\text{, or 6 units}\,\!</math><br />
<br />
Thus, the total expected returns from all shipments for the next month is the sum of the above, or 29 units. This method can be easily repeated for different future sales periods, and utilizing projected shipments. If the user lists the number of units that are expected be sold or shipped during future periods, then these units are added to the number of units at risk whenever they are introduced into the field. The '''Generate Forecast''' functionality in the Weibull++ warranty analysis folio can automate this process for you.<br />
<br />
==Non-Homogeneous Warranty Data==<br />
In the previous sections and examples, it is important to note that the underlying assumption was that the population was homogeneous. In other words, all sold and returned units were exactly the same (i.e., the same population with no design changes and/or modifications). In many situations, as the product matures, design changes are made to enhance and/or improve the reliability of the product. Obviously, an improved product will exhibit different failure characteristics than its predecessor. To analyze such cases, where the population is non-homogeneous, one needs to extract each homogenous group, fit a life model to each group and then project the expected returns for each group based on the number of units at risk for each specific group.<br />
<br />
<br />
'''Using Subset IDs in Weibull++'''<br />
<br />
Weibull++ includes an optional Subset ID column that allows to differentiate between product versions or different designs (lots). Based on the entries, the software will separately analyze (i.e., obtain parameters and failure projections for) each subset of data. Note that it is important to realize that the same limitations with regards to the number of failures that are needed are also applicable here. In other words, distributions can be automatically fitted to lots that have return (failure) data, whereas if no returns have been experienced yet (either because the units are going to be introduced in the future or because no failures happened yet), the user will be asked to specify the parameters, since they can not be computed. Consequently, subsequent estimation/predictions related to these lots would be based on the user specified parameters. The following example illustrates the use of Subset IDs.<br />
<br />
===Example===<br />
{{:Warranty Analysis Non-Homogeneous Data Example}}<br />
<br />
==Monitoring Warranty Returns Using Statistical Process Control (SPC)==<br />
By monitoring and analyzing warranty return data, one can detect specific return periods and/or batches of sales or shipments that may deviate (differ) from the assumed model. This provides the analyst (and the organization) the advantage of early notification of possible deviations in manufacturing, use conditions and/or any other factor that may adversely affect the reliability of the fielded product. Obviously, the motivation for performing such analysis is to allow for faster intervention to avoid increased costs due to increased warranty returns or more serious repercussions. Additionally, this analysis can also be used to uncover different sub-populations that may exist within the population.<br />
<br />
===Basic Analysis Method===<br />
<br />
For each sales period <math>i\,\!</math> and return period <math>j\,\!</math>, the prediction error can be calculated as follows:<br />
<br />
::<math>{{e}_{i,j}}={{\hat{F}}_{i,j}}-{{F}_{i,j}}\,\!</math><br />
<br />
where <math>{{\hat{F}}_{i,j}}\,\!</math> is the estimated number of failures based on the estimated distribution parameters for the sales period <math>i\,\!</math> and the return period <math>j\,\!</math>, which is calculated using the equation for the conditional probability, and <math>{{F}_{i,j}}\,\!</math> is the actual number of failure for the sales period <math>i\,\!</math> and the return period <math>j\,\!</math>.<br />
<br />
Since we are assuming that the model is accurate, <math>{{e}_{i,j}}\,\!</math> should follow a normal distribution with mean value of zero and a standard deviation <math>s\,\!</math>, where:<br />
<br />
::<math>{{\bar{e}}_{i,j}}=\frac{\underset{i}{\mathop{\sum }}\,\underset{j}{\mathop{\sum }}\,{{e}_{i,j}}}{n}=0\,\!</math><br />
<br />
and <math>n\,\!</math> is the total number of return data (total number of residuals). <br />
<br />
The estimated standard deviation of the prediction errors can then be calculated by:<br />
<br />
::<math>s=\sqrt{\frac{1}{n-1}\underset{i}{\mathop \sum }\,\underset{j}{\mathop \sum }\,e_{i,j}^{2}}\,\!</math><br />
<br />
and <math>{{e}_{i,j}}\,\!</math> can be normalized as follows:<br />
<br />
::<math>{{z}_{i,j}}=\frac{{{e}_{i,j}}}{s}\,\!</math><br />
<br />
where <math>{{z}_{i,j}}\,\!</math> is the standardized error. <math>{{z}_{i,j}}\,\!</math> follows a normal distribution with <math>\mu =0\,\!</math> and <math>\sigma =1\,\!</math>.<br />
<br />
It is known that the square of a random variable with standard normal distribution follows the <math>{{\chi }^{2}}\,\!</math> (Chi Square) distribution with 1 degree of freedom and that the sum of the squares of <math>m\,\!</math> random variables with standard normal distribution follows the <math>{{\chi }^{2}}\,\!</math> distribution with <math>m\,\!</math> degrees of freedom. This then can be used to help detect the abnormal returns for a given sales period, return period or just a specific cell (combination of a return and a sales period).<br />
<br />
:* For a cell, abnormality is detected if <math>z_{i,j}^{2}=\chi _{1}^{2}\ge \chi _{1,\alpha }^{2}.\,\!</math> <br />
:* For an entire sales period <math>i\,\!</math>, abnormality is detected if <math>\underset{j}{\mathop{\sum }}\,z_{i,j}^{2}=\chi _{J}^{2}\ge \chi _{\alpha ,J}^{2},\,\!</math> where <math>J\,\!</math> is the total number of return period for a sales period <math>i\,\!</math>.<br />
:* For an entire return period <math>j\,\!</math>, abnormality is detected if <math>\underset{i}{\mathop{\sum }}\,z_{i,j}^{2}=\chi _{I}^{2}\ge \chi _{\alpha ,I}^{2},\,\!</math> where <math>I\,\!</math> is the total number of sales period for a return period <math>j\,\!</math>.<br />
Here <math>\alpha \,\!</math> is the criticality value of the <math>{{\chi }^{2}}\,\!</math> distribution, which can be set at critical value or caution value. It describes the level of sensitivity to outliers (returns that deviate significantly from the predictions based on the fitted model). Increasing the value of <math>\alpha \,\!</math> increases the power of detection, but this could lead to more false alarms.<br />
<br />
====Example====<br />
'''Example Using SPC for Warranty Analysis Data'''<br />
<br />
Using the data from the following table, the expected returns for each sales period can be obtained using conditional reliability concepts, as given in the conditional probability equation. <br />
<br />
{|border="1" align="center" style="border-collapse: collapse;" cellpadding="5" cellspacing="5"<br />
|-align="center"<br />
|colspan="2"| ||colspan="3" style="text-align:center;"|RETURNS<br />
|-align="center"<br />
|colspan="2" style="text-align:right;"|SHIP||Jul. 2010||Aug. 2010||Sep. 2010<br />
|-align="center"<br />
|Jun. 2010||100||3||3||5<br />
|-align="center"<br />
|Jul. 2010||140||-||2||4<br />
|-align="center"<br />
|Aug. 2010||150||-||-||4<br />
|}<br />
<br />
For example, for the month of September, the expected return number from the June shipment is given by:<br />
<br />
::<math>{{\hat{F}}_{Jun,3}}=(100-6)\cdot \left( 1-\frac{R(3)}{R(2)} \right)=94\cdot 0.08239=7.7447\,\!</math><br />
<br />
The actual number of returns during this period is five; thus, the prediction error for this period is: <br />
<br />
::<math>{{e}_{Jun,3}}={{\hat{F}}_{Jun,3}}-{{F}_{Jun,3}}=7.7447-5=2.7447.\,\!</math><br />
<br />
This can then be repeated for each cell, yielding the following table for <math>{{e}_{i,j}}\,\!</math> : <br />
<br />
<center><math>\begin{matrix}<br />
{} & {} & RETURNS & {} & {} \\<br />
{} & SHIP & \text{Jul}\text{. 2005} & \text{Aug}\text{. 2005} & \text{Sep}\text{. 2005} \\<br />
\text{Jun}\text{. 2005} & \text{100} & \text{-2}\text{.1297} & \text{0}\text{.8462} & \text{2}\text{.7447} \\<br />
\text{Jul}\text{. 2005} & \text{140} & \text{-} & \text{-0}\text{.7816} & \text{1}\text{.4719} \\<br />
\text{Aug}\text{. 2005} & \text{150} & \text{-} & \text{-} & \text{-2}\text{.6946} \\<br />
\end{matrix}\,\!</math></center><br />
<br />
Now, for this example, <math>n=6\,\!</math>, <math>{{\bar{e}}_{i,j}}=-0.0904\,\!</math> and <math>s=2.1366.\,\!</math> <br />
<br />
Thus the <math>z_{i,j}\,\!</math> values are: <br />
<br />
<center><math>\begin{matrix}<br />
{} & {} & RETURNS & {} & {} \\<br />
{} & SHIP & \text{Jul}\text{. 2005} & \text{Aug}\text{. 2005} & \text{Sep}\text{. 2005} \\<br />
\text{Jun}\text{. 2005} & \text{100} & \text{-0}\text{.9968} & \text{0}\text{.3960} & \text{1}\text{.2846} \\<br />
\text{Jul}\text{. 2005} & \text{140} & \text{-} & \text{-0}\text{.3658} & \text{0}\text{.6889} \\<br />
\text{Aug}\text{. 2005} & \text{150} & \text{-} & \text{-} & \text{-1}\text{.2612} \\<br />
\end{matrix}\,\!</math></center><br />
<br />
The <math>z_{i,j}^{2}\,\!</math> values, for each cell, are given in the following table. <br />
<br />
<center><math>\begin{matrix}<br />
{} & {} & RETURNS & {} & {} & {} \\<br />
{} & SHIP & \text{Jul}\text{. 2005} & \text{Aug}\text{. 2005} & \text{Sep}\text{. 2005} & \text{Sum} \\<br />
\text{Jun}\text{. 2005} & \text{100} & \text{0}\text{.9936} & \text{0}\text{.1569} & \text{1}\text{.6505} & 2.8010 \\<br />
\text{Jul}\text{. 2005} & \text{140} & \text{-} & \text{0}\text{.1338} & \text{0}\text{.4747} & 0.6085 \\<br />
\text{Aug}\text{. 2005} & \text{150} & \text{-} & \text{-} & \text{1}\text{.5905} & 1.5905 \\<br />
\text{Sum} & {} & 0.9936 & 0.2907 & 3.7157 & {} \\<br />
\end{matrix}\,\!</math></center><br />
<br />
If the critical value is set at <math>\alpha = 0.01\,\!</math> and the caution value is set at <math>\alpha = 0.1\,\!</math>, then the critical and caution <math>{{\chi }^{2}}\,\!</math> values will be: <br />
<br />
<center><math>\begin{matrix}<br />
{} & & Degree of Freedom \\<br />
{} & \text{1} & \text{2} & \text{3} \\<br />
{{\chi}^{2}\text{Critical}} & \text{6.6349} & \text{9.2103} & \text{11.3449} \\<br />
{{\chi}^{2}\text{Caution}} & \text{2,7055} & \text{4.6052} & \text{6.2514} \\<br />
\end{matrix}\,\!</math></center><br />
<br />
If we consider the sales periods as the basis for outlier detection, then after comparing the above table to the sum of <math>z_{i,j}^{2}\,\!</math> <math>(\chi _{1}^{2})\,\!</math> values for each sales period, we find that all the sales values do not exceed the critical and caution limits. For example, the total <math>{{\chi }^{2}}\,\!</math> value of the sale month of July is 0.6085. Its degrees of freedom is 2, so the corresponding caution and critical values are 4.6052 and 9.2103 respectively. Both values are larger than 0.6085, so the return numbers of the July sales period do not deviate (based on the chosen significance) from the model's predictions.<br />
<br />
If we consider returns periods as the basis for outliers detection, then after comparing the above table to the sum of <math>z_{i,j}^{2}\,\!</math> <math>(\chi _{1}^{2})\,\!</math> values for each return period, we find that all the return values do not exceed the critical and caution limits. For example, the total <math>{{\chi }^{2}}\,\!</math> value of the sale month of August is 3.7157. Its degree of freedom is 3, so the corresponding caution and critical values are 6.2514 and 11.3449 respectively. Both values are larger than 3.7157, so the return numbers for the June return period do not deviate from the model's predictions.<br />
<br />
This analysis can be automatically performed in Weibull++ by entering the alpha values in the Statistical Process Control page of the control panel and selecting which period to color code, as shown next.<br />
<br />
[[Image:Warranty Example 5 SPC settings.png|center|250px| ]] <br />
<br />
To view the table of chi-squared values ( <math>z_{i,j}^{2}\,\!</math> or <math>\chi _{1}^{2}\,\!</math> values), click the '''Show Results (...)''' button. <br />
<br />
[[Image:Warranty Example 5 Chi-square.png|center|450px| ]] <br />
<br />
Weibull++ automatically color codes SPC results for easy visualization in the returns data sheet. By default, the green color means that the return number is normal; the yellow color indicates that the return number is larger than the caution threshold but smaller than the critical value; the red color means that the return is abnormal, meaning that the return number is either too big or too small compared to the predicted value.<br />
<br />
In this example, all the cells are coded in green for both analyses (i.e., by sales periods or by return periods), indicating that all returns fall within the caution and critical limits (i.e., nothing abnormal). Another way to visualize this is by using a Chi-Squared plot for the sales period and return period, as shown next.<br />
<br />
[[Image:Warranty Example 5 SPC Sales.png|center|450px| ]] <br />
<br />
<br />
[[Image:Warranty Example 5 SPC Return.png|center|450px| ]]<br />
<br />
===Using Subset IDs with SPC for Warranty Data===<br />
The warranty monitoring methodology explained in this section can also be used to detect different subpopulations in a data set. The different subpopulations can reflect different use conditions, different material, etc. In this methodology, one can use different subset IDs to differentiate between subpopulations, and obtain models that are distinct to each subpopulation. The following example illustrates this concept.<br />
<br />
====Example====<br />
{{:Non-Homogeneous Data with Subset IDs Example}}</div>Miklos Szidarovszkyhttps://www.reliawiki.com/index.php?title=Repairable_Systems_Analysis_Through_Simulation&diff=64813Repairable Systems Analysis Through Simulation2017-01-03T20:47:52Z<p>Miklos Szidarovszky: /* Additional Rules and Assumptions for Standby Containers */</p>
<hr />
<div>{{Template:bsbook|7}}<br />
{{TU}}<br />
<br />
Having introduced some of the basic theory and terminology for repairable systems in [[Introduction to Repairable Systems]], we will now examine the steps involved in the analysis of such complex systems. We will begin by examining system behavior through a sequence of discrete deterministic events and expand the analysis using discrete event simulation.<br />
<br />
=Simple Repairs=<br />
==Deterministic View, Simple Series==<br />
To first understand how component failures and simple repairs affect the system and to visualize the steps involved, let's begin with a very simple deterministic example with two components, <math>A\,\!</math> and <math>B\,\!</math>, in series.<br />
<br />
[[Image:i8.1.png|center|200px|link=]]<br />
<br />
Component <math>A\,\!</math> fails every 100 hours and component <math>B\,\!</math> fails every 120 hours. Both require 10 hours to get repaired. Furthermore, assume that the surviving component stops operating when the system fails (thus not aging). <br />
'''NOTE''': When a failure occurs in certain systems, some or all of the system's components<br />
may or may not continue to accumulate operating time while the system is down. For example,<br />
consider a transmitter-satellite-receiver system. This is a series system and the probability<br />
of failure for this system is the probability that any of the subsystems fail. If the receiver<br />
fails, the satellite continues to operate even though the receiver is down. In this case, the<br />
continued aging of the components during the system inoperation '''must''' be taken into<br />
consideration, since this will affect their failure characteristics and have an impact on the<br />
overall system downtime and availability.<br />
<br />
The system behavior during an operation from 0 to 300 hours would be as shown in the figure below.<br />
<br />
[[Image:BS8.1.png|center|500px|Overview of system and components for a simple series system with two components. Component A fails every 100 hours and component B fails every 120 hours. Both require 10 hours to get repaired and do not age(operate through failure) when the system is in a failed state.|link=]]<br />
<br />
Specifically, component <math>A\,\!</math> would fail at 100 hours, causing the system to fail. After 10 hours, component <math>A\,\!</math> would be restored and so would the system. The next event would be the failure of component <math>B\,\!</math>. We know that component <math>B\,\!</math> fails every 120 hours (or after an age of 120 hours). Since a component does not age while the system is down, component <math>B\,\!</math> would have reached an age of 120 when the clock reaches 130 hours. Thus, component <math>B\,\!</math> would fail at 130 hours and be repaired by 140 and so forth. Overall in this scenario, the system would be failed for a total of 40 hours due to four downing events (two due to <math>A\,\!</math> and two due to <math>B\,\!</math> ). The overall system availability (average or mean availability) would be <math>260/300=0.86667\,\!</math>. Point availability is the availability at a specific point time. In this deterministic case, the point availability would always be equal to 1 if the system is up at that time and equal to zero if the system is down at that time.<br />
<br />
====Operating Through System Failure====<br />
<br />
In the prior section we made the assumption that components do not age when the system is down. This assumption applies to most systems. However, under special circumstances, a unit may age even while the system is down. In such cases, the operating profile will be different from the one presented in the prior section. The figure below illustrates the case where the components operate continuously, regardless of the system status.<br />
<br />
[[Image:BS8.2.png|center|500px|Overview of up and down states for a simple series system with two components. Component ''A'' failes every 100 hours and component ''B'' fails every 120 hours. Both require 10 hours to get repaired and age when the system is in a failed state(operate through failure).|link=]]<br />
<br />
====Effects of Operating Through Failure====<br />
<br />
Consider a component with an increasing failure rate, as shown in the figure below. In the case that the component continues to operate through system failure, then when the system fails at <math>{{t}_{1}}\,\!</math> the surviving component's failure rate will be <math>{{\lambda }_{1}}\,\!</math>, as illustrated in figure below. When the system is restored at <math>{{t}_{2}}\,\!</math>, the component would have aged by <math>{{t}_{2}}-{{t}_{1}}\,\!</math> and its failure rate would now be <math>{{\lambda }_{2}}\,\!</math>. <br />
<br />
In the case of a component that does not operate through failure, then the surviving component would be at the same failure rate, <math>{{\lambda }_{1}},\,\!</math> when the system resumes operation.<br />
<br />
[[Image:BS8.3.png|center|400px|Illustration of a component with a linearly increasing failure rate and the effect of operation through system failure.|link=]]<br />
<br />
==Deterministic View, Simple Parallel==<br />
Consider the following system where <math>A\,\!</math> fails every 100, <math>B\,\!</math> every 120, <math>C\,\!</math> every 140 and <math>D\,\!</math> every 160 time units. Each takes 10 time units to restore. Furthermore, assume that components do not age when the system is down.<br />
<br />
[[Image:i8.2.png|center|300px|link=]]<br />
<br />
A deterministic system view is shown in the figure below. The sequence of events is as follows:<br><br />
<br />
#At 100, <math>A\,\!</math> fails and is repaired by 110. The system is failed. <br />
#At 130, <math>B\,\!</math> fails and is repaired by 140. The system continues to operate.<br />
#At 150, <math>C\,\!</math> fails and is repaired by 160. The system continues to operate.<br />
#At 170, <math>D\,\!</math> fails and is repaired by 180. The system is failed. <br />
#At 220, <math>A\,\!</math> fails and is repaired by 230. The system is failed. <br />
#At 280, <math>B\,\!</math> fails and is repaired by 290. The system continues to operate.<br />
#End at 300.<br />
<br />
[[Image:BS8.4.png|center|500px|Overview of simple redundant system with four components.|link=]]<br />
<br />
====Additional Notes====<br />
<br />
It should be noted that we are dealing with these events deterministically in order to better illustrate the methodology. When dealing with deterministic events, it is possible to create a sequence of events that one would not expect to encounter probabilistically. One such example consists of two units in series that do not operate through failure but both fail at exactly 100, which is highly unlikely in a real-world scenario. In this case, the assumption is that one of the events must occur at least an infinitesimal amount of time ( <math>dt)\,\!</math> before the other. Probabilistically, this event is extremely rare, since both randomly generated times would have to be exactly equal to each other, to 15 decimal points. In the rare event that this happens, BlockSim would pick the unit with the lowest ID value as the first failure. BlockSim assigns a unique numerical ID when each component is created. These can be viewed by selecting the '''Show Block ID''' option in the Diagram Options window.<br />
<br />
==Deterministic Views of More Complex Systems==<br />
<br />
Even though the examples presented are fairly simplistic, the same approach can be repeated for larger and more complex systems. The reader can easily observe/visualize the behavior of more complex systems in BlockSim using the Up/Down plots. These are the same plots used in this chapter. It should be noted that BlockSim makes these plots available only when a single simulation run has been performed for the analysis (i.e., Number of Simulations = 1). These plots are meaningless when doing multiple simulations because each run will yield a different plot.<br />
<br />
==Probabilistic View, Simple Series==<br />
<br />
In a probabilistic case, the failures and repairs do not happen at a fixed time and for a fixed duration, but rather occur randomly and based on an underlying distribution, as shown in the following figures.<br />
<br />
[[Image:8.5.png|center|600px| A single component with a probabilistic failure time and repair duration.|link=]]<br />
[[Image:BS8.6.png|center|500px|A system up/down plot illustrating a probabilistic failure time and repair duration for component B.|link=]]<br />
<br />
We use discrete event simulation in order to analyze (understand) the system behavior. Discrete event simulation looks at each system/component event very similarly to the way we looked at these events in the deterministic example. However, instead of using deterministic (fixed) times for each event occurrence or duration, random times are used. These random times are obtained from the underlying distribution for each event. As an example, consider an event following a 2-parameter Weibull distribution. The ''cdf'' of the 2-parameter Weibull distribution is given by: <br />
<br />
::<math>F(T)=1-{{e}^{-{{\left( \tfrac{T}{\eta } \right)}^{\beta }}}}\,\!</math><br />
<br />
The Weibull reliability function is given by: <br />
<br />
::<math>\begin{align}<br />
R(T)= & 1-F(t) \\ <br />
= & {{e}^{-{{\left( \tfrac{T}{\eta } \right)}^{\beta }}}} <br />
\end{align}\,\!</math><br />
<br />
Then, to generate a random time from a Weibull distribution with a given <math>\eta \,\!</math> and <math>\beta \,\!</math>, a uniform random number from 0 to 1, <math>{{U}_{R}}[0,1]\,\!</math>, is first obtained. The random time from a Weibull distribution is then obtained from:<br />
<br />
::<math>{{T}_{R}}=\eta \cdot {{\left\{ -\ln \left[ {{U}_{R}}[0,1] \right] \right\}}^{\tfrac{1}{\beta }}}\,\!</math><br />
<br />
To obtain a conditional time, the Weibull conditional reliability function is given by:<br />
<br />
::<math>R(T,t)=\frac{R(T+t)}{R(T)}=\frac{{{e}^{-{{\left( \tfrac{T+t}{\eta } \right)}^{\beta }}}}}{{{e}^{-{{\left( \tfrac{T}{\eta } \right)}^{\beta }}}}}\,\!</math><br />
<br />
Or: <br />
<br />
::<math>R(T,t)={{e}^{-\left[ {{\left( \tfrac{T+t}{\eta } \right)}^{\beta }}-{{\left( \tfrac{T}{\eta } \right)}^{\beta }} \right]}}\,\!</math><br />
<br />
The random time would be the solution for <math>t\,\!</math> for <math>R(T,t)={{U}_{R}}[0,1]\,\!</math>.<br />
<br />
To illustrate the sequence of events, assume a single block with a failure and a repair distribution. The first event, <math>{{E}_{{{F}_{1}}}}\,\!</math>, would be the failure of the component. Its first time-to-failure would be a random number drawn from its failure distribution, <math>{{T}_{{{F}_{1}}}}\,\!</math>. Thus, the first failure event, <math>{{E}_{{{F}_{1}}}}\,\!</math>, would be at <math>{{T}_{{{F}_{1}}}}\,\!</math>. Once failed, the next event would be the repair of the component, <math>{{E}_{{{R}_{1}}}}\,\!</math>. The time to repair the component would now be drawn from its repair distribution, <math>{{T}_{{{R}_{1}}}}\,\!</math>. The component would be restored by time <math>{{T}_{{{F}_{1}}}}+{{T}_{{{R}_{1}}}}\,\!</math>. The next event would now be the second failure of the component after the repair, <math>{{E}_{{{F}_{2}}}}\,\!</math>. This event would occur after a component operating time of <math>{{T}_{{{F}_{2}}}}\,\!</math> after the item is restored (again drawn from the failure distribution), or at <math>{{T}_{{{F}_{1}}}}+{{T}_{{{R}_{1}}}}+{{T}_{{{F}_{2}}}}\,\!</math>. This process is repeated until the end time. It is important to note that each run will yield a different sequence of events due to the probabilistic nature of the times. To arrive at the desired result, this process is repeated many times and the results from each run (simulation) are recorded. In other words, if we were to repeat this 1,000 times, we would obtain 1,000 different values for <math>{{E}_{{{F}_{1}}}}\,\!</math>, or <math>\left[ {{E}_{{{F}_{{{1}_{1}}}}}},{{E}_{{{F}_{{{1}_{2}}}}}},...,{{E}_{{{F}_{{{1}_{1,000}}}}}} \right]\,\!</math>.<br />
The average of these values, <math>\left( \tfrac{1}{1000}\underset{i=1}{\overset{1,000}{\mathop{\sum }}}\,{{E}_{{{F}_{{{1}_{i}}}}}} \right)\,\!</math>, would then be the average time to the first event, <math>{{E}_{{{F}_{1}}}}\,\!</math>, or the mean time to first failure (MTTFF) for the component. Obviously, if the component were to be 100% renewed after each repair, then this value would also be the same for the second failure, etc.<br />
<br />
=General Simulation Results=<br />
To further illustrate this, assume that components A and B in the prior example had normal failure and repair distributions with their means equal to the deterministic values used in the prior example and standard deviations of 10 and 1 respectively. That is, <math>{{F}_{A}}\tilde{\ }N(100,10),\,\!</math> <math>{{F}_{B}}\tilde{\ }N(120,10),\,\!</math> <math>{{R}_{A}}={{R}_{B}}\tilde{\ }N(10,1)\,\!</math>. The settings for components C and D are not changed. Obviously, given the probabilistic nature of the example, the times to each event will vary. If one were to repeat this <math>X\,\!</math> number of times, one would arrive at the results of interest for the system and its components. Some of the results for this system and this example, over 1,000 simulations, are provided in the figure below and explained in the next sections. <br />
[[Image:r2.png|center|600px|Summary of system results for 1,000 simulations.|link=]]<br />
<br />
The simulation settings are shown in the figure below.<br />
[[Image:8.7.gif|center|600px|BlockSim simulation window.|link=]]<br />
<br />
===General===<br />
====Mean Availability (All Events), <math>{{\overline{A}}_{ALL}}\,\!</math>====<br />
This is the mean availability due to all downing events, which can be thought of as the operational availability. It is the ratio of the system uptime divided by the total simulation time (total time). For this example: <br />
<br />
::<math>\begin{align}<br />
{{\overline{A}}_{ALL}}= & \frac{Uptime}{TotalTime} \\ <br />
= & \frac{269.137}{300} \\ <br />
= & 0.8971 <br />
\end{align}\,\!</math><br />
<br />
====Std Deviation (Mean Availability)====<br />
This is the standard deviation of the mean availability of all downing events for the system during the simulation.<br />
<br />
====Mean Availability (w/o PM, OC & Inspection), <math>{{\overline{A}}_{CM}}\,\!</math>====<br />
This is the mean availability due to failure events only and it is 0.971 for this example. Note that for this case, the mean availability without preventive maintenance, on condition maintenance and inspection is identical to the mean availability for all events. This is because no preventive maintenance actions or inspections were defined for this system. We will discuss the inclusion of these actions in later sections.<br />
<br />
Downtimes caused by PM and inspections are not included. However, if the PM or inspection action results in the discovery of a failure, then these times are included. As an example, consider a component that has failed but its failure is not discovered until the component is inspected. Then the downtime from the time failed to the time restored after the inspection is counted as failure downtime, since the original event that caused this was the component's failure. <br />
====Point Availability (All Events), <math>A\left( t \right)\,\!</math>====<br />
<br />
This is the probability that the system is up at time <math>t\,\!</math>. As an example, to obtain this value at <math>t\,\!</math> = 300, a special counter would need to be used during the simulation. This counter is increased by one every time the system is up at 300 hours. Thus, the point availability at 300 would be the times the system was up at 300 divided by the number of simulations. For this example, this is 0.930, or 930 times out of the 1000 simulations the system was up at 300 hours.<br />
<br />
====Reliability (Fail Events), <math>R(t)\,\!</math>====<br />
<br />
This is the probability that the system has not failed by time <math>t\,\!</math>. This is similar to point availability with the major exception that it only looks at the probability that the system did not have a single failure. Other (non-failure) downing events are ignored. During the simulation, a special counter again must be used. This counter is increased by one (once in each simulation) if the system has had at least one failure up to 300 hours. Thus, the reliability at 300 would be the number of times the system did not fail up to 300 divided by the number of simulations. For this example, this is 0 because the system failed prior to 300 hours 1000 times out of the 1000 simulations.<br />
<br />
It is very important to note that this value is not always the same as the reliability computed using the analytical methods, depending on the redundancy present. The reason that it may differ is best explained by the following scenario:<br />
<br />
Assume two units in parallel. The analytical system reliability, which does not account for repairs, is the probability that both units fail. In this case, when one unit goes down, it does not get repaired and the system fails after the second unit fails. In the case of repairs, however, it is possible for one of the two units to fail and get repaired before the second unit fails. Thus, when the second unit fails, the system will still be up due to the fact that the first unit was repaired.<br />
<br />
====Expected Number of Failures, <math>{{N}_{F}}\,\!</math>====<br />
This is the average number of system failures. The system failures (not downing events) for all simulations are counted and then averaged. For this case, this is 3.188, which implies that a total of 3,188 system failure events occurred over 1000 simulations. Thus, the expected number of system failures for one run is 3.188. This number includes all failures, even those that may have a duration of zero.<br />
<br />
====Std Deviation (Number of Failures)====<br />
This is the standard deviation of the number of failures for the system during the simulation.<br />
<br />
====MTTFF====<br />
MTTFF is the mean time to first failure for the system. This is computed by keeping track of the time at which the first system failure occurred for each simulation. MTTFF is then the average of these times. This may or may not be identical to the MTTF obtained in the analytical solution for the same reasons as those discussed in the Point Reliability section. For this case, this is 100.2511. This is fairly obvious for this case since the mean of one of the components in series was 100 hours.<br />
<br />
It is important to note that for each simulation run, if a first failure time is observed, then this is recorded as the system time to first failure. If no failure is observed in the system, then the simulation end time is used as a right censored (suspended) data point. MTTFF is then computed using the total operating time until the first failure divided by the number of observed failures (constant failure rate assumption). Furthermore, and if the simulation end time is much less than the time to first failure for the system, it is also possible that all data points are right censored (i.e., no system failures were observed). In this case, the MTTFF is again computed using a constant failure rate assumption, or:<br />
<br />
::<math>MTTFF=\frac{2\cdot ({{T}_{S}})\cdot N}{\chi _{0.50;2}^{2}}\,\!</math><br />
<br />
where <math>{{T}_{S}}\,\!</math> is the simulation end time and <math>N\,\!</math> is the number of simulations. One should be aware that this formulation may yield unrealistic (or erroneous) results if the system does not have a constant failure rate. If you are trying to obtain an accurate (realistic) estimate of this value, then your simulation end time should be set to a value that is well beyond the MTTF of the system (as computed analytically). As a general rule, the simulation end time should be at least three times larger than the MTTF of the system.<br />
<br />
====MTBF (Total Time)====<br />
This is the mean time between failures for the system based on the total simulation time and the expected number of system failures. For this example:<br />
<br />
::<math>\begin{align}<br />
MTBF (Total Time)= & \frac{TotalTime}{{N}_{F}} \\ <br />
= & \frac{300}{3.188} \\ <br />
= & 94.102886 <br />
\end{align}\,\!</math><br />
<br />
====MTBF (Uptime)====<br />
This is the mean time between failures for the system, considering only the time that the system was up. This is calculated by dividing system uptime by the expected number of system failures. You can also think of this as the mean uptime. For this example:<br />
<br />
::<math>\begin{align}<br />
MTBF (Uptime)= & \frac{Uptime}{{N}_{F}} \\ <br />
= & \frac{269.136952}{3.188} \\ <br />
= & 84.42188 <br />
\end{align}\,\!</math><br />
<br />
====MTBE (Total Time)====<br />
This is the mean time between all downing events for the system, based on the total simulation time and including all system downing events. This is calculated by dividing the simulation run time by the number of downing events (<math>{{N}_{AL{{L}_{Down}}}}\,\!</math>).<br />
<br />
====MTBE (Uptime)====<br />
his is the mean time between all downing events for the system, considering only the time that the system was up. This is calculated by dividing system uptime by the number of downing events (<math>{{N}_{AL{{L}_{Down}}}}\,\!</math>).<br />
<br />
===System Uptime/Downtime===<br />
<br />
====Uptime, <math>{{T}_{UP}}\,\!</math> ====<br />
<br />
This is the average time the system was up and operating. This is obtained by taking the sum of the uptimes for each simulation and dividing it by the number of simulations. For this example, the uptime is 269.137. To compute the Operational Availability, <math>{{A}_{o}},\,\!</math> for this system, then:<br />
<br />
::<math>{{A}_{o}}=\frac{{{T}_{UP}}}{{{T}_{S}}}\,\!</math><br />
<br />
====CM Downtime, <math>{{T}_{C{{M}_{Down}}}}\,\!</math> ====<br />
This is the average time the system was down for corrective maintenance actions (CM) only. This is obtained by taking the sum of the CM downtimes for each simulation and dividing it by the number of simulations. For this example, this is 30.863.<br />
To compute the Inherent Availability, <math>{{A}_{I}},\,\!</math> for this system over the observed time (which may or may not be steady state, depending on the length of the simulation), then:<br />
<br />
::<math>{{A}_{I}}=\frac{{{T}_{S}}-{{T}_{C{{M}_{Down}}}}}{{{T}_{S}}}\,\!</math><br />
<br />
====Inspection Downtime ====<br />
<br />
This is the average time the system was down due to inspections. This is obtained by taking the sum of the inspection downtimes for each simulation and dividing it by the number of simulations. For this example, this is zero because no inspections were defined.<br />
<br />
====PM Downtime, <math>{{T}_{P{{M}_{Down}}}}\,\!</math>====<br />
<br />
This is the average time the system was down due to preventive maintenance (PM) actions. This is obtained by taking the sum of the PM downtimes for each simulation and dividing it by the number of simulations. For this example, this is zero because no PM actions were defined.<br />
<br />
====OC Downtime, <math>{{T}_{O{{C}_{Down}}}}\,\!</math>====<br />
<br />
This is the average time the system was down due to on-condition maintenance (PM) actions. This is obtained by taking the sum of the OC downtimes for each simulation and dividing it by the number of simulations. For this example, this is zero because no OC actions were defined.<br />
<br />
====Waiting Downtime, <math>{{T}_{W{{ait}_{Down}}}}\,\!</math>====<br />
<br />
This is the amount of time that the system was down due to crew and spare part wait times or crew conflict times. For this example, this is zero because no crews or spare part pools were defined.<br />
<br />
====Total Downtime, <math>{{T}_{Down}}\,\!</math>====<br />
<br />
This is the downtime due to all events. In general, one may look at this as the sum of the above downtimes. However, this is not always the case. It is possible to have actions that overlap each other, depending on the options and settings for the simulation. Furthermore, there are other events that can cause the system to go down that do not get counted in any of the above categories. As an example, in the case of standby redundancy with a switch delay, if the settings are to reactivate the failed component after repair, the system may be down during the switch-back action. This downtime does not fall into any of the above categories but it is counted in the total downtime.<br />
<br />
For this example, this is identical to <math>{{T}_{C{{M}_{Down}}}}\,\!</math>.<br />
<br />
===System Downing Events===<br />
System downing events are events associated with downtime. Note that events with zero duration will appear in this section only if the task properties specify that the task brings the system down or if the task properties specify that the task brings the item down and the item’s failure brings the system down.<br />
<br />
====Number of Failures, <math>{{N}_{{{F}_{Down}}}}\,\!</math>====<br />
This is the average number of system downing failures. Unlike the Expected Number of Failures, <math>{{N}_{F}},\,\!</math> this number does not include failures with zero duration. For this example, this is 3.188. <br />
<br />
====Number of CMs, <math>{{N}_{C{{M}_{Down}}}}\,\!</math>====<br />
This is the number of corrective maintenance actions that caused the system to fail. It is obtained by taking the sum of all CM actions that caused the system to fail divided by the number of simulations. It does not include CM events of zero duration. For this example, this is 3.188. Note that this may differ from the Number of Failures, <math>{{N}_{{{F}_{Down}}}}\,\!</math>. An example would be a case where the system has failed, but due to other settings for the simulation, a CM is not initiated (e.g., an inspection is needed to initiate a CM).<br />
<br />
====Number of Inspections, <math>{{N}_{{{I}_{Down}}}}\,\!</math>====<br />
This is the number of inspection actions that caused the system to fail. It is obtained by taking the sum of all inspection actions that caused the system to fail divided by the number of simulations. It does not include inspection events of zero duration. For this example, this is zero.<br />
<br />
====Number of PMs, <math>{{N}_{P{{M}_{Down}}}}\,\!</math>====<br />
This is the number of PM actions that caused the system to fail. It is obtained by taking the sum of all PM actions that caused the system to fail divided by the number of simulations. It does not include PM events of zero duration. For this example, this is zero.<br />
<br />
====Number of OCs, <math>{{N}_{O{{C}_{Down}}}}\,\!</math>====<br />
This is the number of OC actions that caused the system to fail. It is obtained by taking the sum of all OC actions that caused the system to fail divided by the number of simulations. It does not include OC events of zero duration. For this example, this is zero.<br />
<br />
====Number of OFF Events by Trigger, <math>{{N}_{O{{FF}_{Down}}}}\,\!</math>====<br />
This is the total number of events where the system is turned off by state change triggers. An OFF event is not a system failure but it may be included in system reliability calculations. For this example, this is zero.<br />
<br />
====Total Events, <math>{{N}_{AL{{L}_{Down}}}}\,\!</math>====<br />
This is the total number of system downing events. It also does not include events of zero duration. It is possible that this number may differ from the sum of the other listed events. As an example, consider the case where a failure does not get repaired until an inspection, but the inspection occurs after the simulation end time. In this case, the number of inspections, CMs and PMs will be zero while the number of total events will be one.<br />
<br />
===Costs and Throughput===<br />
Cost and throughput results are discussed in later sections.<br />
<br />
===Note About Overlapping Downing Events===<br />
<br />
It is important to note that two identical system downing events (that are continuous or overlapping) may be counted and viewed differently. As shown in Case 1 of the following figure, two overlapping failure events are counted as only one event from the system perspective because the system was never restored and remained in the same down state, even though that state was caused by two different components. Thus, the number of downing events in this case is one and the duration is as shown in CM system. In the case that the events are different, as shown in Case 2 of the figure below, two events are counted, the CM and the PM. However, the downtime attributed to each event is different from the actual time of each event. In this case, the system was first down due to a CM and remained in a down state due to the CM until that action was over. However, immediately upon completion of that action, the system remained down but now due to a PM action. In this case, only the PM action portion that kept the system down is counted.<br />
<br />
[[Image:8.9.png|center|350px|Duration and count of different overlapping events.|link=]]<br />
<br />
===System Point Results===<br />
<br />
The system point results, as shown in the figure below, shows the Point Availability (All Events), <math>A\left( t \right)\,\!</math>, and Point Reliability, <math>R(t)\,\!</math>, as defined in the previous section. These are computed and returned at different points in time, based on the number of intervals selected by the user. Additionally, this window shows <math>(1-A(t))\,\!</math>, <math>(1-R(t))\,\!</math>, <math>\text{Labor Cost(t)}\,\!</math>,<math>\text{Part Cost(t)}\,\!</math>, <math>Cost(t)\,\!</math>, <math>Mean\,\!</math> <math>A(t)\,\!</math>, <math>Mean\,\!</math> <math>A({{t}_{i}}-{{t}_{i-1}})\,\!</math>, <math>System\,\!</math>, <math>Failures(t)\,\!</math>, <math>\text{System Off Events by Trigger(t)}\,\!</math> and <math>Throughput(t)\,\!</math>.<br />
<br />
[[Image:BS8.10.png|center|750px|link=]]<br />
The number of intervals shown is based on the increments set. In this figure, the number of increments set was 300, which implies that the results should be shown every hour. The results shown in this figure are for 10 increments, or shown every 30 hours.<br />
<br />
=Results by Component=<br />
Simulation results for each component can also be viewed. The figure below shows the results for component A. These results are explained in the sections that follow.<br />
<br />
[[Image:8.11.gif|center|600px|The Block Details results for component A.|link=]]<br />
<br />
===General Information===<br />
====Number of Block Downing Events, <math>Componen{{t}_{NDE}}\,\!</math>====<br />
This the number of times the component went down (failed). It includes all downing events.<br />
<br />
====Number of System Downing Events, <math>Componen{{t}_{NSDE}}\,\!</math>====<br />
<br />
This is the number of times that this component's downing caused the system to be down. For component <math>A\,\!</math>, this is 2.038. Note that this value is the same in this case as the number of component failures, since the component A is reliability-wise in series with components D and components B, C. If this were not the case (e.g., if they were in a parallel configuration, like B and C), this value would be different.<br />
<br />
====Number of Failures, <math>Componen{{t}_{NF}}\,\!</math>====<br />
<br />
This is the number of times the component failed and does not include other downing events. Note that this could also be interpreted as the number of spare parts required for CM actions for this component. For component <math>A\,\!</math>, this is 2.038.<br />
<br />
====Number of System Downing Failures, <math>Componen{{t}_{NSDF}}\,\!</math>====<br />
This is the number of times that this component's failure caused the system to be down. Note that this may be different from the Number of System Downing Events. It only counts the failure events that downed the system and does not include zero duration system failures.<br />
<br />
====Number of OFF events by Trigger, <math>Componen{{t}_{OFF}}\,\!</math>====<br />
The total number of events where the block is turned off by state change triggers. An OFF event is not a failure but it may be included in system reliability calculations.<br />
<br />
====Mean Availability (All Events), <math>{{\overline{A}}_{AL{{L}_{Component}}}}\,\!</math>====<br />
<br />
This has the same definition as for the system with the exception that this accounts only for the component.<br />
<br />
====Mean Availability (w/o PM, OC & Inspection), <math>{{\overline{A}}_{C{{M}_{Component}}}}\,\!</math>====<br />
<br />
The mean availability of all downing events for the block, not including preventive, on condition or inspection tasks, during the simulation.<br />
<br />
====Block Uptime, <math>{{T}_{Componen{{t}_{UP}}}}\,\!</math>====<br />
<br />
This is tThe total amount of time that the block was up (i.e., operational) during the simulation. For component <math>A\,\!</math>, this is 279.8212.<br />
<br />
====Block Downtime, <math>{{T}_{Componen{{t}_{Down}}}}\,\!</math>====<br />
<br />
This is the average time the component was down for any reason. For component <math>A\,\!</math>, this is 20.1788.<br />
<br />
Block Downtime shows the total amount of time that the block was down (i.e., not operational) during the simulation.<br />
<br />
===Metrics===<br />
====RS DECI====<br />
<br />
The ReliaSoft Downing Event Criticality Index for the block. This is a relative index showing the percentage of times that a downing event of the block caused the system to go down (i.e., the number of system downing events caused by the block divided by the total number of system downing events). For component <math>A\,\!</math>, this is 63.93%. This implies that 63.93% of the times that the system went down, the system failure was due to the fact that component <math>A\,\!</math> went down. This is obtained from:<br />
<br />
::<math>\begin{align}<br />
RSDECI=\frac{Componen{{t}_{NSDE}}}{{{N}_{AL{{L}_{Down}}}}} <br />
\end{align}\,\!</math><br />
<br />
====Mean Time Between Downing Events====<br />
This is the mean time between downing events of the component, which is computed from:<br />
<br />
::<math>MTBDE=\frac{{{T}_{Componen{{t}_{UP}}}}}{Componen{{t}_{NDE}}}\,\!</math><br />
<br />
For component <math>A\,\!</math>, this is 137.3019.<br />
<br />
====RS FCI====<br />
ReliaSoft's Failure Criticality Index (RS FCI) is a relative index showing the percentage of times that a failure of this component caused a system failure. For component <math>A\,\!</math>, this is 63.93%. This implies that 63.93% of the times that the system failed, it was due to the fact that component <math>A\,\!</math> failed. This is obtained from:<br />
<br />
::<math>\begin{align}<br />
RSFCI=\frac{Componen{{t}_{NSDF}}+{{F}_{ZD}}}{{{N}_{F}}} <br />
\end{align}\,\!</math><br />
<br />
<math>{{F}_{ZD}}\,\!</math> is a special counter of system failures not included in <math>Componen{{t}_{NSDF}}\,\!</math>. This counter is not explicitly shown in the results but is maintained by the software. The reason for this counter is the fact that zero duration failures are not counted in <math>Componen{{t}_{NSDF}}\,\!</math> since they really did not down the system. However, these zero duration failures need to be included when computing RS FCI.<br />
<br />
It is important to note that for both RS DECI and RS FCI, and if overlapping events are present, the component that caused the system event gets credited with the system event. Subsequent component events that do not bring the system down (since the system is already down) do not get counted in this metric.<br />
<br />
====MTBF, <math>MTB{{F}_{C}}\,\!</math>====<br />
<br />
Mean time between failures is the mean (average) time between failures of this component, in real clock time. This is computed from:<br />
<br />
::<math>MTB{{F}_{C}}=\frac{{{T}_{S}}-CFDowntime}{Componen{{t}_{NF}}}\,\!</math><br />
<br />
<math>CFDowntime\,\!</math> is the downtime of the component due to failures only (without PM, OC and inspection). The discussion regarding what is a failure downtime that was presented in the section explaining Mean Availability (w/o PM & Inspection) also applies here.<br />
For component <math>A\,\!</math>, this is 137.3019. Note that this value could fluctuate for the same component depending on the simulation end time. As an example, consider the deterministic scenario for this component. It fails every 100 hours and takes 10 hours to repair. Thus, it would be failed at 100, repaired by 110, failed at 210 and repaired by 220. Therefore, its uptime is 280 with two failure events, MTBF = 280/2 = 140. Repeating the same scenario with an end time of 330 would yield failures at 100, 210 and 320. Thus, the uptime would be 300 with three failures, or MTBF = 300/3 = 100. Note that this is not the same as the MTTF (mean time to failure), commonly referred to as MTBF by many practitioners. <br />
<br />
====Mean Downtime per Event, <math>MDPE\,\!</math>====<br />
Mean downtime per event is the average downtime for a component event. This is computed from:<br />
<br />
::<math>MDPE=\frac{{{T}_{Componen{{t}_{Down}}}}}{Componen{{t}_{NDE}}}\,\!</math><br />
<br />
====RS DTCI====<br />
The ReliaSoft Downtime Criticality Index for the block. This is a relative index showing the contribution of the block to the system’s downtime (i.e., the system downtime caused by the block divided by the total system downtime).<br />
<br />
====RS BCCI====<br />
The ReliaSoft Block Cost Criticality Index for the block. This is a relative index showing the contribution of the block to the total costs (i.e., the total block costs divided by the total costs).<br />
<br />
====Non-Waiting Time CI====<br />
A relative index showing the contribution of repair times to the block’s total downtime. (The ratio of the time that the crew is actively working on the item to the total down time). <br />
<br />
====Total Waiting Time CI====<br />
A relative index showing the contribution of wait factor times to the block’s total downtime. Wait factors include crew conflict times, crew wait times and spare part wait times. (The ratio of downtime not including active repair time). <br />
<br />
====Waiting for Opportunity/Maximum Wait Time Ratio====<br />
A relative index showing the contribution of crew conflict times. This is the ratio of the time spent waiting for the crew to respond (not including crew logistic delays) to the total wait time (not including the active repair time). <br />
<br />
====Crew/Part Wait Ratio====<br />
The ratio of the crew and part delays. A value of 100% means that both waits are equal. A value greater than 100% indicates that the crew delay was in excess of the part delay. For example, a value of 200% would indicate that the wait for the crew is two times greater than the wait for the part.<br />
<br />
====Part/Crew Wait Ratio====<br />
The ratio of the part and crew delays. A value of 100% means that both waits are equal. A value greater than 100% indicates that the part delay was in excess of the crew delay. For example, a value of 200% would indicate that the wait for the part is two times greater than the wait for the crew.<br />
<br />
===Downtime Summary===<br />
====Non-Waiting Time====<br />
Time that the block was undergoing active maintenance/inspection by a crew. If no crew is defined, then this will return zero.<br />
<br />
====Waiting for Opportunity====<br />
The total downtime for the block due to crew conflicts (i.e., time spent waiting for a crew while the crew is busy with another task). If no crew is defined, then this will return zero. <br />
<br />
====Waiting for Crew====<br />
The total downtime for the block due to crew wait times (i.e., time spent waiting for a crew due to logistical delay). If no crew is defined, then this will return zero. <br />
<br />
====Waiting for Parts====<br />
The total downtime for the block due to spare part wait times. If no spare part pool is defined then this will return zero. <br />
<br />
====Other Results of Interest====<br />
The remaining component (block) results are similar to those defined for the system with the exception that now they apply only to the component.<br />
<br />
=Imperfect Repairs= <!-- THIS SECTION HEADER IS LINKED TO: http://help.synthesis8.com/rcm8/tasks.htm. IF YOU RENAME THE SECTION, YOU MUST UPDATE THE LINK. --><br />
{{:Imperfect Repairs}}<br />
<br />
=Using Resources: Pools and Crews=<br />
In order to make the analysis more realistic, one may wish to consider additional sources of delay times in the analysis or study the effect of limited resources. In the prior examples, we used a repair distribution to identify how long it takes to restore a component. The factors that one chooses to consider in this time may include the time it takes to do the repair and/or the time it takes to get a crew, a spare part, etc. While all of these factors may be included in the repair duration, optimized usage of these resources can only be achieved if the resources are studied individually and their dependencies are identified.<br />
<br />
As an example, consider the situation where two components in parallel fail at the same time and only a single repair person is available. Because this person would not be able to execute the repair on both components simultaneously, an additional delay will be encountered that also needs to be included in the modeling. One way to accomplish this is to assign a specific repair crew to each component.<br />
<br />
===Including Crews===<br />
<br />
BlockSim allows you to assign maintenance crews to each component and one or more crews may be assigned to each component from the Maintenance Task Properties window. Note that there may be different crews for each action, (i.e., corrective, preventive, on condition and inspection).<br />
<br />
A crew record needs to be defined for each named crew, as shown in the picture below. The basic properties for each crew include factors such as:<br />
<br><br />
* Logistic delays. How long does it take for the crew to arrive?<br />
* Is there a limit to the number of tasks this crew can perform at the same time? If yes, how many simultaneous tasks can the crew perform?<br />
* What is the cost per hour for the crew?<br />
* What is the cost per incident for the crew?<br />
<br />
[[Image:8.16.png|center|518px|link=]]<br />
<br />
===Illustrating Crew Use===<br />
To illustrate the use of crews in BlockSim, consider the deterministic scenario described by the following RBD and properties.<br />
<br />
[[Image:r12.png|center|350px|link=]]<br />
<br />
<br />
{| border="1" align="center" style="border-collapse: collapse;" cellpadding="5" cellspacing="5"<br />
|-<br />
! Unit<br />
! Failure<br />
! Repair<br />
! Crew<br />
|-<br />
| <math>A\,\!</math><br />
| <math>100\,\!</math><br />
| <math>10\,\!</math><br />
| Crew <math>A\,\!</math> : Delay = 20, Single Task<br />
|-<br />
| <math>B\,\!</math><br />
| <math>120\,\!</math><br />
| <math>20\,\!</math><br />
| Crew <math>A\,\!</math> : Delay = 20, Single Task<br />
|-<br />
| <math>C\,\!</math><br />
| <math>140\,\!</math><br />
| <math>20\,\!</math><br />
| Crew <math>A\,\!</math> : Delay = 20, Single Task<br />
|-<br />
| <math>D\,\!</math><br />
| <math>160\,\!</math><br />
| <math>10\,\!</math><br />
| Crew <math>A\,\!</math> : Delay = 20, Single Task<br />
|}<br />
<br />
<br />
[[Image:BS8.17.png|center|600px|link=]]<br />
<br />
As shown in the figure above, the System Up/Down plot illustrates the sequence of events, which are:<br />
<br />
::#At 100, <math>A\,\!</math> fails. It takes 20 to get the crew and 10 to repair, thus the component is repaired by 130. The system is failed/down during this time. <br />
::#At 150, <math>B\,\!</math> fails since it would have accumulated an operating age of 120 by this time. It again has to wait for the crew and is repaired by 190. <br />
::#At 170, <math>C\,\!</math> fails. Upon this failure, <math>C\,\!</math> requests the only available crew. However, this crew is currently engaged by <math>B\,\!</math> and, since the crew can only perform one task at a time, it cannot respond immediately to the request by <math>C\,\!</math>. Thus, <math>C\,\!</math> will remain failed until the crew becomes available. The crew will finish with unit <math>B\,\!</math> at 190 and will then be dispatched to <math>C\,\!</math>. Upon dispatch, the logistic delay will again be considered and <math>C\,\!</math> will be repaired by 230. The system continues to operate until the failures of <math>B\,\!</math> and <math>C\,\!</math> overlap (i.e., the system is down from 170 to 190)<br />
::#At 210, <math>D\,\!</math> fails. It again has to wait for the crew and repair.<br />
::#<math>D\,\!</math> is up at 260.<br />
The following figure shows an example of some of the possible crew results (details), which are presented next. <br />
<br />
[[Image:BS8.18.png|thumb|center|500px|Crew results shown in the BlockSim's Simulation Results Explorer.|link=]]<br />
<br />
====Explanation of the Crew Details====<br />
::#Each request made to a crew is logged. <br />
::#If a request is successful (i.e., the crew is available), the call is logged once in the Calls Received counter and once in the Accepted Calls counter. <br />
::#If a request is not accepted (i.e., the crew is busy), the call is logged once in the Calls Received counter and once in the Rejected Calls counter. When the crew is free and can be called upon again, the call is logged once in the Calls Received counter and once in the Accepted Calls counter.<br />
::#In this scenario, there were two instances when the crew was not available, Rejected Calls = 2, and there were four instances when the crew performed an action, Calls Accepted = 4, for a total of six calls, Calls Received = 6.<br />
::#Percent Accepted and Percent Rejected are the ratios of calls accepted and calls rejected with respect to the total calls received.<br />
::#Total Utilization is the total time that the crew was used. It includes both the time required to complete the repair action and the logistic time. In this case, this is 140, or: <br />
<br />
::<math>\begin{align}<br />
{{T}_{{{R}_{A}}}}= & 10,{{T}_{{{L}_{A}}}}=20 \\ <br />
{{T}_{{{R}_{B}}}}= & 20,{{T}_{{{L}_{B}}}}=20 \\ <br />
{{T}_{{{R}_{C}}}}= & 20,{{T}_{{{L}_{C}}}}=20 \\ <br />
{{T}_{{{R}_{D}}}}= & 10,{{T}_{{{L}_{D}}}}=20 \\ <br />
{{T}_{U}}= & \left( {{T}_{{{R}_{A}}}}+{{T}_{{{L}_{A}}}} \right)+\left( {{T}_{{{R}_{B}}}}+{{T}_{{{L}_{B}}}} \right) \\ <br />
& +\left( {{T}_{{{R}_{C}}}}+{{T}_{{{L}_{C}}}} \right)+\left( {{T}_{{{R}_{D}}}}+{{T}_{{{L}_{D}}}} \right) \\ <br />
{{T}_{U}}= & 140 <br />
\end{align}\,\!</math><br />
<br />
:::6. Average Call Duration is the average duration of each crew usage, and it also includes both logistic and repair time. It is the total usage divided by the number of accepted calls. In this case, this is 35.<br />
:::7. Total Wait Time is the time that blocks in need of a repair waited for this crew. In this case, it is 40 ( <math>C\,\!</math> and <math>D\,\!</math> both waited 20 each). <br />
:::8. Total Crew Costs are the total costs for this crew. It includes the per incident charge as well as the per unit time costs. In this case, this is 180. There were four incidents at 10 each for a total of 40, as well as 140 time units of usage at 1 cost unit per time unit.<br />
:::9. Average Cost per Call is the total cost divided by the number of accepted calls. In this case, this is 45.<br />
<br />
Note that crew costs that are attributed to individual blocks can be obtained from the Blocks reports, as shown in the figure below. <br />
<br />
[[Image:BS8.19.png|thumb|center|650px|Allocation of crew costs.|link=]]<br />
<br />
====How BlockSim Handles Crews====<br />
::#Crew logistic time is added to each repair time. <br />
::#The logistic time is always present, and the same, regardless of where the crew was called from (i.e., whether the crew was at another job or idle at the time of the request).<br />
::#For any given simulation, each crew's logistic time is constant (taken from the distribution) across that single simulation run regardless of the task (CM, PM or inspection).<br />
::#A crew can perform either a finite number of simultaneous tasks or an infinite number. <br />
::#If the finite limit of tasks is reached, the crew will not respond to any additional request until the number of tasks the crew is performing is less than its finite limit.<br />
::#If a crew is not available to respond, the component will "wait" until a crew becomes available.<br />
::#BlockSim maintains the queue of rejected calls and will dispatch the crew to the next repair on a "first come, first served" basis.<br />
::#Multiple crews can be assigned to a single block (see overview in the next section).<br />
::#If no crew has been assigned for a block, it is assumed that no crew restrictions exist and a default crew is used. The default crew can perform an infinite number of simultaneous tasks and has no delays or costs.<br />
<br />
====Looking at Multiple Crews====<br />
Multiple crews may be available to perform maintenance for a particular component. When multiple crews have been assigned to a block in BlockSim, the crews are assigned to perform maintenance based on their order in the crew list, as shown in the figure below.<br />
<br />
[[Image:r23.png||thumb|center|500px|A single component with two corrective maintenance crews assigned to it.|link=]]<br />
<br />
In the case where more than one crew is assigned to a block, and if the first crew is unavailable, then the next crew is called upon and so forth. As an example, consider the prior case but with the following modifications (i.e., Crews <math>A\,\!</math> and <math>B\,\!</math> are assigned to all blocks):<br />
<br />
[[Image:r8.png|center|400px|link=]]<br />
<br />
<br />
{| border="1" align="center" style="border-collapse: collapse;" cellpadding="5" cellspacing="5"<br />
|-<br />
! Unit<br />
! Failure<br />
! Repair<br />
! Crew<br />
|-<br />
| <math>A\,\!</math><br />
| <math>100\,\!</math> <br />
| <math>10\,\!</math><br />
| <math>A,B\,\!</math><br />
|-<br />
| <math>B\,\!</math><br />
| <math>120\,\!</math> <br />
| <math>20\,\!</math><br />
| <math>A,B\,\!</math><br />
|-<br />
| <math>C\,\!</math><br />
| <math>140\,\!</math><br />
| <math>20\,\!</math><br />
| <math>A,B\,\!</math><br />
|-<br />
| <math>D\,\!</math><br />
| <math>160\,\!</math><br />
| <math>10\,\!</math><br />
| <math>A,B\,\!</math><br />
|}<br />
<br />
<br />
{| border="1" align="center" style="border-collapse: collapse;" cellpadding="5" cellspacing="5"<br />
|-<br />
| Crew <math>A\,\!</math> ; Delay = 20, Single Task<br />
|-<br />
| Crew <math>B\,\!</math> ; Delay = 30, Single Task<br />
|}<br />
<br />
<br />
The system would behave as shown in the figure below.<br />
<br />
[[Image:r13.png|center|550px|link=]]<br />
<br />
In this case, Crew <math>B\,\!</math> was used for the <math>C\,\!</math> repair since Crew <math>A\,\!</math> was busy. On all others, Crew <math>A\,\!</math> was used. It is very important to note that once a crew has been assigned to a task it will complete the task. For example, if we were to change the delay time for Crew <math>B\,\!</math> to 100, the system behavior would be as shown in the figure below.<br />
<br />
[[Image:r14.png|center|550px|System up/down plot with the delay time for Crew B changed to 100.|link=]]<br />
<br />
In other words, even though Crew <math>A\,\!</math> would have finished the repair on <math>C\,\!</math> more quickly if it had been available when originally called, <math>B\,\!</math> was assigned the task because <math>A\,\!</math> was not available at the instant that the crew was needed.<br />
<br />
===Additional Rules on Crews===<br />
<br />
::1. If all assigned crews are engaged, the next crew that will be chosen is the crew that can get there first. <br />
:::a) This accounts for the time it would take a particular crew to complete its current task (or all tasks in its queue) and its logistic time.<br />
::2. If a crew is available, it gets used regardless of what its logistic delay time is. <br />
:::a) In other words, if a crew with a shorter logistic time is busy, but almost done, and another crew with a much higher logistic time is currently free, the free one will get assigned to the task.<br />
::3. For each simulation each crew's logistic time is computed (taken randomly from its distribution or its fixed time) at the beginning of the simulation and remains constant across that one simulation for all actions (CM, PM and inspection).<br />
<br />
===Using Spare Part Pools===<br />
<br />
BlockSim also allows you to specify spare part pools (or depots). Spare part pools allow you to model and manage spare part inventory and study the effects associated with limited inventories. Each component can have a spare part pool associated with it. If a spare part pool has not been defined for a block, BlockSim's analysis assumes a default pool of infinite spare parts. To speed up the simulation, no details on pool actions are kept during the simulation if the default pool is used.<br />
<br />
Pools allow you to define multiple aspects of the spare part process, including stock levels, logistic delays and restock options. Every time a part is repaired under a CM or scheduled action (PM, OC and Inspection), a spare part is obtained from the pool. If a part is available in the pool, it is then used for the repair. Spare part pools perform their actions based on the simulation clock time. <br />
<br />
====Spare Properties====<br />
<br />
A spare part pool is identified by a name. The general properties of the pool are its stock level (must be greater than zero), cost properties and logistic delay time. If a part is available (in stock), the pool will dispense that part to the requesting block after the specified logistic time has elapsed. One needs to think of a pool as an independent entity. It accepts requests for parts from blocks and dispenses them to the requesting blocks after a given logistic time. Requests for spares are handled on a first come, first served basis. In other words, if two blocks request a part and only one part is in stock, the first block that made the request will receive the part. Blocks request parts from the pool immediately upon the initiation of a CM or scheduled event (PM, OC and Inspection).<br />
<br />
====Restocking the Pool====<br />
<br />
If the pool has a finite number of spares, restock actions may be incorporated. The figure below shows the restock properties. Specifically, a pool can restock itself either through a scheduled restock action or based on specified conditions.<br />
<br />
[[Image:BS8.24.png|center|500px|link=]]<br />
<br />
A scheduled restock action adds a set number of parts to the pool on a predefined scheduled part arrival time. For the settings in the figure above, one spare part would be added to the pool every 100 hours, based on the system (simulation) time. In other words, for a simulation of 1,000 hours, a spare part would arrive at 100 hours, 200 hours, etc. The part is available to the pool immediately after the restock action and without any logistic delays. <br />
<br />
In an on-condition restock, a restock action is initiated when the stock level reaches (or is below) a specified value. In figure above, five parts are ordered when the stock level reaches 0. Note that unlike the scheduled restock, parts added through on-condition restock become available after a specified logistic delay time. In other words, when doing a scheduled restock, the parts are pre-ordered and arrive when needed. Whereas in the on-condition restock, the parts are ordered when the condition occurs and thus arrive after a specified time. For on-condition restocks, the condition is triggered if and only if the stock level drops to or below the specified stock level, regardless of how the spares arrived to the pool or were distributed by the pool. In addition, the restock trigger value must be less than the initial stock.<br />
<br />
Lastly, a maximum capacity can be assigned to the pool. If the maximum capacity is reached, no more restock actions are performed. This maximum capacity must be equal to or greater than the initial stock. When this limit is reached, no more items are added to the pool. For example, if the pool has a maximum capacity of ten and a current stock level of eight and if a restock action is set to add five items to the pool, then only two will be accepted.<br />
<br />
====Obtaining Emergency Spares====<br />
<br />
Emergency restock actions can also be defined. The figure below illustrates BlockSim's Emergency Spare Provisions options. An emergency action is triggered only when a block requests a spare and the part is not currently in stock. This is the only trigger condition. It does not account for whether a part has been ordered or if one is scheduled to arrive. Emergency spares are ordered when the condition is triggered and arrive after a time equal to the required time to obtain emergency spare(s).<br />
<br />
[[Image:BS8.25.png|center|500px|link=]]<br />
<br />
===Summary of Rules for Spare Part Pools===<br />
<br />
The following rules summarize some of the logic when dealing with spare part pools. <br />
<br />
====Basic Logic Rules====<br />
<br />
::1. '''Queue Based''': Requests for spare parts from blocks are queued and executed on a "first come, first served" basis.<br />
::2. '''Emergency''': Emergency restock actions are performed only when a part is not available.<br />
::3. '''Scheduled Restocks''': Scheduled restocks are added instantaneously to the pool at the scheduled time.<br />
::4. '''On-Condition Restock''': On-condition restock happens when the specified condition is reached (e.g., when the stock drops to two or if a request is received for a part and the stock is below the restock level).<br />
:::a) For example, if a pool has three items in stock and it dispenses one, an on-condition restock is initiated the instant that the request is received (without regard to the logistic delay time). The restocked items will be available after the required time for stock arrival has elapsed.<br />
:::b) The way that this is defined allows for the possibility of multiple restocks. Specifically, every time a part needs to be dispensed and the stock is lower than the specified quantity, parts are ordered. In the case of a long logistic delay time, it is possible to have multiple re-orders in the queue.<br />
::5. '''Parts Become Available after Spare Acquisition Logistic Delay''': If there is a spare acquisition logistic time delay, the requesting block will get the part after that delay. <br />
:::a) For example, if a block with a repair duration of 10 fails at 100 and requests a part from a pool with a logistic delay time of 10, that block will not be up until 120.<br />
::6. '''Compound Delays''': If a part is not available and an emergency part (or another part) can be obtained, then the total wait time for the part is the sum of both the logistic time and the required time to obtain a spare.<br />
::7. '''First Available Part is Dispensed to the First Block in the Queue''': The pool will dispense a requested part if it has one in stock or when it becomes available, regardless of what action (i.e., as needed restock or emergency restock) that request may have initiated. <br />
:::a) For example, if Block A requests a part from a pool and that triggers an emergency restock action, but a part arrives before the emergency restock through another action (e.g., scheduled restock), then the pool will dispense the newly arrived part to Block A (if Block A is next in the queue to receive a part).<br />
::8. '''Blocks that Trigger an Action Get Charged with the Action''': A block that triggers an emergency restock is charged for the additional cost to obtain the emergency part, even if it does not use an emergency part (i.e., even if another part becomes available first).<br />
::9. '''Triggered Action Cannot be Canceled.''' If a block triggers a restock action but then receives a part from another source, the action that the block triggered is not canceled.<br />
:::a) For example, if Block A initiates an emergency restock action but was then able to use a part that became available through other actions, the emergency request is not canceled and an emergency spare part will be added to the pool's stock level. <br />
:::b) Another way to explain this is by looking at the part acquisition logistic times as transit times. Because an ordered part is en-route to you after you order it, you will receive it regardless of whether the conditions have changed and you no longer need it.<br />
<br />
===Simultaneous Dispatch of Crews and Parts Logic===<br />
<br />
Some special rules apply when a block has both logistic delays in acquiring parts from a pool and when waiting for crews. BlockSim dispatches requests for crews and spare parts simultaneously. The repair action does not start until both crew and part arrive, as shown next.<br />
<br />
[[Image:r18.png|center|400px|link=]]<br />
<br />
If a crew arrives and it has to wait for a part, then this time (and cost) is added to the crew usage time.<br />
<br />
===Example Using Both Crews and Pools===<br />
<br />
Consider the following example, using both crews and pools.<br />
<br />
[[Image:r19.png|center|300px|link=]]<br />
<br />
where:<br />
<br />
[[Image:r20.png|center|400px|link=]]<br />
<br />
And the crews are:<br />
<br />
[[Image:r21.png|center|400px|link=]]<br />
<br />
While the spare pool is: <br />
<br />
[[Image:r22.png|center|500px|link=]]<br />
<br />
The behavior of this system from 0 to 300 is shown graphically in the figure below.<br />
<br />
[[Image:8.26.png|center|600px|link=]]<br />
<br />
The discrete system events during that time are as follows:<br />
<br />
::1. Component <math>A\,\!</math> fails at 100 and Crew <math>A\,\!</math> is engaged. <br />
<br />
:::a) At 110, Crew <math>A\,\!</math> arrives and completes the repair by 120. <br />
:::b) This repair uses the only spare part in inventory and triggers an on-condition restock. A part is ordered and is scheduled to arrive at 160.<br />
:::c) A scheduled restock part is also set to arrive at 150.<br />
:::d) Pool [on-hand = 0, pending: 150, 160].<br />
::2. Component <math>B\,\!</math> fails at 121. Crew <math>A\,\!</math> is available and it is engaged. <br />
:::a) Crew <math>A\,\!</math> arrives by 131 but no part is available. <br />
:::b) The failure finds the pool with no parts, triggering the on-condition restock. A part was ordered and is scheduled to arrive at 181.<br />
:::c) Pool [on-hand = 0, pending: 150, 160, 181].<br />
:::d) At 150, the first part arrives and is used by Component <math>B\,\!</math>.<br />
:::e) Repair on Component <math>B\,\!</math> is completed 20 time units later, at 170.<br />
:::f) Pool [on-hand=0, pending: 160, 181].<br />
::3. Component <math>C\,\!</math> fails at 122. Crew <math>A\,\!</math> is already engaged by Component <math>B\,\!</math>, thus Crew <math>B\,\!</math> is engaged. <br />
:::a) Crew <math>B\,\!</math> arrives at 137 but no part is available.<br />
:::b) The failure finds the pool with no parts, triggering the on-condition restock. A part is ordered and is scheduled to arrive at 182.<br />
:::c) Pool [on-hand = 0, pending: 160, 181,182].<br />
:::d) At 160, the part arrives and Component <math>C\,\!</math> is repaired by 180. <br />
:::e) Pool [on-hand = 0, pending: 181,182].<br />
::4. Component <math>F\,\!</math> fails at 123. No crews are available until 170 when Crew <math>A\,\!</math> becomes available.<br />
:::a) Crew <math>A\,\!</math> arrives by 180 and has to wait for a part.<br />
:::b) The failure found the pool with no parts, triggering the on-condition restock. A part is ordered and is scheduled to arrive at 183.<br />
:::c) Pool [on-hand = 0, pending: 181,182, 183].<br />
:::d) At 181, a part is obtained.<br />
:::e) By 201, the repair is completed.<br />
:::f) Pool [on-hand = 0, pending: 182, 183]<br />
::5. Component <math>D\,\!</math> fails at 171 with no crew available. <br />
:::a) Crew <math>B\,\!</math> becomes available at 180 and arrives by 195. <br />
:::b) The failure finds the pool with no parts, triggering the on-condition restock. A part is ordered and is scheduled to arrive at 231.<br />
:::c) The next part becomes available at 182 and the repair is completed by 205.<br />
:::d) Pool [on-hand = 0, pending: 183, 231]<br />
::6. End time is at 300. The last scheduled part arrives at the pool at 300.<br />
<br />
=Using Maintenance Tasks=<br />
One of the most important benefits of simulation is the ability to define how and when actions are performed. In our case, the actions of interest are part repairs/replacements. This is accomplished in BlockSim through the use of maintenance tasks. Specifically, four different types of tasks can be defined for maintenance actions: corrective maintenance, preventive maintenance, on condition maintenance and inspection.<br />
<br><br />
<br />
===Corrective Maintenance Tasks===<br />
A corrective maintenance task defines when a corrective maintenance (CM) action is performed. The figure below shows a corrective maintenance task assigned to a block in BlockSim. Corrective actions will be performed either immediately upon failure of the item or upon finding that the item has failed (for hidden failures that are not detected until an inspection). BlockSim allows the selection of either category. <br />
*'''Upon item failure''': The CM action is initiated immediately upon failure. If the user doesn't specify the choice for a CM, then this is the default option. All prior examples were based on the instruction to perform a CM upon failure. <br />
*'''When found failed during an Inspection''': The CM action will only be initiated after an inspection is done on the failed component. How and when the inspections are performed is defined by the block's inspection properties. This has the effect of defining a dependency between the corrective maintenance task and the inspection task.<br />
<br />
<br />
[[Image:r23.png|center|500px|link=]]<br />
<br />
<div class="noprint"><br />
{{Examples Box|BlockSim Examples|<p>More application examples are available! See also:</p> {{Examples Link|BlockSim_Example:_CM_Triggered_by_Subsystem_Down|CM Triggered by Subsystem Down}}}}<br />
</div><br />
<br />
===Scheduled Tasks===<br />
Scheduled tasks can be performed on a known schedule, which can be based on any of the following:<br />
* A time interval, either fixed or dynamic, based on the item's age (item clock) or on calendar time (system clock). See [[#Item and System Ages|Item and System Ages]].<br />
* The occurrence of certain events, including:<br />
**The system goes down. <br />
**Certain events happen in a maintenance group. The events and groups are user-specified, and the item that the task is assigned to does not need to be part of the selected maintenance group(s).<br />
<br />
The types of scheduled tasks include:<br />
*Inspection tasks<br />
*Preventive maintenance tasks<br />
*On condition tasks<br />
<br />
====Item and System Ages====<br />
It is important to keep in mind that the system and each component of the system maintain separate clocks within the simulation. When setting intervals to perform a scheduled task, the intervals can be based on either type of clock. Specifically:<br />
*Item age refers to the accumulated age of the block, which gets adjusted each time the block is repaired (i.e., restored). If the block is repaired at least once during the simulation, this will be different from the elapsed simulation time. For example, if the restoration factor is 1 (i.e., “as good as new”) and the assigned interval is 100 days based on item age, then the task will be scheduled to be performed for the first time at 100 days of elapsed simulation time. However, if the block fails at 85 days and it takes 5 days to complete the repair, then the block will be fully restored at 90 days and its accumulated age will be reset to 0 at that point. Therefore, if another failure does not occur in the meantime, the task will be performed for the first time 100 days later at 190 days of elapsed simulation time.<br />
<br />
[[Image:Updown_item_age.png|center|450px|link=]]<br />
<br />
*Calendar time refers to the elapsed simulation time. If the assigned interval is 100 days based on calendar time, then the task will be performed for the first time at 100 days of elapsed simulation time, for the second time at 200 days of elapsed simulation time and so on, regardless of whether the block fails and gets repaired correctively between those times.<br />
<br />
[[Image:Updown_system_age.png|center|450px|link=]]<br />
<br />
====Inspection Tasks====<br />
Like all scheduled tasks, inspections can be performed based on a time interval or upon certain events. Inspections can be specified to bring the item or system down or not.<br />
<br />
====Preventive Maintenance Tasks====<br />
The figure below shows the options available in a preventive maintenance (PM) task within BlockSim. PMs can be performed based on a time interval or upon certain events. Because PM tasks always bring the item down, one can also specify whether preventive maintenance will be performed if the task brings the system down.<br />
<br />
[[Image:r25.png|center|556px|link=]]<br />
<br />
====On Condition Tasks====<br />
On condition maintenance relies on the capability to detect failures before they happen so that preventive maintenance can be initiated. If, during an inspection, maintenance personnel can find evidence that the equipment is approaching the end of its life, then it may be possible to delay the failure, prevent it from happening or replace the equipment at the earliest convenience rather then allowing the failure to occur and possibly cause severe consequences. In BlockSim, on condition tasks consist of an inspection task that triggers a preventive task when an impending failure is detected during inspection. <br />
=====Failure Detection=====<br />
Inspection tasks can be used to check for indications of an approaching failure. BlockSim models such indications of when an approaching failure will become detectable upon inspection using Failure Detection Threshold and P-F Interval. Failure detection threshold allows the user to enter a number between 0 and 1 indicating the percentage of an item's life that must elapse before an approaching failure can be detected. For instance, if the failure detection threshold value is set as 0.8 then this means that the failure of a component can be detected only during the last 20% of its life. If an inspection occurs during this time, an approaching failure is detected and the inspection triggers a preventive maintenance task to take the necessary precautions to delay the failure by either repairing or replacing the component.<br />
<br />
The P-F interval allows the user to enter the amount of time before the failure of a component when the approaching failure can be detected by an inspection. The P-F interval represents the warning period that spans from P(when a potential failure can be detected) to F(when the failure occurs). If a P-F interval is set as 200 hours, then the approaching failure of the component can only be detected at 200 hours before the failure of the component. Thus, if a component has a fixed life of 1,000 hours and the P-F interval is set to 200 hours, then if an inspection occurs at or beyond 800 hours, then the approaching failure of the component that is to occur at 1,000 hours is detected by this inspection and a preventive maintenance task is triggered to take action against this failure.<br />
<br />
=====Rules for On Condition Tasks=====<br />
<br />
*An inspection that finds a block at or beyond the failure detection threshold or within the range of the P-F interval will trigger the associated preventive task as long as preventive maintenance can be performed on that block.<br />
<br />
*If a non-downing inspection triggers a preventive maintenance action because the failure detection threshold or P-F interval range was reached, no other maintenance task will be performed between the inspection and the triggered preventive task; tasks that would otherwise have happened at that time due to system age, system down or group maintenance will be ignored.<br />
<br />
*A preventive task that would have been triggered by a non-downing inspection will not happen if the block fails during the inspection, as corrective maintenance will take place instead.<br />
<br />
*If a failure will occur within the failure detection threshold or P-F interval set for the inspection, but the preventive task is only supposed to be performed when the system is down, the simulation waits until the requirements of the preventive task are met to perform the preventive maintenance.<br />
<br />
*If the on condition inspection triggers the preventive maintenance part of the task, the simulation assumes that the maintenance crew will forego any routine servicing associated with the inspection part of the task. In other words, the restoration will come from the preventive maintenance, so any restoration factor defined for the inspection will be ignored in these circumstances.<br />
<br />
=====Example Using P-F Interval=====<br />
<br />
To illustrate the use of the P-F interval in BlockSim, consider a component <math>A\,\!</math> that fails every 700 <math>tu\,\!</math>. The corrective maintenance on this equipment takes 100 <math>tu\,\!</math> to complete, while the preventive maintenance takes 50 <math>tu\,\!</math> to complete. Both the corrective and preventive maintenance actions have a type II restoration factor of 1. Inspection tasks of 10 <math>tu\,\!</math> duration are performed on the component every 300 <math>tu\,\!</math>. There is no restoration of the component during the inspections. The P-F interval for this component is 100 <math>tu\,\!</math>.<br />
<br />
The component behavior from 0 to 2000 <math>tu\,\!</math> is shown in the figure below and described next.<br />
<br />
::#At 300 <math>tu\,\!</math> the first scheduled inspection of 10 <math>tu\,\!</math> duration occurs. At this time the age of the component is 300 <math>tu\,\!</math>. This inspection does not lie in the P-F interval of 100 <math>tu\,\!</math> (which begins at the age of 600 <math>tu\,\!</math> and ends at the age of 700 <math>tu\,\!</math>). Thus, no approaching failure is detected during this inspection.<br />
::#At 600 <math>tu\,\!</math> the second scheduled inspection of 10 <math>tu\,\!</math> duration occurs. At this time the age of the component is 590 <math>tu\,\!</math> (no age is accumulated during the first inspection from 300 tu to 310 <math>tu\,\!</math> as the component does not operate during this inspection). Again this inspection does not lie in the P-F interval. Thus, no approaching failure is detected during this inspection.<br />
::#At 720 <math>tu\,\!</math> the component fails after having accumulated an age of 700 <math>tu\,\!</math>. A corrective maintenance task of 100 <math>tu\,\!</math> duration occurs to restore the component to as-good-as-new condition.<br />
::#At 900 <math>tu\,\!</math> the third scheduled inspection occurs. At this time the age of the component is 80 <math>tu\,\!</math>. This inspection does not lie in the P-F interval (from age 600 <math>tu\,\!</math> to 700 <math>tu\,\!</math>). Thus, no approaching failure is detected during this inspection.<br />
::#At 1200 <math>tu\,\!</math> the fourth scheduled inspection occurs. At this time the age of the component is 370 <math>tu\,\!</math>. Again, this inspection does not lie in the P-F interval and no approaching failure is detected.<br />
::#At 1500 <math>tu\,\!</math> the fifth scheduled inspection occurs. At this time the age of the component is 660 <math>tu\,\!</math>, which lies in the P-F interval. As a result, an approaching failure is detected and the inspection triggers a preventive maintenance task. A preventive maintenance task of 50 <math>tu\,\!</math> duration occurs at 1510 <math>tu\,\!</math> to restore the component to as-good-as-new condition.<br />
::#At 1800 <math>tu\,\!</math> the sixth scheduled inspection occurs. At this time the age of the component is 240 <math>tu\,\!</math>. This inspection does not lie in the P-F interval (from age 600 tu to 700 <math>tu\,\!</math>) and no approaching failure is detected.<br />
<br />
[[Image:BS8.32.png|center|600px|link=]]<br />
<br />
====Rules for PMs and Inspections====<br />
<br />
All the options available in the Maintenance task window were designed to maximize the modeling flexibility within BlockSim. However, maximizing the modeling flexibility introduces issues that you need to be aware of and requires you to carefully select options in order to assure that the selections do not contradict one another. One obvious case would be to define a PM action on a component in series (which will always bring the system down) and then assign a PM policy to the block that has the Do not perform maintenance if the action brings the system down option set. With these settings, no PMs will ever be performed on the component during the BlockSim simulation. The following sections summarize some issues and special cases to consider when defining maintenance properties in BlockSim.<br />
<br />
::#Inspections do not consume spare parts. However, an inspection can have a renewal effect on the component if the restoration factor is set to a number other than the default of 0.<br />
::#On the inspection tab, if Inspection brings system down is selected, this also implies that the inspection brings the item down.<br />
::#If a PM or an inspection are scheduled based on the item's age, then they will occur exactly when the item reaches that age. However, it is important to note that failed items do not age. Thus, if an item fails before it reaches that age, the action will not be performed. This means that if the item fails before the scheduled inspection (based on item age) and the CM is set to be performed upon inspection, the CM will never take place. The reason that this option is allowed in BlockSim is for the flexibility of specifying renewing inspections.<br />
::#Downtime due to a failure discovered during a non-downing inspection is included when computing results "w/o PM, OC & Inspections."<br />
::#If a PM upon item age is scheduled and is not performed because it brings the system down (based on the option in the PM task) the PM will not happen unless the item reaches that age again (after restoration by CM, inspection or another type of PM).<br />
::#If the CM task is upon inspection and a failed component is scheduled for PM prior to the inspection, the PM action will restore the component and the CM will not take place.<br />
::#In the case of simultaneous events, only one event is executed (except the case in maintenance phase, in maintenance phase, all simultaneous events in maintenance phase are executed in a order). The following precedence order is used: 1). Tasks based on intervals or upon start of a maintenance phase; 2). Tasks based on events in a maintenance group, where the triggering event applies to a block; 3). Tasks based on system down; 4). Tasked on events in a maintenance group, where the triggering event applies to a subdiagram. Within these categories, order is determined according to the priorities specified in the URD (i.e., the higher the task in on the list, the higher the priority).<br />
::#The PM option of Do not perform if it brings the system down is only considered at the time that the PM needs to be initiated. If the system is down at that time, due to another item, then the PM will be performed regardless of any future consequences to the system up state. In other words, when the other item is fixed, it is possible that the system will remain down due to this PM action. In this case, the PM time difference is added to the system PM downtime. <br />
::#Downing events cannot overlap. If a component is down due to a PM and another PM is suggested based on another trigger, the second call is ignored.<br />
::#A non-downing inspection with a restoration factor restores the block based on the age of the block at the beginning of the inspection (i.e., duration is not restored). <br />
::#Non-downing events can overlap with downing events. If in a non-downing inspection and a downing event happen concurrently, the non-downing event will be managed in parallel with the downing event.<br />
::#If a failure or PM occurs during a non-downing inspection and the CM or PM has a restoration factor and the inspection action has a restoration factor, then both restoration factors are used (compounded).<br />
::#A PM or inspection on system down is triggered only if the system was up at the time that the event brought the system down.<br />
::#A non-downing inspection with restoration factor of 0 does not affect the block.<br />
<br />
===Example===<br />
<br />
To illustrate the use of maintenance policies in BlockSim we will use the same example from [[Repairable_Systems_Analysis_Through_Simulation#Example_Using_Both_Crews_and_Pools|Example Using Both Crews and Pools]] with the following modifications (The figures below also show these settings): <br />
<br />
Blocks A and D: <br />
#Belong to the same group (Group 1).<br />
#Corrective maintenance actions are upon inspection (not upon failure) and the inspections are performed every 30 hours, based on system time. Inspections have a duration of 1 hour. Furthermore, unlimited free crews are available to perform the inspections.<br />
#Whenever either item get CM, the other one gets a PM.<br />
#The PM has a fixed duration of 10 hours.<br />
#The same crews are used for both corrective and preventive maintenance actions.<br />
<br />
[[Image:r29.png|center|650px| CM and Inspection settings for blocks A and D | link= ]]<br />
<br />
<br />
[[Image:r29b.png|center|650px| CM and Inspection settings for blocks A and D | link= ]]<br />
<br />
<br />
[[Image:r30.png|center|650px| PM settings for blocks A and D | link= ]]<br />
<br />
====System Overview====<br />
<br />
The item and system behavior from 0 to 300 hours is shown in the figure below and described next. <br />
<br />
[[Image:BS8.35.png|center|600px|link=]]<br />
<br />
::1. At 100, block <math>A\,\!</math> goes down and brings the system down. <br />
:::a) No maintenance action is performed since an upon inspection policy was used.<br />
:::b) The next scheduled inspection is at 120, thus Crew <math>A\,\!</math> is called to perform the maintenance by 121 (end of the inspection).<br />
::2. Crew <math>A\,\!</math> arrives and initiates the repair on <math>A\,\!</math> at 131.<br />
:::a) The only part in the pool is used and an on-condition restock is triggered.<br />
:::b) Pool [on-hand = 0, pending: 150 <math>_{s}\,\!</math>, 181].<br />
:::c) Block <math>A\,\!</math> is repaired by 141.<br />
::3. At the same time (121), a PM is initiated for block <math>D\,\!</math> because the PM task called for "PM upon the start of corrective maintenance on another group item."<br />
:::a) Crew <math>B\,\!</math> is called for block <math>D\,\!</math> and arrives at 136.<br />
:::b) No part is available until 150. An on-condition restock is triggered for 181.<br />
:::c) Pool [on-hand = 0, pending: 150 <math>_{s}\,\!</math>, 181, 181].<br />
:::d) At 150, a part becomes available and the PM is completed by 160.<br />
:::e) Pool [on-hand = 0, pending: 181, 181].<br />
::4. At 161, block <math>B\,\!</math> fails (corrective maintenance upon failure).<br />
:::a) Block <math>B\,\!</math> gets Crew <math>A\,\!</math>, which arrives at 171.<br />
:::b) No part is available until 181. An on-condition restock is triggered for 221.<br />
:::c) Pool [on-hand = 0, pending: 181, 181, 221].<br />
:::d) A part arrives at 181.<br />
:::e) The repair is completed by 201.<br />
:::f) Pool [on-hand = 0, pending: 181, 221].<br />
::5. At 162, block <math>C\,\!</math> fails.<br />
:::a) Block <math>C\,\!</math> gets Crew <math>B\,\!</math>, which arrives at 177.<br />
:::b) No part is available until 181. An on-condition restock is triggered for 222.<br />
:::c) Pool [on-hand = 0, pending: 181, 221, 222].<br />
:::d) A part arrives at 181.<br />
:::e) The repair is completed by 201.<br />
:::f) Pool [on-hand = 0, pending: 221, 222]. <br />
::6. At 163, block <math>F\,\!</math> fails and brings the system down.<br />
:::a) Block <math>F\,\!</math> calls Crew <math>A\,\!</math> then <math>B\,\!</math>. Both are busy.<br />
:::b) Crew <math>A\,\!</math> will be the first available so .. calls <math>A\,\!</math> again and waits.<br />
:::c) No part is available until 221. An on-condition restock is triggered for 223.<br />
:::d) Pool [on-hand = 0, pending: 221, 222, 223].<br />
:::e) Crew <math>A\,\!</math> arrives at 211.<br />
:::f) Repair begins at 221.<br />
:::g) Repair is completed by 241.<br />
:::h) Pool [on-hand = 0, pending: 222, 223]. <br />
::7. At 298, block <math>A\,\!</math> goes down and brings the system down.<br />
<br />
====System Uptimes/Downtimes====<br />
::1. Uptime: This is 200 hours. <br />
:::a) This can be obtained by observing the following system up durations: 0 to 100, 160 to 163 and 201 to 298.<br />
::2. CM Downtime: This is 58 hours.<br />
:::a) Observe that even though the system failed at 100, the CM action (on block <math>A\,\!</math> ) was initiated at 121 and lasted until 141, thus only 20 hours of this downtime are attributed to the CM action.<br />
:::b) The next CM action started at 163 when block <math>F\,\!</math> failed and lasted until 201 when blocks <math>B\,\!</math> and <math>C\,\!</math> were restored, thus adding another 38 hours of CM downtime.<br />
::3. Inspection Downtime: This is 1 hour. <br />
:::a) The only time the system was under inspection was from 120 to 121, during the inspection of block <math>A\,\!</math>.<br />
::4. PM Downtime: This is 19 hours. <br />
:::a) Note that the entire PM action duration on block <math>D\,\!</math> was from 121 to 160.<br />
:::b) Until 141, and from the system perspective, the CM on block <math>A\,\!</math> was the cause for the downing. Once block <math>A\,\!</math> was restored (at 141), then the reason for the system being down became the PM on block <math>D\,\!</math>.<br />
:::c) Thus, the PM on block <math>D\,\!</math> was only responsible for the downtime after block <math>A\,\!</math> was restored, or from 141 to 160.<br />
::5. OC Downtime: This is 0. There is not on condition task in this example. <br />
::6. Total Downtime: This is 100 hours. <br />
:::a) This includes all of the above downtimes plus the 20 hours (100 to 120) and the 2 hours (298 to 300) that the system was down due the undiscovered failure of block <math>A\,\!</math>.<br />
<br />
[[Image:R32.png|center|600px|link=]]<br />
<br />
====System Metrics====<br />
::1. Mean Availability (All Events): <br />
::<math>\frac{300-100}{300}=0.6667\,\!</math><br />
::2. Mean Availability (w/o PM & Inspection):<br />
:::a) This is due to the CM downtime of 58, the undiscovered downtime of 22 and the inspection downtime of 1, or: <br />
::<math>\frac{300-(58+22+1)}{300}=0.7333\,\!</math><br />
:::b) It should be noted that the inspection downtime was included even though the definition was "w/o PM & Inspection." The reason for this is that the inspection did not cause the downtime in this case. Only downtimes caused by the PM or inspections are excluded. <br />
::3. Point Availability and Reliability at 300 is zero because the system was down at 300.<br />
::4. Expected Number of Failures is 3. <br />
:::a) The system failed at 100, 163 and 298.<br />
::5. The standard deviation of the number of failures is 0.<br />
::6. The MTTFF is 100 because the example is deterministic.<br />
<br />
====The System Downing Events====<br />
::1. Number of Failures is 3.<br />
:::a) The first is the failure of block <math>A\,\!</math>, the second is the failure of block <math>F\,\!</math> and the third is the failure of block <math>A\,\!</math>.<br />
::2. Number of CMs is 2. <br />
:::a) The first is the CM on block <math>A\,\!</math> and the second is the CM on block <math>F\,\!</math>.<br />
::3. Number of Inspections is 1.<br />
::4. Number of PMs is 1.<br />
::5. Total Events are 6. These are events that the downtime can be attributed to. Specifically, the following events were observed:<br />
:::a) The failure of block <math>A\,\!</math> at 100. <br />
:::b) Inspection on block <math>A\,\!</math> at 120.<br />
:::c) The CM action on block <math>A\,\!</math>.<br />
:::d) The PM action on block <math>D\,\!</math> (after <math>A\,\!</math> was fixed).<br />
:::e) The failure of block <math>F\,\!</math> at 163.<br />
:::f) The failure of block <math>A\,\!</math> at 298.<br />
<br />
====Block Details====<br />
The details for blocks <math>A,B,C,D\,\!</math> and <math>F\,\!</math> are shown below.<br />
<br />
[[Image:r33.png|center|600px| Block details for this example.|link=]]<br />
<br />
We will discuss some of these results. First note that there are four downing events on block <math>A\,\!</math> : initial failure, inspection and CM, plus the last failure at 298. All others have just one. Also, block <math>A\,\!</math> had a total downtime of <math>41+2\,\!</math>, giving it a mean availability of 0.8567. The first time-to-failure for block <math>A\,\!</math> occurred at 100 while the second occurred after <math>298-141=157\,\!</math> hours of operation, yielding an average time between failures (MTBF) of <math>257/2=128.5\,\!</math>. (Note that this is the same as uptime/failures.) Block <math>D\,\!</math> never failed, so its MTBF cannot be determined. Furthermore, MTBDE for each item is determined by dividing the block's uptime by the number of events. The RS FCI and RS DECI metrics are obtained by looking at the SD Failures and SD Events of the item and the number of system failures and events. Specifically, the only items that caused system failure are blocks <math>A\,\!</math> and <math>F\,\!</math> ; <math>A\,\!</math> at 100 and 298 and <math>F\,\!</math> at 163. It is important to note that even though one could argue that block <math>F\,\!</math> alone did not cause the failure ( <math>B\,\!</math> and <math>C\,\!</math> were also failed), the downing was attributed to <math>F\,\!</math> because the system reached a failed state only when block <math>F\,\!</math> failed. <br />
<br />
On the number of inspections, which were scheduled every 30 hours, nine occurred for block <math>A\,\!</math> [30, 60, 90, 120, 150, 180, 210, 240, 270] and eight for block <math>D\,\!</math>. Block <math>D\,\!</math> did not get inspected at 150 because block <math>D\,\!</math> was undergoing a PM action at that time.<br />
<br />
====Crew Details====<br />
<br />
The figure below shows the crew results.<br />
<br />
[[Image:r34.png|center|400px| Crew details for this example.|link=]]<br />
<br />
Crew <math>A\,\!</math> received a total of six calls and accepted three. Specifically,<br />
<br />
::#At 121, the crew was called by block <math>A\,\!</math> and the call was accepted.<br />
::#At 121, block <math>D\,\!</math> also called for its PM action and was rejected. Block <math>D\,\!</math> then called crew <math>B\,\!</math>, which accepted the call.<br />
::#At 161, block <math>B\,\!</math> called crew <math>A\,\!</math>. Crew <math>A\,\!</math> accepted.<br />
::#At 162, block <math>C\,\!</math> called crew <math>A\,\!</math>. Crew <math>A\,\!</math> rejected and block <math>C\,\!</math> called crew <math>B\,\!</math>, which accepted the call.<br />
::#At 163, block <math>F\,\!</math> called crew <math>A\,\!</math> and then crew <math>B\,\!</math> and both rejected. Block <math>F\,\!</math> then waited until a crew became available at 201 and called that crew again. This was crew <math>A\,\!</math>, which accepted.<br />
<br />
The total wait time is the time that blocks had to wait for the maintenance crew. Block <math>F\,\!</math> is the only component that waited, waiting 38 hours for crew <math>A\,\!</math>.<br />
<br />
Also, the costs for crew <math>A\,\!</math> were 1 per unit time and 10 per incident, thus the total costs were 100 + 30. The costs for Crew <math>B\,\!</math> were 2 per unit time and 20 per incident, thus the total costs were 156 + 40.<br />
<br />
====Pool Details====<br />
The figure below shows the spare part pool results.<br />
<br />
[[Image:r35.png|center|300px| Pool details for this example.|link=]]<br />
<br />
The pool started with a stock level of 1 and ended up with 2. Specifically,<br />
<br />
::#At 121, the pool dispensed a part to block <math>A\,\!</math> and ordered another to arrive at 181.<br />
::#At 121, it dispensed a part to block <math>D\,\!</math> and ordered another to arrive at 181.<br />
::#At 150, a scheduled part arrived to restock the pool.<br />
::#At 161 the pool dispensed a part to block <math>B\,\!</math> and ordered another to arrive at 221.<br />
::#At 181, it dispensed a part to block <math>C\,\!</math> and ordered another to arrive at 222.<br />
::#At 221, it dispensed a part to block <math>F\,\!</math> and ordered another to arrive at 223.<br />
::#The 222 and 223 arrivals remained in stock until the end of the simulation.<br />
<br />
Overall, five parts were dispensed. Blocks had to wait a total of 126 hours to receive parts (B: 181-161=20, C: 181-162=19, D: 150-121=29 and F: 221-163=58).<br />
<br />
=Subdiagrams and Multi Blocks in Simulation=<br />
<br />
Any subdiagrams and multi blocks that may be present in the BlockSim RBD are expanded and/or merged into a single diagram before the system is simulated. As an example, consider the system shown in the figure below.<br />
<br />
[[Image:r38.png|center|350px| A system made up of three subsystems, A, B, and C.|link=]]<br />
<br />
BlockSim will internally merge the system into a single diagram before the simulation, as shown in the figure below. This means that all the failure and repair properties of the items in the subdiagrams are also considered.<br />
<br />
[[Image:r39.png|center|350px| The simulation engine view of the system and subdiagrams|link=]]<br />
<br />
In the case of multi blocks, the blocks are also fully expanded before simulation. This means that unlike the analytical solution, the execution speed (and memory requirements) for a multi block representing ten blocks in series is identical to the representation of ten individual blocks in series.<br />
<br />
=Containers in Simulation=<br />
===Standby Containers===<br />
When you simulate a diagram that contains a standby container, the container acts as the switch mechanism (as shown below) in addition to defining the standby relationships and the number of active units that are required. The container's failure and repair properties are really that of the switch itself. The switch can fail with a distribution, while waiting to switch or during the switch action. Repair properties restore the switch regardless of how the switch failed. Failure of the switch itself does not bring the container down because the switch is not really needed unless called upon to switch. The container will go down if the units within the container fail or the switch is failed when a switch action is needed. The restoration time for this is based on the repair distributions of the contained units and the switch. Furthermore, the container is down during a switch process that has a delay. <br />
<br />
[[Image:8.43.png|center|500px| The standby container acts as the switch, thus the failure distribution of the container is the failure distribution of the switch. The container can also fail when called upon to switch.|link=]]<br />
<br />
[[Image:8_43_1_new.png|center|150px|link=]]<br />
<br />
To better illustrate this, consider the following deterministic case.<br />
<br />
::#Units <math>A\,\!</math> and <math>B\,\!</math> are contained in a standby container.<br />
::#The standby container is the only item in the diagram, thus failure of the container is the same as failure of the system. <br />
::#<math>A\,\!</math> is the active unit and <math>B\,\!</math> is the standby unit. <br />
::#Unit <math>A\,\!</math> fails every 100 <math>tu\,\!</math> (active) and takes 10 <math>tu\,\!</math> to repair. <br />
::#<math>B\,\!</math> fails every 3 <math>tu\,\!</math> (active) and also takes 10 <math>tu\,\!</math> to repair. <br />
::#The units cannot fail while in quiescent (standby) mode. <br />
::#Furthermore, assume that the container (acting as the switch) fails every 30 <math>tu\,\!</math> while waiting to switch and takes 4 <math>tu\,\!</math> to repair. If not failed, the container switches with 100% probability. <br />
::#The switch action takes 7 <math>tu\,\!</math> to complete.<br />
::#After repair, unit <math>A\,\!</math> is always reactivated. <br />
::#The container does not operate through system failure and thus the components do not either. <br />
<br />
Keep in mind that we are looking at two events on the container. The container down and container switch down.<br />
<br />
The system event log is shown in the figure below and is as follows:<br />
<br />
[[Image:BS8.44.png|center|600px| The system behavior using a standby container.|link=]]<br />
<br />
::#At 30, the switch fails and gets repaired by 34. The container switch is failed and being repaired; however, the container is up during this time.<br />
::#At 64, the switch fails and gets repaired by 68. The container is up during this time.<br />
::#At 98, the switch fails. It will be repaired by 102.<br />
::#At 100, unit <math>A\,\!</math> fails. Unit <math>A\,\!</math> attempts to activate the switch to go to <math>B\,\!</math> ; however, the switch is failed.<br />
::#At 102, the switch is operational.<br />
::#From 102 to 109, the switch is in the process of switching from unit <math>A\,\!</math> to unit <math>B\,\!</math>. The container and system are down from 100 to 109.<br />
::#By 110, unit <math>A\,\!</math> is fixed and the system is switched back to <math>A\,\!</math> from <math>B\,\!</math>. The return switch action brings the container down for 7 <math>tu\,\!</math>, from 110 to 117. During this time, note that unit <math>B\,\!</math> has only functioned for 1 <math>tu\,\!</math>, 109 to 110.<br />
::#At 146, the switch fails and gets repaired by 150. The container is up during this time.<br />
::#At 180, the switch fails and gets repaired by 184. The container is up during this time.<br />
::#At 214, the switch fails and gets repaired by 218. <br />
::#At 217, unit <math>A\,\!</math> fails. The switch is failed at this time.<br />
::#At 218, the switch is operational and the system is switched to unit <math>B\,\!</math> within 7 <math>tu\,\!</math>. The container is down from 218 to 225.<br />
::#At 225, unit <math>B\,\!</math> takes over. After 2 <math>tu\,\!</math> of operation at 227, unit <math>B\,\!</math> fails. It will be restored by 237. <br />
::#At 227, unit <math>A\,\!</math> is repaired and the switchback action to unit <math>A\,\!</math> is initiated. By 234, the system is up.<br />
::#At 262, the switch fails and gets repaired by 266. The container is up during this time.<br />
::#At 296, the switch fails and gets repaired by 300. The container is up during this time.<br />
<br />
The system results are shown in the figure below and discussed next.<br />
[[Image:BS8.45.png|center|600px| System overview results.|link=]]<br />
<br />
::1. System CM Downtime is 24. <br />
:::a) CM downtime includes all downtime due to failures as well as the delay in switching from a failed active unit to a standby unit. It does not include the switchback time from the standby to the restored active unit. Thus, the times from 100 to 109, 217 to 225 and 227 to 234 are included. The time to switchback, 110 to 117, is not included.<br />
::2. System Total Downtime is 31. <br />
:::a) It includes the CM downtime and the switchback downtime.<br />
::3. Number of System Failures is 3. <br />
:::a) It includes the failures at 100, 217 and 227. <br />
:::b) This is the same as the number of CM downing events. <br />
::4. The Total Downing Events are 4. <br />
:::a) This includes the switchback downing event at 110.<br />
::5. The Mean Availability (w/o PM and Inspection) does not include the downtime due to the switchback event.<br />
<br />
====Additional Rules and Assumptions for Standby Containers====<br />
<br />
::1) A container will only attempt to switch if there is an available non-failed item to switch to. If there is no such item, it will then switch if and when an item becomes available. The switch will cancel the action if it gets restored before an item becomes available. <br />
:::a) As an example, consider the case of unit <math>A\,\!</math> failing active while unit <math>B\,\!</math> failed in a quiescent mode. If unit <math>B\,\!</math> gets restored before unit <math>A\,\!</math>, then the switch will be initiated. If unit <math>A\,\!</math> is restored before unit <math>B\,\!</math>, the switch action will not occur.<br />
::2) In cases where not all active units are required, a switch will only occur if the failed combination causes the container to fail. <br />
:::a) For example, if <math>A\,\!</math>, <math>B\,\!</math>, and <math>C\,\!</math> are in a container for which one unit is required to be operating and <math>A\,\!</math> and <math>B\,\!</math> are active with <math>C\,\!</math> on standby, then the failure of either <math>A\,\!</math> or <math>B\,\!</math> will not cause a switching action. The container will switch to <math>C\,\!</math> only if both <math>A\,\!</math> and <math>B\,\!</math> are failed.<br />
::3) If the container switch is failed and a switching action is required, the switching action will occur after the switch has been restored if it is still required (i.e., if the active unit is still failed).<br />
::4) If a switch fails during the delay time of the switching action based on the reliability distribution (quiescent failure mode), the action is still carried out unless a failure based on the switch probability/restarts occurs when attempting to switch. <br />
::5) During switching events, the change from the operating to quiescent distribution (and vice versa) occurs at the end of the delay time.<br />
::6) The option of whether components operate while the system is down is defined at component level now (This is different from BlockSim 7, in which this option of the contained items inherit from container). Two rules here:<br />
:::a) If a path inside the container is down, blocks inside the container that are in that path do not continue to operate.<br />
:::b) Blocks that are up do not continue to operate while the container is down.<br />
::7) A switch can have a repair distribution and maintenance properties without having a reliability distribution. <br />
:::a) This is because maintenance actions are performed regardless of whether the switch failed while waiting to switch (reliability distribution) or during the actual switching process (fixed probability).<br />
::8) A switch fails during switching when the restarts are exhausted.<br />
::9) A restart is executed every time the switch fails to switch (based on its fixed probability of switching).<br />
::10) If a delay is specified, restarts happen after the delay.<br />
::11) If a container brings the system down, the container is responsible for the system going down (not the blocks inside the container).<br />
<br />
===Load Sharing Containers===<br />
<br />
When you simulate a diagram that contains a load sharing container, the container defines the load that is shared. A load sharing container has no failure or repair distributions. The container itself is considered failed if all the blocks inside the container have failed (or <math>k\,\!</math> blocks in a <math>k\,\!</math> -out-of- <math>n\,\!</math> configuration).<br />
<br />
To illustrate this, consider the following container with items <math>A\,\!</math> and <math>B\,\!</math> in a load sharing redundancy.<br />
<br />
Assume that <math>A\,\!</math> fails every 100 <math>tu\,\!</math> and <math>B\,\!</math> every 120 <math>tu\,\!</math> if both items are operating and they fail in half that time if either is operating alone (i.e., the items age twice as fast when operating alone). They both get repaired in 5 <math>tu\,\!</math>.<br />
<br />
[[Image:8.46.png|center|600px| Behavior of a simple load sharing system.|link=]]<br />
<br />
The system event log is shown in the figure above and is as follows:<br />
<br />
::1. At 100, <math>A\,\!</math> fails. It takes 5 <math>tu\,\!</math> to restore <math>A\,\!</math>. <br />
::2. From 100 to 105, <math>B\,\!</math> is operating alone and is experiencing a higher load.<br />
::3. At 115, <math>B\,\!</math> fails. would normally be expected to fail at 120, however: <br />
:::a) From 0 to 100, it accumulated the equivalent of 100 <math>tu\,\!</math> of damage.<br />
:::b) From 100 to 105, it accumulated 10 <math>tu\,\!</math> of damage, which is twice the damage since it was operating alone. Put another way, <math>B\,\!</math> aged by 10 <math>tu\,\!</math> over a period of 5 <math>tu\,\!</math>.<br />
:::c) At 105, <math>A\,\!</math> is restored but <math>B\,\!</math> has only 10 <math>tu\,\!</math> of life remaining at this point.<br />
:::d) <math>B\,\!</math> fails at 115.<br />
::4. At 120, <math>B\,\!</math> is repaired.<br />
::5. At 200, <math>A\,\!</math> fails again. <math>A\,\!</math> would normally be expected to fail at 205; however, the failure of <math>B\,\!</math> at 115 to 120 added additional damage to <math>A\,\!</math>. In other words, the age of <math>A\,\!</math> at 115 was 10; by 120 it was 20. Thus it reached an age of 100 95 <math>tu\,\!</math> later at 200. <br />
::6. <math>A\,\!</math> is restored by 205.<br />
::7. At 235, <math>B\,\!</math> fails. <math>B\,\!</math> would normally be expected to fail at 240; however, the failure of <math>A\,\!</math> at 200 caused the reduction.<br />
:::a) At 200, <math>B\,\!</math> had an age of 80.<br />
:::b) By 205, <math>B\,\!</math> had an age of 90.<br />
:::c) <math>B\,\!</math> fails 30 <math>tu\,\!</math> later at 235.<br />
::8. The system itself never failed.<br />
<br />
====Additional Rules and Assumptions for Load Sharing Containers====<br />
<br />
::1. The option of whether components operate while the system is down is defined at component level now (This is different from BlockSim 7, in which this option of the contained items inherit from container). Two rules here:<br />
:::a) If a path inside the container is down, blocks inside the container that are in that path do not continue to operate.<br />
:::b) Blocks that are up do not continue to operate while the container is down.<br />
::2. If a container brings the system down, the block that brought the container down is responsible for the system going down. (This is the opposite of standby containers.)<br />
<br />
=State Change Triggers=<br />
{{:State Change Triggers}}<br />
<br />
=Discussion=<br />
<br />
Even though the examples and explanations presented here are deterministic, the sequence of events and logic used to view the system is the same as the one that would be used during simulation. The difference is that the process would be repeated multiple times during simulation and the results presented would be the average results over the multiple runs.<br />
<br />
Additionally, multiple metrics and results are presented and defined in this chapter. Many of these results can also be used to obtain additional metrics not explicitly given in BlockSim's Simulation Results Explorer. As an example, to compute mean availability with inspections but without PMs, the explicit downtimes given for each event could be used. Furthermore, all of the results given are for operating times starting at zero to a specified end time (although the components themselves could have been defined with a non-zero starting age). Results for a starting time other than zero could be obtained by running two simulations and looking at the difference in the detailed results where applicable. As an example, the difference in uptimes and downtimes can be used to determine availabilities for a specific time window.</div>Miklos Szidarovszkyhttps://www.reliawiki.com/index.php?title=Repairable_Systems_Analysis_Through_Simulation&diff=64812Repairable Systems Analysis Through Simulation2017-01-03T20:07:19Z<p>Miklos Szidarovszky: /* Additional Rules and Assumptions for Standby Containers */</p>
<hr />
<div>{{Template:bsbook|7}}<br />
{{TU}}<br />
<br />
Having introduced some of the basic theory and terminology for repairable systems in [[Introduction to Repairable Systems]], we will now examine the steps involved in the analysis of such complex systems. We will begin by examining system behavior through a sequence of discrete deterministic events and expand the analysis using discrete event simulation.<br />
<br />
=Simple Repairs=<br />
==Deterministic View, Simple Series==<br />
To first understand how component failures and simple repairs affect the system and to visualize the steps involved, let's begin with a very simple deterministic example with two components, <math>A\,\!</math> and <math>B\,\!</math>, in series.<br />
<br />
[[Image:i8.1.png|center|200px|link=]]<br />
<br />
Component <math>A\,\!</math> fails every 100 hours and component <math>B\,\!</math> fails every 120 hours. Both require 10 hours to get repaired. Furthermore, assume that the surviving component stops operating when the system fails (thus not aging). <br />
'''NOTE''': When a failure occurs in certain systems, some or all of the system's components<br />
may or may not continue to accumulate operating time while the system is down. For example,<br />
consider a transmitter-satellite-receiver system. This is a series system and the probability<br />
of failure for this system is the probability that any of the subsystems fail. If the receiver<br />
fails, the satellite continues to operate even though the receiver is down. In this case, the<br />
continued aging of the components during the system inoperation '''must''' be taken into<br />
consideration, since this will affect their failure characteristics and have an impact on the<br />
overall system downtime and availability.<br />
<br />
The system behavior during an operation from 0 to 300 hours would be as shown in the figure below.<br />
<br />
[[Image:BS8.1.png|center|500px|Overview of system and components for a simple series system with two components. Component A fails every 100 hours and component B fails every 120 hours. Both require 10 hours to get repaired and do not age(operate through failure) when the system is in a failed state.|link=]]<br />
<br />
Specifically, component <math>A\,\!</math> would fail at 100 hours, causing the system to fail. After 10 hours, component <math>A\,\!</math> would be restored and so would the system. The next event would be the failure of component <math>B\,\!</math>. We know that component <math>B\,\!</math> fails every 120 hours (or after an age of 120 hours). Since a component does not age while the system is down, component <math>B\,\!</math> would have reached an age of 120 when the clock reaches 130 hours. Thus, component <math>B\,\!</math> would fail at 130 hours and be repaired by 140 and so forth. Overall in this scenario, the system would be failed for a total of 40 hours due to four downing events (two due to <math>A\,\!</math> and two due to <math>B\,\!</math> ). The overall system availability (average or mean availability) would be <math>260/300=0.86667\,\!</math>. Point availability is the availability at a specific point time. In this deterministic case, the point availability would always be equal to 1 if the system is up at that time and equal to zero if the system is down at that time.<br />
<br />
====Operating Through System Failure====<br />
<br />
In the prior section we made the assumption that components do not age when the system is down. This assumption applies to most systems. However, under special circumstances, a unit may age even while the system is down. In such cases, the operating profile will be different from the one presented in the prior section. The figure below illustrates the case where the components operate continuously, regardless of the system status.<br />
<br />
[[Image:BS8.2.png|center|500px|Overview of up and down states for a simple series system with two components. Component ''A'' failes every 100 hours and component ''B'' fails every 120 hours. Both require 10 hours to get repaired and age when the system is in a failed state(operate through failure).|link=]]<br />
<br />
====Effects of Operating Through Failure====<br />
<br />
Consider a component with an increasing failure rate, as shown in the figure below. In the case that the component continues to operate through system failure, then when the system fails at <math>{{t}_{1}}\,\!</math> the surviving component's failure rate will be <math>{{\lambda }_{1}}\,\!</math>, as illustrated in figure below. When the system is restored at <math>{{t}_{2}}\,\!</math>, the component would have aged by <math>{{t}_{2}}-{{t}_{1}}\,\!</math> and its failure rate would now be <math>{{\lambda }_{2}}\,\!</math>. <br />
<br />
In the case of a component that does not operate through failure, then the surviving component would be at the same failure rate, <math>{{\lambda }_{1}},\,\!</math> when the system resumes operation.<br />
<br />
[[Image:BS8.3.png|center|400px|Illustration of a component with a linearly increasing failure rate and the effect of operation through system failure.|link=]]<br />
<br />
==Deterministic View, Simple Parallel==<br />
Consider the following system where <math>A\,\!</math> fails every 100, <math>B\,\!</math> every 120, <math>C\,\!</math> every 140 and <math>D\,\!</math> every 160 time units. Each takes 10 time units to restore. Furthermore, assume that components do not age when the system is down.<br />
<br />
[[Image:i8.2.png|center|300px|link=]]<br />
<br />
A deterministic system view is shown in the figure below. The sequence of events is as follows:<br><br />
<br />
#At 100, <math>A\,\!</math> fails and is repaired by 110. The system is failed. <br />
#At 130, <math>B\,\!</math> fails and is repaired by 140. The system continues to operate.<br />
#At 150, <math>C\,\!</math> fails and is repaired by 160. The system continues to operate.<br />
#At 170, <math>D\,\!</math> fails and is repaired by 180. The system is failed. <br />
#At 220, <math>A\,\!</math> fails and is repaired by 230. The system is failed. <br />
#At 280, <math>B\,\!</math> fails and is repaired by 290. The system continues to operate.<br />
#End at 300.<br />
<br />
[[Image:BS8.4.png|center|500px|Overview of simple redundant system with four components.|link=]]<br />
<br />
====Additional Notes====<br />
<br />
It should be noted that we are dealing with these events deterministically in order to better illustrate the methodology. When dealing with deterministic events, it is possible to create a sequence of events that one would not expect to encounter probabilistically. One such example consists of two units in series that do not operate through failure but both fail at exactly 100, which is highly unlikely in a real-world scenario. In this case, the assumption is that one of the events must occur at least an infinitesimal amount of time ( <math>dt)\,\!</math> before the other. Probabilistically, this event is extremely rare, since both randomly generated times would have to be exactly equal to each other, to 15 decimal points. In the rare event that this happens, BlockSim would pick the unit with the lowest ID value as the first failure. BlockSim assigns a unique numerical ID when each component is created. These can be viewed by selecting the '''Show Block ID''' option in the Diagram Options window.<br />
<br />
==Deterministic Views of More Complex Systems==<br />
<br />
Even though the examples presented are fairly simplistic, the same approach can be repeated for larger and more complex systems. The reader can easily observe/visualize the behavior of more complex systems in BlockSim using the Up/Down plots. These are the same plots used in this chapter. It should be noted that BlockSim makes these plots available only when a single simulation run has been performed for the analysis (i.e., Number of Simulations = 1). These plots are meaningless when doing multiple simulations because each run will yield a different plot.<br />
<br />
==Probabilistic View, Simple Series==<br />
<br />
In a probabilistic case, the failures and repairs do not happen at a fixed time and for a fixed duration, but rather occur randomly and based on an underlying distribution, as shown in the following figures.<br />
<br />
[[Image:8.5.png|center|600px| A single component with a probabilistic failure time and repair duration.|link=]]<br />
[[Image:BS8.6.png|center|500px|A system up/down plot illustrating a probabilistic failure time and repair duration for component B.|link=]]<br />
<br />
We use discrete event simulation in order to analyze (understand) the system behavior. Discrete event simulation looks at each system/component event very similarly to the way we looked at these events in the deterministic example. However, instead of using deterministic (fixed) times for each event occurrence or duration, random times are used. These random times are obtained from the underlying distribution for each event. As an example, consider an event following a 2-parameter Weibull distribution. The ''cdf'' of the 2-parameter Weibull distribution is given by: <br />
<br />
::<math>F(T)=1-{{e}^{-{{\left( \tfrac{T}{\eta } \right)}^{\beta }}}}\,\!</math><br />
<br />
The Weibull reliability function is given by: <br />
<br />
::<math>\begin{align}<br />
R(T)= & 1-F(t) \\ <br />
= & {{e}^{-{{\left( \tfrac{T}{\eta } \right)}^{\beta }}}} <br />
\end{align}\,\!</math><br />
<br />
Then, to generate a random time from a Weibull distribution with a given <math>\eta \,\!</math> and <math>\beta \,\!</math>, a uniform random number from 0 to 1, <math>{{U}_{R}}[0,1]\,\!</math>, is first obtained. The random time from a Weibull distribution is then obtained from:<br />
<br />
::<math>{{T}_{R}}=\eta \cdot {{\left\{ -\ln \left[ {{U}_{R}}[0,1] \right] \right\}}^{\tfrac{1}{\beta }}}\,\!</math><br />
<br />
To obtain a conditional time, the Weibull conditional reliability function is given by:<br />
<br />
::<math>R(T,t)=\frac{R(T+t)}{R(T)}=\frac{{{e}^{-{{\left( \tfrac{T+t}{\eta } \right)}^{\beta }}}}}{{{e}^{-{{\left( \tfrac{T}{\eta } \right)}^{\beta }}}}}\,\!</math><br />
<br />
Or: <br />
<br />
::<math>R(T,t)={{e}^{-\left[ {{\left( \tfrac{T+t}{\eta } \right)}^{\beta }}-{{\left( \tfrac{T}{\eta } \right)}^{\beta }} \right]}}\,\!</math><br />
<br />
The random time would be the solution for <math>t\,\!</math> for <math>R(T,t)={{U}_{R}}[0,1]\,\!</math>.<br />
<br />
To illustrate the sequence of events, assume a single block with a failure and a repair distribution. The first event, <math>{{E}_{{{F}_{1}}}}\,\!</math>, would be the failure of the component. Its first time-to-failure would be a random number drawn from its failure distribution, <math>{{T}_{{{F}_{1}}}}\,\!</math>. Thus, the first failure event, <math>{{E}_{{{F}_{1}}}}\,\!</math>, would be at <math>{{T}_{{{F}_{1}}}}\,\!</math>. Once failed, the next event would be the repair of the component, <math>{{E}_{{{R}_{1}}}}\,\!</math>. The time to repair the component would now be drawn from its repair distribution, <math>{{T}_{{{R}_{1}}}}\,\!</math>. The component would be restored by time <math>{{T}_{{{F}_{1}}}}+{{T}_{{{R}_{1}}}}\,\!</math>. The next event would now be the second failure of the component after the repair, <math>{{E}_{{{F}_{2}}}}\,\!</math>. This event would occur after a component operating time of <math>{{T}_{{{F}_{2}}}}\,\!</math> after the item is restored (again drawn from the failure distribution), or at <math>{{T}_{{{F}_{1}}}}+{{T}_{{{R}_{1}}}}+{{T}_{{{F}_{2}}}}\,\!</math>. This process is repeated until the end time. It is important to note that each run will yield a different sequence of events due to the probabilistic nature of the times. To arrive at the desired result, this process is repeated many times and the results from each run (simulation) are recorded. In other words, if we were to repeat this 1,000 times, we would obtain 1,000 different values for <math>{{E}_{{{F}_{1}}}}\,\!</math>, or <math>\left[ {{E}_{{{F}_{{{1}_{1}}}}}},{{E}_{{{F}_{{{1}_{2}}}}}},...,{{E}_{{{F}_{{{1}_{1,000}}}}}} \right]\,\!</math>.<br />
The average of these values, <math>\left( \tfrac{1}{1000}\underset{i=1}{\overset{1,000}{\mathop{\sum }}}\,{{E}_{{{F}_{{{1}_{i}}}}}} \right)\,\!</math>, would then be the average time to the first event, <math>{{E}_{{{F}_{1}}}}\,\!</math>, or the mean time to first failure (MTTFF) for the component. Obviously, if the component were to be 100% renewed after each repair, then this value would also be the same for the second failure, etc.<br />
<br />
=General Simulation Results=<br />
To further illustrate this, assume that components A and B in the prior example had normal failure and repair distributions with their means equal to the deterministic values used in the prior example and standard deviations of 10 and 1 respectively. That is, <math>{{F}_{A}}\tilde{\ }N(100,10),\,\!</math> <math>{{F}_{B}}\tilde{\ }N(120,10),\,\!</math> <math>{{R}_{A}}={{R}_{B}}\tilde{\ }N(10,1)\,\!</math>. The settings for components C and D are not changed. Obviously, given the probabilistic nature of the example, the times to each event will vary. If one were to repeat this <math>X\,\!</math> number of times, one would arrive at the results of interest for the system and its components. Some of the results for this system and this example, over 1,000 simulations, are provided in the figure below and explained in the next sections. <br />
[[Image:r2.png|center|600px|Summary of system results for 1,000 simulations.|link=]]<br />
<br />
The simulation settings are shown in the figure below.<br />
[[Image:8.7.gif|center|600px|BlockSim simulation window.|link=]]<br />
<br />
===General===<br />
====Mean Availability (All Events), <math>{{\overline{A}}_{ALL}}\,\!</math>====<br />
This is the mean availability due to all downing events, which can be thought of as the operational availability. It is the ratio of the system uptime divided by the total simulation time (total time). For this example: <br />
<br />
::<math>\begin{align}<br />
{{\overline{A}}_{ALL}}= & \frac{Uptime}{TotalTime} \\ <br />
= & \frac{269.137}{300} \\ <br />
= & 0.8971 <br />
\end{align}\,\!</math><br />
<br />
====Std Deviation (Mean Availability)====<br />
This is the standard deviation of the mean availability of all downing events for the system during the simulation.<br />
<br />
====Mean Availability (w/o PM, OC & Inspection), <math>{{\overline{A}}_{CM}}\,\!</math>====<br />
This is the mean availability due to failure events only and it is 0.971 for this example. Note that for this case, the mean availability without preventive maintenance, on condition maintenance and inspection is identical to the mean availability for all events. This is because no preventive maintenance actions or inspections were defined for this system. We will discuss the inclusion of these actions in later sections.<br />
<br />
Downtimes caused by PM and inspections are not included. However, if the PM or inspection action results in the discovery of a failure, then these times are included. As an example, consider a component that has failed but its failure is not discovered until the component is inspected. Then the downtime from the time failed to the time restored after the inspection is counted as failure downtime, since the original event that caused this was the component's failure. <br />
====Point Availability (All Events), <math>A\left( t \right)\,\!</math>====<br />
<br />
This is the probability that the system is up at time <math>t\,\!</math>. As an example, to obtain this value at <math>t\,\!</math> = 300, a special counter would need to be used during the simulation. This counter is increased by one every time the system is up at 300 hours. Thus, the point availability at 300 would be the times the system was up at 300 divided by the number of simulations. For this example, this is 0.930, or 930 times out of the 1000 simulations the system was up at 300 hours.<br />
<br />
====Reliability (Fail Events), <math>R(t)\,\!</math>====<br />
<br />
This is the probability that the system has not failed by time <math>t\,\!</math>. This is similar to point availability with the major exception that it only looks at the probability that the system did not have a single failure. Other (non-failure) downing events are ignored. During the simulation, a special counter again must be used. This counter is increased by one (once in each simulation) if the system has had at least one failure up to 300 hours. Thus, the reliability at 300 would be the number of times the system did not fail up to 300 divided by the number of simulations. For this example, this is 0 because the system failed prior to 300 hours 1000 times out of the 1000 simulations.<br />
<br />
It is very important to note that this value is not always the same as the reliability computed using the analytical methods, depending on the redundancy present. The reason that it may differ is best explained by the following scenario:<br />
<br />
Assume two units in parallel. The analytical system reliability, which does not account for repairs, is the probability that both units fail. In this case, when one unit goes down, it does not get repaired and the system fails after the second unit fails. In the case of repairs, however, it is possible for one of the two units to fail and get repaired before the second unit fails. Thus, when the second unit fails, the system will still be up due to the fact that the first unit was repaired.<br />
<br />
====Expected Number of Failures, <math>{{N}_{F}}\,\!</math>====<br />
This is the average number of system failures. The system failures (not downing events) for all simulations are counted and then averaged. For this case, this is 3.188, which implies that a total of 3,188 system failure events occurred over 1000 simulations. Thus, the expected number of system failures for one run is 3.188. This number includes all failures, even those that may have a duration of zero.<br />
<br />
====Std Deviation (Number of Failures)====<br />
This is the standard deviation of the number of failures for the system during the simulation.<br />
<br />
====MTTFF====<br />
MTTFF is the mean time to first failure for the system. This is computed by keeping track of the time at which the first system failure occurred for each simulation. MTTFF is then the average of these times. This may or may not be identical to the MTTF obtained in the analytical solution for the same reasons as those discussed in the Point Reliability section. For this case, this is 100.2511. This is fairly obvious for this case since the mean of one of the components in series was 100 hours.<br />
<br />
It is important to note that for each simulation run, if a first failure time is observed, then this is recorded as the system time to first failure. If no failure is observed in the system, then the simulation end time is used as a right censored (suspended) data point. MTTFF is then computed using the total operating time until the first failure divided by the number of observed failures (constant failure rate assumption). Furthermore, and if the simulation end time is much less than the time to first failure for the system, it is also possible that all data points are right censored (i.e., no system failures were observed). In this case, the MTTFF is again computed using a constant failure rate assumption, or:<br />
<br />
::<math>MTTFF=\frac{2\cdot ({{T}_{S}})\cdot N}{\chi _{0.50;2}^{2}}\,\!</math><br />
<br />
where <math>{{T}_{S}}\,\!</math> is the simulation end time and <math>N\,\!</math> is the number of simulations. One should be aware that this formulation may yield unrealistic (or erroneous) results if the system does not have a constant failure rate. If you are trying to obtain an accurate (realistic) estimate of this value, then your simulation end time should be set to a value that is well beyond the MTTF of the system (as computed analytically). As a general rule, the simulation end time should be at least three times larger than the MTTF of the system.<br />
<br />
====MTBF (Total Time)====<br />
This is the mean time between failures for the system based on the total simulation time and the expected number of system failures. For this example:<br />
<br />
::<math>\begin{align}<br />
MTBF (Total Time)= & \frac{TotalTime}{{N}_{F}} \\ <br />
= & \frac{300}{3.188} \\ <br />
= & 94.102886 <br />
\end{align}\,\!</math><br />
<br />
====MTBF (Uptime)====<br />
This is the mean time between failures for the system, considering only the time that the system was up. This is calculated by dividing system uptime by the expected number of system failures. You can also think of this as the mean uptime. For this example:<br />
<br />
::<math>\begin{align}<br />
MTBF (Uptime)= & \frac{Uptime}{{N}_{F}} \\ <br />
= & \frac{269.136952}{3.188} \\ <br />
= & 84.42188 <br />
\end{align}\,\!</math><br />
<br />
====MTBE (Total Time)====<br />
This is the mean time between all downing events for the system, based on the total simulation time and including all system downing events. This is calculated by dividing the simulation run time by the number of downing events (<math>{{N}_{AL{{L}_{Down}}}}\,\!</math>).<br />
<br />
====MTBE (Uptime)====<br />
his is the mean time between all downing events for the system, considering only the time that the system was up. This is calculated by dividing system uptime by the number of downing events (<math>{{N}_{AL{{L}_{Down}}}}\,\!</math>).<br />
<br />
===System Uptime/Downtime===<br />
<br />
====Uptime, <math>{{T}_{UP}}\,\!</math> ====<br />
<br />
This is the average time the system was up and operating. This is obtained by taking the sum of the uptimes for each simulation and dividing it by the number of simulations. For this example, the uptime is 269.137. To compute the Operational Availability, <math>{{A}_{o}},\,\!</math> for this system, then:<br />
<br />
::<math>{{A}_{o}}=\frac{{{T}_{UP}}}{{{T}_{S}}}\,\!</math><br />
<br />
====CM Downtime, <math>{{T}_{C{{M}_{Down}}}}\,\!</math> ====<br />
This is the average time the system was down for corrective maintenance actions (CM) only. This is obtained by taking the sum of the CM downtimes for each simulation and dividing it by the number of simulations. For this example, this is 30.863.<br />
To compute the Inherent Availability, <math>{{A}_{I}},\,\!</math> for this system over the observed time (which may or may not be steady state, depending on the length of the simulation), then:<br />
<br />
::<math>{{A}_{I}}=\frac{{{T}_{S}}-{{T}_{C{{M}_{Down}}}}}{{{T}_{S}}}\,\!</math><br />
<br />
====Inspection Downtime ====<br />
<br />
This is the average time the system was down due to inspections. This is obtained by taking the sum of the inspection downtimes for each simulation and dividing it by the number of simulations. For this example, this is zero because no inspections were defined.<br />
<br />
====PM Downtime, <math>{{T}_{P{{M}_{Down}}}}\,\!</math>====<br />
<br />
This is the average time the system was down due to preventive maintenance (PM) actions. This is obtained by taking the sum of the PM downtimes for each simulation and dividing it by the number of simulations. For this example, this is zero because no PM actions were defined.<br />
<br />
====OC Downtime, <math>{{T}_{O{{C}_{Down}}}}\,\!</math>====<br />
<br />
This is the average time the system was down due to on-condition maintenance (PM) actions. This is obtained by taking the sum of the OC downtimes for each simulation and dividing it by the number of simulations. For this example, this is zero because no OC actions were defined.<br />
<br />
====Waiting Downtime, <math>{{T}_{W{{ait}_{Down}}}}\,\!</math>====<br />
<br />
This is the amount of time that the system was down due to crew and spare part wait times or crew conflict times. For this example, this is zero because no crews or spare part pools were defined.<br />
<br />
====Total Downtime, <math>{{T}_{Down}}\,\!</math>====<br />
<br />
This is the downtime due to all events. In general, one may look at this as the sum of the above downtimes. However, this is not always the case. It is possible to have actions that overlap each other, depending on the options and settings for the simulation. Furthermore, there are other events that can cause the system to go down that do not get counted in any of the above categories. As an example, in the case of standby redundancy with a switch delay, if the settings are to reactivate the failed component after repair, the system may be down during the switch-back action. This downtime does not fall into any of the above categories but it is counted in the total downtime.<br />
<br />
For this example, this is identical to <math>{{T}_{C{{M}_{Down}}}}\,\!</math>.<br />
<br />
===System Downing Events===<br />
System downing events are events associated with downtime. Note that events with zero duration will appear in this section only if the task properties specify that the task brings the system down or if the task properties specify that the task brings the item down and the item’s failure brings the system down.<br />
<br />
====Number of Failures, <math>{{N}_{{{F}_{Down}}}}\,\!</math>====<br />
This is the average number of system downing failures. Unlike the Expected Number of Failures, <math>{{N}_{F}},\,\!</math> this number does not include failures with zero duration. For this example, this is 3.188. <br />
<br />
====Number of CMs, <math>{{N}_{C{{M}_{Down}}}}\,\!</math>====<br />
This is the number of corrective maintenance actions that caused the system to fail. It is obtained by taking the sum of all CM actions that caused the system to fail divided by the number of simulations. It does not include CM events of zero duration. For this example, this is 3.188. Note that this may differ from the Number of Failures, <math>{{N}_{{{F}_{Down}}}}\,\!</math>. An example would be a case where the system has failed, but due to other settings for the simulation, a CM is not initiated (e.g., an inspection is needed to initiate a CM).<br />
<br />
====Number of Inspections, <math>{{N}_{{{I}_{Down}}}}\,\!</math>====<br />
This is the number of inspection actions that caused the system to fail. It is obtained by taking the sum of all inspection actions that caused the system to fail divided by the number of simulations. It does not include inspection events of zero duration. For this example, this is zero.<br />
<br />
====Number of PMs, <math>{{N}_{P{{M}_{Down}}}}\,\!</math>====<br />
This is the number of PM actions that caused the system to fail. It is obtained by taking the sum of all PM actions that caused the system to fail divided by the number of simulations. It does not include PM events of zero duration. For this example, this is zero.<br />
<br />
====Number of OCs, <math>{{N}_{O{{C}_{Down}}}}\,\!</math>====<br />
This is the number of OC actions that caused the system to fail. It is obtained by taking the sum of all OC actions that caused the system to fail divided by the number of simulations. It does not include OC events of zero duration. For this example, this is zero.<br />
<br />
====Number of OFF Events by Trigger, <math>{{N}_{O{{FF}_{Down}}}}\,\!</math>====<br />
This is the total number of events where the system is turned off by state change triggers. An OFF event is not a system failure but it may be included in system reliability calculations. For this example, this is zero.<br />
<br />
====Total Events, <math>{{N}_{AL{{L}_{Down}}}}\,\!</math>====<br />
This is the total number of system downing events. It also does not include events of zero duration. It is possible that this number may differ from the sum of the other listed events. As an example, consider the case where a failure does not get repaired until an inspection, but the inspection occurs after the simulation end time. In this case, the number of inspections, CMs and PMs will be zero while the number of total events will be one.<br />
<br />
===Costs and Throughput===<br />
Cost and throughput results are discussed in later sections.<br />
<br />
===Note About Overlapping Downing Events===<br />
<br />
It is important to note that two identical system downing events (that are continuous or overlapping) may be counted and viewed differently. As shown in Case 1 of the following figure, two overlapping failure events are counted as only one event from the system perspective because the system was never restored and remained in the same down state, even though that state was caused by two different components. Thus, the number of downing events in this case is one and the duration is as shown in CM system. In the case that the events are different, as shown in Case 2 of the figure below, two events are counted, the CM and the PM. However, the downtime attributed to each event is different from the actual time of each event. In this case, the system was first down due to a CM and remained in a down state due to the CM until that action was over. However, immediately upon completion of that action, the system remained down but now due to a PM action. In this case, only the PM action portion that kept the system down is counted.<br />
<br />
[[Image:8.9.png|center|350px|Duration and count of different overlapping events.|link=]]<br />
<br />
===System Point Results===<br />
<br />
The system point results, as shown in the figure below, shows the Point Availability (All Events), <math>A\left( t \right)\,\!</math>, and Point Reliability, <math>R(t)\,\!</math>, as defined in the previous section. These are computed and returned at different points in time, based on the number of intervals selected by the user. Additionally, this window shows <math>(1-A(t))\,\!</math>, <math>(1-R(t))\,\!</math>, <math>\text{Labor Cost(t)}\,\!</math>,<math>\text{Part Cost(t)}\,\!</math>, <math>Cost(t)\,\!</math>, <math>Mean\,\!</math> <math>A(t)\,\!</math>, <math>Mean\,\!</math> <math>A({{t}_{i}}-{{t}_{i-1}})\,\!</math>, <math>System\,\!</math>, <math>Failures(t)\,\!</math>, <math>\text{System Off Events by Trigger(t)}\,\!</math> and <math>Throughput(t)\,\!</math>.<br />
<br />
[[Image:BS8.10.png|center|750px|link=]]<br />
The number of intervals shown is based on the increments set. In this figure, the number of increments set was 300, which implies that the results should be shown every hour. The results shown in this figure are for 10 increments, or shown every 30 hours.<br />
<br />
=Results by Component=<br />
Simulation results for each component can also be viewed. The figure below shows the results for component A. These results are explained in the sections that follow.<br />
<br />
[[Image:8.11.gif|center|600px|The Block Details results for component A.|link=]]<br />
<br />
===General Information===<br />
====Number of Block Downing Events, <math>Componen{{t}_{NDE}}\,\!</math>====<br />
This the number of times the component went down (failed). It includes all downing events.<br />
<br />
====Number of System Downing Events, <math>Componen{{t}_{NSDE}}\,\!</math>====<br />
<br />
This is the number of times that this component's downing caused the system to be down. For component <math>A\,\!</math>, this is 2.038. Note that this value is the same in this case as the number of component failures, since the component A is reliability-wise in series with components D and components B, C. If this were not the case (e.g., if they were in a parallel configuration, like B and C), this value would be different.<br />
<br />
====Number of Failures, <math>Componen{{t}_{NF}}\,\!</math>====<br />
<br />
This is the number of times the component failed and does not include other downing events. Note that this could also be interpreted as the number of spare parts required for CM actions for this component. For component <math>A\,\!</math>, this is 2.038.<br />
<br />
====Number of System Downing Failures, <math>Componen{{t}_{NSDF}}\,\!</math>====<br />
This is the number of times that this component's failure caused the system to be down. Note that this may be different from the Number of System Downing Events. It only counts the failure events that downed the system and does not include zero duration system failures.<br />
<br />
====Number of OFF events by Trigger, <math>Componen{{t}_{OFF}}\,\!</math>====<br />
The total number of events where the block is turned off by state change triggers. An OFF event is not a failure but it may be included in system reliability calculations.<br />
<br />
====Mean Availability (All Events), <math>{{\overline{A}}_{AL{{L}_{Component}}}}\,\!</math>====<br />
<br />
This has the same definition as for the system with the exception that this accounts only for the component.<br />
<br />
====Mean Availability (w/o PM, OC & Inspection), <math>{{\overline{A}}_{C{{M}_{Component}}}}\,\!</math>====<br />
<br />
The mean availability of all downing events for the block, not including preventive, on condition or inspection tasks, during the simulation.<br />
<br />
====Block Uptime, <math>{{T}_{Componen{{t}_{UP}}}}\,\!</math>====<br />
<br />
This is tThe total amount of time that the block was up (i.e., operational) during the simulation. For component <math>A\,\!</math>, this is 279.8212.<br />
<br />
====Block Downtime, <math>{{T}_{Componen{{t}_{Down}}}}\,\!</math>====<br />
<br />
This is the average time the component was down for any reason. For component <math>A\,\!</math>, this is 20.1788.<br />
<br />
Block Downtime shows the total amount of time that the block was down (i.e., not operational) during the simulation.<br />
<br />
===Metrics===<br />
====RS DECI====<br />
<br />
The ReliaSoft Downing Event Criticality Index for the block. This is a relative index showing the percentage of times that a downing event of the block caused the system to go down (i.e., the number of system downing events caused by the block divided by the total number of system downing events). For component <math>A\,\!</math>, this is 63.93%. This implies that 63.93% of the times that the system went down, the system failure was due to the fact that component <math>A\,\!</math> went down. This is obtained from:<br />
<br />
::<math>\begin{align}<br />
RSDECI=\frac{Componen{{t}_{NSDE}}}{{{N}_{AL{{L}_{Down}}}}} <br />
\end{align}\,\!</math><br />
<br />
====Mean Time Between Downing Events====<br />
This is the mean time between downing events of the component, which is computed from:<br />
<br />
::<math>MTBDE=\frac{{{T}_{Componen{{t}_{UP}}}}}{Componen{{t}_{NDE}}}\,\!</math><br />
<br />
For component <math>A\,\!</math>, this is 137.3019.<br />
<br />
====RS FCI====<br />
ReliaSoft's Failure Criticality Index (RS FCI) is a relative index showing the percentage of times that a failure of this component caused a system failure. For component <math>A\,\!</math>, this is 63.93%. This implies that 63.93% of the times that the system failed, it was due to the fact that component <math>A\,\!</math> failed. This is obtained from:<br />
<br />
::<math>\begin{align}<br />
RSFCI=\frac{Componen{{t}_{NSDF}}+{{F}_{ZD}}}{{{N}_{F}}} <br />
\end{align}\,\!</math><br />
<br />
<math>{{F}_{ZD}}\,\!</math> is a special counter of system failures not included in <math>Componen{{t}_{NSDF}}\,\!</math>. This counter is not explicitly shown in the results but is maintained by the software. The reason for this counter is the fact that zero duration failures are not counted in <math>Componen{{t}_{NSDF}}\,\!</math> since they really did not down the system. However, these zero duration failures need to be included when computing RS FCI.<br />
<br />
It is important to note that for both RS DECI and RS FCI, and if overlapping events are present, the component that caused the system event gets credited with the system event. Subsequent component events that do not bring the system down (since the system is already down) do not get counted in this metric.<br />
<br />
====MTBF, <math>MTB{{F}_{C}}\,\!</math>====<br />
<br />
Mean time between failures is the mean (average) time between failures of this component, in real clock time. This is computed from:<br />
<br />
::<math>MTB{{F}_{C}}=\frac{{{T}_{S}}-CFDowntime}{Componen{{t}_{NF}}}\,\!</math><br />
<br />
<math>CFDowntime\,\!</math> is the downtime of the component due to failures only (without PM, OC and inspection). The discussion regarding what is a failure downtime that was presented in the section explaining Mean Availability (w/o PM & Inspection) also applies here.<br />
For component <math>A\,\!</math>, this is 137.3019. Note that this value could fluctuate for the same component depending on the simulation end time. As an example, consider the deterministic scenario for this component. It fails every 100 hours and takes 10 hours to repair. Thus, it would be failed at 100, repaired by 110, failed at 210 and repaired by 220. Therefore, its uptime is 280 with two failure events, MTBF = 280/2 = 140. Repeating the same scenario with an end time of 330 would yield failures at 100, 210 and 320. Thus, the uptime would be 300 with three failures, or MTBF = 300/3 = 100. Note that this is not the same as the MTTF (mean time to failure), commonly referred to as MTBF by many practitioners. <br />
<br />
====Mean Downtime per Event, <math>MDPE\,\!</math>====<br />
Mean downtime per event is the average downtime for a component event. This is computed from:<br />
<br />
::<math>MDPE=\frac{{{T}_{Componen{{t}_{Down}}}}}{Componen{{t}_{NDE}}}\,\!</math><br />
<br />
====RS DTCI====<br />
The ReliaSoft Downtime Criticality Index for the block. This is a relative index showing the contribution of the block to the system’s downtime (i.e., the system downtime caused by the block divided by the total system downtime).<br />
<br />
====RS BCCI====<br />
The ReliaSoft Block Cost Criticality Index for the block. This is a relative index showing the contribution of the block to the total costs (i.e., the total block costs divided by the total costs).<br />
<br />
====Non-Waiting Time CI====<br />
A relative index showing the contribution of repair times to the block’s total downtime. (The ratio of the time that the crew is actively working on the item to the total down time). <br />
<br />
====Total Waiting Time CI====<br />
A relative index showing the contribution of wait factor times to the block’s total downtime. Wait factors include crew conflict times, crew wait times and spare part wait times. (The ratio of downtime not including active repair time). <br />
<br />
====Waiting for Opportunity/Maximum Wait Time Ratio====<br />
A relative index showing the contribution of crew conflict times. This is the ratio of the time spent waiting for the crew to respond (not including crew logistic delays) to the total wait time (not including the active repair time). <br />
<br />
====Crew/Part Wait Ratio====<br />
The ratio of the crew and part delays. A value of 100% means that both waits are equal. A value greater than 100% indicates that the crew delay was in excess of the part delay. For example, a value of 200% would indicate that the wait for the crew is two times greater than the wait for the part.<br />
<br />
====Part/Crew Wait Ratio====<br />
The ratio of the part and crew delays. A value of 100% means that both waits are equal. A value greater than 100% indicates that the part delay was in excess of the crew delay. For example, a value of 200% would indicate that the wait for the part is two times greater than the wait for the crew.<br />
<br />
===Downtime Summary===<br />
====Non-Waiting Time====<br />
Time that the block was undergoing active maintenance/inspection by a crew. If no crew is defined, then this will return zero.<br />
<br />
====Waiting for Opportunity====<br />
The total downtime for the block due to crew conflicts (i.e., time spent waiting for a crew while the crew is busy with another task). If no crew is defined, then this will return zero. <br />
<br />
====Waiting for Crew====<br />
The total downtime for the block due to crew wait times (i.e., time spent waiting for a crew due to logistical delay). If no crew is defined, then this will return zero. <br />
<br />
====Waiting for Parts====<br />
The total downtime for the block due to spare part wait times. If no spare part pool is defined then this will return zero. <br />
<br />
====Other Results of Interest====<br />
The remaining component (block) results are similar to those defined for the system with the exception that now they apply only to the component.<br />
<br />
=Imperfect Repairs= <!-- THIS SECTION HEADER IS LINKED TO: http://help.synthesis8.com/rcm8/tasks.htm. IF YOU RENAME THE SECTION, YOU MUST UPDATE THE LINK. --><br />
{{:Imperfect Repairs}}<br />
<br />
=Using Resources: Pools and Crews=<br />
In order to make the analysis more realistic, one may wish to consider additional sources of delay times in the analysis or study the effect of limited resources. In the prior examples, we used a repair distribution to identify how long it takes to restore a component. The factors that one chooses to consider in this time may include the time it takes to do the repair and/or the time it takes to get a crew, a spare part, etc. While all of these factors may be included in the repair duration, optimized usage of these resources can only be achieved if the resources are studied individually and their dependencies are identified.<br />
<br />
As an example, consider the situation where two components in parallel fail at the same time and only a single repair person is available. Because this person would not be able to execute the repair on both components simultaneously, an additional delay will be encountered that also needs to be included in the modeling. One way to accomplish this is to assign a specific repair crew to each component.<br />
<br />
===Including Crews===<br />
<br />
BlockSim allows you to assign maintenance crews to each component and one or more crews may be assigned to each component from the Maintenance Task Properties window. Note that there may be different crews for each action, (i.e., corrective, preventive, on condition and inspection).<br />
<br />
A crew record needs to be defined for each named crew, as shown in the picture below. The basic properties for each crew include factors such as:<br />
<br><br />
* Logistic delays. How long does it take for the crew to arrive?<br />
* Is there a limit to the number of tasks this crew can perform at the same time? If yes, how many simultaneous tasks can the crew perform?<br />
* What is the cost per hour for the crew?<br />
* What is the cost per incident for the crew?<br />
<br />
[[Image:8.16.png|center|518px|link=]]<br />
<br />
===Illustrating Crew Use===<br />
To illustrate the use of crews in BlockSim, consider the deterministic scenario described by the following RBD and properties.<br />
<br />
[[Image:r12.png|center|350px|link=]]<br />
<br />
<br />
{| border="1" align="center" style="border-collapse: collapse;" cellpadding="5" cellspacing="5"<br />
|-<br />
! Unit<br />
! Failure<br />
! Repair<br />
! Crew<br />
|-<br />
| <math>A\,\!</math><br />
| <math>100\,\!</math><br />
| <math>10\,\!</math><br />
| Crew <math>A\,\!</math> : Delay = 20, Single Task<br />
|-<br />
| <math>B\,\!</math><br />
| <math>120\,\!</math><br />
| <math>20\,\!</math><br />
| Crew <math>A\,\!</math> : Delay = 20, Single Task<br />
|-<br />
| <math>C\,\!</math><br />
| <math>140\,\!</math><br />
| <math>20\,\!</math><br />
| Crew <math>A\,\!</math> : Delay = 20, Single Task<br />
|-<br />
| <math>D\,\!</math><br />
| <math>160\,\!</math><br />
| <math>10\,\!</math><br />
| Crew <math>A\,\!</math> : Delay = 20, Single Task<br />
|}<br />
<br />
<br />
[[Image:BS8.17.png|center|600px|link=]]<br />
<br />
As shown in the figure above, the System Up/Down plot illustrates the sequence of events, which are:<br />
<br />
::#At 100, <math>A\,\!</math> fails. It takes 20 to get the crew and 10 to repair, thus the component is repaired by 130. The system is failed/down during this time. <br />
::#At 150, <math>B\,\!</math> fails since it would have accumulated an operating age of 120 by this time. It again has to wait for the crew and is repaired by 190. <br />
::#At 170, <math>C\,\!</math> fails. Upon this failure, <math>C\,\!</math> requests the only available crew. However, this crew is currently engaged by <math>B\,\!</math> and, since the crew can only perform one task at a time, it cannot respond immediately to the request by <math>C\,\!</math>. Thus, <math>C\,\!</math> will remain failed until the crew becomes available. The crew will finish with unit <math>B\,\!</math> at 190 and will then be dispatched to <math>C\,\!</math>. Upon dispatch, the logistic delay will again be considered and <math>C\,\!</math> will be repaired by 230. The system continues to operate until the failures of <math>B\,\!</math> and <math>C\,\!</math> overlap (i.e., the system is down from 170 to 190)<br />
::#At 210, <math>D\,\!</math> fails. It again has to wait for the crew and repair.<br />
::#<math>D\,\!</math> is up at 260.<br />
The following figure shows an example of some of the possible crew results (details), which are presented next. <br />
<br />
[[Image:BS8.18.png|thumb|center|500px|Crew results shown in the BlockSim's Simulation Results Explorer.|link=]]<br />
<br />
====Explanation of the Crew Details====<br />
::#Each request made to a crew is logged. <br />
::#If a request is successful (i.e., the crew is available), the call is logged once in the Calls Received counter and once in the Accepted Calls counter. <br />
::#If a request is not accepted (i.e., the crew is busy), the call is logged once in the Calls Received counter and once in the Rejected Calls counter. When the crew is free and can be called upon again, the call is logged once in the Calls Received counter and once in the Accepted Calls counter.<br />
::#In this scenario, there were two instances when the crew was not available, Rejected Calls = 2, and there were four instances when the crew performed an action, Calls Accepted = 4, for a total of six calls, Calls Received = 6.<br />
::#Percent Accepted and Percent Rejected are the ratios of calls accepted and calls rejected with respect to the total calls received.<br />
::#Total Utilization is the total time that the crew was used. It includes both the time required to complete the repair action and the logistic time. In this case, this is 140, or: <br />
<br />
::<math>\begin{align}<br />
{{T}_{{{R}_{A}}}}= & 10,{{T}_{{{L}_{A}}}}=20 \\ <br />
{{T}_{{{R}_{B}}}}= & 20,{{T}_{{{L}_{B}}}}=20 \\ <br />
{{T}_{{{R}_{C}}}}= & 20,{{T}_{{{L}_{C}}}}=20 \\ <br />
{{T}_{{{R}_{D}}}}= & 10,{{T}_{{{L}_{D}}}}=20 \\ <br />
{{T}_{U}}= & \left( {{T}_{{{R}_{A}}}}+{{T}_{{{L}_{A}}}} \right)+\left( {{T}_{{{R}_{B}}}}+{{T}_{{{L}_{B}}}} \right) \\ <br />
& +\left( {{T}_{{{R}_{C}}}}+{{T}_{{{L}_{C}}}} \right)+\left( {{T}_{{{R}_{D}}}}+{{T}_{{{L}_{D}}}} \right) \\ <br />
{{T}_{U}}= & 140 <br />
\end{align}\,\!</math><br />
<br />
:::6. Average Call Duration is the average duration of each crew usage, and it also includes both logistic and repair time. It is the total usage divided by the number of accepted calls. In this case, this is 35.<br />
:::7. Total Wait Time is the time that blocks in need of a repair waited for this crew. In this case, it is 40 ( <math>C\,\!</math> and <math>D\,\!</math> both waited 20 each). <br />
:::8. Total Crew Costs are the total costs for this crew. It includes the per incident charge as well as the per unit time costs. In this case, this is 180. There were four incidents at 10 each for a total of 40, as well as 140 time units of usage at 1 cost unit per time unit.<br />
:::9. Average Cost per Call is the total cost divided by the number of accepted calls. In this case, this is 45.<br />
<br />
Note that crew costs that are attributed to individual blocks can be obtained from the Blocks reports, as shown in the figure below. <br />
<br />
[[Image:BS8.19.png|thumb|center|650px|Allocation of crew costs.|link=]]<br />
<br />
====How BlockSim Handles Crews====<br />
::#Crew logistic time is added to each repair time. <br />
::#The logistic time is always present, and the same, regardless of where the crew was called from (i.e., whether the crew was at another job or idle at the time of the request).<br />
::#For any given simulation, each crew's logistic time is constant (taken from the distribution) across that single simulation run regardless of the task (CM, PM or inspection).<br />
::#A crew can perform either a finite number of simultaneous tasks or an infinite number. <br />
::#If the finite limit of tasks is reached, the crew will not respond to any additional request until the number of tasks the crew is performing is less than its finite limit.<br />
::#If a crew is not available to respond, the component will "wait" until a crew becomes available.<br />
::#BlockSim maintains the queue of rejected calls and will dispatch the crew to the next repair on a "first come, first served" basis.<br />
::#Multiple crews can be assigned to a single block (see overview in the next section).<br />
::#If no crew has been assigned for a block, it is assumed that no crew restrictions exist and a default crew is used. The default crew can perform an infinite number of simultaneous tasks and has no delays or costs.<br />
<br />
====Looking at Multiple Crews====<br />
Multiple crews may be available to perform maintenance for a particular component. When multiple crews have been assigned to a block in BlockSim, the crews are assigned to perform maintenance based on their order in the crew list, as shown in the figure below.<br />
<br />
[[Image:r23.png||thumb|center|500px|A single component with two corrective maintenance crews assigned to it.|link=]]<br />
<br />
In the case where more than one crew is assigned to a block, and if the first crew is unavailable, then the next crew is called upon and so forth. As an example, consider the prior case but with the following modifications (i.e., Crews <math>A\,\!</math> and <math>B\,\!</math> are assigned to all blocks):<br />
<br />
[[Image:r8.png|center|400px|link=]]<br />
<br />
<br />
{| border="1" align="center" style="border-collapse: collapse;" cellpadding="5" cellspacing="5"<br />
|-<br />
! Unit<br />
! Failure<br />
! Repair<br />
! Crew<br />
|-<br />
| <math>A\,\!</math><br />
| <math>100\,\!</math> <br />
| <math>10\,\!</math><br />
| <math>A,B\,\!</math><br />
|-<br />
| <math>B\,\!</math><br />
| <math>120\,\!</math> <br />
| <math>20\,\!</math><br />
| <math>A,B\,\!</math><br />
|-<br />
| <math>C\,\!</math><br />
| <math>140\,\!</math><br />
| <math>20\,\!</math><br />
| <math>A,B\,\!</math><br />
|-<br />
| <math>D\,\!</math><br />
| <math>160\,\!</math><br />
| <math>10\,\!</math><br />
| <math>A,B\,\!</math><br />
|}<br />
<br />
<br />
{| border="1" align="center" style="border-collapse: collapse;" cellpadding="5" cellspacing="5"<br />
|-<br />
| Crew <math>A\,\!</math> ; Delay = 20, Single Task<br />
|-<br />
| Crew <math>B\,\!</math> ; Delay = 30, Single Task<br />
|}<br />
<br />
<br />
The system would behave as shown in the figure below.<br />
<br />
[[Image:r13.png|center|550px|link=]]<br />
<br />
In this case, Crew <math>B\,\!</math> was used for the <math>C\,\!</math> repair since Crew <math>A\,\!</math> was busy. On all others, Crew <math>A\,\!</math> was used. It is very important to note that once a crew has been assigned to a task it will complete the task. For example, if we were to change the delay time for Crew <math>B\,\!</math> to 100, the system behavior would be as shown in the figure below.<br />
<br />
[[Image:r14.png|center|550px|System up/down plot with the delay time for Crew B changed to 100.|link=]]<br />
<br />
In other words, even though Crew <math>A\,\!</math> would have finished the repair on <math>C\,\!</math> more quickly if it had been available when originally called, <math>B\,\!</math> was assigned the task because <math>A\,\!</math> was not available at the instant that the crew was needed.<br />
<br />
===Additional Rules on Crews===<br />
<br />
::1. If all assigned crews are engaged, the next crew that will be chosen is the crew that can get there first. <br />
:::a) This accounts for the time it would take a particular crew to complete its current task (or all tasks in its queue) and its logistic time.<br />
::2. If a crew is available, it gets used regardless of what its logistic delay time is. <br />
:::a) In other words, if a crew with a shorter logistic time is busy, but almost done, and another crew with a much higher logistic time is currently free, the free one will get assigned to the task.<br />
::3. For each simulation each crew's logistic time is computed (taken randomly from its distribution or its fixed time) at the beginning of the simulation and remains constant across that one simulation for all actions (CM, PM and inspection).<br />
<br />
===Using Spare Part Pools===<br />
<br />
BlockSim also allows you to specify spare part pools (or depots). Spare part pools allow you to model and manage spare part inventory and study the effects associated with limited inventories. Each component can have a spare part pool associated with it. If a spare part pool has not been defined for a block, BlockSim's analysis assumes a default pool of infinite spare parts. To speed up the simulation, no details on pool actions are kept during the simulation if the default pool is used.<br />
<br />
Pools allow you to define multiple aspects of the spare part process, including stock levels, logistic delays and restock options. Every time a part is repaired under a CM or scheduled action (PM, OC and Inspection), a spare part is obtained from the pool. If a part is available in the pool, it is then used for the repair. Spare part pools perform their actions based on the simulation clock time. <br />
<br />
====Spare Properties====<br />
<br />
A spare part pool is identified by a name. The general properties of the pool are its stock level (must be greater than zero), cost properties and logistic delay time. If a part is available (in stock), the pool will dispense that part to the requesting block after the specified logistic time has elapsed. One needs to think of a pool as an independent entity. It accepts requests for parts from blocks and dispenses them to the requesting blocks after a given logistic time. Requests for spares are handled on a first come, first served basis. In other words, if two blocks request a part and only one part is in stock, the first block that made the request will receive the part. Blocks request parts from the pool immediately upon the initiation of a CM or scheduled event (PM, OC and Inspection).<br />
<br />
====Restocking the Pool====<br />
<br />
If the pool has a finite number of spares, restock actions may be incorporated. The figure below shows the restock properties. Specifically, a pool can restock itself either through a scheduled restock action or based on specified conditions.<br />
<br />
[[Image:BS8.24.png|center|500px|link=]]<br />
<br />
A scheduled restock action adds a set number of parts to the pool on a predefined scheduled part arrival time. For the settings in the figure above, one spare part would be added to the pool every 100 hours, based on the system (simulation) time. In other words, for a simulation of 1,000 hours, a spare part would arrive at 100 hours, 200 hours, etc. The part is available to the pool immediately after the restock action and without any logistic delays. <br />
<br />
In an on-condition restock, a restock action is initiated when the stock level reaches (or is below) a specified value. In figure above, five parts are ordered when the stock level reaches 0. Note that unlike the scheduled restock, parts added through on-condition restock become available after a specified logistic delay time. In other words, when doing a scheduled restock, the parts are pre-ordered and arrive when needed. Whereas in the on-condition restock, the parts are ordered when the condition occurs and thus arrive after a specified time. For on-condition restocks, the condition is triggered if and only if the stock level drops to or below the specified stock level, regardless of how the spares arrived to the pool or were distributed by the pool. In addition, the restock trigger value must be less than the initial stock.<br />
<br />
Lastly, a maximum capacity can be assigned to the pool. If the maximum capacity is reached, no more restock actions are performed. This maximum capacity must be equal to or greater than the initial stock. When this limit is reached, no more items are added to the pool. For example, if the pool has a maximum capacity of ten and a current stock level of eight and if a restock action is set to add five items to the pool, then only two will be accepted.<br />
<br />
====Obtaining Emergency Spares====<br />
<br />
Emergency restock actions can also be defined. The figure below illustrates BlockSim's Emergency Spare Provisions options. An emergency action is triggered only when a block requests a spare and the part is not currently in stock. This is the only trigger condition. It does not account for whether a part has been ordered or if one is scheduled to arrive. Emergency spares are ordered when the condition is triggered and arrive after a time equal to the required time to obtain emergency spare(s).<br />
<br />
[[Image:BS8.25.png|center|500px|link=]]<br />
<br />
===Summary of Rules for Spare Part Pools===<br />
<br />
The following rules summarize some of the logic when dealing with spare part pools. <br />
<br />
====Basic Logic Rules====<br />
<br />
::1. '''Queue Based''': Requests for spare parts from blocks are queued and executed on a "first come, first served" basis.<br />
::2. '''Emergency''': Emergency restock actions are performed only when a part is not available.<br />
::3. '''Scheduled Restocks''': Scheduled restocks are added instantaneously to the pool at the scheduled time.<br />
::4. '''On-Condition Restock''': On-condition restock happens when the specified condition is reached (e.g., when the stock drops to two or if a request is received for a part and the stock is below the restock level).<br />
:::a) For example, if a pool has three items in stock and it dispenses one, an on-condition restock is initiated the instant that the request is received (without regard to the logistic delay time). The restocked items will be available after the required time for stock arrival has elapsed.<br />
:::b) The way that this is defined allows for the possibility of multiple restocks. Specifically, every time a part needs to be dispensed and the stock is lower than the specified quantity, parts are ordered. In the case of a long logistic delay time, it is possible to have multiple re-orders in the queue.<br />
::5. '''Parts Become Available after Spare Acquisition Logistic Delay''': If there is a spare acquisition logistic time delay, the requesting block will get the part after that delay. <br />
:::a) For example, if a block with a repair duration of 10 fails at 100 and requests a part from a pool with a logistic delay time of 10, that block will not be up until 120.<br />
::6. '''Compound Delays''': If a part is not available and an emergency part (or another part) can be obtained, then the total wait time for the part is the sum of both the logistic time and the required time to obtain a spare.<br />
::7. '''First Available Part is Dispensed to the First Block in the Queue''': The pool will dispense a requested part if it has one in stock or when it becomes available, regardless of what action (i.e., as needed restock or emergency restock) that request may have initiated. <br />
:::a) For example, if Block A requests a part from a pool and that triggers an emergency restock action, but a part arrives before the emergency restock through another action (e.g., scheduled restock), then the pool will dispense the newly arrived part to Block A (if Block A is next in the queue to receive a part).<br />
::8. '''Blocks that Trigger an Action Get Charged with the Action''': A block that triggers an emergency restock is charged for the additional cost to obtain the emergency part, even if it does not use an emergency part (i.e., even if another part becomes available first).<br />
::9. '''Triggered Action Cannot be Canceled.''' If a block triggers a restock action but then receives a part from another source, the action that the block triggered is not canceled.<br />
:::a) For example, if Block A initiates an emergency restock action but was then able to use a part that became available through other actions, the emergency request is not canceled and an emergency spare part will be added to the pool's stock level. <br />
:::b) Another way to explain this is by looking at the part acquisition logistic times as transit times. Because an ordered part is en-route to you after you order it, you will receive it regardless of whether the conditions have changed and you no longer need it.<br />
<br />
===Simultaneous Dispatch of Crews and Parts Logic===<br />
<br />
Some special rules apply when a block has both logistic delays in acquiring parts from a pool and when waiting for crews. BlockSim dispatches requests for crews and spare parts simultaneously. The repair action does not start until both crew and part arrive, as shown next.<br />
<br />
[[Image:r18.png|center|400px|link=]]<br />
<br />
If a crew arrives and it has to wait for a part, then this time (and cost) is added to the crew usage time.<br />
<br />
===Example Using Both Crews and Pools===<br />
<br />
Consider the following example, using both crews and pools.<br />
<br />
[[Image:r19.png|center|300px|link=]]<br />
<br />
where:<br />
<br />
[[Image:r20.png|center|400px|link=]]<br />
<br />
And the crews are:<br />
<br />
[[Image:r21.png|center|400px|link=]]<br />
<br />
While the spare pool is: <br />
<br />
[[Image:r22.png|center|500px|link=]]<br />
<br />
The behavior of this system from 0 to 300 is shown graphically in the figure below.<br />
<br />
[[Image:8.26.png|center|600px|link=]]<br />
<br />
The discrete system events during that time are as follows:<br />
<br />
::1. Component <math>A\,\!</math> fails at 100 and Crew <math>A\,\!</math> is engaged. <br />
<br />
:::a) At 110, Crew <math>A\,\!</math> arrives and completes the repair by 120. <br />
:::b) This repair uses the only spare part in inventory and triggers an on-condition restock. A part is ordered and is scheduled to arrive at 160.<br />
:::c) A scheduled restock part is also set to arrive at 150.<br />
:::d) Pool [on-hand = 0, pending: 150, 160].<br />
::2. Component <math>B\,\!</math> fails at 121. Crew <math>A\,\!</math> is available and it is engaged. <br />
:::a) Crew <math>A\,\!</math> arrives by 131 but no part is available. <br />
:::b) The failure finds the pool with no parts, triggering the on-condition restock. A part was ordered and is scheduled to arrive at 181.<br />
:::c) Pool [on-hand = 0, pending: 150, 160, 181].<br />
:::d) At 150, the first part arrives and is used by Component <math>B\,\!</math>.<br />
:::e) Repair on Component <math>B\,\!</math> is completed 20 time units later, at 170.<br />
:::f) Pool [on-hand=0, pending: 160, 181].<br />
::3. Component <math>C\,\!</math> fails at 122. Crew <math>A\,\!</math> is already engaged by Component <math>B\,\!</math>, thus Crew <math>B\,\!</math> is engaged. <br />
:::a) Crew <math>B\,\!</math> arrives at 137 but no part is available.<br />
:::b) The failure finds the pool with no parts, triggering the on-condition restock. A part is ordered and is scheduled to arrive at 182.<br />
:::c) Pool [on-hand = 0, pending: 160, 181,182].<br />
:::d) At 160, the part arrives and Component <math>C\,\!</math> is repaired by 180. <br />
:::e) Pool [on-hand = 0, pending: 181,182].<br />
::4. Component <math>F\,\!</math> fails at 123. No crews are available until 170 when Crew <math>A\,\!</math> becomes available.<br />
:::a) Crew <math>A\,\!</math> arrives by 180 and has to wait for a part.<br />
:::b) The failure found the pool with no parts, triggering the on-condition restock. A part is ordered and is scheduled to arrive at 183.<br />
:::c) Pool [on-hand = 0, pending: 181,182, 183].<br />
:::d) At 181, a part is obtained.<br />
:::e) By 201, the repair is completed.<br />
:::f) Pool [on-hand = 0, pending: 182, 183]<br />
::5. Component <math>D\,\!</math> fails at 171 with no crew available. <br />
:::a) Crew <math>B\,\!</math> becomes available at 180 and arrives by 195. <br />
:::b) The failure finds the pool with no parts, triggering the on-condition restock. A part is ordered and is scheduled to arrive at 231.<br />
:::c) The next part becomes available at 182 and the repair is completed by 205.<br />
:::d) Pool [on-hand = 0, pending: 183, 231]<br />
::6. End time is at 300. The last scheduled part arrives at the pool at 300.<br />
<br />
=Using Maintenance Tasks=<br />
One of the most important benefits of simulation is the ability to define how and when actions are performed. In our case, the actions of interest are part repairs/replacements. This is accomplished in BlockSim through the use of maintenance tasks. Specifically, four different types of tasks can be defined for maintenance actions: corrective maintenance, preventive maintenance, on condition maintenance and inspection.<br />
<br><br />
<br />
===Corrective Maintenance Tasks===<br />
A corrective maintenance task defines when a corrective maintenance (CM) action is performed. The figure below shows a corrective maintenance task assigned to a block in BlockSim. Corrective actions will be performed either immediately upon failure of the item or upon finding that the item has failed (for hidden failures that are not detected until an inspection). BlockSim allows the selection of either category. <br />
*'''Upon item failure''': The CM action is initiated immediately upon failure. If the user doesn't specify the choice for a CM, then this is the default option. All prior examples were based on the instruction to perform a CM upon failure. <br />
*'''When found failed during an Inspection''': The CM action will only be initiated after an inspection is done on the failed component. How and when the inspections are performed is defined by the block's inspection properties. This has the effect of defining a dependency between the corrective maintenance task and the inspection task.<br />
<br />
<br />
[[Image:r23.png|center|500px|link=]]<br />
<br />
<div class="noprint"><br />
{{Examples Box|BlockSim Examples|<p>More application examples are available! See also:</p> {{Examples Link|BlockSim_Example:_CM_Triggered_by_Subsystem_Down|CM Triggered by Subsystem Down}}}}<br />
</div><br />
<br />
===Scheduled Tasks===<br />
Scheduled tasks can be performed on a known schedule, which can be based on any of the following:<br />
* A time interval, either fixed or dynamic, based on the item's age (item clock) or on calendar time (system clock). See [[#Item and System Ages|Item and System Ages]].<br />
* The occurrence of certain events, including:<br />
**The system goes down. <br />
**Certain events happen in a maintenance group. The events and groups are user-specified, and the item that the task is assigned to does not need to be part of the selected maintenance group(s).<br />
<br />
The types of scheduled tasks include:<br />
*Inspection tasks<br />
*Preventive maintenance tasks<br />
*On condition tasks<br />
<br />
====Item and System Ages====<br />
It is important to keep in mind that the system and each component of the system maintain separate clocks within the simulation. When setting intervals to perform a scheduled task, the intervals can be based on either type of clock. Specifically:<br />
*Item age refers to the accumulated age of the block, which gets adjusted each time the block is repaired (i.e., restored). If the block is repaired at least once during the simulation, this will be different from the elapsed simulation time. For example, if the restoration factor is 1 (i.e., “as good as new”) and the assigned interval is 100 days based on item age, then the task will be scheduled to be performed for the first time at 100 days of elapsed simulation time. However, if the block fails at 85 days and it takes 5 days to complete the repair, then the block will be fully restored at 90 days and its accumulated age will be reset to 0 at that point. Therefore, if another failure does not occur in the meantime, the task will be performed for the first time 100 days later at 190 days of elapsed simulation time.<br />
<br />
[[Image:Updown_item_age.png|center|450px|link=]]<br />
<br />
*Calendar time refers to the elapsed simulation time. If the assigned interval is 100 days based on calendar time, then the task will be performed for the first time at 100 days of elapsed simulation time, for the second time at 200 days of elapsed simulation time and so on, regardless of whether the block fails and gets repaired correctively between those times.<br />
<br />
[[Image:Updown_system_age.png|center|450px|link=]]<br />
<br />
====Inspection Tasks====<br />
Like all scheduled tasks, inspections can be performed based on a time interval or upon certain events. Inspections can be specified to bring the item or system down or not.<br />
<br />
====Preventive Maintenance Tasks====<br />
The figure below shows the options available in a preventive maintenance (PM) task within BlockSim. PMs can be performed based on a time interval or upon certain events. Because PM tasks always bring the item down, one can also specify whether preventive maintenance will be performed if the task brings the system down.<br />
<br />
[[Image:r25.png|center|556px|link=]]<br />
<br />
====On Condition Tasks====<br />
On condition maintenance relies on the capability to detect failures before they happen so that preventive maintenance can be initiated. If, during an inspection, maintenance personnel can find evidence that the equipment is approaching the end of its life, then it may be possible to delay the failure, prevent it from happening or replace the equipment at the earliest convenience rather then allowing the failure to occur and possibly cause severe consequences. In BlockSim, on condition tasks consist of an inspection task that triggers a preventive task when an impending failure is detected during inspection. <br />
=====Failure Detection=====<br />
Inspection tasks can be used to check for indications of an approaching failure. BlockSim models such indications of when an approaching failure will become detectable upon inspection using Failure Detection Threshold and P-F Interval. Failure detection threshold allows the user to enter a number between 0 and 1 indicating the percentage of an item's life that must elapse before an approaching failure can be detected. For instance, if the failure detection threshold value is set as 0.8 then this means that the failure of a component can be detected only during the last 20% of its life. If an inspection occurs during this time, an approaching failure is detected and the inspection triggers a preventive maintenance task to take the necessary precautions to delay the failure by either repairing or replacing the component.<br />
<br />
The P-F interval allows the user to enter the amount of time before the failure of a component when the approaching failure can be detected by an inspection. The P-F interval represents the warning period that spans from P(when a potential failure can be detected) to F(when the failure occurs). If a P-F interval is set as 200 hours, then the approaching failure of the component can only be detected at 200 hours before the failure of the component. Thus, if a component has a fixed life of 1,000 hours and the P-F interval is set to 200 hours, then if an inspection occurs at or beyond 800 hours, then the approaching failure of the component that is to occur at 1,000 hours is detected by this inspection and a preventive maintenance task is triggered to take action against this failure.<br />
<br />
=====Rules for On Condition Tasks=====<br />
<br />
*An inspection that finds a block at or beyond the failure detection threshold or within the range of the P-F interval will trigger the associated preventive task as long as preventive maintenance can be performed on that block.<br />
<br />
*If a non-downing inspection triggers a preventive maintenance action because the failure detection threshold or P-F interval range was reached, no other maintenance task will be performed between the inspection and the triggered preventive task; tasks that would otherwise have happened at that time due to system age, system down or group maintenance will be ignored.<br />
<br />
*A preventive task that would have been triggered by a non-downing inspection will not happen if the block fails during the inspection, as corrective maintenance will take place instead.<br />
<br />
*If a failure will occur within the failure detection threshold or P-F interval set for the inspection, but the preventive task is only supposed to be performed when the system is down, the simulation waits until the requirements of the preventive task are met to perform the preventive maintenance.<br />
<br />
*If the on condition inspection triggers the preventive maintenance part of the task, the simulation assumes that the maintenance crew will forego any routine servicing associated with the inspection part of the task. In other words, the restoration will come from the preventive maintenance, so any restoration factor defined for the inspection will be ignored in these circumstances.<br />
<br />
=====Example Using P-F Interval=====<br />
<br />
To illustrate the use of the P-F interval in BlockSim, consider a component <math>A\,\!</math> that fails every 700 <math>tu\,\!</math>. The corrective maintenance on this equipment takes 100 <math>tu\,\!</math> to complete, while the preventive maintenance takes 50 <math>tu\,\!</math> to complete. Both the corrective and preventive maintenance actions have a type II restoration factor of 1. Inspection tasks of 10 <math>tu\,\!</math> duration are performed on the component every 300 <math>tu\,\!</math>. There is no restoration of the component during the inspections. The P-F interval for this component is 100 <math>tu\,\!</math>.<br />
<br />
The component behavior from 0 to 2000 <math>tu\,\!</math> is shown in the figure below and described next.<br />
<br />
::#At 300 <math>tu\,\!</math> the first scheduled inspection of 10 <math>tu\,\!</math> duration occurs. At this time the age of the component is 300 <math>tu\,\!</math>. This inspection does not lie in the P-F interval of 100 <math>tu\,\!</math> (which begins at the age of 600 <math>tu\,\!</math> and ends at the age of 700 <math>tu\,\!</math>). Thus, no approaching failure is detected during this inspection.<br />
::#At 600 <math>tu\,\!</math> the second scheduled inspection of 10 <math>tu\,\!</math> duration occurs. At this time the age of the component is 590 <math>tu\,\!</math> (no age is accumulated during the first inspection from 300 tu to 310 <math>tu\,\!</math> as the component does not operate during this inspection). Again this inspection does not lie in the P-F interval. Thus, no approaching failure is detected during this inspection.<br />
::#At 720 <math>tu\,\!</math> the component fails after having accumulated an age of 700 <math>tu\,\!</math>. A corrective maintenance task of 100 <math>tu\,\!</math> duration occurs to restore the component to as-good-as-new condition.<br />
::#At 900 <math>tu\,\!</math> the third scheduled inspection occurs. At this time the age of the component is 80 <math>tu\,\!</math>. This inspection does not lie in the P-F interval (from age 600 <math>tu\,\!</math> to 700 <math>tu\,\!</math>). Thus, no approaching failure is detected during this inspection.<br />
::#At 1200 <math>tu\,\!</math> the fourth scheduled inspection occurs. At this time the age of the component is 370 <math>tu\,\!</math>. Again, this inspection does not lie in the P-F interval and no approaching failure is detected.<br />
::#At 1500 <math>tu\,\!</math> the fifth scheduled inspection occurs. At this time the age of the component is 660 <math>tu\,\!</math>, which lies in the P-F interval. As a result, an approaching failure is detected and the inspection triggers a preventive maintenance task. A preventive maintenance task of 50 <math>tu\,\!</math> duration occurs at 1510 <math>tu\,\!</math> to restore the component to as-good-as-new condition.<br />
::#At 1800 <math>tu\,\!</math> the sixth scheduled inspection occurs. At this time the age of the component is 240 <math>tu\,\!</math>. This inspection does not lie in the P-F interval (from age 600 tu to 700 <math>tu\,\!</math>) and no approaching failure is detected.<br />
<br />
[[Image:BS8.32.png|center|600px|link=]]<br />
<br />
====Rules for PMs and Inspections====<br />
<br />
All the options available in the Maintenance task window were designed to maximize the modeling flexibility within BlockSim. However, maximizing the modeling flexibility introduces issues that you need to be aware of and requires you to carefully select options in order to assure that the selections do not contradict one another. One obvious case would be to define a PM action on a component in series (which will always bring the system down) and then assign a PM policy to the block that has the Do not perform maintenance if the action brings the system down option set. With these settings, no PMs will ever be performed on the component during the BlockSim simulation. The following sections summarize some issues and special cases to consider when defining maintenance properties in BlockSim.<br />
<br />
::#Inspections do not consume spare parts. However, an inspection can have a renewal effect on the component if the restoration factor is set to a number other than the default of 0.<br />
::#On the inspection tab, if Inspection brings system down is selected, this also implies that the inspection brings the item down.<br />
::#If a PM or an inspection are scheduled based on the item's age, then they will occur exactly when the item reaches that age. However, it is important to note that failed items do not age. Thus, if an item fails before it reaches that age, the action will not be performed. This means that if the item fails before the scheduled inspection (based on item age) and the CM is set to be performed upon inspection, the CM will never take place. The reason that this option is allowed in BlockSim is for the flexibility of specifying renewing inspections.<br />
::#Downtime due to a failure discovered during a non-downing inspection is included when computing results "w/o PM, OC & Inspections."<br />
::#If a PM upon item age is scheduled and is not performed because it brings the system down (based on the option in the PM task) the PM will not happen unless the item reaches that age again (after restoration by CM, inspection or another type of PM).<br />
::#If the CM task is upon inspection and a failed component is scheduled for PM prior to the inspection, the PM action will restore the component and the CM will not take place.<br />
::#In the case of simultaneous events, only one event is executed (except the case in maintenance phase, in maintenance phase, all simultaneous events in maintenance phase are executed in a order). The following precedence order is used: 1). Tasks based on intervals or upon start of a maintenance phase; 2). Tasks based on events in a maintenance group, where the triggering event applies to a block; 3). Tasks based on system down; 4). Tasked on events in a maintenance group, where the triggering event applies to a subdiagram. Within these categories, order is determined according to the priorities specified in the URD (i.e., the higher the task in on the list, the higher the priority).<br />
::#The PM option of Do not perform if it brings the system down is only considered at the time that the PM needs to be initiated. If the system is down at that time, due to another item, then the PM will be performed regardless of any future consequences to the system up state. In other words, when the other item is fixed, it is possible that the system will remain down due to this PM action. In this case, the PM time difference is added to the system PM downtime. <br />
::#Downing events cannot overlap. If a component is down due to a PM and another PM is suggested based on another trigger, the second call is ignored.<br />
::#A non-downing inspection with a restoration factor restores the block based on the age of the block at the beginning of the inspection (i.e., duration is not restored). <br />
::#Non-downing events can overlap with downing events. If in a non-downing inspection and a downing event happen concurrently, the non-downing event will be managed in parallel with the downing event.<br />
::#If a failure or PM occurs during a non-downing inspection and the CM or PM has a restoration factor and the inspection action has a restoration factor, then both restoration factors are used (compounded).<br />
::#A PM or inspection on system down is triggered only if the system was up at the time that the event brought the system down.<br />
::#A non-downing inspection with restoration factor of 0 does not affect the block.<br />
<br />
===Example===<br />
<br />
To illustrate the use of maintenance policies in BlockSim we will use the same example from [[Repairable_Systems_Analysis_Through_Simulation#Example_Using_Both_Crews_and_Pools|Example Using Both Crews and Pools]] with the following modifications (The figures below also show these settings): <br />
<br />
Blocks A and D: <br />
#Belong to the same group (Group 1).<br />
#Corrective maintenance actions are upon inspection (not upon failure) and the inspections are performed every 30 hours, based on system time. Inspections have a duration of 1 hour. Furthermore, unlimited free crews are available to perform the inspections.<br />
#Whenever either item get CM, the other one gets a PM.<br />
#The PM has a fixed duration of 10 hours.<br />
#The same crews are used for both corrective and preventive maintenance actions.<br />
<br />
[[Image:r29.png|center|650px| CM and Inspection settings for blocks A and D | link= ]]<br />
<br />
<br />
[[Image:r29b.png|center|650px| CM and Inspection settings for blocks A and D | link= ]]<br />
<br />
<br />
[[Image:r30.png|center|650px| PM settings for blocks A and D | link= ]]<br />
<br />
====System Overview====<br />
<br />
The item and system behavior from 0 to 300 hours is shown in the figure below and described next. <br />
<br />
[[Image:BS8.35.png|center|600px|link=]]<br />
<br />
::1. At 100, block <math>A\,\!</math> goes down and brings the system down. <br />
:::a) No maintenance action is performed since an upon inspection policy was used.<br />
:::b) The next scheduled inspection is at 120, thus Crew <math>A\,\!</math> is called to perform the maintenance by 121 (end of the inspection).<br />
::2. Crew <math>A\,\!</math> arrives and initiates the repair on <math>A\,\!</math> at 131.<br />
:::a) The only part in the pool is used and an on-condition restock is triggered.<br />
:::b) Pool [on-hand = 0, pending: 150 <math>_{s}\,\!</math>, 181].<br />
:::c) Block <math>A\,\!</math> is repaired by 141.<br />
::3. At the same time (121), a PM is initiated for block <math>D\,\!</math> because the PM task called for "PM upon the start of corrective maintenance on another group item."<br />
:::a) Crew <math>B\,\!</math> is called for block <math>D\,\!</math> and arrives at 136.<br />
:::b) No part is available until 150. An on-condition restock is triggered for 181.<br />
:::c) Pool [on-hand = 0, pending: 150 <math>_{s}\,\!</math>, 181, 181].<br />
:::d) At 150, a part becomes available and the PM is completed by 160.<br />
:::e) Pool [on-hand = 0, pending: 181, 181].<br />
::4. At 161, block <math>B\,\!</math> fails (corrective maintenance upon failure).<br />
:::a) Block <math>B\,\!</math> gets Crew <math>A\,\!</math>, which arrives at 171.<br />
:::b) No part is available until 181. An on-condition restock is triggered for 221.<br />
:::c) Pool [on-hand = 0, pending: 181, 181, 221].<br />
:::d) A part arrives at 181.<br />
:::e) The repair is completed by 201.<br />
:::f) Pool [on-hand = 0, pending: 181, 221].<br />
::5. At 162, block <math>C\,\!</math> fails.<br />
:::a) Block <math>C\,\!</math> gets Crew <math>B\,\!</math>, which arrives at 177.<br />
:::b) No part is available until 181. An on-condition restock is triggered for 222.<br />
:::c) Pool [on-hand = 0, pending: 181, 221, 222].<br />
:::d) A part arrives at 181.<br />
:::e) The repair is completed by 201.<br />
:::f) Pool [on-hand = 0, pending: 221, 222]. <br />
::6. At 163, block <math>F\,\!</math> fails and brings the system down.<br />
:::a) Block <math>F\,\!</math> calls Crew <math>A\,\!</math> then <math>B\,\!</math>. Both are busy.<br />
:::b) Crew <math>A\,\!</math> will be the first available so .. calls <math>A\,\!</math> again and waits.<br />
:::c) No part is available until 221. An on-condition restock is triggered for 223.<br />
:::d) Pool [on-hand = 0, pending: 221, 222, 223].<br />
:::e) Crew <math>A\,\!</math> arrives at 211.<br />
:::f) Repair begins at 221.<br />
:::g) Repair is completed by 241.<br />
:::h) Pool [on-hand = 0, pending: 222, 223]. <br />
::7. At 298, block <math>A\,\!</math> goes down and brings the system down.<br />
<br />
====System Uptimes/Downtimes====<br />
::1. Uptime: This is 200 hours. <br />
:::a) This can be obtained by observing the following system up durations: 0 to 100, 160 to 163 and 201 to 298.<br />
::2. CM Downtime: This is 58 hours.<br />
:::a) Observe that even though the system failed at 100, the CM action (on block <math>A\,\!</math> ) was initiated at 121 and lasted until 141, thus only 20 hours of this downtime are attributed to the CM action.<br />
:::b) The next CM action started at 163 when block <math>F\,\!</math> failed and lasted until 201 when blocks <math>B\,\!</math> and <math>C\,\!</math> were restored, thus adding another 38 hours of CM downtime.<br />
::3. Inspection Downtime: This is 1 hour. <br />
:::a) The only time the system was under inspection was from 120 to 121, during the inspection of block <math>A\,\!</math>.<br />
::4. PM Downtime: This is 19 hours. <br />
:::a) Note that the entire PM action duration on block <math>D\,\!</math> was from 121 to 160.<br />
:::b) Until 141, and from the system perspective, the CM on block <math>A\,\!</math> was the cause for the downing. Once block <math>A\,\!</math> was restored (at 141), then the reason for the system being down became the PM on block <math>D\,\!</math>.<br />
:::c) Thus, the PM on block <math>D\,\!</math> was only responsible for the downtime after block <math>A\,\!</math> was restored, or from 141 to 160.<br />
::5. OC Downtime: This is 0. There is not on condition task in this example. <br />
::6. Total Downtime: This is 100 hours. <br />
:::a) This includes all of the above downtimes plus the 20 hours (100 to 120) and the 2 hours (298 to 300) that the system was down due the undiscovered failure of block <math>A\,\!</math>.<br />
<br />
[[Image:R32.png|center|600px|link=]]<br />
<br />
====System Metrics====<br />
::1. Mean Availability (All Events): <br />
::<math>\frac{300-100}{300}=0.6667\,\!</math><br />
::2. Mean Availability (w/o PM & Inspection):<br />
:::a) This is due to the CM downtime of 58, the undiscovered downtime of 22 and the inspection downtime of 1, or: <br />
::<math>\frac{300-(58+22+1)}{300}=0.7333\,\!</math><br />
:::b) It should be noted that the inspection downtime was included even though the definition was "w/o PM & Inspection." The reason for this is that the inspection did not cause the downtime in this case. Only downtimes caused by the PM or inspections are excluded. <br />
::3. Point Availability and Reliability at 300 is zero because the system was down at 300.<br />
::4. Expected Number of Failures is 3. <br />
:::a) The system failed at 100, 163 and 298.<br />
::5. The standard deviation of the number of failures is 0.<br />
::6. The MTTFF is 100 because the example is deterministic.<br />
<br />
====The System Downing Events====<br />
::1. Number of Failures is 3.<br />
:::a) The first is the failure of block <math>A\,\!</math>, the second is the failure of block <math>F\,\!</math> and the third is the failure of block <math>A\,\!</math>.<br />
::2. Number of CMs is 2. <br />
:::a) The first is the CM on block <math>A\,\!</math> and the second is the CM on block <math>F\,\!</math>.<br />
::3. Number of Inspections is 1.<br />
::4. Number of PMs is 1.<br />
::5. Total Events are 6. These are events that the downtime can be attributed to. Specifically, the following events were observed:<br />
:::a) The failure of block <math>A\,\!</math> at 100. <br />
:::b) Inspection on block <math>A\,\!</math> at 120.<br />
:::c) The CM action on block <math>A\,\!</math>.<br />
:::d) The PM action on block <math>D\,\!</math> (after <math>A\,\!</math> was fixed).<br />
:::e) The failure of block <math>F\,\!</math> at 163.<br />
:::f) The failure of block <math>A\,\!</math> at 298.<br />
<br />
====Block Details====<br />
The details for blocks <math>A,B,C,D\,\!</math> and <math>F\,\!</math> are shown below.<br />
<br />
[[Image:r33.png|center|600px| Block details for this example.|link=]]<br />
<br />
We will discuss some of these results. First note that there are four downing events on block <math>A\,\!</math> : initial failure, inspection and CM, plus the last failure at 298. All others have just one. Also, block <math>A\,\!</math> had a total downtime of <math>41+2\,\!</math>, giving it a mean availability of 0.8567. The first time-to-failure for block <math>A\,\!</math> occurred at 100 while the second occurred after <math>298-141=157\,\!</math> hours of operation, yielding an average time between failures (MTBF) of <math>257/2=128.5\,\!</math>. (Note that this is the same as uptime/failures.) Block <math>D\,\!</math> never failed, so its MTBF cannot be determined. Furthermore, MTBDE for each item is determined by dividing the block's uptime by the number of events. The RS FCI and RS DECI metrics are obtained by looking at the SD Failures and SD Events of the item and the number of system failures and events. Specifically, the only items that caused system failure are blocks <math>A\,\!</math> and <math>F\,\!</math> ; <math>A\,\!</math> at 100 and 298 and <math>F\,\!</math> at 163. It is important to note that even though one could argue that block <math>F\,\!</math> alone did not cause the failure ( <math>B\,\!</math> and <math>C\,\!</math> were also failed), the downing was attributed to <math>F\,\!</math> because the system reached a failed state only when block <math>F\,\!</math> failed. <br />
<br />
On the number of inspections, which were scheduled every 30 hours, nine occurred for block <math>A\,\!</math> [30, 60, 90, 120, 150, 180, 210, 240, 270] and eight for block <math>D\,\!</math>. Block <math>D\,\!</math> did not get inspected at 150 because block <math>D\,\!</math> was undergoing a PM action at that time.<br />
<br />
====Crew Details====<br />
<br />
The figure below shows the crew results.<br />
<br />
[[Image:r34.png|center|400px| Crew details for this example.|link=]]<br />
<br />
Crew <math>A\,\!</math> received a total of six calls and accepted three. Specifically,<br />
<br />
::#At 121, the crew was called by block <math>A\,\!</math> and the call was accepted.<br />
::#At 121, block <math>D\,\!</math> also called for its PM action and was rejected. Block <math>D\,\!</math> then called crew <math>B\,\!</math>, which accepted the call.<br />
::#At 161, block <math>B\,\!</math> called crew <math>A\,\!</math>. Crew <math>A\,\!</math> accepted.<br />
::#At 162, block <math>C\,\!</math> called crew <math>A\,\!</math>. Crew <math>A\,\!</math> rejected and block <math>C\,\!</math> called crew <math>B\,\!</math>, which accepted the call.<br />
::#At 163, block <math>F\,\!</math> called crew <math>A\,\!</math> and then crew <math>B\,\!</math> and both rejected. Block <math>F\,\!</math> then waited until a crew became available at 201 and called that crew again. This was crew <math>A\,\!</math>, which accepted.<br />
<br />
The total wait time is the time that blocks had to wait for the maintenance crew. Block <math>F\,\!</math> is the only component that waited, waiting 38 hours for crew <math>A\,\!</math>.<br />
<br />
Also, the costs for crew <math>A\,\!</math> were 1 per unit time and 10 per incident, thus the total costs were 100 + 30. The costs for Crew <math>B\,\!</math> were 2 per unit time and 20 per incident, thus the total costs were 156 + 40.<br />
<br />
====Pool Details====<br />
The figure below shows the spare part pool results.<br />
<br />
[[Image:r35.png|center|300px| Pool details for this example.|link=]]<br />
<br />
The pool started with a stock level of 1 and ended up with 2. Specifically,<br />
<br />
::#At 121, the pool dispensed a part to block <math>A\,\!</math> and ordered another to arrive at 181.<br />
::#At 121, it dispensed a part to block <math>D\,\!</math> and ordered another to arrive at 181.<br />
::#At 150, a scheduled part arrived to restock the pool.<br />
::#At 161 the pool dispensed a part to block <math>B\,\!</math> and ordered another to arrive at 221.<br />
::#At 181, it dispensed a part to block <math>C\,\!</math> and ordered another to arrive at 222.<br />
::#At 221, it dispensed a part to block <math>F\,\!</math> and ordered another to arrive at 223.<br />
::#The 222 and 223 arrivals remained in stock until the end of the simulation.<br />
<br />
Overall, five parts were dispensed. Blocks had to wait a total of 126 hours to receive parts (B: 181-161=20, C: 181-162=19, D: 150-121=29 and F: 221-163=58).<br />
<br />
=Subdiagrams and Multi Blocks in Simulation=<br />
<br />
Any subdiagrams and multi blocks that may be present in the BlockSim RBD are expanded and/or merged into a single diagram before the system is simulated. As an example, consider the system shown in the figure below.<br />
<br />
[[Image:r38.png|center|350px| A system made up of three subsystems, A, B, and C.|link=]]<br />
<br />
BlockSim will internally merge the system into a single diagram before the simulation, as shown in the figure below. This means that all the failure and repair properties of the items in the subdiagrams are also considered.<br />
<br />
[[Image:r39.png|center|350px| The simulation engine view of the system and subdiagrams|link=]]<br />
<br />
In the case of multi blocks, the blocks are also fully expanded before simulation. This means that unlike the analytical solution, the execution speed (and memory requirements) for a multi block representing ten blocks in series is identical to the representation of ten individual blocks in series.<br />
<br />
=Containers in Simulation=<br />
===Standby Containers===<br />
When you simulate a diagram that contains a standby container, the container acts as the switch mechanism (as shown below) in addition to defining the standby relationships and the number of active units that are required. The container's failure and repair properties are really that of the switch itself. The switch can fail with a distribution, while waiting to switch or during the switch action. Repair properties restore the switch regardless of how the switch failed. Failure of the switch itself does not bring the container down because the switch is not really needed unless called upon to switch. The container will go down if the units within the container fail or the switch is failed when a switch action is needed. The restoration time for this is based on the repair distributions of the contained units and the switch. Furthermore, the container is down during a switch process that has a delay. <br />
<br />
[[Image:8.43.png|center|500px| The standby container acts as the switch, thus the failure distribution of the container is the failure distribution of the switch. The container can also fail when called upon to switch.|link=]]<br />
<br />
[[Image:8_43_1_new.png|center|150px|link=]]<br />
<br />
To better illustrate this, consider the following deterministic case.<br />
<br />
::#Units <math>A\,\!</math> and <math>B\,\!</math> are contained in a standby container.<br />
::#The standby container is the only item in the diagram, thus failure of the container is the same as failure of the system. <br />
::#<math>A\,\!</math> is the active unit and <math>B\,\!</math> is the standby unit. <br />
::#Unit <math>A\,\!</math> fails every 100 <math>tu\,\!</math> (active) and takes 10 <math>tu\,\!</math> to repair. <br />
::#<math>B\,\!</math> fails every 3 <math>tu\,\!</math> (active) and also takes 10 <math>tu\,\!</math> to repair. <br />
::#The units cannot fail while in quiescent (standby) mode. <br />
::#Furthermore, assume that the container (acting as the switch) fails every 30 <math>tu\,\!</math> while waiting to switch and takes 4 <math>tu\,\!</math> to repair. If not failed, the container switches with 100% probability. <br />
::#The switch action takes 7 <math>tu\,\!</math> to complete.<br />
::#After repair, unit <math>A\,\!</math> is always reactivated. <br />
::#The container does not operate through system failure and thus the components do not either. <br />
<br />
Keep in mind that we are looking at two events on the container. The container down and container switch down.<br />
<br />
The system event log is shown in the figure below and is as follows:<br />
<br />
[[Image:BS8.44.png|center|600px| The system behavior using a standby container.|link=]]<br />
<br />
::#At 30, the switch fails and gets repaired by 34. The container switch is failed and being repaired; however, the container is up during this time.<br />
::#At 64, the switch fails and gets repaired by 68. The container is up during this time.<br />
::#At 98, the switch fails. It will be repaired by 102.<br />
::#At 100, unit <math>A\,\!</math> fails. Unit <math>A\,\!</math> attempts to activate the switch to go to <math>B\,\!</math> ; however, the switch is failed.<br />
::#At 102, the switch is operational.<br />
::#From 102 to 109, the switch is in the process of switching from unit <math>A\,\!</math> to unit <math>B\,\!</math>. The container and system are down from 100 to 109.<br />
::#By 110, unit <math>A\,\!</math> is fixed and the system is switched back to <math>A\,\!</math> from <math>B\,\!</math>. The return switch action brings the container down for 7 <math>tu\,\!</math>, from 110 to 117. During this time, note that unit <math>B\,\!</math> has only functioned for 1 <math>tu\,\!</math>, 109 to 110.<br />
::#At 146, the switch fails and gets repaired by 150. The container is up during this time.<br />
::#At 180, the switch fails and gets repaired by 184. The container is up during this time.<br />
::#At 214, the switch fails and gets repaired by 218. <br />
::#At 217, unit <math>A\,\!</math> fails. The switch is failed at this time.<br />
::#At 218, the switch is operational and the system is switched to unit <math>B\,\!</math> within 7 <math>tu\,\!</math>. The container is down from 218 to 225.<br />
::#At 225, unit <math>B\,\!</math> takes over. After 2 <math>tu\,\!</math> of operation at 227, unit <math>B\,\!</math> fails. It will be restored by 237. <br />
::#At 227, unit <math>A\,\!</math> is repaired and the switchback action to unit <math>A\,\!</math> is initiated. By 234, the system is up.<br />
::#At 262, the switch fails and gets repaired by 266. The container is up during this time.<br />
::#At 296, the switch fails and gets repaired by 300. The container is up during this time.<br />
<br />
The system results are shown in the figure below and discussed next.<br />
[[Image:BS8.45.png|center|600px| System overview results.|link=]]<br />
<br />
::1. System CM Downtime is 24. <br />
:::a) CM downtime includes all downtime due to failures as well as the delay in switching from a failed active unit to a standby unit. It does not include the switchback time from the standby to the restored active unit. Thus, the times from 100 to 109, 217 to 225 and 227 to 234 are included. The time to switchback, 110 to 117, is not included.<br />
::2. System Total Downtime is 31. <br />
:::a) It includes the CM downtime and the switchback downtime.<br />
::3. Number of System Failures is 3. <br />
:::a) It includes the failures at 100, 217 and 227. <br />
:::b) This is the same as the number of CM downing events. <br />
::4. The Total Downing Events are 4. <br />
:::a) This includes the switchback downing event at 110.<br />
::5. The Mean Availability (w/o PM and Inspection) does not include the downtime due to the switchback event.<br />
<br />
====Additional Rules and Assumptions for Standby Containers====<br />
<br />
::1) A container will only attempt to switch if there is an available non-failed item to switch to. If there is no such item, it will then switch if and when an item becomes available. The switch will cancel the action if it gets restored before an item becomes available. <br />
:::a) As an example, consider the case of unit <math>A\,\!</math> failing active while unit <math>B\,\!</math> failed in a quiescent mode. If unit <math>B\,\!</math> gets restored before unit <math>A\,\!</math>, then the switch will be initiated. If unit <math>A\,\!</math> is restored before unit <math>B\,\!</math>, the switch action will not occur.<br />
::2) If the container switch is failed and a switching action is required, the switching action will occur after the switch has been restored if it is still required (i.e., if the active unit is still failed).<br />
::3) If a switch fails during the delay time of the switching action based on the reliability distribution (quiescent failure mode), the action is still carried out unless a failure based on the switch probability/restarts occurs when attempting to switch. <br />
::4) During switching events, the change from the operating to quiescent distribution (and vice versa) occurs at the end of the delay time.<br />
::5) The option of whether components operate while the system is down is defined at component level now (This is different from BlockSim 7, in which this option of the contained items inherit from container). Two rules here:<br />
:::a) If a path inside the container is down, blocks inside the container that are in that path do not continue to operate.<br />
:::b) Blocks that are up do not continue to operate while the container is down.<br />
::6) A switch can have a repair distribution and maintenance properties without having a reliability distribution. <br />
:::a) This is because maintenance actions are performed regardless of whether the switch failed while waiting to switch (reliability distribution) or during the actual switching process (fixed probability).<br />
::7) A switch fails during switching when the restarts are exhausted.<br />
::8) A restart is executed every time the switch fails to switch (based on its fixed probability of switching).<br />
::9) If a delay is specified, restarts happen after the delay.<br />
::10) If a container brings the system down, the container is responsible for the system going down (not the blocks inside the container).<br />
<br />
===Load Sharing Containers===<br />
<br />
When you simulate a diagram that contains a load sharing container, the container defines the load that is shared. A load sharing container has no failure or repair distributions. The container itself is considered failed if all the blocks inside the container have failed (or <math>k\,\!</math> blocks in a <math>k\,\!</math> -out-of- <math>n\,\!</math> configuration).<br />
<br />
To illustrate this, consider the following container with items <math>A\,\!</math> and <math>B\,\!</math> in a load sharing redundancy.<br />
<br />
Assume that <math>A\,\!</math> fails every 100 <math>tu\,\!</math> and <math>B\,\!</math> every 120 <math>tu\,\!</math> if both items are operating and they fail in half that time if either is operating alone (i.e., the items age twice as fast when operating alone). They both get repaired in 5 <math>tu\,\!</math>.<br />
<br />
[[Image:8.46.png|center|600px| Behavior of a simple load sharing system.|link=]]<br />
<br />
The system event log is shown in the figure above and is as follows:<br />
<br />
::1. At 100, <math>A\,\!</math> fails. It takes 5 <math>tu\,\!</math> to restore <math>A\,\!</math>. <br />
::2. From 100 to 105, <math>B\,\!</math> is operating alone and is experiencing a higher load.<br />
::3. At 115, <math>B\,\!</math> fails. would normally be expected to fail at 120, however: <br />
:::a) From 0 to 100, it accumulated the equivalent of 100 <math>tu\,\!</math> of damage.<br />
:::b) From 100 to 105, it accumulated 10 <math>tu\,\!</math> of damage, which is twice the damage since it was operating alone. Put another way, <math>B\,\!</math> aged by 10 <math>tu\,\!</math> over a period of 5 <math>tu\,\!</math>.<br />
:::c) At 105, <math>A\,\!</math> is restored but <math>B\,\!</math> has only 10 <math>tu\,\!</math> of life remaining at this point.<br />
:::d) <math>B\,\!</math> fails at 115.<br />
::4. At 120, <math>B\,\!</math> is repaired.<br />
::5. At 200, <math>A\,\!</math> fails again. <math>A\,\!</math> would normally be expected to fail at 205; however, the failure of <math>B\,\!</math> at 115 to 120 added additional damage to <math>A\,\!</math>. In other words, the age of <math>A\,\!</math> at 115 was 10; by 120 it was 20. Thus it reached an age of 100 95 <math>tu\,\!</math> later at 200. <br />
::6. <math>A\,\!</math> is restored by 205.<br />
::7. At 235, <math>B\,\!</math> fails. <math>B\,\!</math> would normally be expected to fail at 240; however, the failure of <math>A\,\!</math> at 200 caused the reduction.<br />
:::a) At 200, <math>B\,\!</math> had an age of 80.<br />
:::b) By 205, <math>B\,\!</math> had an age of 90.<br />
:::c) <math>B\,\!</math> fails 30 <math>tu\,\!</math> later at 235.<br />
::8. The system itself never failed.<br />
<br />
====Additional Rules and Assumptions for Load Sharing Containers====<br />
<br />
::1. The option of whether components operate while the system is down is defined at component level now (This is different from BlockSim 7, in which this option of the contained items inherit from container). Two rules here:<br />
:::a) If a path inside the container is down, blocks inside the container that are in that path do not continue to operate.<br />
:::b) Blocks that are up do not continue to operate while the container is down.<br />
::2. If a container brings the system down, the block that brought the container down is responsible for the system going down. (This is the opposite of standby containers.)<br />
<br />
=State Change Triggers=<br />
{{:State Change Triggers}}<br />
<br />
=Discussion=<br />
<br />
Even though the examples and explanations presented here are deterministic, the sequence of events and logic used to view the system is the same as the one that would be used during simulation. The difference is that the process would be repeated multiple times during simulation and the results presented would be the average results over the multiple runs.<br />
<br />
Additionally, multiple metrics and results are presented and defined in this chapter. Many of these results can also be used to obtain additional metrics not explicitly given in BlockSim's Simulation Results Explorer. As an example, to compute mean availability with inspections but without PMs, the explicit downtimes given for each event could be used. Furthermore, all of the results given are for operating times starting at zero to a specified end time (although the components themselves could have been defined with a non-zero starting age). Results for a starting time other than zero could be obtained by running two simulations and looking at the difference in the detailed results where applicable. As an example, the difference in uptimes and downtimes can be used to determine availabilities for a specific time window.</div>Miklos Szidarovszkyhttps://www.reliawiki.com/index.php?title=Mixture_Design&diff=62849Mixture Design2016-01-21T16:20:04Z<p>Miklos Szidarovszky: /* Mixture Design Types */</p>
<hr />
<div>{{Template:Doebook|14}}<br />
<br />
==Introduction==<br />
<br />
When a product is formed by mixing together two or more ingredients, the product is called a mixture, and the ingredients are called mixture components. In a general mixture problem, the measured response is assumed to depend only on the proportions of the ingredients in the mixture, not the amount of the mixture. For example, the taste of a fruit punch recipe (i.e., the response) might depend on the proportions of watermelon, pineapple and orange juice in the mixture. The taste of a small cup of fruit punch will obviously be the same as a big cup.<br />
<br />
Sometimes the responses of a mixture experiment depend not only on the proportions of ingredients, but also on the settings of variables in the process of making the mixture. For example, the tensile strength of stainless steel is not only affected by the proportions of iron, copper, nickel and chromium in the alloy; it is also affected by process variables such as temperature, pressure and curing time used in the experiment. <br />
<br />
One of the purposes of conducting a mixture experiment is to find the best proportion of each component and the best value of each process variable, in order to optimize a single response or multiple responses simultaneously. In this chapter, we will discuss how to design effective mixture designs and how to analyze data from mixture experiments with and without process variables. <br />
<br />
==Mixture Design Types==<br />
<br />
There are several different types of mixture designs. The most common ones are simplex lattice, simplex centroid, simplex axial and extreme vertex designs, each of which is used for a different purpose. <br />
<br />
*If there are many components in a mixture, the first choice is to screen out the most important ones. Simplex axial and Simplex centroid designs are used for this purpose. <br />
*If the number of components is not large, but a high order polynomial equation is needed in order to accurately describe the response surface, then a simplex lattice design can be used.<br />
*Extreme vertex designs are used for the cases when there are constraints on one or more components (e.g., if the proportion of watermelon juice in a fruit punch recipe is required to be less than 30%, and the combined proportion of watermelon and orange juice should always be between 40% and 70%). <br />
<br />
===Simplex Plot===<br />
<br />
Since the sum of all the mixture components is always 100%, the experiment space usually is given by a plot. The experiment space for the fruit punch experiment is given in the following triangle or simplex plot. <br />
<br />
[[Image:doe_14.1.png|500 px|center]]<br />
<br />
The triangle area in the above plot is defined by the fact that the sum of the three ingredients is 1 (100%). For the points that are on the vertices, the punch only has one ingredient. For instance, point 1 only has watermelon. The line opposite of point 1 represents a mixture with no watermelon . <br />
<br />
The coordinate system used for the value of each ingredient <br />
<math>{{x}_{i}}\,\!</math>,<math>i=1,2,...,q\,\!</math> is called a simplex coordinate system. q is the number of ingredients. The simplex plot can only visually display three ingredients. If there are more than three ingredients, the values for other ingredients must be provided. For the fruit punch example, the coordinate for point 1 is (1, 0, 0). The interior points of the triangle represent mixtures in which none of the three components is absent. It means all <br />
<math>{{x}_{i}}>0\,\!</math>, <math>i=1,2,3\,\!</math>. Point 0 in the middle of the triangle is called the center point. In this case, it is the centroid of a face/plane. The coordinate for point 0 is (1/3, 1/3, 1/3). Points 2, 4 and 6 are each called a centroid of edge. Their coordinates are (0.5, 0.5, 0), (0, 0.5, 0.5), and (0.5, 0, 0.5).<br />
<br />
===Simplex Lattice Design===<br />
<br />
The response in a mixture experiment usually is described by a polynomial function. This function represents how the components affect the response. To better study the shape of the response surface, the natural choice for a design would be the one whose points are spread evenly over the whole simplex. An ordered arrangement consisting of a uniformly spaced distribution of points on a simplex is known as a lattice. <br />
<br />
A {q, m} simplex lattice design for q components consists of points defined by the following coordinate settings: the proportions assumed by each component take the m+1 equally spaced values from 0 to 1, <br />
<br />
::<math>{{x}_{i}}=0,\frac{1}{m},\frac{2}{m},....,1\text{ }i=1,2,....,q\,\!</math><br />
<br />
and the design space consists of all the reasonable combinations of all the values for each factor. m is usually called the degree of the lattice. For example, for a {3, 2} design, <math>{{x}_{i}}=0,\frac{1}{2},1\,\!</math> and its design space has 6 points. They are:<br />
<br />
[[Image:doe_14.2.png|500 px|center]]<br />
<br />
For a {3, 3} design, <math>{{x}_{i}}=0,\frac{1}{3},\frac{2}{3},1\,\!</math>, and its design space has 10 points. They are:<br />
<br />
[[Image:doe_14.3.png|500 px|center]]<br />
<br />
<br />
For a simplex design with degree of m, each component has m + 1 different values, therefore, the experiment results can be used to fit a polynomial equation up to an order of m. A {3, 3} simplex lattice design can be used to fit the following model. <br />
<br />
<br />
::<math>\begin{align}<br />
& y={{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+{{\beta }_{13}}{{x}_{1}}{{x}_{3}}+{{\beta }_{23}}{{x}_{2}}{{x}_{3}} \\ <br />
& +{{\delta }_{12}}{{x}_{1}}{{x}_{2}}\left( {{x}_{1}}-{{x}_{2}} \right)+{{\delta }_{13}}{{x}_{1}}{{x}_{3}}\left( {{x}_{1}}-{{x}_{3}} \right)+{{\delta }_{23}}{{x}_{2}}{{x}_{3}}\left( {{x}_{2}}-{{x}_{3}} \right) \\ <br />
& +{{\beta }_{123}}{{x}_{1}}{{x}_{2}}{{x}_{3}} <br />
\end{align}\,\!</math><br />
<br />
<br />
The above model is called the full cubic model. Note that the intercept term is not included in the model due to the correlation between all the components (their sum is 100%). <br />
<br />
Simplex lattice design includes all the component combinations. For a {q, m} design, the total number of runs is <math>\left( \begin{align}<br />
& q+m-1 \\ <br />
& m \\ <br />
\end{align} \right)\,\!</math>. Therefore to reduce the number of runs and still be able to fit a high order polynomial model, sometimes we can use simplex centroid design which is explained next.<br />
<br />
===Simplex Centroid Design===<br />
<br />
A simplex centroid design only includes the centroid points. For the components that appear in a run in a simplex centroid design, they have the same values. <br />
<br />
[[Image:doe_14.4.png|500 px|center]]<br />
<br />
In the above simplex plot, points 2, 4 and 6 are 2nd degree centroids. Each of them has two non-zero components with equal values. Point 0 is a 3rd degree centroid and all three components have the same value. For a design with q components, the highest degree of centroid is q. It is called the overall centroid, or the center point of the design. <br />
<br />
For a q component simplex centroid design with a degree of centroid of q, the total number of runs is <br />
<math>{{2}^{q}}-1\,\!</math>. The runs correspond to the q permutations of (1, 0, 0,…, 0), <br />
<math>\left( \begin{align}<br />
& q \\ <br />
& 2 \\ <br />
\end{align} \right)\,\!</math> permutations of (1/2, 1/2, 0, 0, 0, 0, …,0), the <br />
<math>\left( \begin{align}<br />
& q \\ <br />
& 3 \\ <br />
\end{align} \right)\,\!</math> permutations of (1/3, 1/3, 1/3, 0, 0, 0, 0,…, 0)…., and the overall centroid (1/q, 1/q, …, 1/q). If the degree of centroid is defined as <math>m\,\!</math> (m < q), then the total number of runs will be <br />
<math>\left( \begin{align}<br />
& q \\ <br />
& 1 \\ <br />
\end{align} \right)+\left( \begin{align}<br />
& q \\ <br />
& 2 \\ <br />
\end{align} \right)+...+\left( \begin{align}<br />
& q \\ <br />
& m \\ <br />
\end{align} \right)\,\!</math>.<br />
<br />
Since a simplex centroid design usually has fewer runs than a simplex lattice design with the same degree, a polynomial model with fewer terms should be used. A {3, 3} simplex centroid design can be used to fit the following model. <br />
<br />
<br />
::<math>y={{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+{{\beta }_{13}}{{x}_{1}}{{x}_{3}}+{{\beta }_{23}}{{x}_{2}}{{x}_{3}}+{{\beta }_{123}}{{x}_{1}}{{x}_{2}}{{x}_{3}}\,\!</math><br />
<br />
<br />
<br />
The above model is called the special cubic model. Note that the intercept term is not included due to the correlation between all the components (their sum is 100%).<br />
<br />
===Simplex Axial Design===<br />
<br />
The simplex lattice and simplex centroid designs are boundary designs since the points of these designs are positioned on boundaries (vertices, edges, faces, etc.) of the simplex factor space, with the exception of the overall centroid. Axial designs, on the other hand, are designs consisting mainly of the points positioned inside the simplex. Axial designs have been recommended for use when component effects are to be measured in a screening experiment, particularly when first degree models are to be fitted. <br />
<br />
Definition of Axial: The axial of a component <math>i\,\!</math> is defined as the imaginary line extending from the base point <math>{{x}_{i}}=0\,\!</math>, <math>{{x}_{j}}=1/\left( q-1 \right)\,\!</math> for all <math>j\ne i\,\!</math>, to the vertex where <math>{{x}_{i}}=1,{{x}_{j}}=0\,\!</math> all <math>j\ne i\,\!</math>. [John Cornell] <br />
<br />
In a simplex axial design, all the points are on the axial. The simplest form of axial design is one whose points are positioned equidistant from the overall centroid <math>\left( {1}/{q,{1}/{q,}\;{1}/{q,}\;...}\; \right)\,\!</math>. Traditionally, points located at the half distance from the overall centroid to the vertex are called axial points/blends. This is illustrated in the following plot.<br />
<br />
[[Image:doe_14.5.png|500 px|center]]<br />
<br />
Points 4, 5 and 6 are the axial blends. <br />
<br />
By default, a simple axial design in DOE++ only has vertices, axial blends, centroid of the constraint planes and the overall centroid. For a design with q components, constraint plane centroids are the center points of dimension of q-1 space. One component is 0, and the remaining components have the same values for the center points of constraint planes. The number of the constraint plane centroids is the number of components q. The total number of runs in a simple axial design will be 3q+1. They are q vertex runs, q centroids of constraint planes, q axial blends and 1 overall centroid. <br />
<br />
A simplex axial design for 3 components has 10 points as given below. <br />
<br />
[[Image:doe_14.6.png|500 px|center]]<br />
<br />
Points 1, 2 and 3 are the three vertices; points 4, 5, 6 are the axial blends; points 7, 8 and 9 are the centroids of constraint planes, and point 0 is the overall center point.<br />
<br />
===Extreme Vertex Design===<br />
<br />
Extreme vertex designs are used when both lower and upper bound constraints on the components are presented, or when linear constraints are added to several components. For example, if a mixture design with 3 components has the following constraints:<br />
<br />
<br />
:*<math>{{x}_{2}}\le 0.7\,\!</math><br />
:*<math>-2{{x}_{1}}+2{{x}_{2}}+3{{x}_{3}}\ge 0\,\!</math><br />
:*<math>48{{x}_{1}}+13{{x}_{2}}-{{x}_{3}}\ge 0\,\!</math><br />
<br />
<br />
Then the feasible region is defined by the six points in the following simplex plot. To meet the above constraints, all the runs conducted in the experiment should be in the feasible region or on its boundary. <br />
<br />
[[Image:doe_14.7.png|500 px|center]]<br />
<br />
<br />
The CONSIM method described in [Snee 1979] is used in DOE++ to check the consistency of all the constraints and to get the vertices defined by them. <br />
<br />
Extreme vertex designs by default use the vertices at the boundary. Additional points such as the centroid of spaces of different dimensions, axial points and the overall center point can be added. In extreme vertex designs, axial points are between the overall center point and the vertices. For the above example, if the axial points and the overall center point are added, then all the runs in the experiment will be:<br />
<br />
[[Image:doe_14.8.png|500 px|center]]<br />
<br />
Point 0 in the center of the feasible region is the overall centroid. The other red points are the axial points. They are at the middle of the lines connecting the center point with the vertices.<br />
<br />
==Mixture Design Data Analysis==<br />
<br />
In the following section, we will discuss the most popular regression models in mixture design data analysis. Due to the correlation between all the components in mixture designs, the intercept term usually is not included in the regression model. <br />
<br />
===Models Used in Mixture Design===<br />
<br />
For a design with three components, the following models are commonly used.<br />
<br />
*Linear model: <br />
<br />
<br />
::<math>y={{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}\,\!</math><br />
<br />
<br />
If the intercept were included in the model, then the linear model would be<br />
<br />
<br />
::<math>y=\beta _{0}^{'}+\beta _{1}^{'}{{x}_{1}}+\beta _{2}^{'}{{x}_{2}}+\beta _{3}^{'}{{x}_{3}}\,\!</math><br />
<br />
<br />
However, since <math>{{x}_{1}}+{{x}_{2}}+{{x}_{3}}=1\,\!</math> (can be other constants as well), the above equation can be written as<br />
<br />
<br />
::<math>\begin{align}<br />
& y=\beta _{0}^{'}\left( {{x}_{1}}+{{x}_{2}}+{{x}_{3}} \right)+\beta _{1}^{'}{{x}_{1}}+\beta _{2}^{'}{{x}_{2}}+\beta _{3}^{'}{{x}_{3}} \\ <br />
& =\left( \beta _{0}^{'}+\beta _{1}^{'} \right){{x}_{1}}+\left( \beta _{0}^{'}+\beta _{2}^{'} \right){{x}_{2}}+\left( \beta _{0}^{'}+\beta _{3}^{'} \right){{x}_{3}} \\ <br />
& ={{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}} <br />
\end{align}\,\!</math><br />
<br />
<br />
The equation has thus been reformatted to omit the intercept. <br />
<br />
*Quadratic model: <br />
<br />
<br />
::<math>y={{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+{{\beta }_{13}}{{x}_{1}}{{x}_{3}}+{{\beta }_{23}}{{x}_{2}}{{x}_{3}}\,\!</math><br />
<br />
<br />
There are no classic quadratic terms such as <math>x_{1}^{2}\,\!</math>. This is because <br />
<br />
<br />
::<math>x_{1}^{2}={{x}_{1}}\left( 1-{{x}_{2}}-{{x}_{3}} \right)={{x}_{1}}-{{x}_{1}}{{x}_{2}}-{{x}_{1}}{{x}_{3}}\,\!</math><br />
<br />
<br />
*Full cubic model:<br />
<br />
<br />
::<math>\begin{align}<br />
& y={{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+{{\beta }_{13}}{{x}_{1}}{{x}_{3}}+{{\beta }_{23}}{{x}_{2}}{{x}_{3}} \\ <br />
& +{{\delta }_{12}}{{x}_{1}}{{x}_{2}}\left( {{x}_{1}}-{{x}_{2}} \right)+{{\delta }_{13}}{{x}_{1}}{{x}_{3}}\left( {{x}_{1}}-{{x}_{3}} \right)+{{\delta }_{23}}{{x}_{2}}{{x}_{3}}\left( {{x}_{2}}-{{x}_{3}} \right) \\ <br />
& +{{\beta }_{123}}{{x}_{1}}{{x}_{2}}{{x}_{3}} <br />
\end{align}\,\!</math><br />
<br />
<br />
*Special cubic model: <math>{{\delta }_{ij}}{{x}_{i}}{{x}_{j}}\left( {{x}_{i}}-{{x}_{j}} \right)\,\!</math> are removed from the full cubic model. <br />
<br />
<br />
::<math>\begin{align}<br />
& y={{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+{{\beta }_{13}}{{x}_{1}}{{x}_{3}}+{{\beta }_{23}}{{x}_{2}}{{x}_{3}} \\ <br />
& +{{\beta }_{123}}{{x}_{1}}{{x}_{2}}{{x}_{3}} <br />
\end{align}\,\!</math><br />
<br />
<br />
The above types of models are called Scheffe type models. They can be extended to designs with more than three components. <br />
<br />
In regular regression analysis, the effect of an exploratory variable or factor is represented by the value of the coefficient. The ratio of the estimated coefficient and its standard error is used for the t-test. The t-test can tell us if a coefficient is 0 or not. If a coefficient is statistically 0, then the corresponding factor has no significant effect on the response. However, for Scheffe type models, since the intercept term is not included in the model, we cannot use the regular t-test to test each individual main effect. In other words, we cannot test if the coefficient for each component is 0 or not. <br />
<br />
Similarly, in the ANOVA analysis, the linear effects of all the components are tested together as a single group. The main effect test for each individual component is not conducted. To perform ANOVA analysis, the Scheffe type model needs to be reformatted to include the hidden intercept. For example, the linear model<br />
<br />
<br />
::<math>y={{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}\,\!</math><br />
<br />
<br />
can be rewritten as<br />
<br />
<br />
::<math>\begin{align}<br />
& y={{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}} \\ <br />
& ={{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{3}}\left( 1-{{x}_{1}}-{{x}_{2}} \right) \\ <br />
& ={{\beta }_{3}}+\left( {{\beta }_{1}}-{{\beta }_{3}} \right){{x}_{1}}+\left( {{\beta }_{2}}-{{\beta }_{3}} \right){{x}_{2}} \\ <br />
& ={{\beta }_{0}}+\beta _{1}^{'}{{x}_{1}}+\beta _{2}^{'}{{x}_{2}} <br />
\end{align}\,\!</math><br />
<br />
<br />
where <math>{{\beta }_{0}}={{\beta }_{3}}\,\!</math>, <math>\beta _{1}^{'}={{\beta }_{1}}-{{\beta }_{3}}\,\!</math>, <math>\beta _{2}^{'}={{\beta }_{2}}-{{\beta }_{3}}\,\!</math>. All other models such as the quadratic, cubic and special cubic model can be reformatted using the same procedure. By including the intercept in the model, the correct sum of squares can be calculated in the ANOVA table. If ANOVA analysis is conducted directly using the Scheffe type models, the result will be incorrect.<br />
<br />
===L-Pseudocomponent, Proportion, and Actual Values===<br />
<br />
In mixture designs, the total amount of the mixture is usually given. For example, we can make either a one-pound or a two-pound cake. Regardless of whether the cake is one or two pounds, the proportion of each ingredient is the same. When the total amount is given, the upper and lower limits for each ingredient are usually given in amounts, which is easier for the experimenter to understand. Of course, if the limits or other constraints are given in terms of proportions, these proportions need be converted to the real amount values when conducting the experiment. To keep everything consistent, all the constraints in DOE++ are treated as amounts.<br />
<br />
In regular factorial design and response surface methods, the regression model is calculated using coded values. Coded values scale all the factors to the same magnitude, which makes the analysis much easier and reduces convergence error. Similarly, the analysis in mixture design is conducted using the so-called L-pseudocomponent value. L-pseudocomponent values scale all the components' values within 0 and 1. In DOE++ all the designs and calculations for mixture factors are based on L-pseudocomponent values. The relationship between L-pseudocomponent values, proportions and actual amounts are explained next. <br />
<br />
====Example for L-Pseudocomponent Value====<br />
<br />
We are going to make one gallon (about 3.8 liters) of fruit punch. Three ingredients will be in the punch with the following constraints.<br />
<br />
<br />
::<math>1.2\le A\le 3.8\,\!</math>, <math>1.5\le B\le 3\,\!</math>, <math>0\le C\le 3.8\,\!</math><br />
<br />
<br />
Let <math>x_{i}^{A}\,\!</math> (i = 1, 2, 3) be the actual amount value, <math>x_{i}^{{}}\,\!</math> be the L-pseudocomponent value and <math>x_{i}^{R}\,\!</math> be the proportion value. Then the equations for the conversion between them are:<br />
<br />
<br />
::<math>{{x}_{i}}=\frac{x_{i}^{A}-{{l}_{i}}}{\left( T-\sum\limits_{j=1}^{p}{{{l}_{j}}} \right)}\,\!</math>, <math>x_{i}^{A}={{l}_{i}}+\left( T-\sum\limits_{j=1}^{p}{{{l}_{i}}} \right){{x}_{i}}\,\!</math>, <math>x_{i}^{R}=\frac{x_{i}^{A}}{T}\,\!</math><br />
<br />
<br />
where <math>{{x}_{1}}\,\!</math>, <math>x_{1}^{A}\,\!</math> and <br />
<math>x_{1}^{R}\,\!</math> are for component A, <math>{{x}_{2}}\,\!</math>, <math>x_{2}^{A}\,\!</math> and <math>x_{2}^{R}\,\!</math> are for component B, and <math>{{x}_{3}}\,\!</math>, <math>x_{3}^{A}\,\!</math> and <math>x_{3}^{R}\,\!</math> are for component C. <br />
<br />
Since components in this example have both lower and upper limit constraints, an extreme vertex design is used. The design settings are given below.<br />
<br />
[[Image:doe_14.9.png|500 px|center]]<br />
<br />
The created design in terms of L-pseudocomponent values is:<br />
<br />
[[Image:doe_14.10.png|500 px|center]]<br />
<br />
Displayed in amount values, it is:<br />
<br />
[[Image:doe_14.11.png|500 px|center]]<br />
<br />
Displayed in proportion values, it is:<br />
<br />
[[Image:doe_14.12.png|500 px|center]]<br />
<br />
====Check Constraint Consistency====<br />
<br />
In the above example, all the constraints are consistent. However, if we set the constraints to <br />
<br />
<br />
::<math>1.2\le A\le 3.8\,\!</math>, <math>1.5\le B\le 3\,\!</math>, <math>2\le C\le 3.8,\,\!</math><br />
<br />
<br />
then they are not consistent. This is because the total is only 3.8, but the sum of all the lower limits is 4.7. Therefore, not all the lower limits can be satisfied at the same time. If only lower limits and upper limits are presented for all the components, then we can adjust the lower bounds to make the constraints consistent. The method given by [Pieple 1983] is used and summarized below.<br />
<br />
Defined the range of a component to be <math>{{R}_{i}}={{U}_{i}}-{{L}_{i}}\,\!</math>. <math>{{U}_{i}}\,\!</math> and <math>{{L}_{i}}\,\!</math> are the upper and lower limit for component i. The implied range of component i is <br />
<math>R_{i}^{*}=U_{i}^{*}-L_{i}^{*}\,\!</math>, where <math>L_{i}^{*}=T-\sum\limits_{j\ne i}^{q}{{{U}_{i}}}\,\!</math>, and <br />
<math>U_{i}^{*}=T-\sum\limits_{j\ne i}^{q}{{{L}_{i}}}\,\!</math>. T is the total amount. The steps for checking and adjusting bounds are given below. <br />
<br />
Step 1: Check if <math>L_{i}^{*}\,\!</math> and <math>U_{i}^{*}\,\!</math> are greater than 0, if they are, then these constraints meet the basic requirement to be consistent. We can move forward to step 2. If not, these constraints cannot be adjusted to be consistent. We should stop. <br />
<br />
Step 2: For each component, check if <math>{{L}_{i}}\ge L_{i}^{*}\,\!</math> and <math>{{U}_{i}}\le U_{i}^{*}\,\!</math>. If they are, then this component’s constraints are consistent. Otherwise, if <math>{{L}_{i}}<L_{i}^{*}\,\!</math>, then set <math>{{L}_{i}}=L_{i}^{*}\,\!</math>, and if <math>{{U}_{i}}>U_{i}^{*}\,\!</math>, then set <math>{{U}_{i}}=U_{i}^{*}\,\!</math>.<br />
<br />
Step 3: Whenever a bound is changed, restart from Step 1 to use the new bound to check if all the constraints are consistent. Repeat this until all the limits are consistent. <br />
<br />
For extreme vertex design where linear constraints are allowed, DOE++ will give a warning and stop creating the design if inconsistent linear combination constraints are found. No adjustment will be conducted for linear constraints.<br />
<br />
===Response Trace Plot===<br />
<br />
Due to the correlation between all the components, the regular t-test is not used to test the significance of each component. A special plot called the Response Trace Plot can be used to see how the response changes when each component changes from its reference point [John Cornell]. <br />
<br />
A reference point can be any point inside the experiment space. An imaginary line can be drawn from this reference point to each vertex <math>{{x}_{i}}=1\,\!</math>, and <math>{{x}_{j}}=0\,\!</math> (<math>i\ne j\,\!</math>). This line is the direction for component i to change. Component i can either increase or decrease its value along this line, while the ratio of other components <math>{{{x}_{j}}}/{{{x}_{k}}}\,\!</math> (<math>j,k\ne i\,\!</math>) will keep constant. If the simplex plot is defined in terms of proportion, then the direction is called Cox’s direction, and <math>{{{x}_{j}}}/{{{x}_{k}}}\,\!</math> is the ratio of proportion. If the simplex plot is defined in terms of pseduocomponent value, then the direction is called Pieple’s direction, and <math>{{{x}_{j}}}/{{{x}_{k}}}\,\!</math> will be the ratio of pseduocomponent values. <br />
<br />
Assume the reference point in terms of proportion is <math>s=\left( {{s}_{1}},{{s}_{2}},...,{{s}_{q}} \right)\,\!</math> where <math>{{s}_{1}}+{{s}_{2}}+...+{{s}_{q}}=1\,\!</math>. Suppose the proportion of component <math>i\,\!</math> at <math>{{s}_{i}}\,\!</math> is now changed by <math>{{\Delta }_{i}}\,\!</math> (<math>{{\Delta }_{i}}\,\!</math> could be greater than or less than 0) in Cox’s direction, so that the new proportion becomes <math>{{x}_{i}}={{s}_{i}}+{{\Delta }_{i}}\,\!</math><br />
<br />
Then the proportions of the remaining <math>q-1\,\!</math> components resulting from the change from <math>{{s}_{i}}\,\!</math> will be<br />
<br />
::<math>{{x}_{j}}={{s}_{j}}-\frac{{{\Delta }_{i}}{{s}_{j}}}{1-{{s}_{i}}}\,\!</math><br />
<br />
After the change, the ratio of component j and k is unchanged. This is because<br />
<br />
::<math>\frac{{{x}_{j}}}{{{x}_{k}}}=\frac{{{s}_{j}}-\frac{{{\Delta }_{i}}{{s}_{j}}}{1-{{s}_{i}}}}{{{s}_{k}}-\frac{{{\Delta }_{i}}{{s}_{k}}}{1-{{s}_{i}}}}=\frac{{{s}_{j}}}{{{s}_{k}}}\frac{\frac{{{\Delta }_{i}}}{1-{{s}_{i}}}}{\frac{{{\Delta }_{i}}}{1-{{s}_{i}}}}=\frac{{{s}_{j}}}{{{s}_{k}}}\,\!</math> <br />
<br />
While <math>{{x}_{i}}\,\!</math> is changed along Cox’s direction, we can use a fitted regression model to get the response value y. A response trace plot for a mixture design with three components will look like <br />
<br />
[[Image:doe_14.13.png|500 px|center]]<br />
<br />
The x-axis is the deviation amount from the reference point, and the y-value is the fitted response. Each component has one curve. Since the red curve for component A changes significantly, this means it has a significant effect along its axial. The blue curve for component C is almost flat; this means when C changes along Cox’s direction and other components keep the same ratio, the response Y does not change very much. The effect of component B is between component A and C.<br />
<br />
===Example===<br />
<br />
Watermelon (A), pineapple (B) and orange juice (C) are used for making 3.8 liters of fruit punch. At least 30% of the fruit punch must be watermelon. Therefore the constraints are<br />
<br />
<br />
::<math>1.14\le A\le 3.8\,\!</math>, <math>0\le B\le 3.8\,\!</math>, <math>0\le C\le 3.8,\,\!</math><br />
<br />
<br />
Different blends of the three-juice recipe were evaluated by a panel. A value from 1 (extremely poor) to 9 (very good) is used for the response [John Cornell, page 74]. A {3, 2} simplex lattice design is used with one center point and three axial points. Three replicates were conducted for each ingredient combination. The settings for creating this design in DOE++ is <br />
<br />
[[Image:doe_14.14.png|500 px|center]]<br />
<br />
The generated design in L-pseudocomponent values and the response values from the experiment are<br />
<br />
[[Image:doe_14.15.png|500 px|center]]<br />
<br />
The simplex design point plot is<br />
<br />
[[Image:doe_14.16.png|500 px|center]]<br />
<br />
Main effect and 2-way interactions are included in the regression model. The result for the regression model in terms of L-pseudocomponents is<br />
<br />
<br />
::<math>y=4.81{{x}_{1}}+6.03{{x}_{2}}+6.16{{x}_{3}}+1.13{{x}_{1}}{{x}_{2}}+2.45{{x}_{1}}{{x}_{3}}+1.69{{x}_{2}}{{x}_{3}}\,\!</math><br />
<br />
<br />
The regression information table is<br />
<br />
{| border="1" border="1" cellpadding="5" cellspacing="0" align="center"<br />
! colspan="8" style="text-align: center; font-weight: bold;" style="background:#CCCCCC"| Regression Information<br />
|-<br />
| style="text-align: center; font-weight: bold;" | Term<br />
| style="text-align: center; font-weight: bold;" | Coefficient<br />
| style="text-align: center; font-weight: bold;" | Standard Error<br />
| style="text-align: center; font-weight: bold;" | Low Confidence<br />
| style="text-align: center; font-weight: bold;" | High Confidence<br />
| style="text-align: center; font-weight: bold;" | T Value<br />
| style="text-align: center; font-weight: bold;" | P Value<br />
| style="text-align: center; font-weight: bold;" | Variance Inflation Factor<br />
|-<br />
| style="text-align: center;" | A: Watermelon<br />
| style="text-align: center;" | 4.8093<br />
| style="text-align: center;" | 0.3067<br />
| style="text-align: center;" | 4.2845<br />
| style="text-align: center;" | 5.3340<br />
| style="text-align: center;" | <br />
| style="text-align: center;" | <br />
| style="text-align: center;" | 1.9636<br />
|-<br />
| style="text-align: center;" | B: Pineapple<br />
| style="text-align: center;" | 6.0274<br />
| style="text-align: center;" | 0.3067<br />
| style="text-align: center;" | 5.5027<br />
| style="text-align: center;" | 6.5522<br />
| style="text-align: center;" | <br />
| style="text-align: center;" | <br />
| style="text-align: center;" | 1.9636<br />
|-<br />
| style="text-align: center;" | C: Orange<br />
| style="text-align: center;" | 6.1577<br />
| style="text-align: center;" | 0.3067<br />
| style="text-align: center;" | 5.6330<br />
| style="text-align: center;" | 6.6825<br />
| style="text-align: center;" | <br />
| style="text-align: center;" | <br />
| style="text-align: center;" | 1.9636<br />
|-<br />
| style="text-align: center;" | A • B<br />
| style="text-align: center;" | 1.1253<br />
| style="text-align: center;" | 1.4137<br />
| style="text-align: center;" | -1.2934<br />
| style="text-align: center;" | 3.5439<br />
| style="text-align: center;" | 0.7960<br />
| style="text-align: center;" | 0.4339<br />
| style="text-align: center;" | 1.9819<br />
|-<br />
| style="text-align: center;" | <span style="color:red;"> A • C </span><br />
| style="text-align: center;" | 2.4525<br />
| style="text-align: center;" | 1.4137<br />
| style="text-align: center;" | 0.0339<br />
| style="text-align: center;" | 4.8712<br />
| style="text-align: center;" | 1.7348<br />
| style="text-align: center;" | <span style="color:red;"> 0.0956 </span><br />
| style="text-align: center;" | 1.9819<br />
|-<br />
| style="text-align: center;" | B • C<br />
| style="text-align: center;" | 1.6889<br />
| style="text-align: center;" | 1.4137<br />
| style="text-align: center;" | -0.7298<br />
| style="text-align: center;" | 4.1075<br />
| style="text-align: center;" | 1.1947<br />
| style="text-align: center;" | 0.2439<br />
| style="text-align: center;" | 1.9819<br />
|}<br />
<br />
<br />
The result shows that the taste of the fruit punch is significantly affected by the interaction between watermelon and orange. <br />
<br />
The ANOVA table is<br />
<br />
{| border="1" border="1" cellpadding="5" cellspacing="0" align="center"<br />
! colspan="6" style="text-align: center; font-weight: bold;" style="background:#CCCCCC"| Anova Table<br />
|-<br />
| style="text-align: center; font-weight: bold;" | Source of Variation<br />
| style="text-align: center; font-weight: bold;" | Degrees of Freedom<br />
| style="text-align: center; font-weight: bold;" | Standard ErrorSum of Squares [Partial]<br />
| style="text-align: center; font-weight: bold;" | Mean Squares [Partial]<br />
| style="text-align: center; font-weight: bold;" | F Ratio<br />
| style="text-align: center; font-weight: bold;" | P Value<br />
|-<br />
| <span style="color:red;"> Model </span><br />
| style="text-align: center;" | 5<br />
| style="text-align: center;" | 6.5517<br />
| style="text-align: center;" | 1.3103<br />
| style="text-align: center;" | 4.3181<br />
| style="text-align: center;" | <span style="color:red;"> 0.0061 </span><br />
|-<br />
| style="text-align: center;" | <span style="color:red;"> Linear </span><br />
| style="text-align: center;" | 2<br />
| style="text-align: center;" | 3.6513<br />
| style="text-align: center;" | 1.8256<br />
| style="text-align: center;" | 6.0162<br />
| style="text-align: center;" | <span style="color:red;"> 0.0076 </span><br />
|-<br />
| style="text-align: center;" | A • B<br />
| style="text-align: center;" | 1<br />
| style="text-align: center;" | 0.1923<br />
| style="text-align: center;" | 0.1923<br />
| style="text-align: center;" | 0.6336<br />
| style="text-align: center;" | 0.4339<br />
|-<br />
| style="text-align: center;" | <span style="color:red;"> A • C </span><br />
| style="text-align: center;" | 1<br />
| style="text-align: center;" | 0.9133<br />
| style="text-align: center;" | 0.9133<br />
| style="text-align: center;" | 3.0097<br />
| style="text-align: center;" | <span style="color:red;"> 0.0956 </span><br />
|-<br />
| style="text-align: center;" | B • C<br />
| style="text-align: center;" | 1<br />
| style="text-align: center;" | 0.4331<br />
| style="text-align: center;" | 0.4331<br />
| style="text-align: center;" | 1.4272<br />
| style="text-align: center;" | 0.2439<br />
|-<br />
| Residual<br />
| style="text-align: center;" | 24<br />
| style="text-align: center;" | 7.2829<br />
| style="text-align: center;" | 0.3035<br />
| style="text-align: center;" | <br />
| style="text-align: center;" | <br />
|-<br />
| style="text-align: center;" | <span style="color:red;"> Lack of Fit </span><br />
| style="text-align: center;" | 4<br />
| style="text-align: center;" | 4.4563<br />
| style="text-align: center;" | 1.1141<br />
| style="text-align: center;" | 7.8825<br />
| style="text-align: center;" | <span style="color:red;"> 0.0006 </span><br />
|-<br />
| style="text-align: center;" | Pure Error<br />
| style="text-align: center;" | 20<br />
| style="text-align: center;" | 2.8267<br />
| style="text-align: center;" | 0.1413<br />
| <br />
| <br />
|-<br />
| Total<br />
| style="text-align: center;" | 29<br />
| style="text-align: center;" | 13.8347<br />
| style="text-align: center;" | <br />
| <br />
| <br />
|} <br />
<br />
The simplex contour plot in L-pseudocomponent values is<br />
<br />
[[Image:doe_14.17.png|500 px|center]]<br />
<br />
From this plot we can see that as the amount of watermelon is reduced, the taste of the fruit punch becomes better.<br />
<br />
<br />
<br />
In order to find the best proportion of each ingredient, the optimization tool in DOE++ can be utilized. Set the settings as<br />
<br />
[[Image:doe_14.18.png|500 px|center]]<br />
<br />
The resulting optimal plot is<br />
<br />
[[Image:doe_14.19.png|500 px|center]]<br />
<br />
This plot shows that when the amounts for watermelon, pineapple and orange juice are 1.141, 1.299 and 1.359, respectively, the rated taste of the fruit punch is highest.<br />
<br />
==Mixture Design with Process Variables==<br />
<br />
Process variables often play very important roles in mixture experiments. A simple example is baking a cake. Even with the same ingredients, different baking temperatures and baking times can produce completely different results. In order to study the effect of process variables and find their best settings, we need to consider them when conducting a mixture experiment. <br />
<br />
An easy way to do this is to make mixtures with the same ingredients in different combinations of process variables. If all the process variables are independent, then we can plan a regular factorial design for these process variables. By combining these designs with a separated mixture design, the effect of mixture components and effect of process variables can be studied. <br />
<br />
For example, a {3, 2} simplex lattice design is used for a mixture with 3 components. Together with the center point, it has total of 7 runs or 7 different ingredient combinations. Assume 2 process variables are potentially important and a two level factorial design is used for them. It has a total of 4 combinations for these 2 process variables. If the 7 different mixtures are made under each of the 4 process variable combinations, then the experiment has a total of 28 runs. This is illustrated in the figure below. <br />
<br />
[[Image:doe_14.20.png|500 px|center]]<br />
<br />
Of course, if it is possible, all the 28 experiments should be conducted in a random order. <br />
<br />
===Model with Process Variables===<br />
<br />
In DOE++, regression models including both mixture components and process variables are available. For mixture components, we use L-pseudocomponent values, and for process variables coded values are used. <br />
<br />
Assume a design has 3 mixture components and 2 process variables, as illustrated in the above figure. We can use the following models for them. <br />
<br />
*For the 3 mixture components, the following special cubic model is used.<br />
<br />
<br />
::<math>y={{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+{{\beta }_{13}}{{x}_{1}}{{x}_{3}}+{{\beta }_{23}}{{x}_{2}}{{x}_{3}}+{{\beta }_{123}}{{x}_{1}}{{x}_{2}}{{x}_{3}}\,\!</math><br />
<br />
<br />
*For the 2 process variables the following model is used.<br />
<br />
<br />
::<math>y={{\alpha }_{0}}+{{\alpha }_{1}}{{z}_{1}}+{{\alpha }_{2}}{{z}_{2}}+{{\alpha }_{12}}{{z}_{1}}{{z}_{2}}\,\!</math><br />
<br />
<br />
*The combined model with both mixture components and process variables is<br />
<br />
<br />
::<math>\begin{align}<br />
& y=\sum\limits_{i=1}^{3}{\gamma _{i}^{0}{{x}_{i}}}+\sum{\sum\limits_{i<j}^{3}{\gamma _{ij}^{0}{{x}_{i}}{{x}_{j}}}+}\gamma _{123}^{0}{{x}_{1}}{{x}_{2}}{{x}_{3}} \\ <br />
& +\left( \sum\limits_{i=1}^{3}{\gamma _{i}^{1}{{x}_{i}}}+\sum{\sum\limits_{i<j}^{3}{\gamma _{ij}^{1}{{x}_{i}}{{x}_{j}}}+}\gamma _{123}^{1}{{x}_{1}}{{x}_{2}}{{x}_{3}} \right){{z}_{1}} \\ <br />
& +\left( \sum\limits_{i=1}^{3}{\gamma _{i}^{2}{{x}_{i}}}+\sum{\sum\limits_{i<j}^{3}{\gamma _{ij}^{2}{{x}_{i}}{{x}_{j}}}+}\gamma _{123}^{2}{{x}_{1}}{{x}_{2}}{{x}_{3}} \right){{z}_{2}} \\ <br />
& +\left( \sum\limits_{i=1}^{3}{\gamma _{i}^{12}{{x}_{i}}}+\sum{\sum\limits_{i<j}^{3}{\gamma _{ij}^{12}{{x}_{i}}{{x}_{j}}}+}\gamma _{123}^{12}{{x}_{1}}{{x}_{2}}{{x}_{3}} \right){{z}_{1}}{{z}_{2}} <br />
\end{align}\,\!</math><br />
<br />
<br />
The above combined model has total of 7x4=28 terms. By expanding it, we get the following model:<br />
<br />
<br />
::<math>\begin{align}<br />
& y=\gamma _{1}^{0}{{x}_{1}}+\gamma _{2}^{0}{{x}_{2}}+\gamma _{3}^{0}{{x}_{3}}+\gamma _{12}^{0}{{x}_{1}}{{x}_{2}}+\gamma _{13}^{0}{{x}_{1}}{{x}_{3}}+\gamma _{23}^{0}{{x}_{2}}{{x}_{3}}+\gamma _{123}^{0}{{x}_{1}}{{x}_{2}}{{x}_{3}} \\ <br />
& +\gamma _{1}^{1}{{x}_{1}}{{z}_{1}}+\gamma _{2}^{1}{{x}_{2}}{{z}_{1}}+\gamma _{3}^{1}{{x}_{3}}{{z}_{1}}+\gamma _{12}^{1}{{x}_{1}}{{x}_{2}}{{z}_{1}}+\gamma _{13}^{1}{{x}_{1}}{{x}_{3}}{{z}_{1}}+\gamma _{23}^{1}{{x}_{2}}{{x}_{3}}{{z}_{1}}+\gamma _{123}^{1}{{x}_{1}}{{x}_{2}}{{x}_{3}}{{z}_{1}} \\ <br />
& +\gamma _{1}^{2}{{x}_{1}}{{z}_{2}}+\gamma _{2}^{2}{{x}_{2}}{{z}_{2}}+\gamma _{3}^{2}{{x}_{3}}{{z}_{2}}+\gamma _{12}^{2}{{x}_{1}}{{x}_{2}}{{z}_{2}}+\gamma _{13}^{2}{{x}_{1}}{{x}_{3}}{{z}_{2}}+\gamma _{23}^{2}{{x}_{2}}{{x}_{3}}{{z}_{2}}+\gamma _{123}^{2}{{x}_{1}}{{x}_{2}}{{x}_{3}}{{z}_{2}} \\ <br />
& +\gamma _{1}^{12}{{x}_{1}}{{z}_{1}}{{z}_{2}}+\gamma _{2}^{12}{{x}_{2}}{{z}_{1}}{{z}_{2}}+\gamma _{3}^{12}{{x}_{3}}{{z}_{1}}{{z}_{2}}+\gamma _{12}^{12}{{x}_{1}}{{x}_{2}}{{z}_{1}}{{z}_{2}}+\gamma _{13}^{12}{{x}_{1}}{{x}_{3}}{{z}_{1}}{{z}_{2}}+\gamma _{23}^{12}{{x}_{2}}{{x}_{3}}{{z}_{1}}{{z}_{2}}+\gamma _{123}^{12}{{x}_{1}}{{x}_{2}}{{x}_{3}}{{z}_{1}}{{z}_{2}} <br />
\end{align}\,\!</math><br />
<br />
<br />
The combined model basically crosses every term in the mixture components model with every term in the process variables model. From a mathematical point of view, this model is just a regular regression model. Therefore, the traditional regression analysis method can still be used for obtaining the model coefficients and calculating the ANOVA table.<br />
<br />
===Example===<br />
<br />
Three kinds of meats (beef, pork and lamb) are mixed together to form burger patties. The meat comprises 90% of the total mixture, with the remaining 10% reserved for flavoring ingredients. A {3, 2} simplex design with the center point is used for the experiment. The design has 7 meat combinations, which are given below using L-pseudocomponent values. <br />
<br />
{| border="1" cellpadding="5" cellspacing="0" align="center"<br />
!A: Beef<br />
!B: Pork<br />
!C: Lamb<br />
|-<br />
|style="text-align: center"|1||style="text-align: center"| 0||style="text-align: center"| 0<br />
|-<br />
|style="text-align: center"|0.5||style="text-align: center"| 0.5||style="text-align: center"| 0<br />
|-<br />
|style="text-align: center"|0.5||style="text-align: center"| 0||style="text-align: center"| 0.5<br />
|-<br />
|style="text-align: center"|0||style="text-align: center"| 1||style="text-align: center"| 0<br />
|-<br />
|style="text-align: center"|0||style="text-align: center"| 0.5||style="text-align: center"| 0.5<br />
|-<br />
|style="text-align: center"|0||style="text-align: center"| 0||style="text-align: center"| 1<br />
|-<br />
|style="text-align: center"|0.333333||style="text-align: center"| 0.333333||style="text-align: center"| 0.333333<br />
|}<br />
<br />
Two process variables on making the patties are also studied: cooking temperature and cooking time. The low and high temperature values are 375°F and 425°F, and the low and high time values are 25 and 40 minutes. A two level full factorial design is used and displayed below with coded values. <br />
<br />
{| border="1" cellpadding="5" cellspacing="0" align="center" style="text-align:center" <br />
!Temperature<br />
!Time<br />
|-<br />
| -1|| -1<br />
|-<br />
| -1|| 1<br />
|-<br />
| 1|| -1<br />
|-<br />
| 1|| 1<br />
|}<br />
<br />
One of the properties of the burger patties is texture. The texture is measured by a compression test that measures the grams of force required to puncture the surface of the patty. <br />
<br />
Combining the simplex design and the factorial design together, we get the following 28 runs. The corresponding texture reading for each blend is also provided. <br />
<br />
{| border="1" cellpadding="5" cellspacing="0" align="center" style="text-align:center" <br />
!Standard Order<br />
!A: Beef<br />
!B: Pork<br />
!C: Lamb<br />
!Z1: Temperature<br />
!Z2: Time<br />
!Texture (<math>10^3\,\!</math> gram)<br />
|-<br />
|1|| 1|| 0|| 0|| -1|| -1|| 1.84<br />
|-<br />
|2|| 0.5|| 0.5|| 0|| -1|| -1|| 0.67<br />
|-<br />
|3|| 0.5|| 0|| 0.5|| -1|| -1|| 1.51<br />
|-<br />
|4|| 0|| 1|| 0|| -1|| -1|| 1.29<br />
|-<br />
|5|| 0|| 0.5|| 0.5|| -1|| -1|| 1.42<br />
|-<br />
|6|| 0|| 0|| 1|| -1|| -1|| 1.16<br />
|-<br />
|7|| 0.333|| 0.333|| 0.333|| -1|| -1|| 1.59<br />
|-<br />
|8|| 1|| 0|| 0|| 1|| -1|| 2.86<br />
|-<br />
|9|| 0.5|| 0.5|| 0|| 1|| -1|| 1.1<br />
|-<br />
|10|| 0.5|| 0|| 0.5|| 1|| -1|| 1.6<br />
|-<br />
|11|| 0|| 1|| 0|| 1|| -1|| 1.53<br />
|-<br />
|12|| 0|| 0.5|| 0.5|| 1|| -1|| 1.81<br />
|-<br />
|13|| 0|| 0|| 1|| 1|| -1|| 1.5<br />
|-<br />
|14|| 0.333|| 0.333|| 0.333|| 1|| -1|| 1.68<br />
|-<br />
|15|| 1|| 0|| 0|| -1|| 1|| 3.01<br />
|-<br />
|16|| 0.5|| 0.5|| 0|| -1|| 1|| 1.21<br />
|-<br />
|17|| 0.5|| 0|| 0.5|| -1|| 1|| 2.32<br />
|-<br />
|18|| 0|| 1|| 0|| -1|| 1|| 1.93<br />
|-<br />
|19|| 0|| 0.5|| 0.5|| -1|| 1|| 2.57<br />
|-<br />
|20|| 0|| 0|| 1|| -1|| 1|| 1.83<br />
|-<br />
|21|| 0.333|| 0.3333|| 0.333|| -1|| 1|| 1.94<br />
|-<br />
|22|| 1|| 0|| 0|| 1|| 1|| 4.13<br />
|-<br />
|23|| 0.5|| 0.5|| 0|| 1|| 1|| 1.67<br />
|-<br />
|24|| 0.5|| 0|| 0.5|| 1|| 1|| 2.57<br />
|-<br />
|25|| 0|| 1|| 0|| 1|| 1|| 2.26<br />
|-<br />
|26|| 0|| 0.5|| 0.5|| 1|| 1|| 3.15<br />
|-<br />
|27|| 0|| 0|| 1|| 1|| 1|| 2.22<br />
|-<br />
|28|| 0.333|| 0.333|| 0.333|| 1|| 1|| 2.6<br />
|}<br />
<br />
Using a quadratic model for the mixture component and a 2-way interaction model for the process variables, we get the following results. <br />
<br />
{| border="1" cellpadding="5" cellspacing="0" align="center" style="text-align:center" <br />
!Term<br />
!Coefficient<br />
!Standard Error<br />
!T Value<br />
!P Value<br />
!Variance Inflation Factor<br />
|-<br />
|<span style="color:red;"> A:Beef </span>|| 2.9421|| 0.1236|| *|| *|| 1.5989<br />
|-<br />
|<span style="color:red;"> B:Pork </span>|| 1.7346|| 0.1236|| *|| *|| 1.5989<br />
|-<br />
|<span style="color:red;"> C:Lamb </span>|| 1.6596|| 0.1236|| *|| *|| 1.5989<br />
|-<br />
|<span style="color:red;"> A • B </span>|| -4.4170|| 0.5680|| -7.7766|| <span style="color:red;"> 0.0015 </span>|| 1.5695<br />
|-<br />
|A • C|| -0.9170|| 0.5680|| -1.6146|| 0.1817|| 1.5695<br />
|-<br />
|<span style="color:red;"> B • C </span>|| 2.4480|| 0.5680|| 4.3099|| <span style="color:red;"> 0.0125 </span>|| 1.5695<br />
|-<br />
|<span style="color:red;"> Z1 • A </span>|| 0.5324|| 0.1236|| 4.3084|| <span style="color:red;"> 0.0126 </span>|| 1.5989<br />
|-<br />
|Z1 • B|| 0.1399|| 0.1236|| 1.1319|| 0.3209|| 1.5989<br />
|-<br />
|Z1 • C|| 0.1799|| 0.1236|| 1.4557|| 0.2192|| 1.5989<br />
|-<br />
|Z1 • A • B|| -0.4123|| 0.5680|| -0.7260|| 0.5081|| 1.5695<br />
|-<br />
|Z1 • A • C|| -1.0423|| 0.5680|| -1.8352|| 0.1404|| 1.5695<br />
|-<br />
|Z1 • B • C|| 0.3727|| 0.5680|| 0.6561|| 0.5476|| 1.5695<br />
|-<br />
|<span style="color:red;"> Z2 • A </span>|| 0.6193|| 0.1236|| 5.0117|| <span style="color:red;"> 0.0074 </span>|| 1.5989<br />
|-<br />
|<span style="color:red;"> Z2 • B </span>|| 0.3518|| 0.1236|| 2.8468|| <span style="color:red;"> 0.0465 </span>|| 1.5989<br />
|-<br />
|<span style="color:red;"> Z2 • C </span>|| 0.3568|| 0.1236|| 2.8873|| <span style="color:red;"> 0.0447 </span>|| 1.5989<br />
|-<br />
|Z2 • A • B|| -0.9802|| 0.5680|| -1.7258|| 0.1595|| 1.5695<br />
|-<br />
|Z2 • A • C|| -0.3202|| 0.5680|| -0.5638|| 0.6030|| 1.5695<br />
|-<br />
|Z2 • B • C|| 0.9248|| 0.5680|| 1.6282|| 0.1788|| 1.5695<br />
|-<br />
|Z1 • Z2 • A|| 0.0177|| 0.1236|| 0.1433|| 0.8930|| 1.5989<br />
|-<br />
|Z1 • Z2 • B|| 0.0152|| 0.1236|| 0.1231|| 0.9080|| 1.5989<br />
|-<br />
|Z1 • Z2 • C|| 0.0052|| 0.1236|| 0.0422|| 0.9684|| 1.5989<br />
|-<br />
|Z1 • Z2 • A • B|| 0.0808|| 0.5680|| 0.1423|| 0.8937|| 1.5695<br />
|-<br />
|Z1 • Z2 • A • C|| 0.2308|| 0.5680|| 0.4064|| 0.7052|| 1.5695<br />
|-<br />
|Z1 • Z2 • B • C|| 0.2658|| 0.5680|| 0.4680|| 0.6641|| 1.5695<br />
|}<br />
<br />
The above table shows that all the terms with <math>{{z}_{1}}\times {{z}_{2}}\,\!</math> have very large P values, therefore, we can remove these terms from the model. We can also remove other terms with P values larger than 0.5. After recalculating with the desired terms, the final results are<br />
<br />
{| border="1" cellpadding="5" cellspacing="0" align="center" style="text-align:center" <br />
!Term<br />
!Coefficient<br />
!Standard Error<br />
!T Value<br />
!P Value<br />
!Variance Inflation Factor<br />
|-<br />
|<span style="color:red;"> A:Beef </span>|| 2.9421|| 0.0875|| *|| *|| 1.5989<br />
|-<br />
|<span style="color:red;"> B:Pork </span>|| 1.7346|| 0.0875|| *|| *|| 1.5989<br />
|-<br />
|<span style="color:red;"> C:Lamb </span>|| 1.6596|| 0.0875|| *|| *|| 1.5989<br />
|-<br />
|<span style="color:red;"> A • B </span>|| -4.4170|| 0.4023|| -10.9782|| <span style="color:red;"> 6.0305E-08 </span>|| 1.5695<br />
|-<br />
|<span style="color:red;"> A • C </span>|| -0.9170|| 0.4023|| -2.2792|| <span style="color:red;"> 0.0402 </span>|| 1.5695<br />
|-<br />
|<span style="color:red;"> B • C </span>|| 2.4480|| 0.4023|| 6.0842|| <span style="color:red;"> 3.8782E-05 </span>|| 1.5695<br />
|-<br />
|<span style="color:red;"> Z1 • A </span>|| 0.4916|| 0.0799|| 6.1531|| <span style="color:red;"> 3.4705E-05 </span>|| 1.3321<br />
|-<br />
|<span style="color:red;"> Z1 • B </span>|| 0.1365|| 0.0725|| 1.8830|| <span style="color:red;"> 0.0823 </span>|| 1.0971<br />
|-<br />
|<span style="color:red;"> Z1 • C </span>|| 0.2176|| 0.0799|| 2.7235|| <span style="color:red;"> 0.0174 </span>|| 1.3321<br />
|-<br />
|<span style="color:red;"> Z1 • A • C </span>|| -1.0406|| 0.4015|| -2.5916|| <span style="color:red;"> 0.0224 </span>|| 1.5631<br />
|-<br />
|<span style="color:red;"> Z2 • A </span>|| 0.5910|| 0.0800|| 7.3859|| <span style="color:red;"> 5.3010E-06 </span>|| 1.3364<br />
|-<br />
|<span style="color:red;"> Z2 • B </span>|| 0.3541|| 0.0875|| 4.0475|| <span style="color:red;"> 0.0014 </span>|| 1.5971<br />
|-<br />
|<span style="color:red;"> Z2 • C </span>|| 0.3285|| 0.0800|| 4.1056|| <span style="color:red;"> 0.0012 </span>|| 1.3364<br />
|-<br />
|<span style="color:red;"> Z2 • A • B </span>|| -0.9654|| 0.4019|| -2.4020|| <span style="color:red;"> 0.0320 </span>|| 1.5661<br />
|-<br />
|<span style="color:red;"> Z2 • B • C </span>|| 0.9396|| 0.4019|| 2.3378|| <span style="color:red;"> 0.0360 </span>|| 1.5661<br />
|}<br />
<br />
The regression model is<br />
<br />
<br />
::<math>\begin{align}<br />
& y=2.9421{{x}_{1}}+1.7346{{x}_{2}}+1.6596{{x}_{3}}-4.4170{{x}_{1}}{{x}_{2}}-0.9170{{x}_{1}}{{x}_{3}}+2.4480{{x}_{2}}{{x}_{3}} \\ <br />
& +0.4916{{x}_{1}}{{z}_{1}}+0.1365{{x}_{2}}{{z}_{1}}+0.2176{{x}_{3}}{{z}_{1}}-1.0406{{x}_{1}}{{x}_{3}}{{z}_{1}}+0.5910{{x}_{1}}{{z}_{2}} \\ <br />
& +0.3541{{x}_{2}}{{z}_{2}}+0.3285{{x}_{3}}{{z}_{2}}-0.9654{{x}_{1}}{{x}_{2}}{{z}_{2}}+0.9396{{x}_{2}}{{x}_{3}}{{z}_{2}} <br />
\end{align}\,\!</math><br />
<br />
<br />
The ANOVA table for this model is<br />
<br />
{| border="1" cellpadding="5" cellspacing="0" align="center" <br />
! colspan="6" style="background: #CCCCCC;"|ANOVA Table<br />
|-<br />
!Source of Variation||Degrees of Freedom||Sum of Squares [Partial]|| Mean Squares [Partial]|| F Ratio|| P Value<br />
|-<br />
|<span style="color:red"> Model</span>||style="text-align: center"| 14||style="text-align: center"| 14.5066||style="text-align: center"| 1.0362|| style="text-align: center"|33.5558||style="text-align: center"| 6.8938E-08<br />
|-<br />
|style="text-align: center"|Component Only || || || || || <br />
|-<br />
|style="text-align: right"|<span style="color:red"> Linear </span> ||style="text-align: center"| 2||style="text-align: center"| 4.1446||style="text-align: center"| 2.0723||style="text-align: center"| 67.1102||style="text-align: center"| 1.4088E-07<br />
|-<br />
|style="text-align: right"|<span style="color:red"> A • B </span>||style="text-align: center"| 1||style="text-align: center"| 3.7216||style="text-align: center"| 3.7216||style="text-align: center"| 120.5208||style="text-align: center"| 6.0305E-08<br />
|-<br />
|style="text-align: right"|<span style="color:red"> A • C </span> ||style="text-align: center"| 1||style="text-align: center"| 0.1604||style="text-align: center"| 0.1604||style="text-align: center"| 5.1949||style="text-align: center"| 0.0402<br />
|-<br />
|style="text-align: right"|<span style="color:red"> B • C </span>||style="text-align: center"| 1||style="text-align: center"| 1.1431||style="text-align: center"| 1.1431||style="text-align: center"| 37.0173 ||style="text-align: center"|3.8782E-05<br />
|-<br />
|style="text-align: center"|Component • Z1|| || || || || <br />
|-<br />
|style="text-align: right"| <span style="color:red"> Z1 • A </span>||style="text-align: center"|1||style="text-align: center"| 1.1691||style="text-align: center"| 1.1691||style="text-align: center"| 37.8604||style="text-align: center"| 3.4705E-05<br />
|-<br />
|style="text-align: right"|<span style="color:red"> Z1 • B </span> ||style="text-align: center"| 1||style="text-align: center"| 0.1095||style="text-align: center"| 0.1095||style="text-align: center"| 3.5456||style="text-align: center"| 0.0823<br />
|-<br />
|style="text-align: right"|<span style="color:red"> Z1 • C </span>||style="text-align: center"| 1||style="text-align: center"| 0.2290||style="text-align: center"| 0.2290||style="text-align: center"| 7.4172||style="text-align: center"| 0.0174<br />
|-<br />
|style="text-align: right"|<span style="color:red"> Z1 • A • C </span>||style="text-align: center"| 1||style="text-align: center"| 0.2074||style="text-align: center"| 0.2074||style="text-align: center"| 6.7165||style="text-align: center"| 0.0224<br />
|-<br />
|style="text-align: center"|Component • Z2|| || || || || <br />
|-<br />
|style="text-align: right"|<span style="color:red"> Z2 • A </span>||style="text-align: center"| 1||style="text-align: center"| 1.6845||style="text-align: center"| 1.6845|| style="text-align: center"|54.5517||style="text-align: center"| 5.3010E-06<br />
|-<br />
|style="text-align: right"|<span style="color:red"> Z2 • B </span>||style="text-align: center"| 1||style="text-align: center"| 0.5059||style="text-align: center"| 0.5059||style="text-align: center"| 16.3819||style="text-align: center"| 0.0014<br />
|-<br />
|style="text-align: right"|<span style="color:red"> Z2 • C </span>||style="text-align: center"| 1||style="text-align: center"| 0.5205||style="text-align: center"| 0.5205||style="text-align: center"| 16.8556||style="text-align: center"| 0.0012<br />
|-<br />
|style="text-align: right"|<span style="color:red"> Z2 • A • B </span>||style="text-align: center"| 1||style="text-align: center"| 0.1782||style="text-align: center"| 0.1782||style="text-align: center"| 5.7698||style="text-align: center"| 0.0320<br />
|-<br />
|style="text-align: right"|<span style="color:red"> Z2 • B • C </span>||style="text-align: center"| 1||style="text-align: center"| 0.1688||style="text-align: center"| 0.1688||style="text-align: center"| 5.4651||style="text-align: center"| 0.0360<br />
|-<br />
|style="text-align: left"|Residual ||style="text-align: center"| 13||style="text-align: center"| 0.4014||style="text-align: center"| 0.0309|| || <br />
|-<br />
|style="text-align: center"|Lack of Fit||style="text-align: center"| 13||style="text-align: center"| 0.4014||style="text-align: center"| 0.0309 || || <br />
|-<br />
|style="text-align: left"|Total||style="text-align: center"| 27||style="text-align: center"| 14.9080|| || || <br />
|}<br />
<br />
The above table shows both process factors have significant effects on the texture of the patties. Since the model is pretty complicate, the best settings for the process variables and for components cannot be easily identified. <br />
<br />
The optimization tool in DOE++ is used for the above model. The target texture value is <math>3\times {{10}^{3}}\,\!</math> grams with an acceptable range of <math>2.5-3.5\times {{10}^{3}}\,\!</math> grams. <br />
<br />
[[Image:doe_14.21.png|600 px|center]]<br />
<br />
The optimal solution is Beef = 98.5%, Pork = 0.7%, Lamb = 0.7%, Temperature = 375.7, and Time = 40.<br />
<br />
==References==<br />
<br/><br />
1. Cornell, John (2002), Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data, John Wiley & Sons, Inc. New York.<br />
<br/><br />
2. Piepel, G. F. (1983), “Defining consistent constraint regions in mixture experiments,” Technometrics, Vol. 25, pp. 97-101. <br />
<br/><br />
3. Snee, R. D. (1979), “Experimental designs for mixture systems with multiple component constraints,” Communications in Statistics, Theory and Methods, Bol. A8, pp. 303-326.</div>Miklos Szidarovszkyhttps://www.reliawiki.com/index.php?title=Parametric_Recurrent_Event_Data_Analysis&diff=57372Parametric Recurrent Event Data Analysis2015-04-02T17:19:59Z<p>Miklos Szidarovszky: /* The GRP Model */</p>
<hr />
<div><noinclude>{{Banner Weibull Articles}}<br />
''This article appears in the [[Recurrent_Event_Data_Analysis#Parametric_Recurrent_Event_Data_Analysis|Life Data Analysis Reference book]].''<br />
<br />
{{Navigation box}}<br />
</noinclude>Weibull++'s parametric RDA&nbsp;folio is a tool for modeling recurrent event data. It can capture the trend, estimate the rate and predict the total number of recurrences. The failure and repair data of a repairable system can be treated as one type of recurrence data. Past and current repairs may affect the future failure process. For most recurrent events, time (distance, cycles, etc.) is a key factor. With time, the recurrence rate may remain constant, increase or decrease. For other recurrent events, not only the time, but also the number of events can affect the recurrence process (e.g., the debugging process in software development). <br />
<br />
<br />
The parametric analysis approach utilizes the General Renewal Process (GRP) model, as discussed in Mettas and Zhao [[Appendix:_Life_Data_Analysis_References|[28]]]. In this model, the repair time is assumed to be negligible so that the processes can be viewed as point processes. This model provides a way to describe the rate of occurrence of events over time, such as in the case of data obtained from a repairable system. This model is particularly useful in modeling the failure behavior of a specific system and understanding the effects of the repairs on the age of that system. For example, consider a system that is repaired after a failure, where the repair does not bring the system to an ''as-good-as-new'' or an ''as-bad-as-old'' condition. In other words, the system is partially rejuvenated after the repair. Traditionally, in as-bad-as-old repairs, also known as ''minimal repairs'', the failure data from such a system would have been modeled using a homogeneous or non-homogeneous Poisson process (NHPP). On rare occasions, a Weibull distribution has been used as well in cases where the system is almost as-good-as-new after the repair, also known as a ''perfect renewal process'' (PRP). However, for the intermediate states after the repair, there has not been a commercially available model, even though many models have been proposed in literature. In Weibull++, the GRP model provides the capability to model systems with partial renewal (''general repair'' or ''imperfect repair/maintenance'') and allows for a variety of predictions such as reliability, expected failures, etc. <br />
<br />
== The GRP Model ==<br />
In this model, the concept of virtual age is introduced. Let&nbsp;<math>{{t}_{1}},{{t}_{2}},\cdots ,{{t}_{n}}\,\!</math> represent the&nbsp;successive failure times and let <math>{{x}_{1}},{{x}_{2}},\cdots ,{{x}_{n}}\,\!</math> represent the time between failures ( <math>{{t}_{i}}=\sum_{j=1}^{i}{{x}_{j}})\,\!</math>. Assume that after each event, actions are taken to improve the system performance. Let <math>q\,\!</math> be the action effectiveness factor. There are two GRP models: <br />
<br />
Type I: <br />
<br />
::<math>\begin{align}<br />
v_{i}=v_{i-1}+qx_{i}=qt_{i}<br />
\end{align}\,\!</math><br />
<br />
<br />
Type II: <br />
<br />
::<math>{{v}_{i}}=q({{v}_{i-1}}+{{x}_{i}})={{q}^{i}}{{x}_{1}}+{{q}^{i-1}}{{x}_{2}}+\cdots +{{q}{x}_{i}}\,\!</math><br />
<br />
where <math>{{v}_{i}}\,\!</math> is the virtual age of the system right after <math>i\,\!</math>th repair. The Type I model assumes that the <math>i\,\!</math>th repair cannot remove the damage incurred before the <math>(i-1)\,\!</math> th repair. It can only reduce the additional age <math>{{x}_{i}}\,\!</math> to <math>{{qx}_{i}}\,\!</math>. The Type II model assumes that at the <math>i\,\!</math>th repair, the virtual age has been accumulated to <math>v_{i-1} + {{x}_{i}}\,\!</math>. The <math>i\,\!</math>th repair will remove the cumulative damage from both current and previous failures by reducing the virtual age to <math>q(v_{i-1} + x_{i})\,\!</math>. <br />
<br />
The power law function is used to model the rate of recurrence, which is: <br />
<br />
::<math>\begin{align}<br />
\lambda(t)=\lambda \beta t^{\beta -1} <br />
\end{align}\,\!</math><br />
<br />
<br />
The conditional ''pdf'' is: <br />
<br />
::<math>f({{t}_{i}}|{{t}_{i-1}})=\lambda \beta {{({{x}_{i}}+{{v}_{i-1}})}^{\beta -1}}{{e}^{-\lambda \left[ {{\left( {{x}_{i}}+{{v}_{i-1}} \right)}^{\beta }}-v_{i-1}^{\beta } \right]}}\,\!</math><br />
<br />
MLE method is used to estimate the model parameters. The log likelihood function is discussed in Mettas and Zhao [[Appendix:_Life_Data_Analysis_References|[28]]]: <br />
<br />
::<math>\begin{align}<br />
& \ln (L)= n(\ln \lambda +\ln \beta )-\lambda \left[ {{\left( T-{{t}_{n}}+{{v}_{n}} \right)}^{\beta }}-v_{n}^{\beta } \right] \\ <br />
& -\lambda \underset{i=1}{\overset{n}{\mathop \sum }}\,\left[ {{\left( {{x}_{i}}+{{v}_{i-1}} \right)}^{\beta }}-v_{i}^{\beta } \right]+(\beta -1)\underset{i=1}{\overset{n}{\mathop \sum }}\,\ln ({{x}_{i}}+{{v}_{i-1}}) <br />
\end{align}\,\!</math><br />
<br />
where <math>n\,\!</math> is the total number of events during the entire observation period. <math>T\,\!</math> is the stop time of the observation. <math>T = t_{n}\,\!</math> if the observation stops right after the last event.<br />
<br />
== Confidence Bounds ==<br />
In general, in order to obtain the virtual age, the exact occurrence time of each event (failure) should be available (see equations for Type I and Type II models). However, the times are unknown until the corresponding events occur. For this reason, there are no closed-form expressions for total failure number and failure intensity, which are functions of failure times and virtual age. Therefore, in Weibull++, a Monte Carlo simulation is used to predict values of virtual time, failure number, MTBF and failure rate. The approximate confidence bounds obtained from simulation are provided. The uncertainty of model parameters is also considered in the bounds. <br />
<br />
=== Bounds on Cumulative Failure (Event) Numbers ===<br />
The variance of the cumulative failure number <math>N(t)\,\!</math> is: <br />
<br />
::<math>Var[N(t)]=Var\left[ E(N(t)|\lambda ,\beta ,q) \right]+E\left[ Var(N(t)|\lambda ,\beta ,q) \right]\,\!</math><br />
<br />
The first term accounts for the uncertainty of the parameter estimation. The second term considers the uncertainty caused by the renewal process even when model parameters are fixed. However, unless <math>q = 1\,\!</math> , <math>Var\left[ E(N(t)|\lambda ,\beta ,q) \right]\,\!</math> cannot be calculated because <math>E(N(t))\,\!</math> cannot be expressed as a closed-form function of <math>\lambda,\beta\,\,</math>, and <math>q\,\!</math>. In order to consider the uncertainty of the parameter estimation, <math>Var\left[ E(N(t)|\lambda ,\beta ,q) \right]\,\!</math> is approximated by: <br />
<br />
::<math>Var\left[ E(N(t)|\lambda ,\beta ,q) \right]=Var[E(N({{v}_{t}})|\lambda ,\beta )]=Var[\lambda v_{t}^{\beta }]\,\!</math><br />
<br />
where <math>v_{t}\,\!</math> is the expected virtual age at time <math>t\,\!</math> and <math>Var[\lambda v_{t}^{\beta }]\,\!</math> is: <br />
<br />
::<math>\begin{align}<br />
& Var[\lambda v_{t}^{\beta }]= & {{\left( \frac{\partial (\lambda v_{t}^{\beta })}{\partial \beta } \right)}^{2}}Var(\hat{\beta })+{{\left( \frac{\partial (\lambda v_{t}^{\beta })}{\partial \lambda } \right)}^{2}}Var(\hat{\lambda }) \\ <br />
& +2\frac{\partial (\lambda v_{t}^{\beta })}{\partial \beta }\frac{\partial (\lambda v_{t}^{\beta })}{\partial \lambda }Cov(\hat{\beta },\hat{\lambda }) <br />
\end{align}\,\!</math><br />
<br />
By conducting this approximation, the uncertainty of <math>\lambda\,\!</math> and <math>\beta\,\!</math> are considered. The value of <math>v_{t}\,\!</math> and the value of the second term in the equation for the variance of number of failures are obtained through the Monte Carlo simulation using parameters <math>\hat{\lambda },\hat{\beta },\hat{q},\,\!</math> which are the ML estimators. The same simulation is used to estimate the cumulative number of failures <math>\hat{N}(t)=E(N(t)|\hat{\lambda },\hat{\beta },\hat{q})\,\!</math>. <br />
<br />
Once the variance and the expected value of <math>N(t)\,\!</math> have been obtained, the bounds can be calculated by assuming that&nbsp;<math>N(t)\,\!</math> is lognormally distributed as: <br />
<br />
::<math>\frac{\ln N(t)-\ln \hat{N}(t)}{\sqrt{Var(\ln N(t))}}\tilde{\ }N(0,1)\,\!</math><br />
<br />
The upper and lower bounds for a given confidence level <math>\alpha\,\!</math> can be calculated by: <br />
<br />
::<math>N{{(t)}_{U,L}}=\hat{N}(t){{e}^{\pm {{z}_{a}}\sqrt{Var(N(t))}/\hat{N}(t)}}\,\!</math><br />
<br />
where <math>z_{a}\,\!</math> is the standard normal distribution. <br />
<br />
If <math>N(t)\,\!</math> is assumed to be normally distributed, the bounds can be calculated by: <br />
<br />
::<math>N{{(t)}_{U}}=\hat{N}(t)+{{z}_{a}}\sqrt{Var(N(t))}\,\!</math><br />
<br />
::<math>N{{(t)}_{L}}=\hat{N}(t)-{{z}_{a}}\sqrt{Var(N(t))}\,\!</math><br />
<br />
In Weibull++, the <math>N(t)_{U}\,\!</math> is the smaller of the upper bounds obtained from lognormal and normal distribution appoximation. The <math>N(t)_{L}\,\!</math> is set to the largest of the lower bounds obtained from lognormal and normal distribution appoximation. This combined method can prevent the out-of-range values of bounds for some small <math>t\,\!</math> values.<br />
<br />
=== Bounds of Cumulative Failure Intensity and MTBF ===<br />
For a given time <math>t\,\!</math> , the expected value of cumulative MTBF <math>m_{c}(t)\,\!</math> and cumulative failure intensity <math>\lambda_{c}(t)\,\!</math> can be calculated using the following equations: <br />
<br />
::<math>{{\hat{\lambda }}_{c}}(t)=\frac{\hat{N}(t)}{t};{{\hat{m}}_{c}}(t)=\frac{t}{\hat{N}(t)}\,\!</math><br />
<br />
The bounds can be easily obtained from the corresponding bounds of <math>N(t)\,\!</math>.<br />
<br />
::<math>\begin{align}<br />
& {{{\hat{\lambda }}}_{c}}{{(t)}_{L}}= & \frac{\hat{N}{{(t)}_{L}}}{t};\text{ }{{{\hat{\lambda }}}_{c}}{{(t)}_{L}}=\frac{\hat{N}{{(t)}_{L}}}{t};\text{ } \\ <br />
& {{{\hat{m}}}_{c}}{{(t)}_{L}}= & \frac{t}{\hat{N}{{(t)}_{U}}};\text{ }{{{\hat{m}}}_{c}}{{(t)}_{U}}=\frac{t}{\hat{N}{{(t)}_{L}}} <br />
\end{align}\,\!</math><br />
<br />
=== Bounds on Instantaneous Failure Intensity and MTBF ===<br />
The instantaneous failure intensity is given by: <br />
<br />
::<math>{{\lambda }_{i}}(t)=\lambda \beta v_{t}^{\beta -1}\,\!</math><br />
<br />
where <math>v_{t}\,\!</math> is the virtual age at time <math>t\,\!</math>. When <math>q\ne 1,\,\!</math> it is obtained from simulation. When <math>q = 1\,\!</math>, <math>v_{t} = t\,\!</math> from model Type I and Type II. <br />
<br />
The variance of instantaneous failure intensity can be calculated by: <br />
<br />
::<math>\begin{align}<br />
& Var({{\lambda }_{i}}(t))= {{\left( \frac{\partial {{\lambda }_{i}}(t)}{\partial \beta } \right)}^{2}}Var(\hat{\beta })+{{\left( \frac{\partial {{\lambda }_{i}}(t)}{\partial \lambda } \right)}^{2}}Var(\hat{\lambda }) \\ <br />
& +2\frac{\partial {{\lambda }_{i}}(t)}{\partial \beta }\frac{\partial {{\lambda }_{i}}(t)}{\partial \lambda }Cov(\hat{\beta },\hat{\lambda })+{{\left( \frac{\partial {{\lambda }_{i}}(t)}{\partial v(t)} \right)}^{2}}Var({{{\hat{v}}}_{t}}) <br />
\end{align}\,\!</math><br />
<br />
The expected value and variance of <math>v_{t}\,\!</math> are obtained from the Monte Carlo simulation with parameters <math>\hat{\lambda },\hat{\beta },\hat{q}.\,\!</math> Because of the simulation accuracy and the convergence problem in calculation of <math>Var(\hat{\beta }),Var(\hat{\lambda })\,\!</math> and <math>Cov(\hat{\beta },\hat{\lambda }),\,\!</math> <math>Var(\lambda_{i}(t))\,\!</math> can be a negative value at some time points. When this case happens, the bounds of instantaneous failure intensity are not provided. <br />
<br />
Once the variance and the expected value of <math>\lambda_{i}(t)\,\!</math> are obtained, the bounds can be calculated by assuming that &nbsp;<math>\lambda_{i}(t)\,\!</math> is lognormally distributed as: <br />
<br />
::<math>\frac{\ln {{\lambda }_{i}}(t)-\ln {{{\hat{\lambda }}}_{i}}(t)}{\sqrt{Var(\ln {{\lambda }_{i}}(t))}}\tilde{\ }N(0,1)\,\!</math><br />
<br />
The upper and lower bounds for a given confidence level <math>\alpha\,\!</math> can be calculated by: <br />
<br />
::<math>{{\lambda }_{i}}(t)={{\hat{\lambda }}_{i}}(t){{e}^{\pm {{z}_{a}}\sqrt{Var({{\lambda }_{i}}(t))}/{{{\hat{\lambda }}}_{i}}(t)}}\,\!</math><br />
<br />
where <math>z_{a}\,\!</math> is the standard normal distribution. <br />
<br />
If <math>\lambda_{i}(t)\,\!</math> is assumed to be normally distributed, the bounds can be calculated by: <br />
<br />
::<math>{{\lambda }_{i}}{{(t)}_{U}}={{\hat{\lambda }}_{i}}(t)+{{z}_{a}}\sqrt{Var(N(t))}\,\!</math><br />
<br />
::<math>{{\lambda }_{i}}{{(t)}_{L}}={{\hat{\lambda }}_{i}}(t)-{{z}_{a}}\sqrt{Var(N(t))}\,\!</math><br />
<br />
In Weibull++, <math>\lambda_{i}(t)_{U}\,\!</math> is set to the smaller of the two upper bounds obtained from the above lognormal and normal distribution appoximation. <math>\lambda_{i}(t)_{L}\,\!</math> is set to the largest of the two lower bounds obtained from the above lognormal and normal distribution appoximation. This combination method can prevent the out of range values of bounds when <math>t\,\!</math> values are small. <br />
<br />
For a given time <math>t\,\!</math>, the expected value of cumulative MTBF <math>m_{i}(t)\,\!</math> is: <br />
<br />
::<math>{{\hat{m}}_{i}}(t)=\frac{1}{{{{\hat{\lambda }}}_{i}}(t)}\text{ }\,\!</math><br />
<br />
The upper and lower bounds can be easily obtained from the corresponding bounds of <math>\lambda_{i}(t)\,\!</math>: <br />
<br />
::<math>{{\hat{m}}_{i}}{{(t)}_{U}}=\frac{1}{{{{\hat{\lambda }}}_{i}}{{(t)}_{L}}}\,\!</math><br />
<br />
<br />
::<math>{{\hat{m}}_{i}}{{(t)}_{L}}=\frac{1}{{{{\hat{\lambda }}}_{i}}{{(t)}_{U}}}\,\!</math><br />
<br />
=== Bounds on Conditional Reliability ===<br />
Given mission start time <math>t_{0}\,\!</math> and mission time <math>T\,\!</math>, the conditional reliability can be calculated by: <br />
<br />
::<math>R(T|{{t}_{0}})=\frac{R(T+{{v}_{0}})}{R({{v}_{0}})}={{e}^{-\lambda [{{({{v}_{0}}+T)}^{\beta }}-{{v}_{0}}]}}\,\!</math><br />
<br />
<math>v_{0}\,\!</math> is the virtual age corresponding to time <math>t_{0}\,\!</math>. The expected value and the variance of <math>v_{0}\,\!</math> are obtained from Monte Carlo simulation. The variance of the conditional reliability <math>R(T|t_{0})\,\!</math> is: <br />
<br />
::<math>\begin{align}<br />
& Var(R)= {{\left( \frac{\partial R}{\partial \beta } \right)}^{2}}Var(\hat{\beta })+{{\left( \frac{\partial R}{\partial \lambda } \right)}^{2}}Var(\hat{\lambda }) \\ <br />
& +2\frac{\partial R}{\partial \beta }\frac{\partial R}{\partial \lambda }Cov(\hat{\beta },\hat{\lambda })+{{\left( \frac{\partial R}{\partial {{v}_{0}}} \right)}^{2}}Var({{{\hat{v}}}_{0}}) <br />
\end{align}\,\!</math><br />
<br />
Because of the simulation accuracy and the convergence problem in calculation of <math>Var(\hat{\beta }),Var(\hat{\lambda })\,\!</math> and <math>Cov(\hat{\beta },\hat{\lambda }),\,\!</math> <math>Var(R)\,\!</math> can be a negative value at some time points. When this case happens, the bounds are not provided. <br />
<br />
The bounds are based on: <br />
<br />
::<math>\log \text{it}(\hat{R}(T))\tilde{\ }N(0,1)\,\!</math><br />
<br />
::<math>\log \text{it}(\hat{R}(T))=\ln \left\{ \frac{\hat{R}(T)}{1-\hat{R}(T)} \right\}\,\!</math><br />
<br />
The confidence bounds on reliability are given by: <br />
<br />
::<math>R=\frac{{\hat{R}}}{\hat{R}+(1-\hat{R}){{e}^{\pm \sqrt{Var(R)}/[\hat{R}(1-\hat{R})]}}}\,\!</math><br />
<br />
It will be compared with the bounds obtained from: <br />
<br />
::<math>R=\hat{R}{{e}^{\pm {{z}_{a}}\sqrt{Var(R)}/\hat{R}}}\,\!</math><br />
<br />
The smaller of the two upper bounds will be the final upper bound and the larger of the two lower bounds will be the final lower bound.<br />
<br />
==Example: Air Condition Unit== <br />
<br />
{{:Example:_Parametric_RDA_-_Air_Condition_Unit}}</div>Miklos Szidarovszkyhttps://www.reliawiki.com/index.php?title=Parametric_Recurrent_Event_Data_Analysis&diff=57371Parametric Recurrent Event Data Analysis2015-04-02T17:05:14Z<p>Miklos Szidarovszky: /* The GRP Model */</p>
<hr />
<div><noinclude>{{Banner Weibull Articles}}<br />
''This article appears in the [[Recurrent_Event_Data_Analysis#Parametric_Recurrent_Event_Data_Analysis|Life Data Analysis Reference book]].''<br />
<br />
{{Navigation box}}<br />
</noinclude>Weibull++'s parametric RDA&nbsp;folio is a tool for modeling recurrent event data. It can capture the trend, estimate the rate and predict the total number of recurrences. The failure and repair data of a repairable system can be treated as one type of recurrence data. Past and current repairs may affect the future failure process. For most recurrent events, time (distance, cycles, etc.) is a key factor. With time, the recurrence rate may remain constant, increase or decrease. For other recurrent events, not only the time, but also the number of events can affect the recurrence process (e.g., the debugging process in software development). <br />
<br />
<br />
The parametric analysis approach utilizes the General Renewal Process (GRP) model, as discussed in Mettas and Zhao [[Appendix:_Life_Data_Analysis_References|[28]]]. In this model, the repair time is assumed to be negligible so that the processes can be viewed as point processes. This model provides a way to describe the rate of occurrence of events over time, such as in the case of data obtained from a repairable system. This model is particularly useful in modeling the failure behavior of a specific system and understanding the effects of the repairs on the age of that system. For example, consider a system that is repaired after a failure, where the repair does not bring the system to an ''as-good-as-new'' or an ''as-bad-as-old'' condition. In other words, the system is partially rejuvenated after the repair. Traditionally, in as-bad-as-old repairs, also known as ''minimal repairs'', the failure data from such a system would have been modeled using a homogeneous or non-homogeneous Poisson process (NHPP). On rare occasions, a Weibull distribution has been used as well in cases where the system is almost as-good-as-new after the repair, also known as a ''perfect renewal process'' (PRP). However, for the intermediate states after the repair, there has not been a commercially available model, even though many models have been proposed in literature. In Weibull++, the GRP model provides the capability to model systems with partial renewal (''general repair'' or ''imperfect repair/maintenance'') and allows for a variety of predictions such as reliability, expected failures, etc. <br />
<br />
== The GRP Model ==<br />
In this model, the concept of virtual age is introduced. Let&nbsp;<math>{{t}_{1}},{{t}_{2}},\cdots ,{{t}_{n}}\,\!</math> represent the&nbsp;successive failure times and let <math>{{x}_{1}},{{x}_{2}},\cdots ,{{x}_{n}}\,\!</math> represent the time between failures ( <math>{{t}_{i}}=\sum_{j=1}^{i}{{x}_{j}})\,\!</math>. Assume that after each event, actions are taken to improve the system performance. Let <math>q\,\!</math> be the action effectiveness factor. There are two GRP models: <br />
<br />
Type I: <br />
<br />
::<math>\begin{align}<br />
v_{i}=v_{i-1}+qx_{i}=qt_{i}<br />
\end{align}\,\!</math><br />
<br />
<br />
Type II: <br />
<br />
::<math>{{v}_{i}}=q({{v}_{i-1}}+{{x}_{i}})={{q}^{i}}{{x}_{1}}+{{q}^{i-1}}{{x}_{2}}+\cdots +{{q}{x}_{i}}\,\!</math><br />
<br />
where <math>{{v}_{i}}\,\!</math> is the virtual age of the system right after <math>i\,\!</math>th repair. The Type I model assumes that the <math>i\,\!</math>th repair cannot remove the damage incurred before the <math>{(}{i}-{1}{)}\,\!</math>th repair. It can only reduce the additional age <math>{{x}_{i}}\,\!</math> to <math>q{{x}_{i}}\,\!</math>. The Type II model assumes that at the <math>i\,\!</math>th repair, the virtual age has been accumulated to <math>v_{i-1} + {{x}_{i}}\,\!</math>. The <math>i\,\!</math>th repair will remove the cumulative damage from both current and previous failures by reducing the virtual age to <math>q(v_{i-1} + x_{i})\,\!</math>. <br />
<br />
The power law function is used to model the rate of recurrence, which is: <br />
<br />
::<math>\begin{align}<br />
\lambda(t)=\lambda \beta t^{\beta -1} <br />
\end{align}\,\!</math><br />
<br />
<br />
The conditional ''pdf'' is: <br />
<br />
::<math>f({{t}_{i}}|{{t}_{i-1}})=\lambda \beta {{({{x}_{i}}+{{v}_{i-1}})}^{\beta -1}}{{e}^{-\lambda \left[ {{\left( {{x}_{i}}+{{v}_{i-1}} \right)}^{\beta }}-v_{i-1}^{\beta } \right]}}\,\!</math><br />
<br />
MLE method is used to estimate the model parameters. The log likelihood function is discussed in Mettas and Zhao [[Appendix:_Life_Data_Analysis_References|[28]]]: <br />
<br />
::<math>\begin{align}<br />
& \ln (L)= n(\ln \lambda +\ln \beta )-\lambda \left[ {{\left( T-{{t}_{n}}+{{v}_{n}} \right)}^{\beta }}-v_{n}^{\beta } \right] \\ <br />
& -\lambda \underset{i=1}{\overset{n}{\mathop \sum }}\,\left[ {{\left( {{x}_{i}}+{{v}_{i-1}} \right)}^{\beta }}-v_{i}^{\beta } \right]+(\beta -1)\underset{i=1}{\overset{n}{\mathop \sum }}\,\ln ({{x}_{i}}+{{v}_{i-1}}) <br />
\end{align}\,\!</math><br />
<br />
where <math>n\,\!</math> is the total number of events during the entire observation period. <math>T\,\!</math> is the stop time of the observation. <math>T = t_{n}\,\!</math> if the observation stops right after the last event.<br />
<br />
== Confidence Bounds ==<br />
In general, in order to obtain the virtual age, the exact occurrence time of each event (failure) should be available (see equations for Type I and Type II models). However, the times are unknown until the corresponding events occur. For this reason, there are no closed-form expressions for total failure number and failure intensity, which are functions of failure times and virtual age. Therefore, in Weibull++, a Monte Carlo simulation is used to predict values of virtual time, failure number, MTBF and failure rate. The approximate confidence bounds obtained from simulation are provided. The uncertainty of model parameters is also considered in the bounds. <br />
<br />
=== Bounds on Cumulative Failure (Event) Numbers ===<br />
The variance of the cumulative failure number <math>N(t)\,\!</math> is: <br />
<br />
::<math>Var[N(t)]=Var\left[ E(N(t)|\lambda ,\beta ,q) \right]+E\left[ Var(N(t)|\lambda ,\beta ,q) \right]\,\!</math><br />
<br />
The first term accounts for the uncertainty of the parameter estimation. The second term considers the uncertainty caused by the renewal process even when model parameters are fixed. However, unless <math>q = 1\,\!</math> , <math>Var\left[ E(N(t)|\lambda ,\beta ,q) \right]\,\!</math> cannot be calculated because <math>E(N(t))\,\!</math> cannot be expressed as a closed-form function of <math>\lambda,\beta\,\,</math>, and <math>q\,\!</math>. In order to consider the uncertainty of the parameter estimation, <math>Var\left[ E(N(t)|\lambda ,\beta ,q) \right]\,\!</math> is approximated by: <br />
<br />
::<math>Var\left[ E(N(t)|\lambda ,\beta ,q) \right]=Var[E(N({{v}_{t}})|\lambda ,\beta )]=Var[\lambda v_{t}^{\beta }]\,\!</math><br />
<br />
where <math>v_{t}\,\!</math> is the expected virtual age at time <math>t\,\!</math> and <math>Var[\lambda v_{t}^{\beta }]\,\!</math> is: <br />
<br />
::<math>\begin{align}<br />
& Var[\lambda v_{t}^{\beta }]= & {{\left( \frac{\partial (\lambda v_{t}^{\beta })}{\partial \beta } \right)}^{2}}Var(\hat{\beta })+{{\left( \frac{\partial (\lambda v_{t}^{\beta })}{\partial \lambda } \right)}^{2}}Var(\hat{\lambda }) \\ <br />
& +2\frac{\partial (\lambda v_{t}^{\beta })}{\partial \beta }\frac{\partial (\lambda v_{t}^{\beta })}{\partial \lambda }Cov(\hat{\beta },\hat{\lambda }) <br />
\end{align}\,\!</math><br />
<br />
By conducting this approximation, the uncertainty of <math>\lambda\,\!</math> and <math>\beta\,\!</math> are considered. The value of <math>v_{t}\,\!</math> and the value of the second term in the equation for the variance of number of failures are obtained through the Monte Carlo simulation using parameters <math>\hat{\lambda },\hat{\beta },\hat{q},\,\!</math> which are the ML estimators. The same simulation is used to estimate the cumulative number of failures <math>\hat{N}(t)=E(N(t)|\hat{\lambda },\hat{\beta },\hat{q})\,\!</math>. <br />
<br />
Once the variance and the expected value of <math>N(t)\,\!</math> have been obtained, the bounds can be calculated by assuming that&nbsp;<math>N(t)\,\!</math> is lognormally distributed as: <br />
<br />
::<math>\frac{\ln N(t)-\ln \hat{N}(t)}{\sqrt{Var(\ln N(t))}}\tilde{\ }N(0,1)\,\!</math><br />
<br />
The upper and lower bounds for a given confidence level <math>\alpha\,\!</math> can be calculated by: <br />
<br />
::<math>N{{(t)}_{U,L}}=\hat{N}(t){{e}^{\pm {{z}_{a}}\sqrt{Var(N(t))}/\hat{N}(t)}}\,\!</math><br />
<br />
where <math>z_{a}\,\!</math> is the standard normal distribution. <br />
<br />
If <math>N(t)\,\!</math> is assumed to be normally distributed, the bounds can be calculated by: <br />
<br />
::<math>N{{(t)}_{U}}=\hat{N}(t)+{{z}_{a}}\sqrt{Var(N(t))}\,\!</math><br />
<br />
::<math>N{{(t)}_{L}}=\hat{N}(t)-{{z}_{a}}\sqrt{Var(N(t))}\,\!</math><br />
<br />
In Weibull++, the <math>N(t)_{U}\,\!</math> is the smaller of the upper bounds obtained from lognormal and normal distribution appoximation. The <math>N(t)_{L}\,\!</math> is set to the largest of the lower bounds obtained from lognormal and normal distribution appoximation. This combined method can prevent the out-of-range values of bounds for some small <math>t\,\!</math> values.<br />
<br />
=== Bounds of Cumulative Failure Intensity and MTBF ===<br />
For a given time <math>t\,\!</math> , the expected value of cumulative MTBF <math>m_{c}(t)\,\!</math> and cumulative failure intensity <math>\lambda_{c}(t)\,\!</math> can be calculated using the following equations: <br />
<br />
::<math>{{\hat{\lambda }}_{c}}(t)=\frac{\hat{N}(t)}{t};{{\hat{m}}_{c}}(t)=\frac{t}{\hat{N}(t)}\,\!</math><br />
<br />
The bounds can be easily obtained from the corresponding bounds of <math>N(t)\,\!</math>.<br />
<br />
::<math>\begin{align}<br />
& {{{\hat{\lambda }}}_{c}}{{(t)}_{L}}= & \frac{\hat{N}{{(t)}_{L}}}{t};\text{ }{{{\hat{\lambda }}}_{c}}{{(t)}_{L}}=\frac{\hat{N}{{(t)}_{L}}}{t};\text{ } \\ <br />
& {{{\hat{m}}}_{c}}{{(t)}_{L}}= & \frac{t}{\hat{N}{{(t)}_{U}}};\text{ }{{{\hat{m}}}_{c}}{{(t)}_{U}}=\frac{t}{\hat{N}{{(t)}_{L}}} <br />
\end{align}\,\!</math><br />
<br />
=== Bounds on Instantaneous Failure Intensity and MTBF ===<br />
The instantaneous failure intensity is given by: <br />
<br />
::<math>{{\lambda }_{i}}(t)=\lambda \beta v_{t}^{\beta -1}\,\!</math><br />
<br />
where <math>v_{t}\,\!</math> is the virtual age at time <math>t\,\!</math>. When <math>q\ne 1,\,\!</math> it is obtained from simulation. When <math>q = 1\,\!</math>, <math>v_{t} = t\,\!</math> from model Type I and Type II. <br />
<br />
The variance of instantaneous failure intensity can be calculated by: <br />
<br />
::<math>\begin{align}<br />
& Var({{\lambda }_{i}}(t))= {{\left( \frac{\partial {{\lambda }_{i}}(t)}{\partial \beta } \right)}^{2}}Var(\hat{\beta })+{{\left( \frac{\partial {{\lambda }_{i}}(t)}{\partial \lambda } \right)}^{2}}Var(\hat{\lambda }) \\ <br />
& +2\frac{\partial {{\lambda }_{i}}(t)}{\partial \beta }\frac{\partial {{\lambda }_{i}}(t)}{\partial \lambda }Cov(\hat{\beta },\hat{\lambda })+{{\left( \frac{\partial {{\lambda }_{i}}(t)}{\partial v(t)} \right)}^{2}}Var({{{\hat{v}}}_{t}}) <br />
\end{align}\,\!</math><br />
<br />
The expected value and variance of <math>v_{t}\,\!</math> are obtained from the Monte Carlo simulation with parameters <math>\hat{\lambda },\hat{\beta },\hat{q}.\,\!</math> Because of the simulation accuracy and the convergence problem in calculation of <math>Var(\hat{\beta }),Var(\hat{\lambda })\,\!</math> and <math>Cov(\hat{\beta },\hat{\lambda }),\,\!</math> <math>Var(\lambda_{i}(t))\,\!</math> can be a negative value at some time points. When this case happens, the bounds of instantaneous failure intensity are not provided. <br />
<br />
Once the variance and the expected value of <math>\lambda_{i}(t)\,\!</math> are obtained, the bounds can be calculated by assuming that &nbsp;<math>\lambda_{i}(t)\,\!</math> is lognormally distributed as: <br />
<br />
::<math>\frac{\ln {{\lambda }_{i}}(t)-\ln {{{\hat{\lambda }}}_{i}}(t)}{\sqrt{Var(\ln {{\lambda }_{i}}(t))}}\tilde{\ }N(0,1)\,\!</math><br />
<br />
The upper and lower bounds for a given confidence level <math>\alpha\,\!</math> can be calculated by: <br />
<br />
::<math>{{\lambda }_{i}}(t)={{\hat{\lambda }}_{i}}(t){{e}^{\pm {{z}_{a}}\sqrt{Var({{\lambda }_{i}}(t))}/{{{\hat{\lambda }}}_{i}}(t)}}\,\!</math><br />
<br />
where <math>z_{a}\,\!</math> is the standard normal distribution. <br />
<br />
If <math>\lambda_{i}(t)\,\!</math> is assumed to be normally distributed, the bounds can be calculated by: <br />
<br />
::<math>{{\lambda }_{i}}{{(t)}_{U}}={{\hat{\lambda }}_{i}}(t)+{{z}_{a}}\sqrt{Var(N(t))}\,\!</math><br />
<br />
::<math>{{\lambda }_{i}}{{(t)}_{L}}={{\hat{\lambda }}_{i}}(t)-{{z}_{a}}\sqrt{Var(N(t))}\,\!</math><br />
<br />
In Weibull++, <math>\lambda_{i}(t)_{U}\,\!</math> is set to the smaller of the two upper bounds obtained from the above lognormal and normal distribution appoximation. <math>\lambda_{i}(t)_{L}\,\!</math> is set to the largest of the two lower bounds obtained from the above lognormal and normal distribution appoximation. This combination method can prevent the out of range values of bounds when <math>t\,\!</math> values are small. <br />
<br />
For a given time <math>t\,\!</math>, the expected value of cumulative MTBF <math>m_{i}(t)\,\!</math> is: <br />
<br />
::<math>{{\hat{m}}_{i}}(t)=\frac{1}{{{{\hat{\lambda }}}_{i}}(t)}\text{ }\,\!</math><br />
<br />
The upper and lower bounds can be easily obtained from the corresponding bounds of <math>\lambda_{i}(t)\,\!</math>: <br />
<br />
::<math>{{\hat{m}}_{i}}{{(t)}_{U}}=\frac{1}{{{{\hat{\lambda }}}_{i}}{{(t)}_{L}}}\,\!</math><br />
<br />
<br />
::<math>{{\hat{m}}_{i}}{{(t)}_{L}}=\frac{1}{{{{\hat{\lambda }}}_{i}}{{(t)}_{U}}}\,\!</math><br />
<br />
=== Bounds on Conditional Reliability ===<br />
Given mission start time <math>t_{0}\,\!</math> and mission time <math>T\,\!</math>, the conditional reliability can be calculated by: <br />
<br />
::<math>R(T|{{t}_{0}})=\frac{R(T+{{v}_{0}})}{R({{v}_{0}})}={{e}^{-\lambda [{{({{v}_{0}}+T)}^{\beta }}-{{v}_{0}}]}}\,\!</math><br />
<br />
<math>v_{0}\,\!</math> is the virtual age corresponding to time <math>t_{0}\,\!</math>. The expected value and the variance of <math>v_{0}\,\!</math> are obtained from Monte Carlo simulation. The variance of the conditional reliability <math>R(T|t_{0})\,\!</math> is: <br />
<br />
::<math>\begin{align}<br />
& Var(R)= {{\left( \frac{\partial R}{\partial \beta } \right)}^{2}}Var(\hat{\beta })+{{\left( \frac{\partial R}{\partial \lambda } \right)}^{2}}Var(\hat{\lambda }) \\ <br />
& +2\frac{\partial R}{\partial \beta }\frac{\partial R}{\partial \lambda }Cov(\hat{\beta },\hat{\lambda })+{{\left( \frac{\partial R}{\partial {{v}_{0}}} \right)}^{2}}Var({{{\hat{v}}}_{0}}) <br />
\end{align}\,\!</math><br />
<br />
Because of the simulation accuracy and the convergence problem in calculation of <math>Var(\hat{\beta }),Var(\hat{\lambda })\,\!</math> and <math>Cov(\hat{\beta },\hat{\lambda }),\,\!</math> <math>Var(R)\,\!</math> can be a negative value at some time points. When this case happens, the bounds are not provided. <br />
<br />
The bounds are based on: <br />
<br />
::<math>\log \text{it}(\hat{R}(T))\tilde{\ }N(0,1)\,\!</math><br />
<br />
::<math>\log \text{it}(\hat{R}(T))=\ln \left\{ \frac{\hat{R}(T)}{1-\hat{R}(T)} \right\}\,\!</math><br />
<br />
The confidence bounds on reliability are given by: <br />
<br />
::<math>R=\frac{{\hat{R}}}{\hat{R}+(1-\hat{R}){{e}^{\pm \sqrt{Var(R)}/[\hat{R}(1-\hat{R})]}}}\,\!</math><br />
<br />
It will be compared with the bounds obtained from: <br />
<br />
::<math>R=\hat{R}{{e}^{\pm {{z}_{a}}\sqrt{Var(R)}/\hat{R}}}\,\!</math><br />
<br />
The smaller of the two upper bounds will be the final upper bound and the larger of the two lower bounds will be the final lower bound.<br />
<br />
==Example: Air Condition Unit== <br />
<br />
{{:Example:_Parametric_RDA_-_Air_Condition_Unit}}</div>Miklos Szidarovszky