Summary Software reliability modeling has, surprisingly to many, been around since the early 1970s with the pioneering works of Jelinski and Moranda, Shooman, and Coutinho. The theory behind software reliability is presented, and some of the major models that have appeared in the literature from both historical and applications perspectives are described. Emerging techniques for software reliability research field are also included. The following four key components in software reliability theory and modeling: historical background, theory, modeling, and emerging techniques are addressed.

These items are discussed in a general way, rather than attempting to discuss a long list of details. Software reliability modeling has, surprisingly to many, been around since the early 1970s with the pioneering works of Jelinski and Moranda (1972), Shooman (1972, 1973, 1976, 1977), and Coutinho (1973). We present the theory behind software reliability, and describe some of the major models that have appeared in the literature from both historical and applications perspectives. Emerging techniques for software reliability research field are also included.

We address the following four key components in software reliability theory and modeling: historical background, theory, modeling, and emerging techniques. These topics are introduced in brief below 1. Historical Background 1. 1. Basic Definitions Software reliability is centered on a very important software attribute: reliability. Software reliability is defined as the probability of failure-free software operation for a specified period of time in a specified environment (ANSI, 1991).

We notice the three major ingredients in the definition of software reliability: failure, time, and operational environment. We now define these terms and other related software reliability terminology. 1. 1. 1. Failures A failure occurs when the user perceives that a software program ceases to deliver the expected service. The user may choose to identify several severity levels of failures, such as catastrophic, major, and minor, depending on their impacts to the system service and the consequences that the loss of a particular service can cause, such as dollar cost, human life, and property damage.

The definitions of these severity levels vary from system to system. 1. 1. 2. Faults A fault is uncovered when either a failure of the program occurs, or an internal error (e. g. , an incorrect state) is detected within the program. The cause of the failure or the internal error is said to be a fault. It is also referred as a “bug. ” In most cases the fault can be identified and removed; in other cases it remains a hypothesis that cannot be adequately verified (e. g. , timing faults in distributed systems).

In summary, a software failure is an incorrect result with to the specification or unexpected software behavior perceived by the user at the boundary of the software system, while a software fault is the identified or hypothesized cause of the software failure. 1. 1. 3. Defects When the distinction between fault and failure is not critical, “defect” can be used as a generic term to refer to either a fault (cause) or a failure (effect). Chillarege and co-workers (1992) provided a complete and practical classification of software defects from various perspectives. 1. 1. 4. Errors

The term “error” has two different meanings: |1. |A discrepancy between a computed, observed, or measured value or condition and the true, specified, or theoretically correct| | |value or condition. Errors occur when some part of the computer software produces an undesired state. Examples include | | |exceptional conditions raised by the activation of existing software faults, and incorrect computer status due to an | | |unexpected external interference. This term is especially useful in fault-tolerant computing to describe an intermediate | | |stage in between faults and failures. |2. |A human action that results in software containing a fault. Examples include omission or misinterpretation of user | | |requirements in a software specification, and incorrect translation or omission of a requirement in the design | | |specification. However, this is not a preferred usage, and the term “mistake” is used instead to avoid the confusion. | 1. 1. 5. Time Reliability quantities are defined with respect to time, although it is possible to define them with respect to other bases such as program runs of number of transactions.

We discuss the notation of time in the next session. 1. 1. 6. Failure Functions When a time basis is determined, failures can be expressed in several ways: the cumulative failure function, the failure intensity function, the failure rate function, and the mean-time-to-failure function. The cumulative failure function (also called the mean-value function) denotes the expected cumulative failures associated with each point of time. The failure intensity function represents the rate of change of the cumulative failure function.

The failure rate function (or called the rate of occurrence of failures) is defined as the probability that a failure per unit time occurs in the interval [t , t + Dt], given that a failure has not occurred before t. The mean time to failure (MTTF) function represents the expected time that the next failure will be observed. (MTTF is also known as MTBF, mean time between failures. ) 1. 1. 7. Mean Time to Repair and Availability Another quantity related to time is mean time to repair (MTTR), which represents the expected time until a system will be repaired after a failure is observed.

When the MTTF and MTTR for a system are measured, its availability can be obtained. Availability is the probability that a system is available when needed. Typically, it is measured by |[pic] | 1. 1. 8. Operational Profile The operational profile of a system is defined as the set of operations that the software can execute along with the probability with which they will occur.

An operation is a group of runs that typically involve similar processing. Software Reliability Engineering in this encyclopedia provides a detailed description on the structure, development, illustration, and project application of the operational profile. In general, the number of possible software operations is quite large. When it is not practical to determine all the operations and their probabilities in complete detail, operations based on grouping or partitioning of input states (or system states) into domains are determined. 1. 2.

The Advantages of Using Execution Time Three kinds of time are relevant to software reliability: execution time, calendar time, and clock time. The execution time for a software system is the CPU time that is actually spent by the computer in executing the software, the calendar time is the time people normally experience in terms of years, months, weeks, days, etc. ; and the clock time is the elapsed time from start to end of computer execution in running the software. In measuring clock time, the periods during which the computer is shut down are not counted.

If computer utilization, the fraction of time the processor is executing a program, is constant, clock time will be proportional to execution time. In 1972 Shooman published a study where the behavior of a key parameter in his model was related to how the project personnel profile varied over time. Several different mathematical forms were proposed for the project personnel profile, and the choice of the best one depended on the particular project. Similarly, Schneidewind (1972) approached software reliability modeling from an empirical viewpoint.

He recommended the investigation of different reliability functions and selection of the one that best fit the particular project in question. He found that the best distribution varied from project to project. Up to 1975, the time domain used in software reliability modeling was exclusively calendar time. The lack of modeling universality resulting from using calendar time caused Musa (1975) to question the suitability of this time domain. He postulated that execution time was the best practical measure for characterizing failure-inducing stress being placed on a software program.

Calendar time, used by Shooman and Schneidewind, did not account for varying usage of a program in a straightforward way. It turned out that the removal of this confounding factor greatly simplified modeling and yielded better model prediction results. There is substantial evidence showing the superiority of execution time over calendar time for software reliability growth models (Trachtenberg, 1985; Musa and Okumoto, 1984a; Hecht, 1981). Nevertheless, many published models continue to use calendar time or do not explicitly specify the type of time being used.

Some models, originally developed as calendar-time models or without regard to the type of time, are now being interpreted as execution-time models. If execution time is not readily available, approximations such as clock time, weighted clock time, staff working time, or units that are natural to the application, such as transactions or test cases executed, may be used. Musa et al. , 1987 developed a modeling component that converts modeling results between execution time and calendar time. Huang et al. (2001) proposed the incorporation of testing effort into the modeling process, which can be measured by the human power, the number of test cases, or the execution-time information. 1. 3. Two Data Types in Time-Domain Modeling Software reliability models generally fall into two categories depending on the domain they operate in. By far the largest and most popular category of models is based on time. Their central feature is that reliability measures, such as failure intensity, are derived as a function of time.

The second category of software reliability models provides a contrasting approach by using operational inputs as their central feature. These models measure reliability as the ratio of successful runs to total number of runs. However, this approach has some problems, including the fact that many systems have runs of widely varying lengths (so that the proportion may give inaccurate estimates) and that the resulting measures are incompatible with the time-based measures used for hardware.

The time-domain modeling approach employs either the observed number of failures discovered per time period or the observed time between failures of the software. The models therefore fall into two basic classes depending on the type of data the model uses: (1) failures per time period or (2) time between failures. These classes are, however, not mutually disjoint. There are models that can handle either data type. In fact many of the models for one data type can still be applied even if the user has data of the other type.

For instance, if the user has only time between failures data and wants to apply a particular model that uses only failures-per time period, the user needs only to set a specified time length (e. g. , a week) and then superimpose the failure history over these time intervals and observe how many fell within each interval. Conversely, if one had only failures-per-time period data, one could randomly assign the fault occurrences within each period and then observe the time-between-failures data created when the periods are linked.

Either of these procedures from “transforming” from one data type into another requires that the user test the applied model to determine the adequacy of the resulting fit. These classes can themselves be considered part of the larger “time domain” approach to software reliability modeling in contrast to the “error seeding and tagging” approach and the “data domain” approach. Space limitations do not allow us to address these other important approaches as well as time-series models in this chapter. The reader is referred to Farr (1983) nd Xie (1991), among others, where these alternative approaches are described. For the failures-per-time-period data, the unit of time could represent a day, week, etc. , over which the software is being tested or observed. For the class based upon time between failures, we have recorded either the elapsed calendar time or execution time between each software failure. Typical failures-per-time period data are shown in Table 1, and typical time-between-failures data are shown in Table 2. Table 1. Failures-per-Time-Period Data [pic] | |Time (hours) | |Failures per Time Period | |Cummulative Failures | | | |[pic] | | | | 8 | |4 | | 4 | | | |16 | |4 | | 8 | | | |24 | |3 | |11 | | | |32 | |5 | |16 | | |40 | |3 | |19 | | | |48 | |2 | |21 | | | |56 | |1 | |22 | | | |64 | |1 | |23 | | | |72 | |1 | |24 | | | |[pic] | Table 2. Time-between-Failures Data [pic] | |Failure Number | |Failure Interval (hours) | |Failure Times (hours) | | | |[pic] | | | | 1 | |0. 5 | |0. | | | | 2 | |1. 2 | |1. 7 | | | | 3 | |2. 8 | |4. 5 | | | | 4 | |2. | |7. 2 | | | | 5 | |2. 8 | |10. 0 | | | | 6 | |3. 0 | |13. | | | | 7 | |1. 8 | |14. 8 | | | | 8 | |0. 9 | |15. 7 | | | | 9 | |1. | |17. 1 | | | |10 | |3. 5 | |20. 6 | | | |11 | |3. 4 | |24. | | | |12 | |1. 2 | |25. 2 | | | |13 | |0. 9 | |26. 1 | | | |14 | |1. | |27. 8 | | | |15 | |1. 4 | |29. 2 | | | |16 | |2. 7 | |31. | | | |17 | |3. 2 | |35. 1 | | | |18 | |2. 5 | |37. 6 | | | |19 | |2. | |39. 6 | | | |20 | |4. 5 | |44. 1 | | | |21 | |3. 5 | |47. | | | |22 | |5. 2 | |52. 8 | | | |23 | |7. 2 | |60. | | | |24 | |10. 7 | |70. 7 | | | |[pic] | 1. 4. Software Reliability Modeling and Measurement Software reliability measurement includes two types of activity, reliability estimation and reliability prediction: | |Estimation. This activity determines current software reliability by applying statistical inference techniques to failure | | |data obtained during system test or during system operation.

This is a measure regarding the achieved reliability from the past| | |until the current point. Its main purpose is to assess the current reliability, and determine whether a reliability model is a | | |good fit in retrospect. | | |Prediction. This activity determines future software reliability based on available software metrics and measures. Depending| | |on the software development stage, prediction involves different techniques: | | | | | |When failure data are available (e. g. software is in system test or operation stage), the estimation techniques can be used to| | |parameterize and verify software reliability models, which can perform future reliability prediction. | | | | | | | | |When failure data are not available (e. g. , software is in the design or coding stage), the metrics obtained from the software | | |development process and the characteristics of the resulting product can be used to determine reliability of the software upon | | |testing or delivery. | | | The first definition is also referred to as “reliability prediction” and the second definition, as “early prediction. ” When there is no ambiguity in the text, only the word “prediction” will be used. Most current software reliability models fall in the estimation category to do reliability prediction. A software reliability model specifies the general form of the dependence of the failure process on the principal factors that affect it: fault introduction, fault removal, and the operational environment. Figure 1 shows the basic ideas of software reliability modeling. | | [pic] |Figure 1. Basic ideas on software reliability modeling. [Full View] | If the reliability improves over time, as faults are discovered and corrected, one would expect that the number of failures detected per unit of time would be decreasing and the time between failures would be increasing. It is this behavior that the software reliability models attempt to capture. In Figure 1, the failure rate of a software system is generally decreasing due to the discovery and removal of software faults. At any particular time (say, the point marked “present time”), it is possible to observe a history of the failure rate of the software.

Software reliability modeling forecasts the curve of the failure rate by statistical evidences. The purpose of this measure is twofold: (1) to predict the extra time needed to test the software to achieve a specified objective and (2) to predict the expected reliability of the software when the testing is finished. If an adequate fit to the past data is achieved and the testing or operation of the software doesn't radically change in the future, the fitted reliability curves can be used to make various reliability predictions. 1. 5. Relationship between Hardware and Software Reliability Hardware reliability is a generally understood and accepted concept with a long history.

Early during the much shorter history of software reliability it became apparent to researchers that a division (often perceived to be large) exists between hardware reliability and software reliability. Software reliability is similar to hardware reliability in that both are stochastic processes and can be described by probability distributions. However, software reliability is different from hardware reliability in the sense that software does not wear out, burn out, or deteriorate; i. e. , its reliability does not decrease with time. Moreover, software generally enjoys reliability growth during testing and operation since software faults can be detected and removed when software failures occur.

On the other hand, software may experience reliability decrease because of abrupt changes of its operational usage or incorrect modifications to the software. Software is also continuously modified throughout its life cycle. The malleability of software makes it inevitable for us to consider variable failure rates. At first these differences raised the question of whether reliability theory can be applied to software at all. It was discovered that the distinction between hardware and software is somewhat artificial. Both may be defined in the same way, so that hardware and software component reliabilities can be combined to get system reliability.

Traditionally, hardware reliability focused on physical phenomena because failures resulting from these factors are much more likely to occur than design-related failures. It was possible to keep hardware design failures low because hardware was generally less complex logically than software. Besides, hardware design failures had to be kept low because of the large expense involved in retrofitting of manufactured items in the field. However, when hardware tests show that reliability is not within specified design limits because of problems or faults in the original design, a sequence of engineering changes may be necessary to improve reliability.

Thus hardware reliability can and has been modeled like software reliability when the failures are the result of design faults, such as the highly visible Pentium floating-point division fault that resulted in massive callbacks in 1994 (Wolfe, 1994). Perhaps the first hardware reliability model that can also be used as a model of reliability for software was developed in 1956 by Northrop Aircraft (Weiss, 1956). This model considers complex systems where engineering changes are made to improve system reliability. It was used to determine the level of system reliability, how rapidly reliability was improving, and the expected reliability at the end of the projected development program.

Two other early hardware reliability models along similar lines consider the problem of estimating reliability of a system undergoing development testing and changes to correct design deficiencies (Corcoran, et al. , 1964; Barlow and Scheuer, 1966). It is now also generally accepted that the software failure process is random. This randomness is introduced in many ways. The location of design faults within the software is random because the overall system design is extremely complex. The programmers who introduce the design faults are human, and human failure behavior is so complex that it can best be modeled using a random process. Also, the occurrence of failures is dependent on the operational profile, which is defined by input states. It is usually not nown which input state will occur next, and sometimes an input state may occur unexpectedly. These events make it impossible to predict where a fault is located or when it will be evoked to cause a failure in a large software system. In an attempt to unify hardware and software for an overall system reliability, software reliability theory has generally been developed in a way that is compatible with hardware reliability theory, so that system reliability figures may be computed using standard hardware combinatorial techniques (Shooman, 1990; Lloyd and Lipow, 1984). However, unlike hardware faults that are mostly physical faults, software faults are design faults that are harder to visualize, classify, detect, and correct.

As a result, software reliability is a much more difficult measure to obtain and analyze than hardware reliability. Usually hardware reliability theory relies on the analysis of stationary processes, because only physical faults are considered. However, with the increase of systems complexity and the introduction of design faults in software, reliability theory based on stationary process becomes unsuitable to address nonstationary phenomena such as reliability growth or reliability decrease experienced in software. This makes software reliability a challenging problem that requires an employment of several methods to attack. We now describe the theory behind software reliability. 2. Theory 2. 1. Reliability Theory

Since the development of software reliability models was based on concepts adapted from hardware reliability theory, we need to define some reliability functions and concepts that show the relationships among these different functions. Because the failure time of a software system is a random variable, which we'll denote as T, this variable has an associated probability density function, fT(t), and cumulative distribution function, FT(t), where |[pic] | |(1) | The reliability function, RT(t), which is the probability the software has not failed by time t, is then calculated as [pic] | |(2) | A third function that is important in reliability modeling is the hazard rate function, ZT(t). It is the conditional probability that the software will fail in an interval (t , t + Dt), given that it has not failed before time t. If T is the time at which the failure occurs, then |[pic] | Dividing both sides by Dt, expressing the probability in its conditional form, and letting Dt approach zero, we have [pic] | |(3) | |[pic] | |(4) | From Equation (3) we see that the hazard rate function is simply the conditional probability density function for the failure of the system given no failures have occurred up to time t. From Equation (4) and the fact that RT(t) = 1 – FT(t), we have the following: |[pic] | or [pic] | or |[pic] | thus |[pic] | |(5) | The reliability function, in turn, can be related to the mean time to failure, (MTTF), of the software; |[pic] | |(6) | All of these relationships hold for the corresponding conditional functions as well.

One simply replaces the hazard, reliability, cumulative distribution, or probability density function for a “single” fault, T, by the associated conditional functions. For example, suppose the system has failed at time ti, then the conditional hazard rate function is denoted as ZT(t|ti), where t ? ti and |[pic] | The last important functions that we will consider are the failure intensity function and the mean value function for the cumulative number of failures. We'll denote the failure intensity function as l(t). This is the instantaneous rate of change of the expected number of failures with respect to time.

Suppose we let M(t) be the random process denoting the cumulative number of failures by time t and we denote m(t) as its mean value function: |[pic] | |(7) | The failure intensity function is then obtained from m(t) as its derivative: |[pic] | |(8) | In order to have reliability growth, we should have (d l(t)/dt) ; 0 for all t ? t0 for some t0. The failure rate function may also exhibit a zigzag type behavior but it must still be decreasing to achieve reliability growth. 2. 2. Random Point Processes

The reliability of software is influenced or determined mainly by three factors: fault introduction, fault removal, and the operational profile. Fault introduction depends primarily on the characteristics of the developed code (code written or modified for the program) and the development process. The code characteristic with the greatest effect is size. Development process characteristics include the software engineering technologies and tools employed and the average level of experience of programmers. Note that code is developed when adding features or removing faults. Fault removal is affected by time, the operational profile, and the quality of the repair activity.

Since most of these factors are probabilistic in nature and operate over time, the behavior of software failures with execution time is usually modeled using some kind of random point process. The upper portion of Figure 2 shows an example of such a process where each failure epoch is represented by an X. A counting process giving the number of failures experienced by execution time t is associated with every point process. Figure 2 also shows a typical counting process, denoted by M(t), having a unit jump at each failure epoch. If we let M(t) be the random number of failures that are experienced by time t with mean value function m(t), i. e. m(t) We remark that the failure counting process being considered here is strictly nondecreasing. Furthermore, if limt ; ? m(t) ; ? i. e. , is finite, we have a finite failure model category; otherwise we have a model of the infinite failures category. We will now relate some important properties of the Poisson and binomial processes. These processes play a key role in the unification and classification of many published models, and there are some central theoretical results that come into consideration. We will make use of these properties for specific classes of models in subsequent sections. | | |[pic] |Figure 2.

A random point process and its associated counting process. [Full View] | First for the Poisson type models, we consider that we have a Poisson process over time. By this we mean that if let t0 = 0 , t1 , ? , ti – 1 , ti , ? , tn = t be a partition of our time interval 0 to t and let m(t) be as defined as above, then we have a Poisson process if each [pic]the number of failures detected in the ith interval, ti – 1 to ti), are independent Poisson random variables with means, [pic]. We make the following assumptions: |1. |The probability of failure (failure intensity) depends on time t and the number of past failures M(t). |2. |Failures do not occur simultaneously. | |3. |Failures do not occur at preassigned times. | |4. |There is no finite interval in [t , ? ] where failures occur with certainty. | |5. |There is no failure in the beginning of the process. | Then for each of the random variables t ? i values , i = 1 , ? , n, the probability mass function is: |[pic] | [Note: If m(t) is a linear function of time, i. e. , m(t) = at for some constant, a ; 0, we say that the Poisson process, M(t), is a homogeneous Poisson process (HPP).

If, however, it is nonlinear, we refer to the process as being a nonhomogeneous Poisson process (NHPP). ] If we have a Poisson process model, we can show a relationship between the failure intensity function and the reliability function (hence the hazard rate and the probability density function using the relationships established in the previous section). Suppose that we denote R( t + Dt|t) as the conditional reliability function that the software will still operate after t + D given that it has not failed after time t. Then |[pic] |

The relationship between the failure intensity function and the hazard rate for a Poisson process can also be derived. It can be shown that |[pic] | |(9) | where ti – 1 is the time of the (i – 1)st failure and Dt is any point such that ti – 1ti + Dt ; ti. This shows that the conditional hazard rate and the failure intensity function are the same if the failure intensity function is evaluated at the current time ti – 1 + Dt. Another relationship that one can establish for the Poisson type of models is that |[pic] | |(10) | here a is some constant and Fa(t) is the cumulative distribution function of the time to failure of an individual failure a. From this, if we consider also distributions that belong to the finite failure category, i. e. , lim t ; ? m(t) ; ? we have that lim t ; ? m(t) = a since lim t ; ? Fa(t) = 1. Thus a represents the eventual number of failures detected in the system if it could have been observed over an infinite amount of time. Using Equation (10) and the relationship between the mean value function and the failure intensity function, we have also for the Poisson type of model |[pic] | |(11) | here fa(t) is the probability density function of the time to failure of the individual failure a. For the binomial type of model, we have the following assumptions: |1. |There is a fixed number of faults (N) in the software at the beginning of the time in which the software is observed. | |2. |When a fault is detected, it is removed immediately. | |3. |If Ta is the random variable denoting the time to failure of fault a, then the Ta values, a = 1 , ? , n, are | | |independently and identically distributed random variables as Fa(t) for all remaining faults. | The cumulative distribution function, Fa(t), density function, fa(t), and hazard rate function, Za(t), are the same for all faults for this class.

Moreover, we notice for this class that a failure and a fault are synonymous and no new faults are introduced into the software in the fault detection/correction process. We can show for this class, the failure intensity function is obtained from the probability density function for a single fault as |[pic] | |(12) | The mean value function is, in turn, related to the cumulative distribution function, Fa(t), as |[pic] | |(13) | Notice the similarity between equations (12) and (13) and (10) and (11) for the Poisson type. For the binomial we have a fixed number of faults one-to-one correspondence to failures) at start, N, while for the Poisson type, a, is the eventual number of failures that could be discovered over an infinite amount of time. Kremer (1983) unified many software reliability models using a nonhomogeneous birth–death Markov process. He showed that many of the Poisson-type and binomial-type models discussed extensively in the literature are special cases of a Markov birth process with specific forms for l(t) and m(t). It is well worth discussing the preceding conditions for a Markov process here because this will help determine the plausibility of the entire modeling approach. Some of the affects of altering the first condition were already investigated when the origin of the Poisson-type models and binomial-type models was clarified.

If this condition is relaxed so that the failure intensity may depend on t, M(t), and the occurrence times for all failures before t, a self-exciting random point process is obtained. In this type of process, since M(t) is a random variable, the failure intensity itself is a random process. Other types of point process can be obtained if an “outside” process affects the failure intensity. These processes are beyond the scope of this discussion. A general self-exciting point process can be conceived of as a modified nonhomogeneous Poisson process in which the future evolution of failures not only is a function of time but also can be influenced by all past occurrences of failures. Of course, when the evolution depends only on time and the total number of past failures, a Markov birth process results.

Note that the assumptions made for a Poisson process in modeling software failures are generally well accepted by researchers in the field, but they can also be easily relaxed, especially for a nonhomogeneous Poisson process. For example, the condition requiring that the process start out with no failures can be easily changed to one that assumes that the process starts out with a known or random number of failures simply by treating this as a separate term in the model formulations. As another example, the condition that no failures occur simultaneously can be relaxed, leading to what is called a compound Poisson process (Sahinoglu, 1992). Many other powerful and seful generalizations are possible and can be found in Snyder (1975). Reliability estimates for nonhomogeneous Poisson process models come directly from Equation (3) and by noting that the event M(T) ; i is equivalent to the event Tint, where T is a random variable for the ith failure time. In addition, unknown model parameters are usually determined using the maximum likelihood principle, least squares, or Bayesian techniques. Again, specific derivations are omitted; however, Table 3 provides a summary of some important relationships for these models (Musa, et al. , 1987). These can be used for a particular model, that is, for a particular mean value function. Table 3.

Some Derived Relationships for a General Poisson Process |[pic] | |Quality | |Formulaa | | | |[pic] | | | |Failures experienced | |[pic] | | | | | |Expected Value = m(t) | | | | | |Variance = m(t) | | | |Failure time | |[pic] | | | |Reliability | |[pic] | | | |Conditional failure time |[pic] | | | |Failure intensity | |[pic] | | | |Unconditional failure time | |[pic] | | | |Maximum-likelihood equations | |[pic] | | | | | |[pic] | | | |[pic] | a Where [pic] 2. 3. Exponential Order Statistics

The exponential order statistics approach to modeling software reliability, studied by Downs (1985; 1986), Miller (1986), and Ross (1985a,1985b) is essentially equivalent to the preceding approach except that it provides a more intuitive feeling for the actual failure process. Figure 3 shows the modeling environment. A piece of software initially containing an unknown number of faults is subjected to testing where input states A, B, and C, are chosen at random according to some operational profile. Most input states result in successful execution (correct program results). Some input states exercise a collection of instructions containing a fault and cause a failure to occur. Still others exercise a collection of instructions containing a fault but do not cause a failure to occur because the data state or specific conditions are not right.

For example, input state C in Figure 3 does not encounter a fault and causes no failure, whereas input states A and B both encounter fault a with only input state A causing a failure. Therefore, the only input states of consequence as far as fault a causing a failure are the ones like input state A in the example. The collection of these input states is called fault a's fail set. | | |[pic] |Figure 3. Software reliability modeling environment. [Full View] | Two factors determine the failure-causing potential of a fault. They are the size or number of input states in the fault's fail set and the frequency with which these states are chosen for execution.

Clearly, if the operational profile were to change, so also would the importance or contribution of the input states in the fail set to the failure-causing potential of the fault. Let the failure intensity for fault a denoted by la; then the failure intensity for the program depends on whether faults are being removed. Assume for the moment that they are not being removed. This is typical of a program in production use. The program failure intensity l is determined by summing the contributions of the individual faults: |[pic] | |(14) | Equation (14) implicitly assumes that the failures from different faults are independent. This assumption may raise several issues.

The first issue that may come up is the matter of input state dependencies; for example, a specific input state may always be executed after a given input state. This can be incorporated into the model, but the increase in accuracy, if any, is probably outweighed by the added complexity. The second issue is that a fault may prevent access to code containing one or more faults. This issue is less common during system testing or operational use than it is during unit testing and is considered to be a secondary effect and is therefore ignored. Finally, another issue concerns multiple failures resulting from an execution of a given input state. This is usually considered to be secondary r not worth the effort to explicitly model. In most cases, failures are independent because they are the result of two processes: the introduction of faults and the activation of faults through the selection of input states. Both of these processes are random, and hence the chance that one failure would affect another is small. The independence conclusion is supported by a study of correlations on failure data from 15 projects (Musa, 1979) that found no significant correlation. Now suppose that faults are being repaired. The program is exercised for a period of time t, and each time a failure occurs, the software is debugged and the fault responsible for the failure is removed.

Let Ia(t) be a function that takes on a value of 1 if fault a has not caused a failure by t and 0 otherwise. As before, the program failure intensity at time t, denoted by L(t) since this time it is a random variable, is determined by summing the contributions of the individual faults still remaining in the program. Thus |[pic] | |(15) | Note that In contributes to the sum only if fault a has not been removed yet. Implicit assumptions being made here are that failures from different faults are independent and that a fault causing an observed failure is immediately resolved. The latter assumption, however, need not be the case.

Subsequent occurrences of the same failure can be ignored in the analysis if necessary. An assumption concerning the failure process itself must be made. The usual assumption is that faults cause failures to occur in accordance with independent homogeneous Poisson processes with unknown failure intensities. A natural result of this is that failure times for a fault are exponentially distributed. Miller (1986) defines an exponential order statistic model based on a failure counting process and an associated failure occurrence-time process, which is characterized by the parameter set of failure intensities set of la , 1 ? a ? w0, with the only restrictions that la ; 0 and [pic].

Miller shows many possibilities exist for the form of la, values which can be treated in several ways. One such way is deterministically; that is, following a known pattern. Two examples include constant failure intensities (la = l0 for all a) and geometric failure intensities (la = a ba , 0 ; b ; 1 for all a). The failure intensities can also be treated as a finite collection of independent and identically distributed random variables drawn from some distribution; for example, the gamma distribution. Finally, the failure intensities can be treated as a realization of a random point process such as the nonhomogeneous Poisson process.

It is also mathematically possible to permit an infinite value for w0 and treat finite w0 as a special subcase. Doing this, it is possible to simultaneously deal with infinite failures and finite failures models and to unify a great many models. Miller (1986) gives a complete discussion on the types of models resulting from these and other patterns of failure intensities with some rather significant results. One result is the inability to distinguish between the different types of models for the la, values based on a single observation of the failure process. Another closely linked idea is that the mean value function is the primary characteristic of a model and the particular type of model is a secondary characteristic.

For example, a model based on deterministic In values and a model based on In values drawn from a probability distribution are close to each other if their mean value functions are close. An example of an application of this idea is in Musa and Okumoto (1984b) where the Littlewood and Verrall (1973) model was analyzed using a nonhomogeneous Poisson process with appropriate mean value function. Given the preceding assumption about the failure process, the probability of fault a not causing a failure by time t is e –l at. Therefore, the expected failure intensity for the program, l(t), is given by |[pic] | |(16) | This equation is sufficient to completely describe the overall failure process for the program.

The integral of this equation gives the mean value function m(t) or the expected number of failures at time t: |[pic] | |(17) | Equations (16) and (17) show how l(t) and m(t), respectively, depend on the failure intensity of the individual faults. Now we determine what Equation (17) reveals about the kinds of mean value functions likely to be good candidates for models. Let [pic]represent the average per-fault failure intensity of the inherent faults: |[pic] |

Also, let s2 be a measure of the variation in the inherent per-fault failure intensities: |[pic] | Then, expanding Equation (17) in a Taylor series about l keeping t fixed yields |[pic] | |(18) | where higher order terms are not explicitly shown. The significant point about Equation (18) is the first term, which represents an exponential mean value function characteristic of many popular and useful models (Musa, 1975; Goel and Okumoto, 1979).

The conclusion to be drawn is that the exponential mean value function is a first-order approximation of all possible mean value functions based on a finite number of initial faults. This explains, in part, why the exponential nonhomogeneous Poisson process model enjoys such success in many applications (Zinnel, 1990; Ehrlich et al. , 1991; Iannino and Musa, 1991). As can be seen, the exponential nonhomogeneous Poisson process model is exact if all per-fault failure intensities are the same. Note that this does not imply a uniform operational profile, as is sometimes claimed in the literature. It does imply that the joint effect of the operational profile and of the fail set size, the fault exposure, is the same for all faults.

Thus a fault with a large fail set size and small individual operation usage probabilities may be the same, in terms of failure intensity, as a fault with small fail set size and large individual operation usage probabilities. Relaxing the homogeneous Poisson process assumption underlying each fault's failure process can further strengthen the case for the exponential mean value function. It turns out that any per-fault failure process whose first time to failure is exponentially distributed will lead to the same conclusions, provided subsequent occurrences of the same failure are ignored. This is an important observation especially when dealing with failures that tend to cluster as described in Ehrlich and co-workers (1991).

In summary, the nonhomogeneous Poisson process is emerging as the most practical and useful choice for modeling software reliability. This is based on many empirical results. The process is fully specified by its mean value function, and an emerging choice here, both in theory and in practice, is the exponential mean value function. This function was shown to play a central role in modeling and to be quite robust from departures in its assumptions. 3. Modelling 3. 1. Model Classification A model classification scheme proposed in Musa and Okumoto (1983) allows relationships to be established for models within the same classification groups and shows where model development has occurred.

It classified models in terms of five different attributes: | |Time domain—calendar versus execution time. | | |Category—the total number of failures that can be experienced in infinite time. This is either finite or infinite, representing| | |two subgroups. | | |Type—The distribution of the number of the failures experienced by time t. Two important types that we will consider are the | | |Poisson and binomial. | | |Class—(finite failure category only) functional form of the failure intensity expressed in terms of time. | | |Family—(infinite failure category only) Functional form of the failure intensity function expressed in terms of the expected | | |number of failures experienced. |

In the “Random Point Processes” section we described the Markov processes that are characterized by the distribution of the number of failures over time, and the two most important distributions, the Poisson and binomial. Models based on the binomial distribution are finite failure models, that is, they postulate that a finite number of failures will be experienced in infinite time. Models based on the Poisson distribution can be either finite failure or infinite failure models, depending on how they are specified. Table 4 shows how some of the published models are classified using this approach. The Bayesian model of Littlewood and Verrall (1973) and the geometric deeutrophication model of Moranda (1975, 1979) are among the few published models that are not Markovian. Table 4. Markov Software Reliability Models [pic] | |Poisson Type | |Binomial Type | |Other Types | | | |[pic] | | | |Crow (1974) | |Jelinski and Moranda (1972) | |Shooman and Trivedi (1975) | | |Musa (1975) | |Shooman (1972) | |Kim and co-workers (1982) | | | |Moranda (1975, 1979) | |Schick and Wolverton (1973) | |Kremer (1983) | | | |Schneidewind (1975) | |Wagoner (1973) | |Laprie (1984) | | | |Goel and Okumoto (1979) | |Goel (1988a) | |S