Statistics in Quality
Once facts or data have been classified and summarized, they must be interpreted, presented, or communicated in an efficient manner to drive databased decisions. Statistical problemsolving methods are used to determine if processes are on target, if the total variability is small compared to specifications, and if the process is stable over time. Businesses can no longer afford to make decisions based on averages alone; process variations and their sources must be identified and eliminated.
Descriptive statistics are used to describe or summarize a specific collection of data (typically samples of data). Descriptive statistics encompass both numerical and graphical techniques, and are used to determine the:
 Central tendency of the data.
 Spread or dispersion of the data.
 Symmetry and skewness of the data.
Inferential statistics is the method of collecting samples of data and making inferences about population parameters from the sample data. Before reviewing basic statistics, the different types of data must be identified. The type of data that has been collected as process inputs (x’s) and/or outputs (y’s) will determine the type of statistics or analysis that can be performed.
Types of Data
Data is objective information that everyone can agree on. Measurability is important in collecting data. The three types of data are attribute data, variables data, and locational data. Of these three, attribute and variables data are more widely used.
Attribute data is discrete. This means that the data values can only be integers, for example, 3, 48, 1029. Counted data or attribute data are answers to questions like “how many”, “how often”, or “what kind.” Examples include:
 How many of the final products are defective?
 How many people are absent each day?
 How many days did it rain last month?
 What kind of performance was achieved?
Variables data is continuous. This means that the data values can be any real number, for example, 1.037, 4.69, 84.35. Measured data (variables data) are answers to questions like “how long,” “what volume,” “how much time” and “how far.” This data is generally measured with some instrument or device. Examples include:
 How long is each item?
 How long did it take to complete the task?
 What is the weight of the product?
Measured data is regarded as being better than counted data. It is more precise and contains more information. For example, one would certainly know much more about the climate of an area if they knew how much it rained each day rather than how many days trained. Collecting measured data is often difficult or expensive, so counted data must be used. In some situations, data will only occur as counted data. For example, a food producer may measure the performance of microwave popcorn by counting the number of unpopped kernels of corn in each bag tested. For information which can be obtained as either attribute or variables data, it is generally preferable to collect variables data.
A third type of data which does not fit into attribute data and variable data is known as locational data which simply answers the question “where.” Charts that utilize locational data are often called “measles charts” or concentration charts. Examples are a drawing showing locations of paint blemishes on an automobile or a map of Pune with sales and distribution offices indicated.
Another way to classify data is as discrete or continuous.
Continuous data:
 Has no boundaries between adjoining values.
 Includes most noncounting intervals and ratios(e.g., time).
Discrete data:
 Has clear boundaries.
 Includes nominals, counts, and rankorders, (e.g., Monday vs. Friday, an electrical circuit with or without a short).
Conversion of Attributes Data to Variables Measures
Some data may only have discrete values, such as this part is good or bad, or I like or dislike the quality of this product. Since variables data provides more information than does attribute data, for a given sample size, it is desirable to use variables data whenever possible. When collecting data, there are opportunities for some types of data to be either attributes or variables. Instead of a good or bad part, the data can be stated as to how far out of tolerance or within tolerance it is. The like or dislike of product quality can be converted to a scale of how much do I like or dislike it.
Referring back to Table above, two of the data examples could easily be presented as variables data: 10 scratches could be reported as total scratch length of 8.37 inches, and 25 paint runs as 3.2 sq. in. surface area of paint runs. Consideration of the cost of collecting variables versus attributes data should also be given when choosing the method. Typically, the measuring instruments are more costly for performing variables measurements and the cost to organize, analyze and store variables data is higher as well. A go/nogo ring gage can be used to quickly check outside diameter threads. To determine the actual pitch diameter is a slower and more costly process. Variables data requires storing of individual values and computations for the mean, standard deviation, and other estimates of the population. Attributes data requires minimal counts of each category and hence requires very little data storage space. For manual data collection, the required skill level of the technician is higher for variables data than for attribute data. Likewise, the cost of automated equipment for variables data is higher than for attributes data. The ultimate purpose for the data collection and the type of data are the most significant factors in the decision to collect attribute or variables data.
The table details the four Measurement scales in increasing order of statistical desirability.
Many of the interval measures may be useful for ratio data as well.
Examples of continuous data, discrete data, and measurement scales:
 Continuous data: A Wagon weighs 478.61 Kg
 Discrete data: Of a lot, 400 pieces failed
 Ordinal scale: Defects are categorized as critical, major A, major B, and minor
 Nominal scale: A printout of all shipping codes for last week’s orders
 Ratio scale: The individual weights of a sample of widgets
 Interval scale: The temperatures of steel rods (°F) after one hour of cooling
Ensuring Data Accuracy and Integrity .
Bad data is not only costly to capture, but corrupts the decisionmaking process. Some considerations include:
 Avoid emotional bias relative to targets or tolerances when counting, measuring, or recording digital or analog displays.
 Avoid unnecessary rounding. Rounding often reduces measurement
sensitivity. Averages should be calculated to at least one more decimal position than individual readings.  If data occurs in time sequence, record the order of its capture.
 If an item characteristic changes over time, record the measurement or classification as soon as possible after its manufacture, as well as after a stabilization period.
 To apply statistics which assume a normal population, determine whether the expected dispersion of data can be represented by at least 8 to 10 resolution increments. If not, the default statistic may be the count of observations which do or do not meet specification criteria.
 Screen or filter data to detect and remove data entry errors such as digital transposition and magnitude shifts due to a misplaced decimal point.
 Avoid removal by hunch. Use objective statistical tests to identify outliers.
 Each important classification identification should be recorded along with the data. This information can include: time, machine, auditor, operator, gage, lab, material, target, process change and conditions, etc.
It is important to select a sampling plan appropriate for the purpose of the use of the data. There are no standards as to which plan is to be used for data collection and analysis, therefore the analyst makes a decision based upon experience and the specific needs. There are many other sampling techniques that have been developed for specific needs.
Population vs. Sample
A population is every possible observation or census, but it is very rare to capture the entire population in data collection. Instead, samples, or subsets of populations as illustrated in the following figure, are captured. A statistic, by definition, is a number that describes a sample characteristic. Information from samples can be used to “infer” or approximate a population characteristic called a parameter.

Random Sampling
Sampling is often undertaken because of time and economic advantages. The use of a sampling plan requires randomness in sample selection. Obviously, true random sampling requires giving every part an equal chance of being selected for the sample. The sample must be representative of the lot and not just the product that is easy to obtain. Thus, the selection of samples requires some up front thought and planning. Often, emphasis is placed on the mechanics of sampling plan usage and not on sample identification and selection. Sampling without randomness ruins the effectiveness of any plan. The product to be sampled may take many forms: in a layer, on a conveyor, in sequential order, etc. The sampling sequence must be based on an independent random plan. The sample is determined by selecting an appropriate number from a hat or random number table.

Sequential Sampling
Sequential sampling plans are similar to multiple sampling plans except that sequential sampling can theoretically continue indefinitely. Usually, these plans are ended after the number inspected has exceeded three times the sample size of a corresponding single sampling plan. Sequential testing is used for costly or destructive testing with sample sizes of one and are based on a probability ratio test developed by Wald .

Stratified Sampling
One of the basic assumptions made in sampling is that the sample is randomly selected from a homogeneous lot. When sampling, the “lot” may not be homogeneous. For example, parts may have been produced on different lines, by different machines, or under different conditions. One product line may have well maintained equipment, while another product line may be older or have poorly % maintained equipment. The concept behind stratified sampling is to attempt to select random samples from each group or process that is different from other similar groups or processes. The resulting mix of samples thus drawn can be biased if the proportion of the samples does not reflect the relative frequency of the groups. To the person using the sample data, the implication is that they must first be aware of the possibility of stratified groups and second, phrase the data report such that the observations are relevant only to the sample drawn and may not necessarily reflect the overall system.
Data Collection Methods
Collecting information is expensive. To ensure that the collected data is relevant to the problem, some prior thought must be given to what is expected. Manual data collection requires a data form. Check sheets, tally sheets, and checklists are data collection methods which are widely used. Other data collection methods include automatic measurement and data coding.
Some data collection guidelines are:
 Formulate a clear statement of the problem
 Define precisely what is to be measured
 List all the important characteristics to be measured
 Carefully select the right measurement technique
 Construct an uncomplicated data form
 Decide who will collect the data
 Arrange for an appropriate sampling method
 Decide who will analyze and interpret the results
 Decide who will report the results
A Without an operational definition, most data is meaningless. Both attribute and variable specifications must defined. Data collection includes both manual and automatic methods. Data collected manually may be done using printed forms or by data entry at the time the measurements are taken. Manual systems are labor intensive and subject to human errors in measuring and recording the correct values. Automatic data collection includes electronic chart recorders and digital storage. The data collection frequency may be synchronous, based on a set time interval, or asynchronous, based on events. Automatic systems have higher initial costs than manual systems, and have the disadvantage of collecting both “good” and “erroneous” data. Advantages to using automatic data collection systems include high accuracy rates and the ability to operate unattended.
Automatic Measurement
Automatic sorting gages are widely used to sort parts by dimension. They are normally accurate within 0.0001″. When computers are used as part of an automated measurement process, there are several important issues. Most of these stem from the requirements of software quality engineering, but have important consequences in terms of ensuring that automated procedures get answers at least as “correct” as those that arise from manual measurements. Computer controlled measurement systems may offer distinct advantages over their human counterparts. (Examples include improved test quality, shorter inspection times, lower operating costs, automatic report generation, improved accuracy, and automatic calibration.) Automated measurement systems have the capacity and speed to be used in high volume operations. Automated systems have the disadvantages of higher initial costs, and a lack of mobility and flexibility compared to humans. Automated systems may require technical malfunction diagnostics. When used properly, they can be a powerful tool to aid in the improvement of product quality. Applications for automatic measurement and digital vision systems are quite extensive. The following incomplete list is intended to show examples:
 Error proofing a process
 Avoiding human boredom and errors
 Sorting acceptable from defective parts
 Detecting flaws, surface defects, or foreign material
 Creating CAD drawings from an object
 Building prototypes by duplicating a model
 Making dimensional measurements
 Performing high speed inspection of critical parameters
 Machining, using either laser or mechanical methods
 Marking and identifying parts
 Inspecting solder joints on circuit boards
 Verifying and inspecting packaging
 Providing optical character and bar code recognition
 Identifying missing components
 Controlling motion
 Assembling components
 Verifying color
Data Coding
The efficiency of data entry and analysis is frequently improved by data coding. Problems due to not coding include:
 Inspectors trying to squeeze too many digits into small blocks on a form
 Reduced throughput and increased errors by clerks at keyboards reading and entering large sequences of digits for a single observation
 insensitivity of analytic results due to rounding large sequences of digits
Coding by adding or subtracting a constant or by multiplying or dividing by a factor:
Let the subscript, lowercase c, represent a coded statistic; the absence of a subscript represents raw data; uppercase C indicates a constant; and lowercase f represents a factor. Then:
Coding by substitution:
Consider a dimensional inspection procedure in which the specification is nominal plus and minus 1.25″. The measurement resolution is 1/8 of an inch and inspectors, using a ruler, record plus and minus deviations from nominal. A typical recorded observation might be 323/8″ crammed in a space that was designed to accommodate three characters. The data can be coded as integers expressing the number of 1/8 inch increments deviating from nominal. The suggestion that check sheet blocks could be made larger could be countered by the objection that there would be fewer data points per page.
Coding by truncation of repetitive place values:
Measurements such as 0.55303, 0.55310, 0.55308, in which the digits 0.553 repeat in all observations, can be recorded as the last two digits expressed as integers. Depending on the objectives of the analysis, it may or may not be necessary to decode the measurements.
Probability
Most quality theories use statistics to make inferences about a population based on information contained in samples. The mechanism one uses to make these inferences is probability.
Conditions for Probability
The probability of any event, E, lies between 0 and 1. The sum of the probabilities of all possible events in a sample space, S, = 1.
Simple Events
An event that cannot be decomposed is a simple event, E. The set of all sample points for an experiment is called the sample space, S.
If an experiment is repeated a large number of times, N, and the event, E, is observed n_{E }times, the probability of E is approximately:
For eg the probability of observing 3 on the toss of a single die is:
What is the probability of getting 1, 2, 3, 4, 5, or 6 by throwing a die?
Use of Venn (Circle) Diagrams
A Venn diagram or set diagram is a diagram that shows all possible logical relations between a finite collection of different sets. Venn diagrams were conceived around 1880 by John Venn. They are used to teach elementary set theory, as well as illustrate simple set relationships in probability, logic, statistics, linguistics and computer science. On occasion, a circle diagram can help conceptualize the relationship between work elements in order to optimize work activities. Shown below is a hypothetical analysis of the work load for a shipping employee using a Venn (or circle) diagram.
A Venn (circle) diagram illustrates relationships between events. In this case, there is an overlap between packing and data entry, as well as packing and pulling stock. Making CDs is exclusive of other activities. If the sample space equals 1.0 or 100%, then one can determine both the busy time and idle time in an 8hour shift.
Busy time = Packing + Data entry + Pulling stock + Making CDs – Overlap
= 0.30 + 0.20 + 0.25 + 0.10 – 0.06 – 0.04
= 0.85 – 0.10
= 0.75
In an 8hour shift, there are 6.0 hours of activity. By the same logic, there are 2.0 idle hours. After deducting customary lunch and break times, one can consider whether additional duties can be assumed by this individual. Venn diagrams are normally used to explain probability theory. In the above diagram, making CDs and packing are mutually exclusive, but packing and pulling stock are not. The final calculation is reflected in ‘the additive law of probability
Compound Events
Compound events are formed by a composition of two or more events. They consist of more than one point in the sample space. For example, if two dice are tossed, what is the probability of getting an 8? A die and a coin are tossed. What is the probability of getting a 4 and tail? The two most important probability theorems are additive and multiplicative. For the following discussion, E_{A} = A and E_{B} = B.
l. Composition.
Consists of two possibilities a union or intersection.
A. Union of A and B
If A and B are two events in a sample space, S, the union of A and B (A U B) contains all sample points in event A, B, or both.
Example: In the die toss E_{1},E_{2},E_{3},E_{4},E_{5} and E_{6} is probability of getting 1, 2, 3, 4, 5, or 6, consider the following:
If A = E_{1}, E_{2} and E_{3} (numbers less than 4)
and B = E_{1}, E_{3} and E_{5} (odd numbers),
then A U B = E_{1}, E_{2}, E_{3} and E_{5}.
B. Intersection of A and B
If A and B are two events in a sample space, S, the intersection of A and B (A ∩ B) is composed of all sample points that are in both A and B.
From the above example A ∩ B = E_{1 } and E_{3}
ll. Event Relationships.
There are three relationships involved in finding the probability of an event: complementary, conditional, and mutually exclusive.
A. Complement of an Event
The complement of event A is all sample points in the sample space, S, but not in A. The complement of A is 1P_{A}.
For Example, If P_{A} (cloudy days) is 0.3, the complement of A would be 1 – P_{A} = 0.7 (clear).
B. Conditional Probabilities
The conditional probability of event A occurring, given that event B has occurred is:
For example If event A (rain) = 0.2 and event B (cloudiness) = 0.3, what is the probability of rain on a cloudy day? (Note, it will not rain without clouds.)
Two events A and B are said to be independent if either:
P(AB) = P(A) or P(BA) = P(B)
However, P(AB) = 0.67 and P(A) = 0.2= no equality, and
P(BA) = 1.00 and P(B) = 0.3 = no equality.
Therefore, the events are said to be dependent.
C. Mutually Exclusive Events
If event A contains no sample points in common with event B, then they are said to be mutually exclusive.
For Example Obtaining a 3 and a 2 on the toss of a single die is a mutually exclusive event. The probability of observing both events simultaneously is zero.
The probability of obtaining either a 3 or a 2 is:
PE_{2}+PE_{3}=1/6 + 1/6= 1/3

The Additive Law
 If the two events are not mutually exclusive:
P(A U B)=P(A)+P(B)P(AB)
Note that P (A U B) is shown in many texts as P (A + B) and is read as the probability of A or B.For Example, If one owns two cars and the probability of each car starting on a cold morning is 0.7, what is the probability of getting to work on his car?
P (A U B) = 0.7 + 0.7 – (0.7×0.7)
=1.4 – 0.49 =0.91 or 91%  If the two events are mutually exclusive, the law reduces to:
P (A U B) = P(A) + P(B) also P (A + B) = P(A) + P(B)For Example, If the probability of finding a black sock in a dark room is 0.4 and the probability of finding a blue sock is 0.3, what is the chance of finding a blue or black sock?P (A U B) = 0.4 + 0.3 = 0.7 or 70%
Note: The problem statements center around the word “or”
Will car A or B start?
Will one get a black or blue sock?
 If the two events are not mutually exclusive:

The Multiplicative Law
If events A and B are dependent, the probability of event A influences the probability of event B. This is known as conditional probability and the sample space is reduced.
 For any two events, A and B, such that P(B) ≠ 0:
P(AB) = P(A ∩ B)/ P(B) and P(A ∩ B) = P(AB)P(B)Note in some texts P (A ∩ B) is shown as P(A * B) and is read as the probability of A and B. P(BA) is read as the probability of B given that A has occurred.
For Example, If a shipment of 100 TV sets contains 30 defective units and two samples are obtained, what is probability of finding both defective? (Event A is the first sample and the sample space is reduced, and event B is the second sample.) ,
P(A ∩ B) = (30/100) X (29/99) = 0.088 or 8.8%
 if events A and B are independent:
P (A ∩ B) = P(A) X P(B)For Example, One relay in an electric circuit has a probability of working equal to 0.9. Another relay in series has a chance of 0.8. What’s the probability that the circuit will work?
P(A ∩ B)=0.9X0.8=0.72 or 72%
Note: The problem statements center around the word “and”
Will TV A and B work?
Will relay A and B operate?
 For any two events, A and B, such that P(B) ≠ 0:
Descriptive Statistics
Descriptive statistics include measures of central tendency, measures of dispersion, probability density function, frequency distributions, and cumulative distribution functions.
Measures of Central Tendency
Measures of central tendency represent different ways of characterizing the central value of a collection of data. Three of these measures will be addressed here: mean, mode, and median.
The Mean (X bar)
The mean is the total of all data values divided by the number of data points.
For Example The for the following 9 numbers, 5 3 7 9 8 5 4 5 8 is 6
The arithmetic mean is the most widely used measure of central tendency.
Advantages of using the mean:
 It is the center of gravity of the data
 It uses all data
 No sorting is needed
Disadvantages of using the mean:
 Extreme data values may distort the picture
 It can be timeconsuming
 The mean may not be the actual value of any data points ,
The Mode
The mode is the most frequently occurring number in a data set.
For Eg . the mode of the following data set: 5 3 7 9 8 5 4 5 8 is : 5
Note: It is possible for groups of data to have more than one mode.
Advantages of using the mode:
 No calculations or sorting are necessary
 It is not influenced by extreme values A
 It is an actual value
 it can be detected visually in distribution plots
Disadvantage of using the mode:
 The data may not have a mode, or may have more than one mode
The Median (Midpoint)
The median is the middle value when the data is arranged in ascending or descending order. For an even set of data, the median is the average of the middle two values.
Examples: Find the median of the following data set:
(10 Numbers) 2 2 2 3 4 6 7 7 8 9
(9 Numbers) 2 2 3 4 5 7 8 8 9
Answer: 5 for both examples
Advantages of using the median:
 Provides an idea of where most data is located
 Little calculation required
 lnsensitivity to extreme values
Disadvantages of using the median:
 The data must be sorted and arranged
 Extreme values may be important.
 Two medians cannot be averaged to obtain a combined distribution median
 The median will have more variation (between samples) than the average
Measures of Dispersion
Other than central tendency, the other important parameter to describe a set of data is spread or dispersion. Three main measures of dispersion will be reviewed: range, variance, and standard deviation.
Range (R)
The range of a set of data is the difference between the largest and smallest values.
Example Find the range of the following data set: 5 3 7 9 3 5 4 5 3
Answer: 9 – 3 = 6
Variance ( σ^{2},s^{2})
The variance, σ^{2} or s^{2}, is equal to the sum of the squared deviations from the mean, divided by the sample size. The formula for variance is:
The variance is equal to the standard deviation squared.
Standard Deviation (σ, s)
The standard deviation is the square root of the variance.
Alternatively
N is used for a population and n 1 for a sample to remove potential bias in relatively small samples (less than 30).
Example: Calculate the Standard Deviation of the following Data set using the formula
Coefficient of Variation (COV)
The coefﬁcient of variation equals the standard deviation divided by the mean and is expressed as a percentage. ,
Probability Density Function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function that describes the relative likelihood for this random variable to take on a given value. The probability of the random variable falling within a particular range of values is given by the integral of this variable’s density over that range—that is, it is given by the area under the density function but above the horizontal axis and between the lowest and greatest values of the range. The probability density function is nonnegative everywhere, and its integral over the entire space is equal to one.
Suppose a species of bacteria typically lives 4 to 6 hours. What is the probability that a bacterium lives exactly 5 hours? The answer is actually 0%. Lots of bacteria live for approximately 5 hours, but there is negligible chance that any given bacterium dies at exactly 5.0000000000.. hours. Instead we might ask: What is the probability that the bacterium dies between 5 hours and 5.01 hours? Let’s say the answer is 0.02 (i.e., 2%). Next: What is the probability that the bacterium dies between 5 hours and 5.001 hours? The answer is probably around 0.002, since this is 1/10th of the previous interval. The probability that the bacterium dies between 5 hours and 5.0001 hours is probably about 0.0002, and so on. The ratio (probability of dying during an interval) / (duration of the interval) is approximately constant, and equal to 2 per hour (or 2 /hour). For example, there is 0.02 probability of dying in the 0.01hour interval between 5 and 5.01 hours, and (0.02 probability / 0.01 hours) = 2 /hour. This quantity 2/ hour is called the probability density for dying at around 5 hours.
Therefore, in response to the question “What is the probability that the bacterium dies at 5 hours?”, a literally correct but unhelpful answer is “0”, but a better answer can be written as (2/ hour) dt. This is the probability that the bacterium dies within a small (infinitesimal) window of time around 5 hours, where dt is the duration of this window. For example, the probability that it lives longer than 5 hours, but shorter than (5 hours + 1 nanosecond), is (2/hour)×(1 nanosecond) ≃ 6×10−13 (using the unit conversion 3.6×1012 nanoseconds = 1 hour). There is a probability density function f with f(5 hours) = 2/ hour. The integral of f over any window of time (not only infinitesimal windows but also large windows) is the probability that the bacterium dies in that window.
The probability density function, f(x), describes the behavior of a random variable. Typically, the probability density function is viewed as the “shape” of the distribution. It is normally a grouped frequency distribution. Consider the histogram for the length of a product shown in Figure below.
A histogram is an approximation of the distribution’s shape. The histogram shown appears symmetrical. It shows this histogram with a smooth curve overlaying the data. The smooth curve is the statistical model that describes the population; in this case, the normal distribution.
When using statistics, the smooth curve represents the population. The differences between the sample data represented by the histogram and the population data represented by the smooth curve are assumed to be due to sampling error. In reality, the difference could also be caused by lack of randomness in the sample or an incorrect model. The probability density function is similar to the overlaid model. The area below the probability density function to the left of a given value, x, is equal to the probability of the random variable represented on the xaxis being less than the given value x. Since the probability density function represents the entire sample space, the area under the probability density function must equal one. Since negative probabilities are impossible, the probability density function, f(x), must be positive for all values of x.
Stating these two requirements mathematically for continuous distributions with f(x)≥ 0;
Figure below demonstrates how the probability density function is used to compute probabilities. The area of the shaded region represents the probability of a single product drawn randomly from the population having a length less than 185. This probability is 15.9% and can be determined by using the standard normal table.
Cumulative Distribution Function
The cumulative distribution function, F(x), denotes the area beneath the probability density function to the left of x.
The area of the shaded region of the probability density function is 0.2525 which corresponds to the cumulative distribution function at x = 190. Mathematically, the cumulative distribution function is equal to the integral of the probability density function to the left of x.For Example, A random variable has the probability density function f(x) = 0.125x, where x is valid from 0 to 4. The probability of x being less than or equal to 2 is:
Properties of a Normal Distribution
A normal distribution can be described by its mean and standard deviation. The standard normal distribution is a special case of the normal distribution and has a mean of zero and a standard deviation of one. The tails of the distribution extend to ± infinity. The area under the curve represents 100% of the possible observations. The curve is symmetrical such that each side of the mean has the same shape and contains 50% of the total area. Theoretically, about 95% of the population is contained within ± 2 standard deviations.
If a data set is normally distributed, then the standard deviation and mean can be used to determine the percentage (or probability) of observations within a selected range. Any normally distributed scale can be transformed to its equivalent Z scale or score using the formula: Z= (xμ)/σ
x will often represent a lower specification limit (LSL) or upper specification limit (USL). Z, the “sigma value,” is a measure of standard deviations from the mean. Any normal data distribution can be transformed to a standard normal curve using the Z transformation. The area under the curve is used to predict the probability of an event occurring.
Example: If the mean is 85 days and the standard deviation is five days, what would be the yield if the USL is 90 days?
A standard Z table is used to determine the area under the curve. The area under the curve represents probability.
Because the curve is symmetric, the area shown as yield would be 1P(z>1) = 0.841 or 84.1%.
In accordance with the equation, Z can be calculated for any “point of interest,” x.
Variation
The following figure shows three normal distributions with the same mean. What differs between the distributions is the variation.
The first distribution displays less variation or dispersion about the mean. The second distribution displays more variation and would have a greater standard deviation. The third distribution displays even more variation.
Shortterm vs. Longterm Variation
The duration over which data is collected will determine whether shortterm or longterm variation has been captured within the subgroup.
There are two types of variation in every process:
common cause variation and special cause variation. Common cause variation is completely random (i.e., the next data point’s specific value cannot be predicted). It is the natural variation of the process. Special cause variation is the nonrandom variation in the process. It is the result of an event, an action, or a series of events or actions. The nature and causes of special cause variation are different for every process. Shortterm data is data that is collected from the process in subgroups. Each subgroup is collected over a short length of time to capture common cause variation only (i.e., data is not collected across different shifts because variation can exist from operator to operator).
Thus, the subgroup consists of “like” things collected over a narrow time frame and is considered a “snapshot in time” of the process. For example, a process may use several raw material lots per shift. A representative shortterm sample may consist of CTQ measurements within one lot. Longterm data is considered to contain both special and common causes of variation that are typically observed when all of the input variables have varied over their full range. To continue with the same example, longterm data would consist of several raw material lots measured across several shortterm samples.
Processes tend to exhibit more variation in the long term than in the short term. Longterm variability is made up of shortterm variability and process drift. The shift from short term to long term can be quantified by taking both shortterm and longterm samples.
On average, shortterm process means tend to shift and drift by 1.5 sigmas.
Z_{lt} = Z_{st }– 1.5
(The shortterm Z (Z_{st}) is also known as the benchmark sigma value. A Six Sigma process would have six standard deviations between the mean and the closest specification limit for a shortterm capability study. The following figure illustrates the Zscore relationship to the Six Sigma philosophy:
In a Six Sigma process, customer satisfaction and business objectives are robust to shifts caused by process or product variation.
Drawing Valid Statistical Conclusions
The objective of statistical inference is to draw conclusions about population characteristics based on the information contained in a sample. Statistical inference in a practical situation contains two elements: (1) the inference and (2) a measure of its validity. The steps involved in statistical inference are:
 Define the problem objective precisely
 Decide if the problem will be evaluated by a onetail or twotail test
 Formulate a null hypothesis and an alternate hypothesis
 Select a test distribution and a critical value of the test statistic reflecting the degree of uncertainty that can be tolerated (the alpha, α, risk)
 Calculate a test statistic value from the sample information
 Make an inference about the population by comparing the calculated value to the critical value. This step determines if the null hypothesis is to be rejected. If the null is rejected, the alternate must be accepted.
 Communicate the findings to interested parties
Everyday, in our personal and professional lives, individuals are faced with decisions between choice A or choice B. In most situations, relevant information is available; but it may be presented in a form that is difficult to digest. Quite often, the data seems inconsistent or contradictory. In these situations, an intuitive decision may be little more than an outright guess. While most people feel their intuitive powers are quite good, the fact is that decisions made on gutfeeling are often wrong.
Null Hypothesis and Alternate Hypothesis
The null hypothesis is the hypothesis to be tested. The null hypothesis directly stems from the problem statement and is denoted as H_{o};
The alternate hypothesis must include all possibilities which are not included in the null hypothesis and is designated H_{1}.
Examples of null and alternate hypothesis: :
Null hypothesis : H_{o}: Y_{a} = Y_{b} H_{o}: A ≤ B
Alternate hypothesis: H_{o}: Y_{a} ≠ Y_{b} H_{o}: A > B
A null hypothesis can only be rejected, or fail to be rejected, it cannot be accepted because of a lack of evidence to reject it.
Add Test Statistic
In order to test a null hypothesis, a test calculation must be made from sample information. This calculated value is called a test statistic and is compared to an appropriate critical value. A decision can then be made to reject or not reject the null hypothesis.
Types of Errors
When formulating a conclusion regarding a population based on observations from a small sample, two types of errors are possible:
 Type I error: This error results when the null hypothesis is rejected when it is, in fact, true.
 Type II error: This error results when the null hypothesis is not rejected when it should be rejected.
The degree of risk (α) is normally chosen by the concerned parties (α is normally taken as 5%) in arriving at the critical value of the test statistic.
Enumerative (Descriptive) Studies
Enumerative data is data that can be counted. For example: the classification of things, the classification of people into intervals of income, age, health. A census is an enumerative collection and study. Useful tools for tests of hypothesis conducted on enumerative data are the chi square, binomial, and Poisson distributions. Deming, in 1975, defined a contrast between enumeration and analysis:
 Enumerative study: A study in which action will be taken on the universe.
 Analytical study: A study in which action will be taken on a process to improve performance in the future.
Numerical descriptive measures create a mental picture of a set of data. The measures calculated from a sample are called statistics. When these measures describe a population, they are called parameters.
The table shows examples of statistics and parameters for the mean and standard deviation. These two important measures are called central tendency and dispersion.
Summary of Analytical and Enumerative Studies
Analytical studies start with the hypothesis statement made about population parameters. A sample statistic is then used to test the hypothesis and either reject or fail to reject the null hypothesis. At a stated level of confidence, one is then able to make inferences about the population.
Your Donation can make a differenceWe have chosen to make our Resources freely and openly available on the web with the hope that it touches the life of thousands of readers who visits us daily. We hope our blog has helped in enhancing the knowledge of our readers and added value to organization and their implementers. We would request you to make donation large and small, so as to provide us the resources needed to distribute, collect, digitize as it is becoming extremely difficult for us to afford the full cost of updating and enriching our site content. Your contribution will ensure that we can keep our blog uptodate and add more of the rich resources — such as video — that make a difference for so many worldwide. 
Previous – What is Six Sigma?
Next – Common used Distribution in Quality
If you need assistance or have any doubt and need to ask any question contact me at: preteshbiswas@gmail.com or call at +971569882663. You can also contribute to this discussion and I shall be happy to publish them. Your comment and suggestion is also welcome.