What is the normal distribution?
The normal distribution formula is based on two simple parameters—mean and standard deviation—to quantify the characteristics of a given data set.
The average value represents the “center” or average value of the entire data set, while the standard deviation represents the “distribution” or variation of data points around that average value.
Key points
- The normal distribution formula is based on two simple parameters—mean and standard deviation—to quantify the characteristics of a given data set.
- In order to promote a unified standard method, so as to facilitate calculation and apply to practical problems, the standard conversion into Z value is introduced, which forms part of the normal distribution table.
- The characteristics of the normal distribution include: the normal curve is symmetric about the mean; the average is in the middle, dividing the area in half; the total area under the curve is equal to 1 for mean=0 and stdev=1; and the distribution is completely determined by its mean and standard deviation To describe.
- The normal distribution table is used in securities trading to help identify upward or downward trends, support or resistance levels, and other technical indicators.
Example of a normal distribution
Consider the following 2 data sets:
- Data set 1 = {10, 10, 10, 10, 10, 10, 10, 10, 10, 10}
- Data set 2 = {6, 8, 10, 12, 14, 14, 12, 10, 8, 6}
For Dataset1, mean = 10 and standard deviation (stddev) = 0
For Dataset2, mean = 10, standard deviation (stddev) = 2.83
Let’s plot these values for DataSet1:
The same for DataSet2:
The red horizontal line in the image above represents the “mean” or average (10 in both cases) for each data set. The pink arrows in the second graph indicate the distribution or change of the data value and the average value. In the case of DataSet2, this is represented by a standard deviation value of 2.83. Since all values of DataSet1 are the same (each value is 10) and have not changed, the stddev value is zero, so no pink arrow applies.
The stddev value has some important and useful characteristics, which are very helpful for data analysis. For a normal distribution, the data values are symmetrically distributed on both sides of the mean. For any normally distributed data set, the horizontal axis is stddev and the vertical axis is the number of data values, resulting in the following figure.
The nature of the normal distribution
- The normal curve is symmetric about the mean;
- The average is in the middle, dividing the area into two halves;
- The total area under the curve is equal to 1 for mean=0 and stdev=1;
- The distribution is completely described by its mean and standard deviation
As can be seen from the above figure, stddev represents the following:
- 68.3% of data values are within 1 standard deviation of the mean (-1 to +1)
- 95.4% of data values are within 2 standard deviations of the mean (-2 to +2)
- 99.7% of data values are within 3 standard deviations of the mean (-3 to +3)
The area under the bell curve represents the desired probability of a given range when measured:
- Less than X: For example, the probability that the data value is less than 70
- Greater than X: For example, the probability that the data value is greater than 95
- Between X1 and X2: For example, the probability that the data value is between 65 and 85
Where X is the value of interest (example below).
It is not always convenient to plot and calculate the area, because different data sets will have different mean and standard deviation values. In order to promote a unified standard method, so as to facilitate calculation and apply to practical problems, the standard conversion into Z value is introduced, which forms part of the normal distribution table.
Z = (X – mean)/stddev, where X is a random variable.
Basically, this conversion forces the mean and standard deviation to be standardized to 0 and 1, respectively, which makes the standard-defined set of Z values (from the normal distribution table) available for easy calculation. A snapshot of the standard z-value table containing probability values is as follows:
z |
0.00 |
0.01 |
0.02 |
0.03 |
0.04 |
0.05 |
0.06 |
0.0 |
0.00000 |
0.00399 |
0.00798 |
0.01197 |
0.01595 |
0.01994 |
… |
0.1 |
0.0398 |
0.04380 |
0.04776 |
0.05172 |
0.05567 |
0.05966 |
… |
0.2 |
0.0793 |
0.08317 |
0.08706 |
0.09095 |
0.09483 |
0.09871 |
… |
0.3 |
0.11791 |
0.12172 |
0.12552 |
0.12930 |
0.13307 |
0.13683 |
… |
0.4 |
0.15542 |
0.15910 |
0.16276 |
0.16640 |
0.17003 |
0.17364 |
… |
0.5 |
0.19146 |
0.19497 |
0.19847 |
0.20194 |
0.20540 |
0.20884 |
… |
0.6 |
0.22575 |
0.22907 |
0.23237 |
0.23565 |
0.23891 |
0.24215 |
… |
0.7 |
0.25804 |
0.26115 |
0.26424 |
0.26730 |
0.27035 |
0.27337 |
… |
… |
… |
… |
… |
… |
… |
… |
… |
To find the probability associated with the z value 0.239865, first round it to 2 decimal places (ie 0.24). Then check the first 2 significant digits in the row (0.2) and the least significant digit in the column (0.04 remaining). This will result in a value of 0.09483.
The complete normal distribution table can be found here, with probability values (including negative values) with precision up to 5 decimal places.
Let’s look at some real life examples. The heights of individuals in large groups follow a normal distribution pattern. Suppose we have a group of 100 people whose height is recorded, and the average and standard deviation are calculated as 66 and 6 inches, respectively.
Here are some example questions that can be easily answered using the z-value table:
What is the probability that one person in the group is 70 inches or less tall?
The problem is to find the cumulative value of P(X<=70), that is, in the entire data set of 100, how many values are between 0 and 70.
Let’s first convert the X value of 70 to the equivalent Z value.
Z = (X – mean)/stddev = (70-66)/6 = 4/6 = 0.66667 = 0.67 (rounded to two decimal places)
We now need to find P (Z <= 0.67) = 0.24857 (from the z table above)
That is, the probability that someone in this group is less than or equal to 70 inches is 24.857%.
But stick to it-the above is incomplete. Remember, we are looking for probabilities of all possible heights up to 70, that is, from 0 to 70. The above is just to provide you with the part from the average to the desired value (ie 66 to 70). We need to include the other half-from 0 to 66-to get the correct answer.
Since 0 to 66 represents half (that is, an extreme value to an intermediate average value), its probability is only 0.5.
Therefore, the correct probability that a person is not more than 70 inches tall = 0.24857 + 0.5 = 0.74857 = 74.857%
Graphically (by calculating the area), these are the two addition regions that represent the solution:
What is the probability that a person is 75 inches or higher?
That is, find the complementary cumulative P (X>=75).
Z = (X – average)/stddev = (75-66)/6 = 9/6 = 1.5
P (Z >=1.5) = 1- P (Z <= 1.5) = 1 – (0.5+0.43319) = 0.06681 = 6.681%
What is the probability that a person is between 52 inches and 67 inches?
Find P(52<=X<=67).
P(52<=X<=67) = P [(52-66)/6 <= Z <= (67-66)/6] = P(-2.33 <= Z <= 0.17)
= P(Z <= 0.17) --P(Z <= -0.233) = (0.5+0.56749)-(.40905) =
This normal distribution table (and z-values) can usually be used to calculate any probability of expected price changes in stock and index stock markets. They are used for range-based trading, to identify upward or downward trends, support or resistance levels, and other technical indicators based on the concept of normal distribution of mean and standard deviation.
.