Power in Hypothesis testing
In order to understand this blog you should know what is hypothesis testing and intuition behind it. If you are not
familiar with the topic, you can refer to the previous blog Understanding Hypothesis Testing
In this section we will try to understand following things:
1. Types of error in hypothesis testing
2. Power in hypothesis testing
3. Problem on power
1. Types of error in hypothesis testing
In this section we will be discussing types of errors in significance tests. There are 2 types
of errors we can see in significance tests. Those are:
1. Type - I error
2. Type - II error
Lets try to understand these with good example:
As we can recollect from hypothesis testing, We have 2 distributions: population distribution and
sample distribution. We define a threshold in population distribution. We call it significance level = 0.05 .
Below is the diagram for significance level (or area or probability)
Given threshold for population distribution,
we will compute the probability of seeing mean or any value more than the mean of sample distribution,
in the population distribution
P( x>=mean of sample distribution | population distribution)
We compare this probability with the threshold and we decide whether to accept or reject null hypothesis. If we take too small a region as our threshold there is a high chance that we might end up accepting the null hypothesis and vice versa if we take too big a region as our threshold there is a high chance that we might end up rejecting the null hypothesis.
So what is the ideal region ? Is it 0.05, 0.10 ?
Experiments show 0.05 is a good threshold. but because of this we see some error in our inferential
statistics. Those are Type I and Type II errors
Type I error: If the null hypothesis is true in real world and we end up rejecting the null hypothesis then it is called
Type I error
Type II error: If the null hypothesis is false in the real world and we end up accepting the null hypothesis then it is
called Type II error
So we can associate probability to these 2 statements:
2. Power in hypothesis testing
This is nothing but 1-β that is : Probability of rejecting null hypothesis given null hypothesis is false in
real world
Note: When you build a new version of the apple app and do significance testing we should reject null hypothesis right? Because
the new version is better than the old version. While we are rejecting Power of the test will tell us how confidently we can reject the
null hypothesis. This is because Power is nothing but the probability of rejecting the null hypothesis given null hypothesis is false.
Let's not dive too much into definitions. Let's go ahead and understand power more intuitively.
Definition: Power is Probability of rejecting null hypothesis given null hypothesis is false in real world
so we already know that null hypothesis is false,
i.e p-value is less than significant value.
i.e mean of sample distribution is very far from mean of the population distribution
i.e sample distribution is significantly different than population distribution
If we plot this on the graph:
3. Problem on hypothesis testing and power
A api has average response time of 500ms with standard deviation of 25ms. A developer had made
few changes and derived his conclusion on 100 samples that api response time has significantly
reduced to 495ms after his changes. Can you validate this and provide the confidence on validation:
Answer:
Let's define hypothesis
null hypothesis = There is no significant change
alt hypothesis = There is significant reduction in response time
population statistics: μ=500 σ=25
sample statistics: μm=495 n=100
significance level: α=0.05
Calculating z-score
z = (x – μ) / (σ / √n)
= ( 495-500) / ( 25/√100)
z-score= -2.0
Hence observed mean is 2.0 standard-error-of-mean below population mean
Calculating p-value:
p(x>= μm | Null hypothesis is true ) this is called as p-value
We want to calculate the area under normal distribution which is above 2.0 SD away from mean
p-value = Total area above mean - Total area till 3.6 SD above mean
= 0.50 - ztable(2.0)
= 0.50 - 0.4772
p-value = 0.0228
Hence p(x>= μm | Null hypothesis is true ) or p-value is 0.0228
Conclusion:
Since significance level or α > p-value we can reject null hypothesis
Hence there is a significant reduction in the mean of api response time
Now that we have established that null hypothesis is false. Let's find power of the test
i.e Probability of rejecting null hypothesis given null hypothesis is false in real world
First we need to find response time at significance level:
Let's assume response time at significance level is ( Rs )
step 1) Find the area from the population mean to significance level
area = 0.50-0.05
= 0.45
step 2) Let's find out how many SE this point Rs is from population mean.
Refer the z-table and see at what point area is 0.45
Hence,
The significance point is 1.65 SE away from population mean
step 3) Find the significance point or Response time at significance level
SE = σ / √n = 25/10 = 2.5
The significance point is 1.65 SE away from population mean
Hence,
The significance point is 1.65*2.5 response time away from population mean
i.e significant point = 500 - (1.65*2.5)
=495.9
Find the power: From the picture we can see that power = 0.50 + area from mean (495) to 495.9
z-score = ( 495.9-495 )/ ( 25/√100) = 0.36
power = 0.50 + z-table(0.36) = 0.50+0.1406 = 64.06%
References
https://www.youtube.com/watch?v=KS6KEWaoOOE
https://www.youtube.com/watch?v=5ABpqVSx33I
https://www.youtube.com/watch?v=-FtlH4svqx4
https://math.stackexchange.com/questions/1796478/sample-standard-deviation-given-population-standard-deviation
https://www.scribbr.com/methodology/population-vs-sample/#:~:text=A%20population%20is%20the%20entire,t%20always%20refer%20to%20people.
https://towardsdatascience.com/hypothesis-testing-z-scores-337fb06e26ab
https://onlinestatbook.com/2/sampling_distributions/samp_dist_mean.html
https://www.statisticshowto.com/probability-and-statistics/z-score/
https://www.analyticsvidhya.com/blog/2020/06/statistics-analytics-hypothesis-testing-z-test-t-test/
https://math.stackexchange.com/questions/504288/what-situation-calls-for-dividing-the-standard-deviation-by-sqrt-n
hhttps://www.cuemath.com/data/z-test/