Hypothesis Testing

Hypothesis

Hypothesis is a statement or supposition made with limited or no evidence. A theory is an explanation for a law or phenomenon which has been tested and is repeatable. One thing about a law is that it has been testing and its results are reproducible. For example, Sir Isaac Newton’s third law of motion states that “if body A exert a force on body B, then body B exerts equal but opposite force on body A”. This has been tested and its results are reproducible. For example, it explains why walking is possible, and why an aircraft moves in a direction opposite to that of the expelled gases. Hypothesis on the other hand is not yet verified to be true and reproducible.

A hypothesis that is true for a single test but fails for other test isn’t reproducible hence, we cannot conclude that a cause stated in the hypothesis is responsible for the effect. After all, if it isn’t reproducible, then one can argue that the correct result we got from a particular test may be just due to random chance and not because of the proposed cause in the hypothesis. For us to then truly accept the claims of a hypothesis, we need to perform multiple tests and each test should yield the expected result proposed in the hypothesis. There therefore lies a problem of performing multiple testing on different samples so as to validate a hypothesis. However, in statistics, the technique used to perform a single test on a hypothesis and using the result of the test to estimate the probability of getting the same result is called hypothesis testing.

Hypothesis testing is a statistical method used to test the result of a research, experiment, survey or assumed value of a population parameter. Hypothesis testing often starts with a research hypothesis or hypothesis statement. For example:

“Data Scientists earn more than an average employee in the UK”

The above statement is a hypothesis and like other hypotheses, it is an educated guess that may be true or not. Hypothesis testing is done by using proof by contradiction—proving the rightness of something by first assuming that it is wrong. This is done by formulating two other hypotheses namely, the Null hypothesis and the Alternate hypothesis.

Null Hypothesis

The null hypothesis is the hypothesis that we seek to nullify or disprove based on the research hypothesis. It opposes the research hypothesis and assumes that the cause proposed in the research hypothesis isn’t true and doesn’t make any difference. It is sometimes the generally accepted fact prior to the research hypothesis. It is often denoted by . For our example above, if we know from prior research that the average annual salary of employees in the UK is £30,000, then the null hypothesis for the research hypothesis that “Data Scientists earn more than an average employee in the UK” would be that: being a data scientist doesn’t change anything and that the average annual salary of a data scientist is not greater than the known average annual salary of employees in the UK. Mathematically, the null hypothesis for this example is:

We typical accept the null hypothesis by default and believe it to be through until the outcome of the hypothesis test shows otherwise.

Alternate Hypothesis

The alternate hypothesis is what we want to use to reject or nullify the null hypothesis. It supports the research hypothesis. For our research hypothesis example, the alternate hypothesis would be that data scientists earn more than an average employee in the UK. It can be represented mathematically as:

The Test

To test for which two of the null hypothesis and alternate hypothesis is correct, we need to get sufficient evidence to either:

Reject the null hypothesis and accept the alternate hypothesis.
Reject the alternate hypothesis and fail to reject the null hypothesis.

Note: We don’t accept the null hypothesis; it is the accepted hypothesis by default. We only find evidence to reject it or fail to reject it.

Now that the stage is set, we can carry out a hypothesis test on our research hypothesis example by going out to get data of the salaries of data scientists in the UK. Now that would be a difficult task because we cannot get the salary data of all data scientists in the UK however, we can get a sample.

Statistics

Statistics in hypothesis testing are the values gotten from a test and there are different types of tests in hypothesis testing. The nature of the research hypothesis and the type of data, should guide our choice of statistics to use for the test. The statistics used is sometimes called the type of hypothesis test and popular ones are:

Normality Test: tests if the sample is drawn from a normal distribution.
T-test: used to compare the mean of a sample with a known value or with that of another sample.
Chi-square test: used to test for association between two categorical variables in a sample.
Analysis of Variance (ANOVA) test: similar to t-test, but used for three or more samples.
Correlation test: used to test for the correlation between two continuous variables.

Significance Level

Since we can only get a sample and not the population of the data we want to carry out the hypothesis test on; if the outcome of the test agrees with the alternate hypothesis (greater than 30,000 in our example), one can argue that the outcome is not reproducible with other samples and that our outcome is purely due to random chance and hence, not significant. To sort this out, we set a probability threshold (significance level) beyond which we can conclude that the outcome was purely due to random chance. There is no perfect probability threshold but any value we set comes with some implications and depends on our trade-off depending if it is too high or too low.

Type I Error

Type I error is a false positive error i.e. flagging something as positive whereas it is negative. If the significant level is too high, we run a risk of incorrectly rejecting the null hypothesis and accepting the alternate hypothesis. An error resulting from wrongly accepting the alternate hypothesis (positive result) is a Type I Error (false positive).

Type II Error

This error is a false negative error. It is wrongly flagging something as negative whereas it is positive. A very low significance level we increase our chance of incorrectly failing to reject the null hypothesis and accept the alternate hypothesis. An error resulting from incorrectly failing to reject the null hypothesis (negative) and accept the alternate hypothesis is a Type II Error (false negative).

Our choice of significance level comes with certain risk and depend on the context of the problem. The rule of thumb significant level for most hypothesis test is 0.05. This however is not an absolute rule and you still have our choice of significance level. The significance level should however be set prior to the hypothesis test.

For example, if for our null hypothesis , and alternate hypothesis , we set a significance level of 0.05; it means that we will only consider a value greater than 30,000 to be significant if the probability that we would have observed that value or more is lesser than 0.05. If the probability is greater than 0.05, we can say that we observed that much of a difference due to random chance. If assume a T-test for the hypothesis test with 30 degrees of freedom, the significance level is shown in the image below:

Salary Graph in Pounds

The area under of shaded region is 0.05 in a 30 degrees of freedom t-distribution.

There are three ways to conclude on the outcome of the test.

Comparing Raw Values: In this method, we compare the raw values of the salaries. If the mean salary of data science in our sample data falls inside the 0.05 significance region (e.g. £32,000), we can then conclude that the data has enough evidence to reject the null hypothesis that the mean salary of data scientist in the UK is lesser than or equal to £30,000. However, if we find from our data that the mean salary of data scientist in the UK is £31,000; though £31,000 is greater than £30,000, we conclude that we do not have enough evidence to reject the null hypothesis because the difference between £31,000 and £30,000 does not fall in the 0.05 significance region in the 30 degrees of freedom t-distribution.
Comparing the t-statistics: in this method, we compare the t-statistics of the mean salaries of data scientist in the UK with the critical t-statistics (that has an area of 0.05 to its right). The figure is as shown below:

signficance region graph

If the t-statistics of the mean salary of data science in our sample data falls inside the 0.05 significance region (e.g. 2.0), we can then conclude that the data has enough evidence to reject the null hypothesis that the mean salary of data scientist in the UK is lesser than or equal to £30,000. However, if we find from our data that the t-statistics of the mean salary of data scientist in the UK say 1.0; though 1.0 is greater than 0, we conclude that we do not have enough evidence to reject the null hypothesis because the difference between 1.0 and 0 does not fall in the 0.05 significance region in the 30 degrees of freedom t-distribution.

Comparing Probabilities: We could instead compare probabilities by finding the probability associated with the t-statistics or with the raw value (of the observed mean salary of data scientist in the UK in our data) in a 30 degrees of freedom t-distribution. This probability is called the p-value. If the p-value is lesser than the chosen threshold (0.05 in our example) say 0.02, we can then conclude that the data has enough evidence to reject the null hypothesis that the mean salary of data scientist in the UK is lesser than or equal to £30,000. However, if the p-value is greater than 0.05 say 0.3; though 0.3 is lesser than 0.5, we conclude that we do not have enough evidence to reject the null hypothesis because the difference between p-value is greater than or equal to 0.05 significance region in the 30 degrees of freedom t-distribution.

Conclusion

Hypothesis testing is used to test the result of a research, experiment, survey or assumed value of a population parameter. Hypothesis testing often starts with a research hypothesis or hypothesis statement. Null and alternate hypothesis are formulated to the end that we reject one of them.

← Back

Shopping cart

Hypothesis Testing

-

search

Category

Recent Posts

Tags

Hypothesis Testing

Hypothesis

Null Hypothesis

Alternate Hypothesis

The Test

Statistics

Significance Level

Type I Error

Type II Error

Conclusion

Comments

Leave a Reply

Do you need help with your academic work? Get in touch

AcademicianHelp