# Interpreting result of Chi-square test on Megastar contest

#1

Update :
I was going through the documentation of `scipy.stats.chi2_contingency` and I found out that expected observation of for a cross table between two variables is obtained by multiplying the row total for that cell by the column total for that cell and then divide by the total number of observation.
What is the significance in creating the expected result by this technique?
What can we comment about the co-relation between the two variables by looking at p-value after chi-square test?

Hi,
I am working on megastar contest. The objective of this dataset is to predict the correct category of the working professionals in India. Below is data dictionary of the variables.

After applying chi-square test on two categorical variable(UG_Education and Category) by using following code:
`sc.stats.chisquare(pd.crosstab(train.UG_Education,train.Category,margins=True))`
I get following result :

I have following questions about the two categorical variables(UG_Education and Category) :
1. Are they co-related?
2. What is Null Hypothesis in this case if the cross table is :

3. If the two variables are not co-related, does that mean UG_Education has no effect on outcome(Category is the outcome variable)?

Danish

#2

Your chi square is very high even with 45 degree of freedom (check in the table) , therefore the relations between education and category are not independent. The Null Hypothesis H0 is the relation of independence, which is not your case.
Hope this help.
Alain

#3

@Lesaffrea,
Thanks for the interpretation of high chi square value and as for the other question I wanted to know what will be the Null Hypothesis in this particular case?

#4

You have the p value of the margin it seems (i no not know this function… Python is not my forte !!) in you answer to sc.stats… therefore you can reject the null hypothesis , they are lower than the cutoff.
As mentioned H0 is the categories are independent… so you can reject it .
Hope this help
Alain