How to verify that the data is Random in Data Science

randomness
statistics

#1

When predicting results from a data set using any of the ML techniques, how do I finalize that there is no pattern and the underlying data is random in nature?
In other words, if a data set containing random numbers is given, how to find out that there is no prediction possible and the underlying data is random in nature?


#2

Hi Heyjag

do some permutations and the variable you believe are important.

Hope this help

Alain


#3

hello @heyjag,

That is why the concept of hypothesis testing exists in statistics.Several tests like the z-test,t-test,ANOVA etc exist to determine whether the results being observed are because of randomness or there is some pattern.
In the HT framework the Null Hypothesis(Ho) says that the results are because of randomness whereas the Alternate hypothesis (Ha) states that the results are not because of randomness but because there is a pattern in the values we are seeing.

For example if you run Linear Regression,the Null Hypothesis states that the slope of the regression line = 0 and the alternate states that the slope is != 0.
If you see the results of a linear regression run.

The last part of this result shows the F-Statistic which is significant,thus saying that the model is significant which in turn means that the results are not because of randomness and that the model has predictive power.

Note.Slope = 0 means you are using the average value of y to predict y.Because if there is no pattern in the data,you can use the average to estimate y.

Though this is not a comprehensive explanation,I hope this helps!!


#4

Hi Shuvayan,

the hypothesis testing can not verify the randomness only to validate the hypothesis or not If you use F test or T test you make one assumption, which is fundamental, the distribution of your data.

Alain