Hi,

I’m pretty new to data science(esp. predictive modeling ). I find new information but I have no idea as to how I can use that info to improve the accuracy of my model.

Scenario:

I have a model, and I’m looking to improve it. During EDA, I find (through visualizations) that a particular factor (say x1) has four levels. When I distribute the target variable over the levels of the factor using a boxplot, I find that the median value of the target variable has a higher value in one of the levels.

Question:

How can I test for statistical significance of this phenomenon?

How do I incorporate this new found information into my model?

This question represents one scenario in which I’m trying to predict the value of a continuous target variable. However, I have been plagued with this issue in the past.

Thank you for the answer!!