I’m pretty new to data science(esp. predictive modeling ). I find new information but I have no idea as to how I can use that info to improve the accuracy of my model.
I have a model, and I’m looking to improve it. During EDA, I find (through visualizations) that a particular factor (say x1) has four levels. When I distribute the target variable over the levels of the factor using a boxplot, I find that the median value of the target variable has a higher value in one of the levels.
How can I test for statistical significance of this phenomenon?
How do I incorporate this new found information into my model?
This question represents one scenario in which I’m trying to predict the value of a continuous target variable. However, I have been plagued with this issue in the past.
Thank you for the answer!!