Hi This is Sridevi, I just learned and want to start career. I have a dataset called “wineQualityReds”, I got outliers so I tested it by replacing mean, mode, median and sd. Every time I got outliers (some cases more than original). So, how to deal these and how to get conclusion.
Can you give more details about your problem? Which variable do you have an outlier in? and what did you replace it with - mean or median or mode or sd?
You can also check out this discussion thread on outlier treatment:
Hi Aishwarya, I came to know that all the variables should be tested for outliers. Is it correct. So, I started with “fixed.acidity” column and I tried to replace with mean, mode, sd and median one by one. Every time, I got more outliers.
can we remove outliers in all variables in a dataset at a time? if so, is there any data loss
Once you replace the outliers with the median of the data, you cannot have more outliers. So for the given data:
100 and 200 are the outliers. Once you replace this, you don’t have to worry about 20 being an outlier.
But in my data set, I am getting more outliers, eventhough I replaced with median. I am getting errors while using basic.stats()
Error: Can’t use matrix or array for column indexing
rlang::last_error() to see a backtrace
Here wine1 is my dataset