Handling Outliers

outliers

#1

Hi This is Sridevi, I just learned and want to start career. I have a dataset called “wineQualityReds”, I got outliers so I tested it by replacing mean, mode, median and sd. Every time I got outliers (some cases more than original). So, how to deal these and how to get conclusion.


#2

Hi @sridevi.tadisetti,

Can you give more details about your problem? Which variable do you have an outlier in? and what did you replace it with - mean or median or mode or sd?

You can also check out this discussion thread on outlier treatment:


#3

@AishwaryaSingh,
Hi Aishwarya, I came to know that all the variables should be tested for outliers. Is it correct. So, I started with “fixed.acidity” column and I tried to replace with mean, mode, sd and median one by one. Every time, I got more outliers.


#4

can we remove outliers in all variables in a dataset at a time? if so, is there any data loss


#5

Hi @sridevi.tadisetti,

Once you replace the outliers with the median of the data, you cannot have more outliers. So for the given data:

2,4,6,8,10,12,14,20,100,200

100 and 200 are the outliers. Once you replace this, you don’t have to worry about 20 being an outlier.


#6

Hi @AishwaryaSingh,

But in my data set, I am getting more outliers, eventhough I replaced with median. I am getting errors while using basic.stats()
1<-boxplot.stats(wine1)
Error: Can’t use matrix or array for column indexing
Call rlang::last_error() to see a backtrace

Here wine1 is my dataset