Ways to find outliers in SAS

outliers
sas
analytics

#1

Hi,

I am working on business analytics problem using SAS and while performing data exploration stage, I want to detect outliers. Can you help me to identify or highlight outliers? What are the ways in SAS to perform this and which one efficient compare to others?

Regards,
Imran


#2

Hello @Imran,

There is a very nice document
http://www.nesug.org/Proceedings/nesug10/ad/ad07.pdf
I think this will give you a great understanding about outlier detection & treatment.
Thanks!


#3

@Imran

You can also refer below link for outlier detection and treatment.

Thx,
Steve


#4

try code
**Proc univariate data= dataset_name plot; **
**var var_name ; **
run;
to get box plot

after that, you would get box plot use quartile.

Ideally, values which lie ± 3*interqurtile range are outliers

you can try
Proc univariate data= dataset_name plot;
var varname ;
where less than/ greater than condition;
run;

to set different cutoffs if you get clear box plot generally you can assume outliers are removed


#5

The logic is something you need to create per variable but SAS has enough proc’s to get percentile, box plot and other descriptive data and graphs for you to decide how you want to treat the outliers.

You can build a macro for the data set you are using and make it automated based on applying a domain knowledge over the data and fine tuning what logic you want for treating them.

Example - Say income. There is a chance that there are 20 Senior folks in your company whose salary is so high that they appear as a cluster and as an outlier. To treat (Cap) or not to treat is driven by what you are achieving and you will have to use the business knowledge and purpose of report to do it.

Easiest approach - Get Percentiles from 1-15 and 85-99 and 99-99.99 and look at max % drop in values to see realistic outliers.

Say 99.95 to 99.99 there is a 30% drop in value, i would use a logic to cap it at 99.95 which also means i am not losing too many values.

Blind logic to use is cap it at 99.97.