How to use diagnostic statistics in R for outliers and influential observations?

r
regression

#1

Hello,

While learning about regression, I came across diagnostic statistics such as STUDENT, COOKD, RSTUDENT and DFFITS etc. to detect outliers and identify influential observations in our data. How to implement and use these in R while using regression?

Thanks.


#2

see http://www.statmethods.net/stats/rdiagnostics.html

Outliers

Assessing Outliers

outlierTest(fit) # Bonferonni p-value for most extreme obs
qqPlot(fit, main=“QQ Plot”) #qq plot for studentized resid
leveragePlots(fit) # leverage plots
leverage plot click to view
Influential Observations

Influential Observations

added variable plots

av.Plots(fit)

Cook’s D plot

identify D values > 4/(n-k-1)

cutoff <- 4/((nrow(mtcars)-length(fit$coefficients)-2))
plot(fit, which=4, cook.levels=cutoff)

Influence Plot

influencePlot(fit, id.method=“identify”, main=“Influence Plot”, sub=“Circle size is proportial to Cook’s Distance” )

Cook’s distance (or Cook’s D): A measure that combines the information of leverage and residual of the observation.