How to generate insights using Box plots?




I want to compare sepal length across different classes (setosa, versicolor & verginica) while working on IRIS data set. I have used box plot to compare length using plot function.


While looking at above plot, I can easily say that class “virginica” length is higher compare to other two. Can I statistically say that “virginica” length is higher compare to other two"?




Box Plot does not give any statistical inference to compare multiple box-plots, it helps to check the distribution.

Box plot has information about Minimum, First Quartile, Median, Third Quartile and Maximum value. You can generate following insights while looking at the plot.

  • Compare the medians, to compare the mid value
  • Compare the interquartile ranges (that is, the box lengths), to compare dispersion.
  • Look at the range (Max - Min) and compare across different plots
  • Look at the position of median value to check the skewness (left skewed, symmetric, right skewed). If the data do not appear to be symmetric, does each batch show the same kind of asymmetry?



You cannot infer from the median that one value is greater than the other, the actual distribution may be different. You can see that in the following diagram: