Regarding the use of bigger exponents, the answer is simple:

This is what happens when an outlier (actually influential point) is present is the data using SSE to compute the coefficients. Just imagine what would happen if we increased the emphasis on outliers even further. The regression line might actually become orthogonal to the real trend, in other words, as wrong as possible. So using bigger exponents is a big no-no.

About absolute errors, it’s actually used, just not as much as squared errors. The key difference is that it gives the same weight to every point in the data, while squared errors give more emphasis to points that are distant from the regression line. What you should use depends on the kind of data and what you want to achieve with it.

The only advantage of squared error over absolute error that I know of is that squared error is differentiable everywhere, while absolute error has a discontinuity in 0. So optimization techniques for squared error are less complex than the ones for absolute error, which translates to being faster.