Why is the error term considered to be of mean Zero in linear regression?



I was studying some mathematics about linear regression and I came across this.

If we denote the variable we are trying to predict as Y and our covariates as X, we may assume that there is a relationship relating one to the other such as Y=f(X)+ϵ where the error term ϵ is normally distributed with a mean of zero like so ϵ∼N(0,σϵ)

Why is the error term distributed with mean zero ?

Is it a mathematical assumption ?

And what if the error is not distributed normally with a non zero mean ?

Any help would be greatly appreciated.