I was exploring deep learning & found it pretty intriguing. However i really need to know how activation function is being selected for modeling. there must be some criteria for selection.
There are many types of activation functions available for choosing. Each have their own strengths and weaknesses. But most of the times, Rectified Linear Units (ReLU) are preferred, because they perform well almost all the time.
As far as selection criteria is considered, it depends on the following points:
- The position of neuron (i.e. the position of layer): eg. if the neuron is in the middle layer of a deep network, ReLU are preferred over softmax.
- The problem you are trying to solve : If its a classification problem, you would apply softmax function, or if its a regression problem, you would apply linear function.
- The depth of the network : If its a shallow network, you would apply tanh in hidden layers, or else (in a deep network), apply ReLU.
There’s still a lot of research going on in this domain, so there’s no “guide book”, no “default” activation functions. It just boils down to experience in building neural network and your personal preferences. After all, building a deep neural network is an art, right?
PS: To get an in-depth intro to some of the popular activation functions, here’s a resource.