Softmax and Logsoftmax in Pytorch

Output layer and criterion options (all are equivalent, 1 is most popular) :

Linear + LogSoftMax + ClassNLLCriterion
Linear + SoftMax + Log + ClassNLLCriterion
Linear + CrossEntropyCriterion

It should be noted that CrossEntropyLoss includes a softmax operation.

softmax with log-likelihood cost can be more fast compared with softmax with MSELoss.

The log-likelihood loss is

$C = - \Sigma_k y_klog(a_k)$

where $a_k$ is the output of a neuron, and $y_k$ is the truth.

The cross-entropy loss is

$C_{CE} = -\Sigma_k \ y_klog(a_k) + (1-y_k)log(1-a_k)$

And what’s the logsoftmax?

$Applies \ the \ `\log(\text{Softmax}(x))` function \ to \ an \ n-dimensional \ input \ Tensor. \\ The \ LogSoftmax \ formulation \ can \ be \ simplified \ as:\\ \text{LogSoftmax}(x_{i}) = \log\left(\frac{\exp(x_i) }{ \sum_j \exp(x_j)} \right)$

what’s more, it’s actually realized in nn.functional

$While \ mathematically \ equivalent \ to \ log(softmax(x)), \ doing \ these \\ two \ operations \ separately \ is \ slower, \ and \ numerically \ unstable.\\ This \ function \ \ uses \ an \ alternative \ formulation \ to \ compute \ the \ output \\ and \ gradient \ correctly.$

The NLLoss is:

$Loss \ = - w_nx_{n,y_n}$

where $w_n$ default is 1.

The BCELoss is a CrossEntropyLoss designed for binary classification. And it need a sigmoid function before useing the BCELoss. What’s more, BCEWithLogitsLoss includes the BCELoss and the sigmoid function.

References

https://github.com/torch/nn/issues/357

https://pytorch.org/docs/stable/_modules/torch/nn/functional.html#log_softmax

https://pytorch.org/docs/stable/nn.html?highlight=log_softmax#torch.nn.functional.log_softmax