Softmax and Logsoftmax in Pytorch

Output layer and criterion options (all are equivalent, 1 is most popular) :

  1. Linear + LogSoftMax + ClassNLLCriterion
  2. Linear + SoftMax + Log + ClassNLLCriterion
  3. Linear + CrossEntropyCriterion

It should be noted that CrossEntropyLoss includes a softmax operation.

softmax with log-likelihood cost can be more fast compared with softmax with MSELoss.

The log-likelihood loss is

where $a_k$ is the output of a neuron, and $y_k$ is the truth.

The cross-entropy loss is

And what’s the logsoftmax?

what’s more, it’s actually realized in nn.functional

The NLLoss is:

where $w_n$ default is 1.

The BCELoss is a CrossEntropyLoss designed for binary classification. And it need a sigmoid function before useing the BCELoss. What’s more, BCEWithLogitsLoss includes the BCELoss and the sigmoid function.

References

https://github.com/torch/nn/issues/357

https://pytorch.org/docs/stable/_modules/torch/nn/functional.html#log_softmax

https://pytorch.org/docs/stable/nn.html?highlight=log_softmax#torch.nn.functional.log_softmax