Output layer and criterion options (all are equivalent, 1 is most popular) :
- Linear + LogSoftMax + ClassNLLCriterion
- Linear + SoftMax + Log + ClassNLLCriterion
- Linear + CrossEntropyCriterion
It should be noted that CrossEntropyLoss includes a softmax operation.
softmax with log-likelihood cost can be more fast compared with softmax with MSELoss.
The log-likelihood loss is
where $a_k$ is the output of a neuron, and $y_k$ is the truth.
The cross-entropy loss is
And what’s the logsoftmax?
what’s more, it’s actually realized in nn.functional
The NLLoss is:
where $w_n$ default is 1.
The BCELoss is a CrossEntropyLoss designed for binary classification. And it need a sigmoid function before useing the BCELoss. What’s more, BCEWithLogitsLoss includes the BCELoss and the sigmoid function.
References
https://github.com/torch/nn/issues/357
https://pytorch.org/docs/stable/_modules/torch/nn/functional.html#log_softmax
https://pytorch.org/docs/stable/nn.html?highlight=log_softmax#torch.nn.functional.log_softmax