Let’s say your output
is not a tensor with size 1. In addition, assume that there’s some function f
that you’re acting on the output
, so that you get something like newOutput = f(output)
.
You can pass a gradient grad
to output.backward(grad)
. The idea of this is that if you’re doing backpropagation manually, and you know the gradient of the input of the next layer (f
in this case), then you can pass the gradient of the input of the next layer to the previous layer that had output output
.
在手动调用时,有可能这个output是整个网络的一个层,它后面还有其他层的。如果你打算直接这个output在后面层的graident,这时候就需要吧output在后面层的gradient参数传入进来做backward运算。