You mean the output of the transformer? It does not "compute" confidence values. It's still doing token prediction.
You mean the output of the transformer? It does not "compute" confidence values. It's still doing token prediction.