At least to me, the difference is that one is ostensibly an explanation of how the AI arrived at the joke, the other is a post-hoc explanation of the joke.
You can be pretty sure the AI isn't doing a post-hoc explanation because the only writable memory it has access to is the tokens it has already output - i.e. the explanation of the joke. Everything else is reset between every token.
As long as it comes up with different jokes different times you ask it (assuming randomness in sampling) - how could it.
The problem is it can’t remember what it hasn’t written but the end result still makes sense, so there has to be some goal after parsing the initial context that the tokens are emitted towards to. This means there’s nothing stopping it from producing an explanation, it might be in there from the very start.