The limitation in doing anything interesting with this type of AI is always going to be that the simulation has limits on accuracy and fidelity, and you reach the limits of the simulation before you get anything interesting, and the AI ends up exploiting your simulation in a non-generalizable way. See also, much earlier in the current AI hype cycle, when OpenAI beat human pros in DotA, but if you actually look into it, they were mostly winning on the basis of exploiting APM more than anything else.
At any rate, physics and statistics always run into limits as the experiment might be seriously difficult to implement (complex, unstable or costly), or at the limit of the data available. That's the day to day normal of these fields. Even some very basic physics or electronics lab setups are the bane of students - getting all kinds of results except the ones expected. But more seriously, often an experimental scheme is proposed (and that author gets credit for that) and it takes a few years before a team takes it on (and gets credit for that if successful). It's a commitment of time, money and judgment on whether the team can achieve it and thus claim the credit - or will never manage to pull it off.