Our AI Overlords discuss MAG vault line deductions

Apr 30

“They ain’t anything ‘til I call them” - Bill Klem, legendary Major League Baseball umpire.

To deduct, or not to deduct—that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of a foot misplaced,
Or to take arms against a sea of gray areas
And, by opposing, end them?

“What should happen in the following scenario? A gymnast lands his vault with one foot out of bounds. The gymnast completes the landing by moving both heels together in the proper manner, but in doing so his second foot goes out of bounds. What should the deduction be? 0.1 or 0.3?”

People can’t seem to agree on this, so I asked three different AI chatbots what they thought. These machines have no wolverines, farmers, or sticks of wood in the fight. I gave them the above screenshot, the statement from the COP “the gymnast must complete the landing by bringing his heels together without lifting and moving the front of his feet”, and that the vault ends with a landing behind the table in a standing position with legs together facing either towards or away from the table (frontways or rearways).

Neither I nor these chatbots are a gymnast, coach, or judge. I was just curious to see what answer they came up with. AI can reason logically. It can also make things up or misunderstand, but so can humans. This is an experiment, not an assessment of what judges should have actually done in a specific situation or what team “should” have gotten a specific placing at the 2025 or 2026 NCAA Championships.

Spoiler Alert: they didn’t all agree, and they could all understand it’s a question about which reasonable people can disagree.

Men’s NCAA gymnastics can and has followed slightly different rules than the FIG code (no zero vaults, last season only requiring a C skill for full EGR credit). It’s possible that the call was made differently in 2026 as opposed to 2025 NCAA championship finals because there was an internal clarification before the 2026 meet. The FIG has no jurisdiction over NCAA competitions, but the FIG should absolutely weigh in on how the rules should be interpreted, as the next time it happens could be in a World or Olympic final.

What I took verbatim from the machines is in bold type.

ChatGPT went with a 0.1 deduction.

ChatGPT states the OOB is triggered by the initial landing contact, not the final aesthetic completion. The gymnast is not placing the second foot OOB - it is ending up OOB as part of a continuous, allowed motion. It is interpreting the end of the vault as when the gymnast shows a controlled, finished position with his feet together, but does not think the OOB should be judged from the final pose. ChatGPT thinks there is a contradiction between requiring the gymnast to bring the heels together but restricting him from stepping to control that position, therefore penalizing a gymnast from obeying a required movement constraint.

The process is (according to ChatGPT):

Initial contact: one foot OOB → establishes penalty
Completion: legal heel closure → no new action
Therefore:
- Only one OOB event exists

FIG is thinking (at least ChatGPT claims it is):

“Uncontrolled contact events trigger OOB; completion is judged separately.”

To which I asked ChatGPT, “says who?” It answered that the FIG doesn’t say that, but because of how the rules I gave it are written and applied together, that is what the FIG meant, as the FIG wants to avoid penalizing a gymnast for properly completing a requirement and it does not say “evaluate all deductions at the final frame”

This chatbot wasn’t having any of the argument that the deduction should be 0.3 because of when the vault actually ends - “Vault ends…legs together” describes the finish position, not the OOB trigger.

What a completed vault should look like (presentation requirement)
Not how out-of-bounds is assessed

These are judged by different mechanisms:

Finish position → E-jury (execution / control / posture)
Out of bounds → D-jury (boundary violation)
If this were scored –0.30:
- The gymnast would be penalized more for correctly completing the landing
- Than if they had just:
  - Stayed with feet apart (and taken –0.10 E deduction)
That would create a perverse incentive, which FIG avoids. (Well, a lot of people probably disagree with that…).

The robot understood why this could be confusing.

The Code never explicitly answers:

“Do we judge OOB based on the first contact or the final completed position?”

So people fill that gap with:

Model A (Event-based) → –0.10
Model B (Final-state) → –0.30

Perplexity calls it as 0.3.

At first contact, one foot is out of bounds, so the “one foot outside the landing area” condition (0.10) is satisfied.
When he then brings the heels together and the second foot also ends up out of bounds, he has now “touched with feet … outside of the landing area,” which matches the higher 0.30 category.
Out‑of‑bounds deductions are not meant to stack per contact; the judge applies the single description that best matches the final fault (extent of body outside), just as with amplitude or posture faults where you take the largest applicable value, not a running sum.
Therefore, you upgrade the situation from 0.10 (one foot out) to 0.30 (feet out), and you take only the 0.30 once from the final score. The “bringing heels together” rule is satisfied for landing posture, but it does not negate or reduce the boundary penalty; it simply means there is no additional “feet apart” deduction layered on top.

So the clean judging sequence is: assess landing posture per the heels‑together rule, then for boundary apply a single 0.30 deduction for both feet outside the landing area at completion of the landing.

The chatbot is concluding that the situation matches the 0.30 definition, because “feet” (plural) are outside the landing area.

I told the robot that some judges don’t agree on that answer. It replied I … agree that the ambiguity is real and that a good-faith case can be made for either 0.10 or 0.30 with the current text…but even though both positions have some textual justification, if your priority is fairness and reproducibility for human beings in real time, a final‑position standard (which in this scenario yields 0.30) is simpler to teach, easier to see, and more likely to give consistent calls across a panel.

Claude calls it as 0.3

If the vault ends with legs together, then by the time the gymnast is in their final landing position, both feet are together — meaning if one foot lands out of bounds and they slide/join the heels correctly, both feet end up out of bounds in the final position.

This would suggest the deduction should be -0.30 — because the completed landing has both feet outside the landing area.

Claude had this to say about the disagreement.

The -0.10 argument: the initial landing contact was one foot out, which is what triggered the out-of-bounds situation
The -0.30 argument: the completed landing position has both feet out, which is what the judges see at the end

The ambiguity likely comes from the code being written without this exact sequence of events in mind — a gymnast who correctly executes the heel-join but in doing so moves from a -0.10 situation into what looks like a -0.30 situation.

Until the FIG clarifies, judges could reasonably go either way, and you'd likely see inconsistency across competitions.

The frustrating thing for everyone arguing about it is that both sides have a legitimate case with the current wording, so nobody can truly "win" the argument until FIG steps in and clarifies. It's one of those situations where the code just has a gap.

The people arguing -0.10 probably have the more sympathetic argument, but the people arguing -0.30 probably have the more technically literal reading of the code. The uncomfortable truth, though, is that if -0.30 is the correct call, it essentially punishes a gymnast for doing the heel-join correctly.

That's a quirk in the code that FIG really does need to address!

So, what should it be?

Claude thinks the 0.3 deduction is easier on the judges - Judges just look at the final picture — both feet outside = -0.30, done. It’s a simple binary, and doesn’t require tracking a very fast sequence in real time (although video review could help with that). There is no need to then also decide that the second foot going out was a consequence of the heel-join rather than an independent error, which requires a judgment call about intent/causation, which is always harder to apply consistently. The final position is always easier to judge than the process that led to it.

But the 0.3 penalizes the gymnast for doing something correctly.

Exactly — and that's the cruel irony at the heart of this debate.

So, the more honest gymnast is penalized more heavily. That's a real problem with whichever interpretation you choose:

-0.10 is fairer to the gymnast but harder to judge consistently.
-0.30 is easier to judge consistently but creates a perverse incentive.

Until the FIG clarifies, judges are stuck choosing between consistency and fairness — which is a terrible position to be in.

So, which team won in 2025 and was third in 2026? Michigan.

It doesn’t matter how the FIG clarifies this at some later date. Once the competition is over, the results are final. No major sport changes final results due to an ambiguity in the rules. Outright missed calls (far more clear-cut than many of the arguments about gymnastics judging, including this one) have decided the results of FIFA World Cups or which team went to the Super Bowl. Those final results were left unchanged.

Probably the most famous example is the 1986 FIFA World Cup (England v Argentina) “Hand of God”, where Diego Maradona used his hand to score a goal, but the referee didn’t see it. Argentina won the match and the World Cup. It was caught on camera and obvious on TV replays. England complained then and for years after, but the results stood. Years later, Maradona admitted he did it on purpose because he wanted revenge against England for the defeat of Argentina in the Falklands War.

In 2010, Armando Galarraga of the Detroit Tigers was one out away from a perfect game and baseball immortality, but the umpire called the runner safe when the runner was actually out. The authors of the book Bad Call: Technology’s Attack on Referees and Umpires and How to Fix It, describe it thus “even though all parties involved knew shortly after the decision had been made that the umpire had it wrong, and the authorities were requested to change the record and award the pitcher a perfect game, they did not: everyone knew that reality had been made by the umpire and it could not be remade.”

Shelli Koszdin

Our AI Overlords discuss MAG vault line deductions

Men’s 2026 NCAA Championships By the numbers