Game Design: The Reverse Turing Test

Diefenbach VS Wagner, a sleep-deprived production.

So I made a game called Diefenbach VS Wagner in a day. Diefenbach and Wagner are two of the game production professors I studied under, and in the game they beat each other up and there are explosions and lightning and grass and all sorts of other things. In my defense, I was sleep deprived. There were going to be DBZ-style floating rocks swirling around them, but unfortunately I didn’t get that far.

Download the game to play here. Comes with Mac, PC, and web builds. 2 players required (or you can play by yourself if you feel like it).

But I’m not bringing up DvsW as a shameless plug for my own game (well, kind of). The game design behind DvsW is actually fundamentally flawed, and it’s a good opportunity to talk about what that flaw is. This ties in to one of my principles of game design, which I call the Reverse Turing Test.

“If you can reasonably and easily create a computer program to play your game perfectly, then you have a problem.” – the Reverse Turing Test

The original Turing Test was something much more academic than this, and applied to a completely different field. But the main idea is that in the Turing Test, if a machine is successful it’s a good thing. With the Reverse Turing Test, it’s the opposite: if a machine is successful, there’s a problem. It’s also important to note that I mention “reasonably and easily creating a computer program,” so Deep Blue’s performance in chess doesn’t count.

So let’s break down DvsW and see how it plays into the Reverse Turing Test.

Mechanics of DvsW

DvsW is a minimalist fighting game, sort of like Divekick. In DvsW, each player gets three keys (ASD for Diefenbach, JKL for Wagner) that will each activate one of three attacks. The first person to land an attack on the other wins: it’s a one-hit-kill game.

The three attacks are functionally all the same, but as soon as one player launches an attack, the other player is given the opportunity to counterattack by pressing one of his three buttons. Say that Diefenbach attacks by pressing A, which executes his front kick attack. As soon as Diefenbach attacks, Wagner is able to press K (his punch) to counterattack. If a counterattack successfully happens, the counterattacker lands the hit and wins the round. But if the counterattacker presses the wrong button or doesn’t react in time, the win goes to the original attacker. Different attacks are countered differently: front kicks (A and J) are countered by punches (S and K), which are countered by side kicks (D and L), which are countered by front kicks.

However, the game also has another trick: the speed at which you attack will ramp up as the game time progresses. If you attack right at the beginning of the match, your strike will take about a second to land, which is plenty of time for your opponent to use the appropriate counterattack. But with each second that passes, your next strike will be faster, and thus give your enemy a smaller window to counterattack (which means a greater chance that your original attack will connect).

Thus, it becomes a game about timing. Should I strike now, and risk being countered by my enemy, or should I strike later and have a greater chance of landing my attack? Or should I focus on counterattacking now, so that if my enemy attacks first I’ll be ready to react, even though the longer I wait the lower my chances of a successful counterattack will be?

Now that all of that is done, let’s take a look at our Reverse Turing Test. Would it be possible for one to reasonably and easily create a computer program to play DvsW perfectly? The answer is a resounding yes.

First of all, a computer program can have (essentially) zero reaction time. It would be a simple matter to write a script that instantly reacts with the proper counter the exact moment that a player attacks. If player uses front kick, use punch. If player uses punch, use side kick. If player uses side kick, use front kick. With this, there’s absolutely no way for the computer program to lose. We can even write this computer program to go on the offensive, too. If enough game time passes that my next attack would be faster than the player’s reaction time, launch an attack.

Every frame, check two things: if the player has launched an attack, and if the speed of my next attack would be faster than the player’s reaction time. If the former is true, react with the proper counterattack. If the latter is true, attack. There you have it: a computer program that can play DvsW perfectly. It would take ten lines of code.

What Does The Reverse Turing Test Do?

The Reverse Turing Test helps identify what kind of choices players have to make in the game. Sid Meier is famously quoted (and disputed) for saying that “A game is a series of interesting choices,” and even though I don’t want to go quite THAT far, the significance of a choice is still a vital factor for game design. Players make choices, receive feedback from the game regarding the choice they made, adjust their behavior, and do it again. Say you’re playing Halo and you’re fighting a pair of Hunters. You make the choice to try to snipe them from afar, but they split up and kill you with lasers. Next time you respawn, before you pick the sniper rifle back up you remember that you died last time you tried that, so instead you fight them head-on and try to kill them with grenades. Choices are an important part of the gameplay loop, and ultimately the whole experience.

But not all choices are born equal. When you sit down at a slot machine, you have a choice: pull the lever, or not pull the lever. Well, first of all, if you choose not to pull the lever you’re not playing the game, so it’s a faulty choice right from the beginning, but let’s move on. After you’ve made the choice to pull the lever, you have no more choices to make, and you get a randomized output. There is no connection between the choice you made and the reward you get. Sometimes you get a lot of money, sometimes you get a little bit of money, sometimes you get nothing, all for the same action. It’s why some gamblers develop rituals, like periodically changing machines in the hopes of winning. They want clearer outcomes from the choices they make, but the choices they make aren’t meaningful: they’re wholly chance-based, and no matter how much agency you exert you can’t change anything. Luck-based choices are bad.

Unfortunately, the Reverse Turing Test can’t identify luck-based choices, because there is no perfect way to play a luck-based game. However, the test helps expose other problematic choices: those that are based on how well you can do menial, programmatic tasks that could be easily automated. The whole reason why humanity developed machines was so that they could do all the boring grunt labor for us and free up time for us to do more important tasks that require high-level conceptual thinking. It’s paradoxical for video games to force us to do the same boring grunt labor that machines and computers were initially invented to free us from.

Examples of Reverse Turing Test Failures

I feel like I spend a lot of time hating on Maplestory.

Asian MMOs like Maplestory have plenty of design flaws exposed by this test. There are tons of situations in Maplestory that fail the Reverse Turing Test, but my personal peeve is grinding. When I was a kid, I built a clamp out of legos to hold down a button on my keyboard, which was the button mapped to one of my character’s abilities, and my character would just keep using that one ability over and over and gain experience from it. Nowadays, people are more sophisticated: there are computer programs called “bots” that can be used to automate grinding (1). The choice that Maplestory presents to you is “do you want to spend a ton of time doing a boring routine task, or do you want to stop playing this game.” Timesink-based choices are bad.

Hardcore fighting games such as Dead or Alive have a different kind of problem. Many fighting games are notorious for having high skill caps because you need to input a very specific button combo in a short period of time in order for your character to execute a certain move. The choice of whether or not to do the move is meaningful: your timing and your distance from your opponent are all factors to consider, and that’s where the core gameplay actually is. However, say you’ve decided to do a certain move, and now you have to actually make sure you do it. In this case, there is no choice: you either do the move, or you fail to do it and screw yourself over. No one would ever voluntarily choose to fail the move, but it depends on how much time you dedicated to moving your fingers in a precise pattern. Time that you could have spent dedicating to playing the actual game of yomi and enemy prediction. It’s no wonder that so many high-end gaming equipments allow you to set macros to single buttons, which is essentially using a computer program to do the task for you. Dexterity-based choices are bad.

Diefenbach VS Wagner is a game wholly decided by reaction time. There is never a time when you would intentionally choose to press a button other than the one that activates your counterattack. The speed at which you react is a constant, not something you can design meaningful choice around. It would be like if I designed a game where the taller person wins: there are no choices involved, and the players have no mechanisms that they can use to strive for victory. Deciding on when you want to attack is a meaningful choice, but counterattacking is not, so essentially only one person (the attacker) has the agency to exert force on the game state. Reaction-based choices are bad.

Examples of Reverse Turing Test Successes

Divekick should have a robot character named Turing who plays like a badly programmed AI.

Divekick, one of the games that inspired DvsW‘s design philosophy, managed to boil down the essence of fighting games into two aspects: positioning and timing. In fact, if you really want to get philosophical, positioning and timing are basically the same concept, so I’ll focus on positioning. Understanding the distance between you and your opponent, your own threat range versus your enemy’s threat range, the speed at which you can take action, your enemy’s behavior patterns, and more are all aspects that feed into positioning. Do I move closer and possibly walk into my enemy’s trap, or do I fall back and prepare a trap of my own for the enemy to walk into? Do I poke with weak long-ranged attacks, or do I close in for the kill? All of these decisions have to be made dynamically in reaction to the opponent’s actions, and there’s no perfectly correct answer to any given situation. Positioning-based choices are good. The largest fighting game series in the world (Super Smash Bros) is a game of positioning and movement.

League of Legends has certain abilities that are “skill shots,” which means that they have to be aimed manually rather than automatically homing in on targets like most other abilities. At first glance, using a skill shot seems like something that could be easily automated: calculate enemy position and velocity, find out the speed of my own skill shot, make sure they both line up at the same point, and you have a hit. However, interaction around skill shots is so much more nuanced than a mathematical formula. Maybe you can try to bait out an enemy skill shot by moving in an erratic pattern. Maybe you can try to force an enemy in a bad position by firing a skill shot with the intention of having them dodge it in a certain direction. Maybe if you’re rushing at an enemy and they’re firing a skill shot at you, you’ll choose to eat the attack and continue your pursuit, rather than dodge and lose your target. It’s the same reason why FPS games will pass the Reverse Turing Test: even if you’re fighting against a computer program with perfect accuracy, accuracy isn’t everything. Aiming-based choices with proper counterplay measures in place are good. That’s a subtle but important distinction from aiming-based choices without counterplay, such as the turn-based artillery game Gunbound (and guess what, Gunbound is dominated by aimbots).

Games about managing resource systems (2) such as Magic: The Gathering involve a lot of decision making that can’t be boiled down to computational formulas. If I use all my mana and summon this big powerful monster now, what do I do if the enemy casts a game-changing spell during his turn and I can’t do anything about it because I’m out of mana? What if I don’t actually have a way of dealing with enemy game-changing spells, but I decide to bluff and leave my mana untapped as if I were going to unleash a counter? But what if my enemy doesn’t have any kind of game-changing spell in the first place? Playing against a MtG bot or a Starcraft 2 bot is obvious because the decisions involved in resource-based games play into the human element: bluffing, lying, taunting, luring. You can calculate the odds as much as you want, but ultimately the decisions are based on human judgments rather than computerized ones. Resource-based choices are good (in strategic games).

Is The Reverse Turing Test Always Right?

There are plenty of games that fail the Reverse Turing Test that are still successful. Guitar HeroInfinity BladeBit Trip Runner. Arguably golf. What’s the point of a design principle if all these games ignore it and still do perfectly fine in the market? Why are these games still appealing, even if the decisions presented in them are reaction-based (3)?

Tasks that can be computerized or automated are also tasks that can spit out instantaneous feedback, and instantaneous feedback is a good way of inducing flow. All you have to do is compare the player’s performance with a perfect performance, and you get feedback. DDR and Guitar Hero are flow machines because of how quickly they’re able to give you feedback. When you’re playing Divekick and you’re hanging out a good distance away from your enemy, there’s no feedback telling you whether or not you’re doing the right move (because there is no such thing as a “right” move). But as soon as you miss a key in Guitar Hero, the game tells you that you suck (well, the audience does, but same thing). Computerize-able tasks are good at giving fast feedback, and fast feedback is good feedback.

And that’s a perfectly fine design direction to go in, if that’s your cup of tea. I’ve had a lot of fun playing Diefenbach VS Wagner with friends, despite how much I whine about its design. But fun isn’t and shouldn’t be how we measure the quality of game design. Anything can be fun if you add explosions and lightning. How, then, are we able to measure the quality of game design?

That’s up to you, but my personal design philosophy has always focused on the development of individualized player skill. People should be able to find their own unique ways of playing games, because different people are different. Choices in games allow players to make decisions based on their own personal style, and they’re able to develop in different directions. But look at every game I’ve accused of failing the Reverse Turing Test. In Maplestory, you develop in one direction, and that’s getting a higher level. In Dead or Alive, your development correlates to your dexterity and combo memorization. In Diefenbach VS Wagner, you don’t develop at all, and instead rely on your reaction time that was built into you from birth.

But Divekick, League of Legends, and Magic: The Gathering? These games all have very unique ways of being played. You can be a defensive turtle, or an all-in berserker, or a speedy cocky trickster, or any multitude of styles. As you play these games, you develop your own style, and you clash against other people who have developed their own styles, and you see who wins. But if they win, you don’t just roll over and say “oh, well I guess they were right and I was wrong, I should conform to their style.” Instead, you stick to your own method and refine it based on what you learned, and after you’ve evolved you throw yourself right back into the fray and do it all over again. But when you play games that fail the Reverse Turing Test, it’s a competition of who can be more computer-like. There’s no personal style involved.

My goal is not to make games that have one perfect correct answer. Computer programs can find perfect correct answers, but people are different. People are weird and strange and confusing and ambiguous, and they each come to their own answers. Individual skill is a virtue, and games are one of the few mediums that can nurture that virtue. I want to make games about individual skill, and the Reverse Turing Test is one of the tools I use to identify and create such games.


(1) There are actual businesses where you can hire someone (a human, not a bot) to grind on your MMO character for you while you go to work or something else. At that point it’s just ridiculous. Why play a game that’s so boring that you would pay someone else to play it for you?

(2) I write more about resource systems in game design in my essay on The Mana System Paradox. In fact, the problem I define as the mana system paradox can be reworded as a problem of choice. Do you choose to play the game as it was intended to be played and be punished for it, or do you choose a boring and uninteresting way of playing? Just like all of the other Reverse Turing Test failures, it’s a false choice that players shouldn’t be forced to make on their own.

(3) The game Zorba actually puts a lot of this discussion to practice. The developer Pippin Barr wrote a certain excerpt that’s especially relevant:

“Specifically, I became attached to the idea that in many video games it’s effectively the case that the AI is only pretending you can beat it. A computer dancing (digitally) to the Zorba song obviously doesn’t have to make a single mistake. And so I immediately wanted a game where you dance to the song against a computer and, as it becomes impossibly fast and beyond your capacity, the computer just keeps right on dancing without a care in the world. It plays on ideas of what “skill” is, what’s “fair” from an AI opponent, and so on. Amusing.”

5 responses to “Game Design: The Reverse Turing Test

  1. > (1) There are actual businesses where you can hire someone (a human, not a bot) to grind on your MMO character for you while you go to work or something else. At that point it’s just ridiculous. Why play a game that’s so boring that you would pay someone else to play it for you?

    I think this is easy to understand if you ever played any MMORPG: if you want to dominate you have to work in the game, which is something sometimes you don’t have time for.

    It can be basic leveling or pure grind for something crucial and because you can’t and don’t want upset people you play with. Because you are dependent on each other, because you can’t own nothing crucial and important as a solo player and because you in kind of social agreement with your “party”, basically – you have no other choice: you pull the lever (play) or leave.

    Some people take vacations at launches or for important stuff, some people buy currency. Hiring a driver is the only thing you can do in the case you can’t buy the thing you need by your currency, for example BindOnPickup drop. If you are interested I can tell much more because I have experience with stuff like that

    • I’d love to hear more, I am admittedly not a diehard MMO fan. I’ve played a few back in the day, but nowadays my main online game is LoL which can’t quite be classified as an MMORPG.

  2. What about games like Zelda where a person has to solve puzzles (and similar games)? Does the reverse turing test apply to them? The solving of the puzzles are meaningful but you could get a computer program to do it as well.

    • I don’t know about you but I have no idea how I would go about creating a puzzle-solving program. But you’re right, the core question’s still there and they do fail the test. Puzzles have one perfect correct answer. That’s not to say that puzzle games are bad, but puzzle games don’t really support a variety of playstyles the way that Divekick, LoL, or MtG do. It becomes a question of how you define “meaningful.” Personally, I find meaning in varied playstyles.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s