ParaCyberBellum's Blog
Understanding AI limitations with a Tic-Tac-Toe game

"AI-Powered" is definitely a must. No one would dare to release a product, solution or platform that is not "AI-Powered" labeled. Indeed, despite the usual marketing fuzz, AI can bring a lot to IT, if properly designed and used.

On the other hand, AI has limitations. We know it, but it is sometimes difficult to really understand how limited AIs are. We can feel it, but it gets difficult to properly get our hands on these limitations. Either we read articles stating that "AI are not perfect" but without a solid proof, or we dive into research papers that actually prove something, but in such a complicated way that we don't really understand what the limitation is...

The purpose of this article is to explain in a understandable way the main limitations of AI through of a simple game: the famous professor Falken's Tic-Tac-Toe 1. We will illustrate these limitations and find options to circumvent them.

A Genesis

Although I already worked on a lot of AI models, I never really got into reinforced learning more than simply using existing libraries. When I came across an "AI-Powered" Tic-Tac-Toe program 2 written by antirez 3, I thought it was a good opportunity to close the gap. Simply because it is always better to know what you're doing (and talking about).

So, I recoded the program in another language 4 (the best way I know to really understand what a program does), tested it, and went deeper in understanding its limitations and how to circumvent them.

Greetings Professor Falken, shall we play a game?

Under the hood

The core of the AI is a simple Artificial Neural Network (ANN) that initially plays randomly. Inputs of the ANN are the 9 squares of the board, and the outputs the probabilities for each square to be the best move. Between those 2 we have a "layer" of neurons, each of them interconnected with all the inputs and outputs, that will compute the inputs and generate the probabilities provided in the outputs.

Ater each game, the engine reviews its moves one-by-one. For each move it rebuilds the board and adjusts the different parameters of its network based on the outcome of the game: win, loss or tie.

In case of a win or a draw, it considers that the move it performed was the best possible, and gives it a weight of 1, and a weight of 0 for the other options. In the case of a loss, the move is given a weight of 0, and other possible ones equal probability.

Those probabilities are then used to (more or less) recalculate the weight (e.g. the importance) of each neuron operations. Those new parameters will be used in the next game, and adjusted again based on the outcome. And so on.

Building a baseline

To train the model, we need an opponent. In the initial version, this opponent is a dumb algorithm, randomly checking one of the available squares on the board. So let's first set a baseline and see the result of 150.000 games played by 2 of these random algorithms. Results of 3 different sessions of 150.000 games are provided below. In the best case, the first player will win about 118.000 games, lose 15.000 and end up with tie 17.000 times.

[>>>] Games Played: 150000

┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ Player ┃ Type   ┃   Wins ┃ Losses ┃  Ties ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ 1      │ Random │ 108841 │  24526 │ 16633 │
│ 2      │ Random │  24526 │ 108841 │ 16633 │
└────────┴────────┴────────┴────────┴───────┘

[>>>] Games Played: 150000

┏━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ Player ┃ Type   ┃  Wins ┃ Losses ┃  Ties ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ 1      │ Random │ 61185 │  68456 │ 20359 │
│ 2      │ Random │ 68456 │  61185 │ 20359 │
└────────┴────────┴───────┴────────┴───────┘

[>>>] Games Played: 150000

┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ Player ┃ Type   ┃   Wins ┃ Losses ┃  Ties ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ 1      │ Random │ 117724 │  15020 │ 17256 │
│ 2      │ Random │  15020 │ 117724 │ 17256 │
└────────┴────────┴────────┴────────┴───────┘

Release the bAIst

Now that we have a baseline, our first goal is to create a model that will get better results than a basic random player.

So let's have our AI player engaged against our random algorithm. After 150.000 games (which is by the way much more than the number of tic-tac-toe games any human played in its entire life) we get to a point where the AI (playing first) seems to do quite good: roughly 140.400 wins, 300 losses and 9.200 ties. This is much better than a random player.

[>>>] Games Played: 150000

┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃ Player ┃ Type                                 ┃   Wins ┃ Losses ┃ Ties ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│ 1      │ Neural Network (Player 1 vs. Random) │ 140404 │    327 │ 9269 │
│ 2      │ Random                               │    327 │ 140404 │ 9269 │
└────────┴──────────────────────────────────────┴────────┴────────┴──────┘

Even better, having the same AI (now trained) playing another 150.000 games against the same stupid opponent we get 144.300 (+4.000) wins, 100 losses (-200), and then 5.600 ties (-3.700). We created an intelligent machine ! I am proud to introduce the first AI-Powered Tic-Tac-Toe, a game-changer in the computer gaming industry !

[>>>] Games Played: 150000

┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃ Player ┃ Type                                 ┃   Wins ┃ Losses ┃ Ties ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│ 1      │ Neural Network (Player 1 vs. Random) │ 144308 │    112 │ 5580 │
│ 2      │ Random                               │    112 │ 144308 │ 5580 │
└────────┴──────────────────────────────────────┴────────┴────────┴──────┘

Good. So what about playing a few games against the genius we build from the ground... 10 games, 10 victory for me playing in second place. Not even a single tie. And honestly, the model is stupid...

Identifying and (somewhat) closing the gap

What went wrong ?

Well, first I had bugs in my code (not the latest one - hopefully - and the previous results were disastrous). One could say that it is not an inherent issue of AI. Yes and no. Remember AI are programs written by humans, who also have limitations. Moreover, these programs are usually very difficult to debug and require a level of expertise that you won't find in the street. But fair enough, these are human errors, not an AI limitation per-se.

So the real mistake was that we voluntarily forgot the first principle of machine learning: "Garbage In, Garbage Out". Our model learnt from games against a stupid opponent, playing with no consistency and no strategy. We knew this principle but relied on the high number of games hoping that, in the end, it will learn the basics of the game, like playing the square in the center first. It didn't. That's the simple illustration of the main limitation: the quality of the data.

Another mistake, but with less impact at this stage of our analysis, is the fact that the model only learn from its own moves. What if he plays against a world-class champion ? It would obviously be worth learning from him. That's another limitation: the dataset is not comprehensive, and we miss valuable inputs.

If we dive into the code, we also realize that the importance of a move is weighted by a factor of ( move number / total number of move in the game ). This makes the last move much more important than the first. In such a game, especially when the first move is critical, this design simply doesn't match. And that's another limitation: AI models are not "one size fits all", and it is sometimes necessary to go deep to validate their design.

A last one is that we believed we created a smart Tic-Tac-Toe player. Indeed games statistics against a random algorithm were quite explicit. Return to earth has been quite bumpy, our genius would not beat a 5 years old child... If we cannot really consider this as a limitation, but it remains essential to remember that labs outcome may not fit the real-world.

Fixing the "trainer"

Before any optimization, we need a valid opponent to train our AI. This is what I call the "Trainer". A simple deterministic algorithm that will play the most obvious moves and rely on randomness only when no predefined condition is met.

Our trainer plays the move according to the sequence of moves below

  1. Play a move that makes it possible to win the game
  2. Play a move that prevents the opponent from winning the game when its turn comes
  3. Play the center square if it is free
  4. Play a random free square

This is no rocket science. But still worth a try. First, we train a fresh new AI against this much more valuable opponent. After 150.000 games, our AI playing first, we get the following statistics: 2.500 wins, 124.400 losses, 22.700 ties. The trainer definitely beat us hands down.

[>>>] Games Played: 150000

┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ Player ┃ Type                                  ┃   Wins ┃ Losses ┃  Ties ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ 1      │ Neural Network (Player 1 vs. Trainer) │   2474 │ 124740 │ 22786 │
│ 2      │ Trainer                               │ 124740 │   2474 │ 22786 │
└────────┴───────────────────────────────────────┴────────┴────────┴───────┘

However, results against a random player are much better as we have about 5.000 more wins, mostly taken from the ties.

[>>>] Games Played: 150000

┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃ Player ┃ Type                                  ┃   Wins ┃ Losses ┃ Ties ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│ 1      │ Neural Network (Player 1 vs. Trainer) │ 140604 │    269 │ 9127 │
│ 2      │ Random                                │    269 │ 140604 │ 9127 │
└────────┴───────────────────────────────────────┴────────┴────────┴──────┘

But guess what... again playing against an average IQ human player, the AI still loses 100% of the time.

Learning from the bests

What we are still missing is the ability to learn only from the winning games (and the draws with a lesser impact on the outcome).

If we change our algorithm to learn only from the game of the winner, and train it against a valid opponent (our "Trainer"), we get to even better results when facing a random player: 142.800 wins (+2.000), only 140 losses (-130), and 7.100 ties (-2.100).

[>>>] Games Played: 150000

┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃ Player ┃ Type                                  ┃   Wins ┃ Losses ┃ Ties ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│ 1      │ Neural Network (Player 1 vs. Trainer) │ 142785 │    142 │ 7073 │
│ 2      │ Random                                │    142 │ 142785 │ 7073 │
└────────┴───────────────────────────────────────┴────────┴────────┴──────┘

Sounds good, but again the real-world experiment is far from what we expected: 10 games with the AI playing first, me in second: 0 wins for the AI, 6 draws, 4 losses. And it never played the center square first.

It is better, but still not good. As an example, it never played the center square first! Thinking about it, it is simply that he learnt with the trainer playing second. So it actually never "saw" the best first move.

It is then necessary to have our AI trained again for 150.000 games, this time as the second player.

We get a little bit better results: 0 wins, 3 losses, and 7 draws for the AI. It now plays the center square first but still makes mistakes (and actually missed a winning move).

Maybe with more training games...

Indeed, after 500.000 additional training games (then a total of 650.000) we get a proper world-class level player: 1 win (yes, I made a mistake and he exploited it), 0 loss, 9 draws.

Empowering the AI

Let's explore another option to improve our stupid Artificial Neural Network.

From the beginning we relied on an automated learning mechanism. The theory behind is that after a sufficient number of games, the model will learn the best practices and apply them by itself. That's the theory, and we can intuitively accept that the weights in the neural network will eventually fit perfectly. But how long will it take ? And are we sure that our intuition is right ?

In the meantime, we can simply slightly change our algorithm. We apply the first 3 basic rules we designed for the "Trainer", and if none of these rule applies then we ask our AI to find the best move, instead of choosing randomly. This is what I call deterministic AI (semi-deterministic actually, but anyway).

To evaluate the improvement our AI brings, we need to setup another baseline, defined by the results of 150.000 games played by the Trainer against itself.

[>>>] Games Played: 150000

┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ Player ┃ Type    ┃  Wins ┃ Losses ┃  Ties ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ 1      │ Trainer │ 44999 │   6179 │ 98822 │
│ 2      │ Trainer │  6179 │  44999 │ 98822 │
└────────┴─────────┴───────┴────────┴───────┘

Results when we are playing against the Trainer are quite impressive: 65.200 wins, only 500 losses and 84.300 ties. That's much better than our baseline !

[>>>] Games Played: 150000

┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ Player ┃ Type                                    ┃  Wins ┃ Losses ┃  Ties ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ 1      │ Deterministic AI (Player 1 vs. Trainer) │ 65189 │    490 │ 84321 │
│ 2      │ Trainer                                 │   490 │  65189 │ 84321 │
└────────┴─────────────────────────────────────────┴───────┴────────┴───────┘

By adding a helper (now called agent or expert knowledge), we properly leveraged the power of AI, at least from a theoretical standpoint. Now let's see what are the results when playing against me.

[>>>] Games Played: 8

┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━┳━━━━━━┓
┃ Player ┃ Type                                    ┃ Wins ┃ Losses ┃ Ties ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━╇━━━━━━┩
│ 1      │ Deterministic AI (Player 1 vs. Trainer) │    0 │      0 │    8 │
│ 2      │ Human                                   │    0 │      0 │    8 │
└────────┴─────────────────────────────────────────┴──────┴────────┴──────┘

There we are, 100% ties. I forgot how boring Tic-Tac-Toe is...

Conclusion

Through a trivial example, we illustrated some of the limitations AIs face.

We started with a state-of-the-art Artificial Neural Network with Reinforced Learning. With that many buzzwords, our "AI-Powered Tic-Tac-Toe" could only be the best player in the world, and our simulation clearly showed it was the case. However, real-life experiment definitely proved us wrong.

The improvement of such a basic engine has been a long journey.

First we had to improve the quality and relevance of the data from which it learns. Then go deep into the code we copied/pasted to find tricky parameters that "may" not be relevant in our specific case. Last, we added a deterministic agent to take most of the decisions. Then, and only then, we eventually got to an acceptable Ai-Powered Tic-Tac-Toe player.

The effort, time and expertise necessary to obtain a relevant result properly illustrate the complexity of building an efficient AI engine. It also raises a question: is AI always the best option ?

In our case the answer is obviously "no".

An unbeatable Tic-Tac-Toe player can be created with a deterministic engine that doesn't require training, is easy to debug and only requires basic coding skills.

Some important notes

This is not a research paper !

The purpose of this article is to simply illustrate some of the limitations of AI engines, and how they can be circumvented. Period.

99% of the criteria for a real scientific paper are not met. As an example, there is no trace of the games played (actually I wonder who would really want to dig into the millions of games played to write this article...).

Moreover a lot of data rely on pseudo-random algorithms. During my tests I had some surprisingly weird results. I only kept the data presenting results that occurred the most frequently. Again in an educational purpose, not for science.

Going deeper

AI tuning is a full time job. Available literature explicitly states that AI models are blackboxes, and parameters optimization is mostly empirical. Indeed, our ANN uses a single hidden layer with 10 "neurons", but there is actually no rule to define both the number of layers and neurons per layer. Only a few generic guidelines, but who knows if they fit our model ?

I made the AI-Powered Tic-Tac-Toe available 4 so you can replay the tests in this article, and even go further.

It is possible to tune several parameters: the number of neurons in the hidden layer, the reward factors, the learning rate, provide or not higher weight to last moves, and learning from the winner game.

You are welcome to do your tests and improve the model!

Credits

A big thank you to antirez who has always been a great source of inspiration. Already in 1998 when he designed the IDLE scan technique 5 , and today with his "Tic Tac Toe with Reinforcement Learning".

[1] Wargames - https://en.wikipedia.org/wiki/WarGames [2] Tic Tac Toe with Reinforcement Learning - https://github.com/antirez/ttt-rl [3] Salvatore Sanfilippo aka antirez - http://invece.org/ [4] AI-Powered Tic-Tac-Toe: https://github.com/rbidou/ai-powered-tic-tac-toe [5] New TCP scan method: https://seclists.org/bugtraq/1998/Dec/79
https://www.paracyberbellum.io/blog/article/understanding_ai_limitations_with_a_tic_tac_toe_game Copied to clipboard
Share: