This simulation implements the model discussed in “Inventing New Signals” (forthcoming in Dynamic Games and Applications). It shows how reinforcement learning, combined with a certain model of forgetting, often yields efficient and minimal signalling systems in Lewis sender-receiver games.
Sender-receiver games were first introduced by David Lewis in his book Convention. In economics, it is often known as the “Beer-Quiche” game. In a sender-receiver game, Nature picks a state of the world and reveals it to the Sender. The Sender then sends a signal to another player, known as the Receiver. The Receiver then selects an action to perform. If the correct action is performed, given the state of the world, both players receive a positive payoff; if an incorrect action is performed, neither player receives anything.
Figure 1 below illustrates the simplest interesting sender-receiver game featuring two states of the world, two signals, and two actions.
Figure 1: A sender-receiver game with 2 states and 2 actions.
The simulation here differs from the game illustrated in Figure 1 in that the number of possible signals is not predetermined nor limited in number. Briefly: if there are N states of the world, the Sender possesses N Hoppe-Pólya urns, one urn for each state. Upon receipt of information about the state of the world, the Sender reaches into the appropriate urn and draws a coloured ball. This colour (i.e., a signal) is then broadcast to the Receiver.
In contrast to the Sender, the Receiver begins life with no urns. Upon receipt of a new signal, the Receiver creates a new Pólya urn to be used in “interpreting” that signal. If there are N states of the world, the new Pólya urn is filled with N coloured balls — one ball for each of the possible actions the Receiver can perform. The Receiver then chooses what action to perform by drawing a ball from the appropriate urn.
If the correct action is performed, given the state of the world, both the Sender and Receiver reinforce their urns by adding one ball of the colour drawn. If an incorrect action was performed, no reinforcement (or deinforcement) occurs. (Sampling from urns is always done with replacement.)
Given this basic model, it can be proven that the number of signals — even for a simple three-state, three-action problem — will grow arbitrarily large. But most of these signals will not be used. It seems that the Sender and the Receiver ought to be able to find an efficient and minimal signalling system — that is, one using no more signals than necessary. It turns out that this is possible, in many cases, if we allow forgetting of the right kind.
We consider two possible ways of deinforcement, or forgetting. The first, which we call “Forgetting A” consists of the Sender selecting an urn at random, drawing a coloured ball, and throwing it away. The second, which we call “Forgetting B”, consists of the Sender selecting an urn at random, selecting a colour at random, and then throwing away one ball of that colour from that urn.
The key difference between these methods of forgetting can easily be seen: the colour of the ball selected by Forgetting A is most likely to be the colour which has been reinforced the most. So colours which have very little representation in the urn are unlikely to be selected — yet this is exactly what we need to do in order to prune rarely used signals. On the other hand, Forgetting B selects a colour independent of its representation in the urn, so a seldomly used colour is just as likely to be selected as a frequently used colour.
Under Forgetting B, efficient and minimal signalling systems are often obtained.
The Hoppe-Pólya urn begins with just a black ball. When the black ball is drawn, the Sender selects a new colour — not already present in the urn — to use. (There is a minor technicality regarding this point which is discussed in the paper; it concerns what happens if the Sender selects a new colour and yet the signalling attempt is unsuccessful. Essentially, this new colour is discarded and never used again.)
In all of the preceding discussion in this section, reference to a “coloured ball” refers to a non-black ball. Signal invention occurs solely from the point of view of the Sender: the Receiver only responds to receipt of a colour.