Signalling and reinforcement learning

J McKenzie Alexander
Department of Philosophy
London School of Economics
Brian Skyrms
Logic and Philosophy of Science
University of California - Irvine
Sandy Zabell
Department of Mathematics
Northwestern University

Description

This simulation implements the model discussed in “Inventing New Signals” (forthcoming in Dynamic Games and Applications). It shows how reinforcement learning, combined with a certain model of forgetting, often yields efficient and minimal signalling systems in Lewis sender-receiver games.

Sender-receiver games were first introduced by David Lewis in his book Convention. In economics, it is often known as the “Beer-Quiche” game. In a sender-receiver game, Nature picks a state of the world and reveals it to the Sender. The Sender then sends a signal to another player, known as the Receiver. The Receiver then selects an action to perform. If the correct action is performed, given the state of the world, both players receive a positive payoff; if an incorrect action is performed, neither player receives anything.

Figure 1 below illustrates the simplest interesting sender-receiver game featuring two states of the world, two signals, and two actions.

sender-receiver-game

Figure 1: A sender-receiver game with 2 states and 2 actions.

The simulation here differs from the game illustrated in Figure 1 in that the number of possible signals is not predetermined nor limited in number. Briefly: if there are N states of the world, the Sender possesses N Hoppe-Pólya urns, one urn for each state. Upon receipt of information about the state of the world, the Sender reaches into the appropriate urn and draws a coloured ball. This colour (i.e., a signal) is then broadcast to the Receiver.

In contrast to the Sender, the Receiver begins life with no urns. Upon receipt of a new signal, the Receiver creates a new Pólya urn to be used in “interpreting” that signal. If there are N states of the world, the new Pólya urn is filled with N coloured balls — one ball for each of the possible actions the Receiver can perform. The Receiver then chooses what action to perform by drawing a ball from the appropriate urn.

If the correct action is performed, given the state of the world, both the Sender and Receiver reinforce their urns by adding one ball of the colour drawn. If an incorrect action was performed, no reinforcement (or deinforcement) occurs. (Sampling from urns is always done with replacement.)

Given this basic model, it can be proven that the number of signals — even for a simple three-state, three-action problem — will grow arbitrarily large. But most of these signals will not be used. It seems that the Sender and the Receiver ought to be able to find an efficient and minimal signalling system — that is, one using no more signals than necessary. It turns out that this is possible, in many cases, if we allow forgetting of the right kind.

Types of forgetting

We consider two possible ways of deinforcement, or forgetting. The first, which we call “Forgetting A” consists of the Sender selecting an urn at random, drawing a coloured ball, and throwing it away. The second, which we call “Forgetting B”, consists of the Sender selecting an urn at random, selecting a colour at random, and then throwing away one ball of that colour from that urn.

The key difference between these methods of forgetting can easily be seen: the colour of the ball selected by Forgetting A is most likely to be the colour which has been reinforced the most. So colours which have very little representation in the urn are unlikely to be selected — yet this is exactly what we need to do in order to prune rarely used signals. On the other hand, Forgetting B selects a colour independent of its representation in the urn, so a seldomly used colour is just as likely to be selected as a frequently used colour.

Under Forgetting B, efficient and minimal signalling systems are often obtained.

The Hoppe-Pólya urn

The Hoppe-Pólya urn begins with just a black ball. When the black ball is drawn, the Sender selects a new colour — not already present in the urn — to use. (There is a minor technicality regarding this point which is discussed in the paper; it concerns what happens if the Sender selects a new colour and yet the signalling attempt is unsuccessful. Essentially, this new colour is discarded and never used again.)

In all of the preceding discussion in this section, reference to a “coloured ball” refers to a non-black ball. Signal invention occurs solely from the point of view of the Sender: the Receiver only responds to receipt of a colour.

How to use it

Click "Setup" to initialise the model. Clicking "Step" moves the model forward one iteration, whereas clicking “Go” causes the model to run forward indefinitely until the “Go” button is clicked again.

Many of the parameters are largely self-explanatory: such as the slider to control the number of states of the world and the rate of forgetting (the real number is interpreted as a probability). Forgetting A and B are as described above.

It is possible to skew the probability distribution for the states of the world. The input box labeled “world-state-weights” contains a list (indicated by the brackets) of weights. These weights are normalised into probabilities (i.e., weights of [1 1 1] are the same as [2 2 2]). If equiprobable-states? is set to On, the variable world-state-weights is ignored and equal probabilities are used by default — which is useful if you want to vary the number of possible states from run to run, because then you don't have to manually fiddle with changing the weights to suitable values.

The display settings appearing on the right-hand side of the simulation control how the urns are drawn on screen. Some experimentation, combined with their names, will reveal their meaning.

How to interpret the display.

Initially, the Sender will have N edges drawn in a fan to the left, one edge for each state of the world. These edges will be labelled 0, 1,…, N-1. The nodes at the other end of these edges indicate the Hoppe-Pólya urn used for that state of the world. Initially, all of these urns have only a single edge drawn, labelled “0”, with a count at the end, initially “1”. The edge label indicates the colour, with 0 representing the black ball. Successively new colours introduced are numbered 1, 2, 3, and so on. The count at the end of the edge is the number of balls in the urn which are of that colour.

Initially, the Receiver has no urns because she hasn't received any signals. After stepping through the model a few times, eventually the Receiver will have an urn. (The Receiver only keeps an urn for a new signal received if the signalling attempt was successful.) The edge leading away from the Receiver will be labelled according to the signal it represents. So all of the Receiver’s urns will be numbered greater than or equal to 1. Unlike the Sender, the Receiver’s urn contains balls representing possible actions, so each urn of the Receiver will contain the same number of colours as the state of the world. (N.B. For the Receiver, “0” represents act 0, which is appropriate for state 0, not the black ball.) But, as for the Sender, the numbers at the end of the edges indicate the number of balls in the Receiver’s urn of that colour.

If show-urn-results? is set to On, the appropriate edges will be coloured red to indicate which urns were used and which ball was drawn from that urn.

System Requirements.

The applet requires Java 5 or higher. Java must be enabled in your browser settings. Mac users must have Mac OS X 10.4 or higher. Windows and Linux users may obtain the latest Java from Sun's Java site.


View/download model file: signalling and reinforcement.nlogo


Procedures.
extensions [ array table ]

breed [ agents agent ]
breed [ urn-nodes urn-node ]
breed [ ball-color-nodes ball-color-node ]
;breed [ arm-labels arm-label ]
;undirected-link-breed [ arm-links arm-link ]
;undirected-link-breed [ arm-label-links arm-label-link ]

globals [
  sender
  receiver
  nature
  history
  world-state
  next-color
  signal-sent
  action-done
  new-signal-tried?
  results-to-display?
]

agents-own [  
  ; An urn is coded as a list of two lists:
  ;  - the first list is the set of ball colours
  ;  - the second list is the number of balls of each colour
  ; [ [...colours...] [...counts...] ]
  urn-array   ; just for sender
  dictionary ; just for receiver
]

; parent = "Sender" or "Receiver"
; id = urn # or ball color #
; kind = 0 for urn link, 1 for ball color link
links-own [
  parent
  id
]

urn-nodes-own [
  reference 
]

ball-color-nodes-own [
  ball-color 
  urn-tag
  agent-type
]

to setup
  clear-all
  set history []
  set results-to-display? false
  set next-color 1
  
  if (equiprobable-states? = false and length state-probabilities != number-of-states)
  [
    beep
    print "You have not specified probabilities for the correct number of states."
    stop    
  ]
  
  create-turtles 1 [
    setxy -3 1
    set label "Sender"
    set size 0 
  ]

  create-turtles 1 [
    setxy 5.2 1
    set label "Receiver"
    set size 0 
  ]
  
  create-agents 1 [ 
    set sender self
    setxy -6 0
    set shape "circle"
    set size player-node-size
    set color blue
    set urn-array (array:from-list (n-values number-of-states [ [[0] [1]] ]))
    
    let i 0
    while [i < number-of-states ] [
      hatch-urn-nodes 1 [
        let urn self
        set reference i
        set color green
        set size 1
        set shape "circle"
        create-link-with sender [
          set parent "Sender"
          set id i ; this is the urn label
          set color white
          set label i 
        ]
        
        hatch-ball-color-nodes 1 [
           set color blue
           set size 0.5
           set shape "circle"
           set label 1
           set agent-type "Sender"
           set ball-color 0
           set urn-tag i
           create-link-with urn [
             set parent (word "Sender-Urn-" i)
             set id 0 ; this is the ball color
             set label 0 
           ] 
        ]
      ]
      set i (i + 1)
    ]
  ]
    
  create-agents 1 [
    set receiver self
    setxy 6 0
    set shape "circle"
    set size player-node-size
    set color blue
    set dictionary table:make
  ]
  
  
  create-turtles 1 [
    set nature self
    setxy 4 -5
    set shape "square"
    set size 0
    set color green
    set label ""
  ]
  
  create-turtles 1 [
     setxy 3 -5
     set size 0
     set label "State of the world:" 
  ]

  update-display
end

to go
  step
end

to step
  set new-signal-tried? false
  set results-to-display? true
  
  ; determine the state of the world and note in on the display
  set world-state select-world-state
  ask nature [ set label world-state ]
  comment (word "World state: " world-state)
  
  set signal-sent get-signal-from-sender
  comment (word "Signal from sender: " signal-sent) 
  
  set action-done get-action-from-receiver signal-sent
  comment (word "Action performed by receiver: " action-done)
  
  ifelse (action-done = world-state) 
  [
    set history (lput 1 history)
    if (length history > 1000)
    [
      set history (but-first history)
    ]
    
    ; reinforce the appropriate urns
    if (new-signal-tried? = true) [
      add-new-ball-color-to-sender-urns signal-sent 
      
      ; must update the urns..
      create-urn-nodes 1 [
        set color green
        set size 1
        set shape "circle"
        set reference signal-sent
        
        create-link-with receiver [
          set parent "Receiver"
          set id signal-sent ; this is the urn label
          set color white
          set label signal-sent 
        ]
        
        ; now create the nodes for the various action types
        let urn self
        let i 0
        while [i < number-of-states] 
        [
          hatch-ball-color-nodes 1 [
            set color blue
            set size 0.5
            set shape "circle"
            set label 1
            set urn-tag signal-sent
            set ball-color i
            set agent-type "Receiver"
            create-link-with urn [
              set parent (word "Receiver-Urn-" signal-sent)
              set id i ; this is the ball color
              set color white
              set label i 
            ]
          ]
          set i (i + 1) 
        ]
      ]
       
      set next-color (next-color + 1)
    ]
    
    reinforce-sender-urn world-state signal-sent
    reinforce-receiver-urn signal-sent action-done
  ]
  [
    ; we didn't succeed in signalling
    set history (lput 0 history)
    if (length history > 1000)
    [
      set history (but-first history)
    ]
    
    if (new-signal-tried? = true) 
    [
       ; This is just done to keep the urns in alignment
       remove-urn-from-receiver signal-sent
    ] 
  ]
  
  if (enable-forgetting? = true and (random-float 1.0 < forgetting-rate)) 
  [
    forget 
  ]
  
  check-for-extinct-signals
  update-display
  update-plot
  tick
end

to update-plot
  plot mean history
end

to check-for-extinct-signals
  let signals-in-use []
  let i 0
  while [i < number-of-states]
  [
    let urn array:item ([urn-array] of sender) i
    set signals-in-use (sentence signals-in-use (item 0 urn))
    set i (i + 1)
  ]
  set signals-in-use (sort (remove-duplicates signals-in-use))
  
  let signals-receiver-responds-to (table:keys ([dictionary] of receiver))
  
  foreach signals-receiver-responds-to 
  [
    if ((member? ? signals-in-use) = false)
    [
      ; kill the nodes in the display
      ask ball-color-nodes with [ agent-type = "Receiver" and urn-tag = ?]
        [ die ]
      
      ask receiver [
        ask link-neighbors with [ reference = ? ]
          [ die ]
          
        table:remove dictionary ?
      ] 
    ]
  ]
end

to remove-urn-from-receiver [ signal ]
  ask receiver [
    table:remove dictionary signal 
  ]
end

to add-new-ball-color-to-sender-urns [ col ]
  ask sender [
    let i 0
    while [ i < number-of-states ] 
    [
      let urn (array:item urn-array i)  
      array:set urn-array i (add-new-ball-color-to-urn  col urn)
      set i (i + 1)
    ]
    
    ask link-neighbors [
      let r reference
      hatch-ball-color-nodes 1 [ 
        set urn-tag r
        set ball-color col
        set agent-type "Sender"
        set label 1
        set color blue
        set size 0.1
        set shape "circle"
        create-link-with myself [
          set parent (word "Sender-Urn-" r)
          set id col ; this is the ball color
          set label col 
        ]  
      ] 
    ]
    
  ]
end

to reinforce-sender-urn [ state signal ]
  ask sender [
    let urn (array:item urn-array state)
    set urn (reinforce-urn signal urn)
    array:set urn-array state urn
  ]
end

to reinforce-receiver-urn [ signal action ]
  ask receiver [
    let urn (table:get dictionary signal)
    table:put dictionary signal (reinforce-urn action urn)
  ]
end

to-report get-action-from-receiver [ signal ]
  let response -1
  
  ask receiver [
    ifelse (table:has-key? dictionary signal = true) 
    [
      ; retrieve the urn and select an action
      let urn table:get dictionary signal
      set response (draw-ball-from-urn urn)
    ]
    [
      ; create a new urn and add it to the table
      let ball-colors (n-values number-of-states [?])
      let ball-counts (n-values number-of-states [1])
      let urn  (list ball-colors ball-counts)
      table:put dictionary signal urn
      set response (draw-ball-from-urn urn)
    ]
  ]
  
  report response
end

to-report get-signal-from-sender 
  let signal-to-send -1
  ask sender [
    let urn (array:item urn-array world-state )
    let ball (draw-ball-from-urn urn)
    ifelse (ball = 0)
      [
        set new-signal-tried? true 
        set signal-to-send next-color 
      ]
      [ set signal-to-send ball ]
  ]
  report signal-to-send
end

; World states can be 0, 1, 2, etc. 
; Indexing the states from 0 is more convenient for list extraction.
to-report select-world-state
  let list-of-states n-values number-of-states [?]
  report select-randomly-by-weight list-of-states state-probabilities
end

to-report state-probabilities
  ifelse (equiprobable-states? = true)
  [
    report n-values number-of-states [ 1 ] 
  ]
  [
    report read-from-string world-state-weights
  ]
end

to update-display
  ask agents [
    set size player-node-size 
  ]
  
  ask ball-color-nodes with [ agent-type = "Sender" ]
  [
    let urn (array:item ([urn-array] of sender) urn-tag)
    set label (number-of-balls-in-urn-of-this-color ball-color urn)
  ]
  
  ask ball-color-nodes with [ agent-type = "Receiver" ]
  [
    let urn (table:get ([dictionary] of receiver) urn-tag)
    set label (number-of-balls-in-urn-of-this-color ball-color urn)
  ]
  
  ; need to include this always in case we flipped off
  ; the show-urn-results? switch
  ask links [
      set color white
      set thickness 0
    ]
    
    
  if (show-urn-results? = true and results-to-display? = true)
  [
    ask links with [ parent = "Sender" and id = world-state ]
    [ 
      set color red
      set thickness 0.5
    ]
    
    ask links with [ parent = (word "Sender-Urn-" world-state) and id = signal-sent ]
    [ 
      set color red
      set thickness 0.5
    ]
    
    ask links with [ parent = "Receiver" and id = signal-sent ]
    [ 
      set color red
      set thickness 0.5
    ]
    
    ask links with [ parent = (word "Receiver-Urn-" signal-sent) and id = action-done ]
    [ 
      set color red
      set thickness 0.5
    ]

  ]
  
  layout-sender-urns
  layout-receiver-urns
end


to forget
  if (forgetting-type = "Forgetting A") 
    [ do-forgetting-A ]
    
  if (forgetting-type = "Forgetting B")
    [ do-forgetting-B ]
end    

to do-forgetting-A
  ask sender [
    ; pick an urn
    let i (random number-of-states) 
    let urn (array:item urn-array i)
    let c draw-ball-from-urn urn

    if (c = 0) 
      [ stop ]
            
    ; Check to see how many balls of that color there are
    ; because, if this is the last one, we need to remove the
    ; corresponding node from the display
    if ((number-of-balls-in-urn-of-this-color c urn) = 1)
    [
      ; kill the node 
      ask ball-color-nodes with [ agent-type = "Sender" and urn-tag = i and ball-color = c ]
        [ die ]
    ]
    
    array:set urn-array i (deinforce-urn c urn)
  ]
end

to do-forgetting-B
  ask sender [
    ; pick an urn
    let i (random number-of-states) 
    let urn (array:item urn-array i)
    
    ; pick a color
    let ball-colors (item 0 urn)
    let c (item (random (length ball-colors)) ball-colors)

    if (c = 0) 
      [ stop ]
            
    ; Check to see how many balls of that color there are
    ; because, if this is the last one, we need to remove the
    ; corresponding node from the display
    if ((number-of-balls-in-urn-of-this-color c urn) = 1)
    [
      ; kill the node 
      ask ball-color-nodes with [ agent-type = "Sender" and urn-tag = i and ball-color = c ]
        [ die ]
    ]
    
    array:set urn-array i (deinforce-urn c urn)
  ]
end

;to discount-the-past [ rate ]
;  ask researchers [
;    let ball-colors (item 0 urn)
;    comment (word "Ball colors: " ball-colors)
;    
;    let ball-counts (item 1 urn)
;    comment (word "Ball counts: " ball-counts)
;    
;    let new-ball-counts map [ ? * rate ] ball-counts
;    
;    ; reset the count of the black ball to 1, if it exists
;    let pos (position 0 ball-colors)
;    if (pos != false) 
;    [
;      set ball-counts (replace-item pos new-ball-counts 1)
;    ]
;    set urn (list ball-colors ball-counts)
;    
;    ; now prune the urn of colors below a certain threshold
;    foreach ball-colors [
;      if ( (number-of-balls-in-urn-of-this-color ?) / total-number-of-balls-in-urn < 0.000001)
;      [
;         remove-color-from-urn ?
;         remove-arm-from-bandit ?
;         ask arm-link-neighbors with [ arm-colour = ? ] [ die ]
;      ]
;    ]
;  ]
;end



; observer context
to-report select-randomly-by-weight [ lst weights ]
  let total-weight (sum weights)
  let r (random-float total-weight) 
  let list-position 0
  let item-selected false
  
  while [item-selected = false] [
    let current-weight (item list-position weights)

    ifelse r > current-weight
    [
      set r (r - current-weight)
      set list-position (list-position + 1)
    ]
    [ set item-selected true]
  ]
  
  report item list-position lst
end


; researcher context
to-report total-number-of-balls-in-urn [ urn ]
  report sum (item 1 urn)
end

; researcher context
to-report number-of-balls-in-urn-of-this-color [ c urn ]
  let ball-colors (item 0 urn)
  let ball-counts (item 1 urn)
  let pos (position c ball-colors)
  ifelse (pos = false)
    [ report 0 ]
    [ report item pos ball-counts]
end

; researcher context
to-report number-of-colors-in-urn [ urn ]
  report length (item 0 urn)
end

; researcher context
to-report ball-color-probability [ col urn ]
  report (number-of-balls-in-urn-of-this-color col urn) / ( total-number-of-balls-in-urn urn )
end

; the structure of an urn is [ [ ...colours...] [ ...numbers...] ]
; researcher context
to-report draw-ball-from-urn [ urn ]
  let ball-colors (item 0 urn)
  let ball-counts (item 1 urn)
  let number-of-balls (sum ball-counts )
  let r (random-float number-of-balls)
  let list-position 0
  let ball-selected false
  
  while [ball-selected = false] [
    let current-ball-count (item list-position ball-counts)
    ifelse r < current-ball-count
    [
      set ball-selected true 
    ]
    [
      set r (r - current-ball-count)
      set list-position (list-position + 1)
    ]
  ]
  
  report item list-position ball-colors
end

to-report add-new-ball-color-to-urn [ col urn ]
  let ball-colors (item 0 urn)
  let ball-counts (item 1 urn)
  set ball-colors (lput col ball-colors)
  set ball-counts (lput 1 ball-counts)
  report (list ball-colors ball-counts)
end

; the structure of an urn is [ [ ...colours...] [ ...numbers...] ]
to-report reinforce-urn [ col urn ]
  let ball-colors (item 0 urn)
  let ball-counts (item 1 urn)
  let list-position (position col ball-colors)
  let cnt (item list-position ball-counts)
  set ball-counts (replace-item list-position ball-counts (cnt + 1))
  report (list ball-colors ball-counts)
end

; the structure of an urn is [ [ ...colours...] [ ...numbers...] ]
to-report deinforce-urn [ col urn ]
  let ball-colors (item 0 urn)
  let ball-counts (item 1 urn)
  let list-position (position col ball-colors)
  let cnt (item list-position ball-counts)
  ifelse (cnt > 1) 
  [
    set ball-counts (replace-item list-position ball-counts (cnt - 1))
    report (list ball-colors ball-counts)
  ]
  [
    set ball-colors (remove-item list-position ball-colors)
    set ball-counts (remove-item list-position ball-counts)
    report (list ball-colors ball-counts)
  ]
end

to-report remove-color-from-urn [ col urn ]
  let ball-colors (item 0 urn)
  let ball-counts (item 1 urn)
  let pos (position col ball-colors)
  set ball-colors (remove-item pos ball-colors)
  set ball-counts (remove-item pos ball-counts)
  report (list ball-colors ball-counts)
end

to layout-sender-urns
  ask sender [
    let x xcor
    let y ycor
    let i 0
    let delta sender-urn-cone-angle / (number-of-states - 1)
    
    while [ i < number-of-states ] 
    [
      ask link-neighbors with [ reference = i ] 
      [
        ; this positions the node represent the urn
        setxy x y
        set heading -90 - (sender-urn-cone-angle / 2)
        set heading (heading + delta * i)
        fd length-of-links
        
        ; the following positions the nodes representing the contents of the urn
        let new-x xcor
        let new-y ycor
        let default-heading heading
        let s (count link-neighbors with [breed = ball-color-nodes])
        let new-delta 0
        if (s > 1)
        [ 
          set new-delta sender-urn-contents-cone-angle / (s - 1)
        ]
        let j 0
        
        foreach (sort link-neighbors with [ breed = ball-color-nodes ] )
        [
          ask ? [
            setxy new-x new-y
            ifelse (s > 1)
            [
              set heading default-heading - (sender-urn-contents-cone-angle / 2)
              set heading (heading + new-delta * j)
              fd length-of-links
            ]
            [
              set heading default-heading 
              fd length-of-links
            ]
            set j (j + 1)
          ]
        ]
      ] 
      set i (i + 1)
    ]
  ]
end

to layout-receiver-urns
  ask receiver [
    let keys (sort table:keys dictionary)
    let x xcor
    let y ycor
    let i 0
    if (length keys) = 0 
      [ stop ]
      
    let delta 0
    if (length keys) > 1 
    [ 
      set delta (receiver-urn-cone-angle / ((length keys) - 1)) 
    ]
    
    while [ i < (length keys) ] 
    [
      ask link-neighbors with [ reference = (item i keys) ] 
      [
        setxy x y
        ifelse (length keys) = 1
        [
          set heading 90
        ]
        [
          set heading 90 + (receiver-urn-cone-angle / 2)
          set heading (heading - delta * i)
        ]
        fd length-of-links
        
        ; the following positions the nodes representing the contents of the urn
        let new-x xcor
        let new-y ycor
        let default-heading heading
        let new-delta receiver-urn-contents-cone-angle / (number-of-states - 1)
        
        let j 0
        
        foreach (sort link-neighbors with [ breed = ball-color-nodes ] )
        [
          ask ? [
            setxy new-x new-y
            set heading default-heading + (receiver-urn-contents-cone-angle / 2)
            set heading (heading - new-delta * j)
            fd length-of-links
            set j (j + 1)
          ]
        ]
      ] 
      set i (i + 1)
    ]
  ]
end

;; some helper reporters


to comment [ string ]
  if show-console-comments?
    [ print string ]
end

to-report trim-string [ string chars ]
  let len ((length string) - 1)
  report substring string 0 (len - (chars - 1))  
end

to-report probability-to-greyscale [ p ]
  let g (round (255 * p))
  report (list g g g )
end

to-report probability-to-red-green-mix [ p ]
  let r (round (255 * (1 - p)))
  let g (round (255 * p))
  report (list r g 0 )
end