Social Reinforcement and the Search for Successful Theories

Dr J McKenzie Alexander

Department of Philosophy, Logic and Scientific Method
London School of Economics and Political Science

Description

A multi-arm bandit is a slot machine containing N arms, each of which has a fixed (and different) probability of winning. If each arm pays off the same amount when it wins, the learning problem presented by a multi-arm bandit concerns the tradeoff between exploration and exploitation. A learning rule which is too hot may lock on to a suboptimal arm simply because of a lucky winning streak that occurs early on in the exploration phase. A learning rule which is too cold may never settle down to playing only the highest probability arm because it continues to explore.

If we interpret "winning" as "making a successful prediction", a multi-arm bandit may be thought of as a formal model of choosing between competing scientific theories. After all, we don't really know whether any of our scientific theories are really true - all we know is the extent to which they enable us to make correct predictions about the future.

Zollman (2007) uses multi-arm bandits to model competing scientific research programmes. Whereas Zollman considers multi-arm bandits having a fixed number of arms (representing, for example, the set of theories which are currently "on the table"), this model uses variable-arm bandits to model the search for new scientific theories which are potentially more successful than our current theories.

A variable-arm bandit is a slot machine which may have any (countable) number of arms, between 0 and infinity. In each iteration of play, a scientist may take one of the following actions:

  1. Attach a new arm to the bandit. (We assume a newly attached arm is randomly assigned a probability of winning. This probability is held constant the entire time the arm is attached to the bandit.)
  2. Remove an arm from the bandit.
  3. Pull a selected arm.

In the case of variable-arm bandits, the learning problem now concerns the tradeoff between exploitation, exploration and innovation. The question is, can boundedly rational scientists eventually work their way, by attaching arms and exploring their success, to ultimately settling on pulling an arm that has a probability of winning arbitrarily close to 1? This would correspond to settling on a scientific theory which always (or almost always) makes successful predictions.

Here, we assume that scientific researchers are boundedly rational and learn via reinforcement learning. The reinforcement learning occurs via an urn model, specifically a Hoppe-Polya urn. This is an urn which begins with a single black ball inside. Now suppose a scientist draws a ball from the urn.

  1. If the black ball is drawn, the scientist attaches a new arm to the bandit and pulls it. If she wins, he colour-codes the arm (with a unique colour) and adds a ball of that colour to the urn. If she does not win, she removes the arm and throws it away. In all cases, the black ball is returned to the urn as well.
  2. If a coloured ball is drawn, the scientist pulls the arm of the bandit with the corresponding colour. If she wins, she reinforces by adding another ball of the same colour to the urn. (The ball drawn is always returned to the urn regardless of whether of whether the pull results in a win.)

That's the basic model. Now, one can prove that, in the limit, the standard Hoppe-Polya urn will eventually end up containing infinitely many colours. Hence, the scientist will, in the limit, attempt to attach infinitely many arms to the bandit. Consider now the following questions: will such a scientist end up learning to play an arm with a probability of winning arbitrarily close to one? If so, is it possible for social dynamics to help a population of scientific researchers achieve consensus on an optimal theory (or at least a better theory) more quickly than individuals working in isolation may do so?

How to use it.

Click on "Setup" to initialise the model according to the settings indicated. Once the model has been initialised, clicking on "Go" will cause the model to run forever until the "Go" button is clicked again.

The variables which can be configured fall into two categories: model parameters (which crucially determine what the model does) and display parameters (which determine how the model looks onscreen).

Model parameters.

Parameters not discussed below are not properly integrated into the current version.

Parameter Description
number-of-researchers The number of agents used in the model. This number will remain constant unless dynamic-population? is switched on.
enable-forgetting? If on, agents will deinforce what they have previously learned according to the method of forgetting selected by forgetting-type.
forgetting-type Can be set to one of two values: Discrete or Discount the past:

If discrete forgetting has been selected, the variable forgetting-rate indicates the frequency with which agents will remove a ball from the urn (thereby deinforcing what they have previously learnt). A ball is selected as follows: first a colour (other than black) is chosen, then a ball of that colour is discarded. This deinforcement process was chosen because it proves particularly successful at achieving efficient signalling systems in Lewis sender-receiver games.

If discounting the past has been selected, the interpretation of forgetting-rate changes. In this case, it stands for the discount factor applied to the weight attached to each colour in the urn, at the end of each iteration. Of course, weights attached to colours are no longer restricted to integral values in this case.

allow-social-learning? If on, agents will engage in social learning of the type indicated by social-learning-kind.
social-learning-kind Can be set to one of three possible values: Random Visitation, Preferential Attachment or Moran Process.

If random visitations are selected, agent select a person at random with the frequency specified by social-visitation-rate. They then reach into that agent's urn and sample a ball at random. They then add a ball of the corresponding colour to their urn and, if needed, attach the appropriate arm to their bandit (i.e., one having the same - unknown - probability of winning as that of the person they visited). Visitations are shown using arrows for five generations; the arrow fades gradually over time until it eventually disappears. This provides short a visual record of the visitation history without the screen becoming too cluttered.

If preferential attachment is selected, agents choose to connect to an agent with probability proportional to that agent's efficiency of making correct predictions. The efficiency of an agent is defined to be the number of non-black balls in their urn, divided by the number of iterations that agent has been playing. This basically measures how successful that agent has been over their lifespan.

If a Moran process is selected, then each iteration an agent is selected with a probability proportional to his efficiency, as defined above, and cloned. Once cloned, an agent is selected at random (using a uniform probability distribution) to be eliminated. This simple evolutionary dynamic keeps the population size constant.

distribution Can be set to either Exponential or Gamma.

The shape of the Gamma distribution is determined by two parameters: alpha and beta. Examples of different Gamma distributions are shown below. Increasing alpha shifts the probability mass to the right, increasing the likelihood that a new arm added to the bandit will have a high probability of winning.

The shape of the exponential distribution is controlled by a single parameter lambda. Higher values of lambda increase the heaviness of the tail, as shown below.

dynamic-population? If on, allows the population size to vary according to the settings specified in mortality-rate and new-researcher-rate. Notice that if the mortality rate is greater than the rate at which new researchers are introduced into the population, then the population will almost certainly go extinct. This may even happen if the morality rate is set equal to the new researcher rate. Perhaps somewhat oddly, if the population does go extinct, new researchers can still be introduced (as they are generated ex nihilo rather than being cloned).
mortality-rate The frequency with which a single researcher is selected from the population to be eliminated.
new-researcher-rate The frequency with which a new researcher is inserted into the population.

Display parameters.

Parameter Description
show-id? If on, shows the id number of each agent in the model. The id number can then be entered into the input box labeled agent-monitor, which will then cause the Urn monitor and Bandit monitor to display details of both the urn and bandit for that agent.
show-bandits? If on, tells the model to draw the current state of the bandit attached to each agent. Agents are displayed as blue nodes and the bandit is the attached "fan" of nodes connected to the agent. Because the initial state of the bandit is to have 0 arms, immediately after clicking "setup" all you will see are blue nodes.

Each arm of the bandit is drawn as a node coloured on a continuum between red and green. Red represents probability 0 of winning and green represents probability 1 of winning.

show-arm-probabilities? If on, labels each bandit arm node with the probability of winning. Not terribly useful once the bandit acquires more than a few arms.
draw-arm-links-in-greyscale? If on, draws the edges connecting the nodes representing bandit arms using the following convention to indicate the probability of that arm being pulled: black (and, hence, invisible) indicates probability 0, white indicates probability 1, and greyscale all probabilities in between.
show-social-visitation-links? If on, draws edges to indicate which social visits / learning has occurred. (Switching this option off slightly speeds up the model.)
researcher-node-size Specifies how large to draw the blue node representing a researcher.
bandit-arm-node-size Specifies how large to draw the nodes representing bandit arms.
bandit-arm-length Specifies how long to draw the edges connecting the arm nodes to the researcher.
bandit-arm-cone-angle Specifies how wide to make the "fan" representing the variable-arm bandit.
network-radius If a circular node layout is used (rather than a spring layout), this specifies how large of a radius to use.
spring-length The spring layout effect needs to know the "natural length" of an edge connecting two nodes. This is the value used.
network-layout Specifies how large to draw the blue node representing a researcher.
System Requirements.

The applet requires Java 5 or higher. Java must be enabled in your browser settings. Mac users must have Mac OS X 10.4 or higher. Windows and Linux users may obtain the latest Java from Sun's Java site.


View/download model file: scientific-research-programmes.nlogo


Procedures
breed [ researchers researcher ]
breed [ arms arm ]
breed [ arm-labels arm-label ]
undirected-link-breed [ arm-links arm-link ]
undirected-link-breed [ arm-label-links arm-label-link ]
directed-link-breed [ social-links social-link ]

globals [
  next-ball  
]

researchers-own [
  age 
  previous-expected-accuracy
  expected-accuracy
  
  last-arm-pulled
  success?
  
  ; An urn is coded as a list of two lists:
  ;  - the first list is the set of ball colours
  ;  - the second list is the number of balls of each colour
  ; [ [...colours...] [...counts...] ]
  urn
  
  ; A bandit is coded as a list of two lists:
  ;  - the first list is the set of arm colours
  ;  - the second list is the winning probability
  ; [ [...colours...] [...probabilities...] ]
  bandit 
]

arm-links-own [
  link-colour  
]

arms-own [
  arm-colour 
  arm-probability
]

social-links-own [
  ticks-left-to-display 
]


to setup
  clear-all
  set next-ball 1
  
  ifelse ideologues? = false
  [
    create-researchers number-of-researchers [ 
      set success? false
      set shape "circle"
      set size researcher-node-size
      set color blue
      set urn [[0] [1]]
      set bandit []
      set age 1
      set previous-expected-accuracy 0
      set expected-accuracy 0
    ]
  ]
  [
    create-researchers number-of-researchers [ 
      set success? false
      set shape "circle"
      set size researcher-node-size
      set color blue
      ;set urn (list (list next-ball) [ 1 ] )
      ;set bandit (list (list next-ball) (list new-arm-probability ) )
      set next-ball (next-ball + 1)
      set age 1
      set previous-expected-accuracy 0
      set expected-accuracy 0
    ]
    ask researchers [
      add-arm-with-colour-and-probability next-ball new-arm-probability
      set next-ball (next-ball + 1)
    ]
  ]

  update-display
  
  ask researchers [
    update-expected-accuracy
  ]
end

to go
  step
end

to step
  if dynamic-population?
  [
    if (random-float 1.0 < mortality-rate)
      [ kill-one-researcher ]
      
    if (random-float 1.0 < new-researcher-rate)
      [ add-new-researcher ]
  ]
  
  if (allow-mutation? = true)
  [
    if (random-float 1.0 < mutation-rate) 
    [
      kill-one-researcher
      add-new-researcher 
    ] 
  ]
  
  experiment
      
  if allow-social-learning?
    [ do-social-learning ]
  
  if enable-forgetting? 
    [ forget ]
  
  update-display
  
  ; increment the researchers age by one
  ask researchers [
    update-expected-accuracy
    set age (age + 1)
  ]
    
  draw-plots
  tick
end


; In a Moran process, we select 1 agent by their fitness (here
; we use the researcher-efficiency as a proxy) and clone that agent,
; removing another one from the population
to do-moran-process
  let sorted-researchers (sort researchers)
  let fitness-list (map [ [researcher-efficiency] of ? ] sorted-researchers)
  let researcher-to-clone (select-randomly-by-weight sorted-researchers fitness-list)
  
  comment (word "Researcher to clone: " researcher-to-clone)
  
  let new-researcher (clone-researcher researcher-to-clone)
  
  ask new-researcher [
    layout-arms-for-individual-researcher 
  ]
  
  kill-one-researcher 
end


; the following procedure creates a clone of the specified researcher.
; It does nothing regarding the layout of the individual nodes, though.
to-report clone-researcher [ researcher-to-clone ]
  ; clone the urn and bandit
  let new-researcher 0
  
  create-researchers 1 [
    set new-researcher self
    set xcor 0
    set ycor 0.1
    set label who
    
    comment (word "New researcher id: " self)
    
    set success? ([success?] of researcher-to-clone)
    set shape "circle"
    set size researcher-node-size
    set color ([color] of researcher-to-clone)
    set urn ([urn] of researcher-to-clone)
    set bandit ([bandit] of researcher-to-clone)
    set age 1
    set expected-accuracy ([expected-accuracy] of researcher-to-clone)
  ]
  
  ; create the required arms and links
  ask researcher-to-clone [
    
    let arm-links-to-clone ( sort my-arm-links) 
    
    comment (word "Number of arms to copy: " length arm-links-to-clone)
    
    foreach arm-links-to-clone [
      let arm [other-end] of ?
      comment (word "Arm colour: " [arm-colour] of arm)
      comment (word "Arm probability: " [arm-probability] of arm)
      
      hatch-arms 1 [ 
        set arm-colour ([arm-colour] of arm)
        set arm-probability ([arm-probability] of arm)
        set color ([color] of arm)
        set shape ([shape] of arm)
        set size ([size] of arm)
        set label ([label] of arm)
        
        create-arm-link-with new-researcher [
          comment (word "Link colour: " [color] of ? )
          set link-colour [link-colour] of ?
          set color [color] of ? 
          ifelse show-bandits? 
            [ show-link ]
            [ hide-link ]
        ]
      ] 
    ]
  ]
  
  report new-researcher
end

to kill-one-researcher
  if (count researchers >= 1) 
  [
    ask one-of researchers [
      ask my-arm-links [
        ask other-end [ die ]
        die
      ]
      die
    ]
  ]
end

to update-display
  ifelse show-bandits?
    [ 
      ask arms [ 
        set size bandit-arm-node-size
        show-turtle 
      ]
      ask arm-links [ show-link ] 
    ]
    [ 
      ask arms [ hide-turtle ] 
      ask arm-links [ hide-link ]
    ]
  
  ifelse show-social-visitation-links?
    [ ask social-links [ show-link ] ]
    [ ask social-links [ hide-link ] ]
    
  layout-researchers
  update-labels
  update-arm-link-colors
  update-social-link-colors
  layout-arms
end

to layout-researchers
  ifelse (count researchers = 1)
  [ 
    ask researchers [
      set xcor 0
      set ycor 0.1
      set size researcher-node-size
    ]
    stop
  ]
  [
    ask researchers [ set size researcher-node-size ] 
  ]
  
  if network-layout = "Ring"
    [ layout-circle (sort researchers) network-radius ]
    
  ifelse network-layout = "Spring layout" and ticks > 0
    [ layout-spring researchers social-links 0.2 spring-length 1 ]
    [ layout-circle (sort researchers) network-radius ]
end

; The following is a useful function to have because sometimes we want to lay
; out the arms for a single agent, rather than the entire population...
; researcher context
to layout-arms-for-individual-researcher
  let arm-list (sort arm-link-neighbors)  ;([self] of link-neighbors )
    let n (length arm-list)
    let theta (atan xcor ycor)
    let delta 0
    if n > 1 [
      set delta (bandit-arm-cone-angle / (n - 1))
    ]
    let i 0
    
    foreach arm-list [
       ask ? [
         move-to myself
         set heading theta
         if n > 1 
         [
           lt bandit-arm-cone-angle / 2
           rt (delta * i)
         ]
         fd bandit-arm-length 
         set i (i + 1)       
       ]
    ]
end

to layout-arms 
  ask researchers [
    layout-arms-for-individual-researcher
  ]
end

to draw-plots
  if population-accuracy-plot = "Aggregate" [ 
    let total 0
    ask researchers [
      set total (total + expected-accuracy) 
    ]
    set-current-plot "Population accuracy"
    set-plot-pen-mode 0

    ifelse (count researchers != 0)
      [ plot total / (count researchers) ]
      [ plot 0.0 ]
  ]
  
  if population-accuracy-plot = "Individuals" [ 
    set-current-plot "Population accuracy"
    set-plot-pen-mode 0
    
    ask researchers [
      plot-pen-up
      plotxy (ticks - 1) previous-expected-accuracy
      plot-pen-down
      plotxy ticks expected-accuracy
    ]
  ]
  
  if population-accuracy-plot = "Histogram" [
    set-current-plot "Population accuracy"
    set-plot-pen-mode 1
    set-plot-x-range 0 1
    set-histogram-num-bars 20
    histogram [expected-accuracy] of researchers 
  ]
end

to update-labels
  ask researchers [
    ifelse show-id? = true
      [ set label who ]
      [ set label "" ]
  ]

  ask arm-labels [ die ]
  
  ask arms [
    ifelse show-arm-probabilities? = true
    [ 
      let prob (word "" arm-probability)
      if (length prob > 4)
        [ set prob (substring prob 0 4) ]
      
      hatch-arm-labels 1 [
        set label prob
        set heading 0
        set size 0
        fd 1
        create-arm-label-link-with myself [
          set color black
          tie 
        ]
      ]      
    ]
    [
      set label ""
    ]
  ]    
end

to update-arm-link-colors
  ask researchers [
    let tot total-number-of-balls-in-urn
    let cnt 0
    
    ask my-arm-links [
      let col link-colour
      ask myself [
        set cnt ( number-of-balls-in-urn-of-this-color col )
      ]
      ifelse draw-arm-links-in-greyscale? 
        [ set color (probability-to-greyscale (cnt / tot)) ]
        [ set color (probability-to-red-green-mix (cnt / tot)) ]
    ] 
  ]
end

to update-social-link-colors

  if social-learning-kind = "Random visitation" 
  [ 
    ask social-links [
      set ticks-left-to-display (ticks-left-to-display - 1)
      ifelse (ticks-left-to-display = 0)
        [ die ] 
        [
          let g (0.2 * ticks-left-to-display)
          set color (probability-to-greyscale g)
        ]
    ]
  ]
end

to add-new-researcher
  ifelse ideologues? = false
  [
    create-researchers 1 [
      set shape "circle"
      set size researcher-node-size
      set color blue
      set xcor 0.01
      set ycor 0.01
      set urn [[0] [1]]
      set bandit []
      set age 1
    ]
  ]
  [
    let new-researcher 0
    create-researchers 1 [
      set new-researcher self 
      set success? false
      set shape "circle"
      set size researcher-node-size
      set color blue
      set age 1
      set expected-accuracy 0
    ]
    ask new-researcher [
      add-arm-with-colour-and-probability next-ball new-arm-probability
      set next-ball (next-ball + 1)
    ]
  ]
end

to experiment
  ask researchers [
    let ball draw-ball-from-urn
    ifelse (ball = 0) [
      possibly-add-arm
    ]
    [
      pull-bandit ball
    ]
  ]
end

; researcher context
to possibly-add-arm
  let prob new-arm-probability
  ifelse (random-float 1.0 <= prob) 
  [
    ; Success! so add the arm
    add-arm-with-colour-and-probability next-ball prob
    set last-arm-pulled next-ball
    set success? true
    set next-ball (next-ball + 1)
  ]
  [
    set last-arm-pulled next-ball
    set success? false 
  ]
end

; researcher context FIX THIS!!
to add-arm-with-colour-and-probability [ col prob ]
  ifelse ideologues? = false
  [
    ifelse (length bandit) > 0
    [
      set urn (list (lput col (item 0 urn)) (lput 1 (item 1 urn) ))
      set bandit (list (lput col (item 0 bandit)) (lput prob (item 1 bandit)))
    ]
    [
      set urn (list (lput col (item 0 urn)) (lput 1 (item 1 urn) ))
      set bandit (list (list col) (list prob))
    ]
  ]
  [
    ; ideologues only have 1 theory
    set urn (list (list col ) (list 1 ))
    set bandit (list (list col) (list prob ) )
  ]
    
  hatch-arms 1 [
    set arm-colour col
    set arm-probability prob
    set shape "circle"
    set color (probability-to-red-green-mix prob)
    set label ""
    set size bandit-arm-node-size
    
    ifelse show-bandits?
      [ show-turtle ]
      [ hide-turtle ]
    
    create-arm-link-with myself [
      set link-colour col 
      ifelse show-bandits? 
        [ show-link ]
        [ hide-link ]
    ]
  ]
end

; researcher context
to forget
  if (forgetting-type = "Discrete") 
    [ forget-discretely ]
  
  if (forgetting-type = "Discount the past" )
    [ discount-the-past forgetting-rate ]
end

to forget-discretely
  ask researchers [
    ; pick a ball color at random if there's more than 1
    if (random-float 1.0 < forgetting-rate) [
      if number-of-colors-in-urn > 1 
      [
        let color-to-forget (item ((random (number-of-colors-in-urn - 1)) + 1) (item 0 urn))
        
        ifelse (number-of-balls-in-urn-of-this-color color-to-forget) > 1
        [
          deinforce-urn color-to-forget
        ]
        [
         ; we need to remove the arm from the bandit, too
          deinforce-urn color-to-forget
          remove-arm-from-bandit color-to-forget
          ask arm-link-neighbors with [ arm-colour = color-to-forget ] [ die ]
        ]
      ]
    ]
  ]
end

to discount-the-past [ rate ]
  ask researchers [
    let ball-colors (item 0 urn)
    comment (word "Ball colors: " ball-colors)
    
    let ball-counts (item 1 urn)
    comment (word "Ball counts: " ball-counts)
    
    let new-ball-counts map [ ? * rate ] ball-counts
    
    ; reset the count of the black ball to 1, if it exists
    let pos (position 0 ball-colors)
    if (pos != false) 
    [
      set ball-counts (replace-item pos new-ball-counts 1)
    ]
    set urn (list ball-colors ball-counts)
    
    ; now prune the urn of colors below a certain threshold
    foreach ball-colors [
      if ( (number-of-balls-in-urn-of-this-color ?) / total-number-of-balls-in-urn < 0.000001)
      [
         remove-color-from-urn ?
         remove-arm-from-bandit ?
         ask arm-link-neighbors with [ arm-colour = ? ] [ die ]
      ]
    ]
  ]
end

to do-social-learning
  if social-learning-kind = "Random visitation"
    [ do-random-visits ]
    
  if social-learning-kind = "Preferential attachment"
    [ 
      do-preferential-attachment
      learn-from-network
    ]
    
  if social-learning-kind = "None: Moran process"
    [
      if (random-float 1.0 < social-visitation-rate) 
      [
        do-moran-process 
      ]
    ]
end

to do-preferential-attachment
  ask researchers [
    if (random-float 1.0 < social-visitation-rate and (count researchers != 1) ) [
      ; if we have previously linked to a researcher, consider rewiring
      ; if her efficiency is lower than mine...
      
      let my-efficiency -1
      let her-efficiency -1
      
      if (count my-out-social-links = 1) [
        set my-efficiency ( [researcher-efficiency] of self )
        let agent (item 0 [other-end] of my-out-social-links)
        set her-efficiency ( [researcher-efficiency] of agent )
        ifelse (my-efficiency < her-efficiency)
          [ stop ]
          [ ask my-out-social-links [ die ] ]
      ]
      
      let lst (sort other researchers)
      let weights (map [[researcher-efficiency] of ?] lst)
      let agent (select-randomly-by-weight lst weights)
      
      comment (word "Me: " self)
      comment (word "Selected agent: " agent)
      
      create-social-link-to agent [
        set color white
        set shape "stealth"
        set thickness .125
        ifelse show-social-visitation-links?
          [ show-link ]
          [ hide-link ] 
      ]
     
     if (my-efficiency > her-efficiency)
       [ ask my-out-social-links [ die ] ]
      
    ]
     
  ]
end

; researcher context
to learn-from-network
  ask researchers [
    comment (word "I am researcher " self)
      
    ask my-out-social-links [
      let arm ( [last-arm-pulled] of other-end )
      let succ ( [success?] of other-end )
                    
        if succ = true [
          let arm-test false
          ask myself [ set arm-test (bandit-has-arm? arm) ]
          
          ifelse arm-test = true 
            [ ask myself [reinforce-urn arm] ]
            [
              let prob 0
              ask other-end [
                set prob (get-arm-probability arm)
              ]
              ask myself [ add-arm-with-colour-and-probability arm prob ]
            ]
        ]
     ]
  ]
end

; observer context
to-report select-randomly-by-weight [ lst weights ]
  let total-weight (sum weights)
  let r (random-float total-weight) 
  let list-position 0
  let item-selected false
  
  while [item-selected = false] [
    let current-weight (item list-position weights)

    ifelse r > current-weight
    [
      set r (r - current-weight)
      set list-position (list-position + 1)
    ]
    [ set item-selected true]
  ]
  
  report item list-position lst
end

to do-random-visits
  ask researchers [
    if (random-float 1.0 < social-visitation-rate and (count researchers != 1)) [
      let visiting-researcher self
      let my-efficiency researcher-efficiency
      comment "Attempting social learning..."
      comment (word "  My efficiency (" who "): " my-efficiency)
      
      let her-efficiency 0
      
      ; the values of -1 will be changed before they are used.
      let ball-colour -1
      let arm-probability-to-add -1
      let have-sample-from-other-urn false
      
      ask one-of other researchers [       
        set her-efficiency researcher-efficiency
        comment (word "  Her efficiency (" who "): " her-efficiency)
        
        ; it's ok that we get this information, since we only use it later if 
        ; the efficiency of this other researcher is higher
        set ball-colour draw-ball-from-urn
        comment (word "  colour ball sampled from urn: " ball-colour)
        if (ball-colour != 0) ; skip, if we got the black ball...
        [
          set arm-probability-to-add (get-arm-probability ball-colour)
          set have-sample-from-other-urn true
        ]
        
        ; This just creates a social link to show the visit that occurred
        create-social-link-from visiting-researcher [
          set ticks-left-to-display 5
          set color white
          set shape "mylink"
          set thickness .125
          ifelse show-social-visitation-links?
            [ show-link ]
            [ hide-link ]
        ] 
      ]
      
      if (her-efficiency > my-efficiency and have-sample-from-other-urn)
      [
        ifelse ( (number-of-balls-in-urn-of-this-color ball-colour) = 0 )
          [ add-arm-with-colour-and-probability ball-colour arm-probability-to-add ]
          [ reinforce-urn ball-colour ]
        
        comment "  Social learning occurred!"
        comment "  Resulting urn / bandit state:"  
        comment (report-info who)
        
      ]
    ] 
  ]
end

to pull-bandit [ c ]
  set last-arm-pulled c
  
  let r (random-float 1.0)
  ifelse (r <= (get-arm-probability c)) ; we won!
  [
    reinforce-urn c
    set success? true
  ]
  [ set success? false ]
end

; researcher context
to-report bandit-has-arm? [ col ]
  ifelse (length bandit) = 0
    [ report false ]
    [
      let arm-colours (item 0 bandit)
      report member? col arm-colours
    ]
end

to-report get-arm-probability [ col ]
  let arm-colours (item 0 bandit)
  let arm-probs (item 1 bandit)
  let pos (position col arm-colours)
  report (item pos arm-probs)
end

to introduce-newton
  create-researchers 1 [
    show who
    set shape "circle"
    set size researcher-node-size
    set color white
    set age 1
    set urn [ [ 0 ] [ 1 ] ]
    set bandit []
    add-arm-with-colour-and-probability next-ball 1.0

    set next-ball (next-ball + 1)
  ]
  update-display
end

to-report new-arm-probability
  let x 0
  
  if distribution = "Exponential" [
    set x (random-exponential lambda)
    report (atan x 1 / 90.0)
  ] 
  
  if distribution = "Gamma" [
    set x (random-gamma alpha beta)
    report (atan x 1 / 90.0)
  ]
  
  if distribution = "Uniform" [
    report (random-float 1.0)
  ]
end

to-report total-list [ alist ]
  let total 0
  foreach alist [ set total ( total + ? ) ]
  report total
end

; researcher context
to-report total-number-of-balls-in-urn
  report total-list (item 1 urn)
end

; researcher context
to-report number-of-balls-in-urn-of-this-color [ c ]
  let ball-colors (item 0 urn)
  let ball-counts (item 1 urn)
  let pos (position c ball-colors)
  ifelse (pos = false)
    [ report 0 ]
    [ report item pos ball-counts]
end

; researcher context
to-report number-of-colors-in-urn
  report length (item 0 urn)
end

; researcher context
to-report ball-color-probability [ col ]
  report (number-of-balls-in-urn-of-this-color col) / total-number-of-balls-in-urn
end

; the structure of an urn is [ [ ...colours...] [ ...numbers...] ]
; researcher context
to-report draw-ball-from-urn
  let ball-colors (item 0 urn)
  let ball-counts (item 1 urn)
  let number-of-balls (sum ball-counts )
  let r (random-float number-of-balls)
  let list-position 0
  let ball-selected false
  
  while [ball-selected = false] [
    let current-ball-count (item list-position ball-counts)
    ifelse r < current-ball-count
    [
      set ball-selected true 
    ]
    [
      set r (r - current-ball-count)
      set list-position (list-position + 1)
    ]
  ]
  
  report item list-position ball-colors
end

;to-report draw-ball-from-urn
;  let ball-colors (item 0 urn)
;  let ball-counts (item 1 urn)
;  let number-of-balls ( total-list ball-counts )
;  let r ((random number-of-balls) + 1)
;  let list-position 0
;  let ball-selected false
;  
;  while [ball-selected = false] [
;    let current-ball-count (item list-position ball-counts)
;    ;type "r=" type r type "\n"
;    ifelse r > current-ball-count
;    [
;      set r (r - current-ball-count)
;      set list-position (list-position + 1)
;    ]
;    [ set ball-selected true]
;  ]
;  
;  report item list-position ball-colors
;end


; the structure of an urn is [ [ ...colours...] [ ...numbers...] ]
; researcher context
to reinforce-urn [ col ]
  let ball-colors (item 0 urn)
  let ball-counts (item 1 urn)
  let list-position (position col ball-colors)
  let cnt (item list-position ball-counts)
  set ball-counts (replace-item list-position ball-counts (cnt + 1))
  set urn (list ball-colors ball-counts)
end

; the structure of an urn is [ [ ...colours...] [ ...numbers...] ]
; researcher context
to deinforce-urn [ col ]
  let ball-colors (item 0 urn)
  let ball-counts (item 1 urn)
  let list-position (position col ball-colors)
  let cnt (item list-position ball-counts)
  ifelse (cnt > 1) 
  [
    set ball-counts (replace-item list-position ball-counts (cnt - 1))
    set urn (list ball-colors ball-counts)
  ]
  [
    set ball-colors (remove-item list-position ball-colors)
    set ball-counts (remove-item list-position ball-counts)
    set urn (list ball-colors ball-counts)
  ]
end

; researcher context
to remove-color-from-urn [ col ]
  let ball-colors (item 0 urn)
  let ball-counts (item 1 urn)
  let pos (position col ball-colors)
  set ball-colors (remove-item pos ball-colors)
  set ball-counts (remove-item pos ball-counts)
  set urn (list ball-colors ball-counts)
end

; researcher context
to remove-arm-from-bandit [ c ]
  let arm-colours (item 0 bandit)
  let arm-probs (item 1 bandit) 
  let pos (position c arm-colours)
  
  set arm-colours (remove-item pos arm-colours)
  set arm-probs (remove-item pos arm-probs)
  set bandit (list arm-colours arm-probs)
end

; researcher context
to-report researcher-efficiency
  let actual-count (total-number-of-balls-in-urn - 1)
    
  report total-number-of-balls-in-urn / age
end


;; some helper reporters



to-report retrieve-urn
  report [urn] of researcher agent-monitor
end

to-report retrieve-bandit
  report [bandit] of researcher agent-monitor
end

; observer context
to-report report-info [ index ]
  let o1 (report-urn index)
  let o2 (report-bandit index)
  report (word o1 "\n" o2)
end

; observer context
to show-info [ index ]
  print (report-info index)
end



; observer context
to-report report-urn [ num ]
  let output "[ "
  
  ask researcher num [
    (
      foreach (item 0 urn) (item 1 urn) [
        set output (word output ?1 " -> " ?2 ", ")
    ]
    )
  ]
  set output (word (trim-string output 2) " ]")
  report output  
end

to show-urn [ num ]
  print report-urn num
end

; observer context
to-report report-bandit [ num ]
  let output "[ "
  
  ask researcher num [
    (
      foreach (item 0 bandit) (item 1 bandit) [
        let prob (word "" ?2)
        if (length prob > 5)
          [ set prob (substring prob 0 4) ]
        set output (word output ?1 " -> " prob ", ")
    ]
    )
  ]
  set output (word (trim-string output 2) " ]")
  report output
end

to show-bandit [ num ]
  print report-bandit num 
end

to comment [ string ]
  if show-console-comments?
    [ print string ]
end

to-report trim-string [ string chars ]
  let len ((length string) - 1)
  report substring string 0 (len - (chars - 1))  
end

to-report probability-to-greyscale [ p ]
  let g (round (255 * p))
  report (list g g g )
end

to-report probability-to-red-green-mix [ p ]
  let r (round (255 * (1 - p)))
  let g (round (255 * p))
  report (list r g 0 )
end

; researcher context
to update-expected-accuracy
  let ball-colors (item 0 urn)
  set previous-expected-accuracy expected-accuracy
  set expected-accuracy 0
  
  foreach ball-colors [
    if (? != 0) [
       set expected-accuracy (expected-accuracy + ((ball-color-probability ?) * (get-arm-probability ?)))
    ]
  ]
end