Q-Learning in MDPs

1 collaborator

Larry Lin (Author)

Comments and Questions

Please start the discussion about this model! (You'll first need to log in.)

Click to Run Model

patches-own[
  q-val-north
  q-val-south
  q-val-east
  q-val-west
]

to setup
  ca
  reset-ticks
  
  ;; set intial q-values and patches
  set-patch
  
  ;; create agent
  crt 1[
    setxy 0 0
    set shape "car"
    set size 0.5
    set color yellow
    set heading 0
  ]
end 

to go
  tick
  
  ask turtle 0[
    ;; retrieve current x and y coordinates
    let c-xcor xcor
    let c-ycor ycor
    
    ;; 0.25 probability of moving in N,S,E,W direction
    set heading ((random 4) * 90) ;; equal probability: 0, 90, 180, 270
    
    ;; probabilistic movement - 0.8 chance of moving in intended direction
    let prob random-float 1
    
    ifelse(prob < 0.8)[
      ;; move in intended direction
      ;; no changes in heading required
    ][
      ifelse(prob < 0.9)[
        ;; move to left (+90) of intended direction
        set heading (heading - 90)
      ][
        ;; move to right (+90) of intended direction
        set heading (heading + 90)
      ]
    ]
    
    ;; after setting direction, move forward 1 step
    fd 1
    
    ;; disallow movement into blue cell
    if(xcor = 1) and (ycor = 1)[
      bk 1
    ]
        
    ;; set q-values
    set-qval c-xcor c-ycor heading xcor ycor
    
    ;; reset agent's position if reach winning or losing state
    if ([pcolor] of patch xcor ycor) != black[
      if ([pcolor] of patch xcor ycor) != blue[
        set xcor 0
        set ycor 0
      ]
    ]
    
  ]
    
  ;; set the q-values in patches label
  set-patch
end 

to set-qval[cur-xcor cur-ycor cur-heading new-xcor new-ycor]
  
  ;; optimal future value - Q(s',a')
  let opt-fut-val 0
  
  ;; compute optimal future value
  ask patch new-xcor new-ycor[
    set opt-fut-val (max (list q-val-north q-val-east q-val-south q-val-west))
  ]
  
  ;; set computed q-value into Q(s,a)
  ask patch cur-xcor cur-ycor[
    if(cur-heading = 0)[
      ;; north
      set q-val-north (precision (q-val-north + alpha * (reward + (gamma * opt-fut-val) - q-val-north)) 1)
    ]
    if(cur-heading = 90)[
      ;; east
      set q-val-east (precision (q-val-east + alpha * (reward + (gamma * opt-fut-val) - q-val-east)) 1)
    ]
    if(cur-heading = 180)[
      ;; south
      set q-val-south (precision (q-val-south + alpha * (reward + (gamma * opt-fut-val) - q-val-south)) 1)
    ]
    if(cur-heading = 270)[
      ;; west
      set q-val-west (precision (q-val-west + alpha * (reward + (gamma * opt-fut-val) - q-val-west)) 1)
    ]
  ]
end 

to set-patch
  
  ask patches[
    set pcolor black
  ]
  
  ask patch 1 1[
    set pcolor blue
    set q-val-west 0
    set q-val-north 0
    set q-val-east 0
    set q-val-south 0
  ]
    
  ask patch 3 2[
    set pcolor green
    set q-val-west winning-state-value
    set q-val-north winning-state-value
    set q-val-east winning-state-value
    set q-val-south winning-state-value
  ]
  
  ask patch 3 1[
    set pcolor red
    set q-val-north losing-state-value
    set q-val-east losing-state-value
    set q-val-south losing-state-value
    set q-val-west losing-state-value
  ]
  
  ask patches[
    set plabel (list q-val-west q-val-north q-val-east q-val-south)
  ]
end

There is only one version of this model, created over 11 years ago by Larry Lin.

Attached files

File	Type	Description	Last updated
Q-Learning in MDPs.png	preview	Preview for 'Q-Learning in MDPs'	over 11 years ago, by Larry Lin	Download

This model does not have any ancestors.

This model does not have any descendants.

NetLogo

Q-Learning in MDPs

1 collaborator

Close

Tags

Close

Comments and Questions

Attached files