Deep Neural Networks and Dropout

1 collaborator

Jacob Samson (Author)

WHAT IS IT?

This is a model of arbitrarily large neural networks. It is based on the Multilayer Perceptron model, but the network architecture is user-determined.

This network is intended to provide a visualization of the process of neural network training, and to serve as a platform for experimentation with an eye on qualitiative intuitions.

HOW IT WORKS

Initially the weights on the links of the networks are random, with some scaling based on the surrounding network.

The nodes in the leftmost layer are the called the input nodes, the nodes in the middle layers are called the hidden nodes, and the nodes on the rightmost layer are called the output nodes.

The activation values of the input nodes are the inputs to the network. The activation values of the hidden nodes and output nodes are equal to the activation values of the layer before them, multiplied by their link weights, summed together, and passed through the tanh function. The output of the network is 1 if the activation of the output node is greater than 0 and -1 if it is less than 0.

The tanh function maps negative values to values between -1 and 0, and maps positive values to values between 0 and 1. The values increase nonlinearly between -1 and 1 with a sharp transition at 0.

To train the network a lot of inputs are presented to the network along with how the network should correctly classify the inputs. The network uses a back-propagation algorithm to pass error back from the output node and uses this error to update the weights along each link.

If dropout learning is enabled, hidden nodes are randomly dropped from training at each training step. This prevents the model from "overfitting", which means it stops it from making too large of assumptions about the entire data set based on the data it has seen so far.

HOW TO USE IT

To use it press SETUP to load the training data and initialize the patches.

Enter a string of the form "["-any number of space seperated positive integers-"]" into HIDDEN-SIZES-STRING to set the number of hidden nodes in each hidden layer (and how many such layers there are).

Press INIT-NET to initialize the network.

Press TRAIN ONCE to run one epoch of training. The number of examples presented to the network during this epoch is controlled by EXAMPLES-PER-EPOCH slider.

Press TRAIN to continually train the network.

In the view, the greater the intensity of the link's color the greater (in terms of absolute value) the weight it has. If the link is red then it has a positive weight. If the link is blue then it has a negative weight.

LEARNING-RATE controls how much the neural network will learn from any one example.

DROPOUT? activates dropout learning

DROPOUT-RATE controls the probability that the hidden nodes drop out at each step, out of 1000.

THINGS TO NOTICE

As the network trains, high-weight edges (intuitively, connections that the model is placing importance on) become brighter. This exposes the actual process of learning in an intuitive visual medium, and can sometimes be extremely informative.

THINGS TO TRY

Manipulate the HIDDEN-SIZES-STRING. What happens to the learning rate as the number of nodes increases? What happens to the accuracy? How does this effect the visualization of the model?

EXTENDING THE MODEL

The ability to set a threshold weight under which edges become transparent could make the model visualization less cluttered and easier to interpret.

The model would be a viable candidate for extension by any standard neural network optimization, which could be compared to normal learning in a similar way to how dropout was handled in this case.

NETLOGO FEATURES

This model uses the link primitives. It also makes heavy use of lists. Additionally, the csv extension is used to load the MNIST data set.

COPYRIGHT AND LICENSE

CC BY-NC-SA 3.0

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.

Commercial licenses are also available. To inquire about commercial licenses, please contact Uri Wilensky at uri@northwestern.edu.

Comments and Questions

Click to Run Model

extensions [csv]

links-own [
  weight         ;; Weight given to end1 activation by end2
  inlayer        ;; Layer index of end1
]

breed [bias-nodes bias-node]
breed [input-nodes input-node]
breed [output-nodes output-node]
breed [hidden-nodes hidden-node]

turtles-own [
  activation     ;; Determines the nodes output
  err            ;; Used by backpropagation to feed error backwards
  layer          ;; Layer of network node is contained in. Used for agentset manipulation.
  estvar         ;; Estimated variance of error signal
  learncoeff     ;; Individual learning coefficient
  dropped?       ;; Boolean true if node currently dropped
]

globals [
  epoch-error    ;; Measurement of how many training examples the network got wrong in the epoch
  input-size     ;; Size of inputs
  hiddensizes    ;; Vector of layer sizes, determines net topology
  output-size    ;; Size of outputs
  traindata      ;; Training data
  testdata       ;; Testing data
]

;;;
;;; LOAD FILES
;;;

to load-files
  file-close-all
  file-open "mnist_train.csv"
  set traindata (list)
  repeat 20000 [
    set traindata lput (csv:from-row file-read-line) traindata
  ]
  file-close
  set testdata csv:from-file "mnist_test.csv"
end 

;;;
;;; SETUP PROCEDURES
;;;

;; Set patches, shapes, and files (invariant under node change)

to setup
  clear-all
  ask patches [ set pcolor gray ]
  set-default-shape bias-nodes "bias-node"
  set-default-shape input-nodes "circle"
  set-default-shape output-nodes "circle"
  set-default-shape hidden-nodes "output-node"
  set-default-shape links "small-arrow-shape"
  load-files
end 

;; Set up nodes and links, initialize values, and propagate so
;; that activations make sense

to init-net
  clear-links
  clear-turtles
  clear-plot
  setup-nodes
  setup-links
  recolor
  set-learncoeffs
  propagate
  reset-ticks
end 

;; Create, initialize, and position the nodes in the network

to setup-nodes
  set input-size 400
  set hiddensizes read-from-string hidden-sizes-string
  set output-size 10
  let l-index 0
  let index 0
  
  create-bias-nodes 1 [
     setxy nodex l-index nodey l-index index (input-size + 1)
     set activation 1
     set layer l-index
     set dropped? false
  ]
  
  set index 1
  
  repeat input-size [
    create-input-nodes 1 [
      setxy nodex l-index nodey l-index index (input-size + 1)
      set activation ((random 2) * 2) - 1
      set layer l-index
      set dropped? false
    ]
    set index index + 1
  ]
  
  set l-index 1
  set index 0
  
  foreach hiddensizes [
    
    create-bias-nodes 1 [
       setxy nodex l-index nodey l-index index (? + 1)
       set activation 1
       set layer l-index
       set dropped? false
    ]
     
    set index 1
     
    repeat ? [
     create-hidden-nodes 1 [
       setxy nodex l-index nodey l-index index (? + 1)
       set activation ((random 2) * 2) - 1
       set layer l-index
       set dropped? false
     ]
     set index index + 1
    ]
    
    set l-index l-index + 1
    set index 0
  ]
  
  repeat output-size [
    create-output-nodes 1 [
      setxy nodex l-index nodey l-index index output-size
      set activation ((random 2) * 2) - 1
      set layer l-index
      set dropped? false
    ]
    set index index + 1
  ]
  ask turtles [set size 0.5]
end 

;; Create and initialize links between nodes in the network

to setup-links
  let l-index 0
  repeat (length hiddensizes) [
   connect-all (turtles with [layer = l-index]) (hidden-nodes with [layer = (l-index + 1)])
   set l-index l-index + 1
  ]
  connect-all (turtles with [layer = l-index]) (output-nodes with [layer = (l-index + 1)])
end 

;; Completely connect nodes1 to nodes2 with links

to connect-all [nodes1 nodes2]
  let r 1 / (sqrt (count nodes1))
  ask nodes1 [
    create-links-to nodes2 [
      set weight random-float (2 * r) - r
      set inlayer [layer] of one-of nodes1
    ]
  ]
end 

;; Adjust color of nodes and edges according to values

to recolor
  ask turtles [
    set color item (step activation) [black white]
  ]
  let l-index 0
  let maxw 0
  repeat (length hiddensizes) + 1 [
   set maxw max [abs weight] of links with [inlayer = l-index]
   ask links with [inlayer = l-index] [
     let wquotient (weight / maxw)
     let colorstr (wquotient * 127)
     let colorvec (list (colorstr + 127) (127 - (abs colorstr)) (127 - colorstr) 196)
     set color colorvec
   ]
   set l-index l-index + 1
  ]
  ask turtles [
    if dropped? [set color [127 127 127]]
  ]
  ask links [
    if ([dropped?] of end1) or ([dropped?] of end2) [set color [127 127 127 196]]
  ]
end 

;; Set the local learning rate coefficients for the nodes

to set-learncoeffs
  let l-index ((length hiddensizes) + 1)
  let v (1 / (item (l-index - 2) hiddensizes))
  let lc (1 / ((item (l-index - 2) hiddensizes) * sqrt v))
  
  ask output-nodes [
    set estvar v
    set learncoeff lc
  ]
  
  set l-index (l-index - 1)
  
  repeat (length hiddensizes) - 1 [
    set v (((count hidden-nodes with [layer = (l-index - 1)]) * v) / (item (l-index - 2) hiddensizes))
    set lc (1 / ((item (l-index - 2) hiddensizes) * sqrt v))
    
    ask hidden-nodes with [layer = l-index] [
      set estvar v
      set learncoeff lc
    ]
    
    set l-index (l-index - 1)
  ]
  
  set v (((count input-nodes) * v) / (count input-nodes))
  set lc (1 / ((count input-nodes) * (sqrt v)))
  
  ask hidden-nodes with [layer = l-index] [
    set estvar v
    set learncoeff lc
  ]
end 
    

;;;
;;; VISUAL LAYOUT FUNCTIONS
;;;

;; Find the appropriate x coordinate for this layer

to-report nodex [l-index]
  report min-pxcor + (((l-index + 1) * (world-width - 1)) / (length hiddensizes + 3))
end 

;; Find the appropriate y cooridinate for this node

to-report nodey [l-index index in-layer]
  report max-pycor - (((index + 1) * (world-height - 1)) / (in-layer + 1))
end 

;;;
;;; TRAINING PROCEDURES
;;;

to train
  set epoch-error 0
  let currentdatum (one-of traindata)
  let sortin sort input-nodes
  let index 0
  let target n-values 10 [ifelse-value (? = (item 0 currentdatum)) [1][-1]]
  
  repeat examples-per-epoch [
    if dropout? [
      ask hidden-nodes [if (random 1000) < dropout-rate [set dropped? true]]
    ]
    repeat (length sortin) [
      ask (item index sortin) [set activation ((item (index + 1) currentdatum) / 127.5) - 1]
      set index index + 1
    ]
    propagate
    back-propagate  target
    set index 0
    set currentdatum (one-of traindata)
    set target n-values 10 [ifelse-value (? = (item 0 currentdatum)) [1][-1]]
    ask hidden-nodes [set dropped? false]
  ]
  set epoch-error epoch-error / examples-per-epoch
  tick
end 

;;;
;;; PROPAGATION PROCEDURES
;;;

;; carry out one calculation from beginning to end

to propagate
  let l-index 1
  repeat length hiddensizes [
    ask hidden-nodes with [layer = l-index and not dropped?] [ set activation new-activation ]
    set l-index l-index + 1
  ]
  ask output-nodes [set activation new-activation]
  recolor
end 

;; Determine the activation of a node based on the activation of its input nodes

to-report new-activation  ;; node procedure
  report tanh sum [[activation] of end1 * weight] of my-in-links with [not [dropped?] of end1]
end 

;; changes weights to correct for errors

to back-propagate [target]
  let example-error 0
  let sortout sort output-nodes
  let l-index (length hiddensizes) + 1
  let index 0
  
  repeat (count output-nodes) [
    ask (item index sortout) [
      set err (item index target) - activation
      set example-error example-error + (err ^ 2)
    ]
    set index index + 1
  ]
  
  set example-error .5 * example-error
  set l-index l-index - 1
  
  repeat length hiddensizes [
    ask hidden-nodes with [layer = l-index and not dropped?] [
      let sumerror sum [weight * ([err] of end2)] of my-out-links
      set err (1 - (activation ^ 2)) * sumerror
    ]
    set l-index l-index + 1
  ]
  
  ask links with [not [dropped?] of end1 and not [dropped?] of end2] [
    let change ([err * learncoeff * learning-rate] of end2)*([activation] of end1)
    set weight weight + change
  ]
  
  set epoch-error epoch-error + example-error
end 

;;;
;;; MISC PROCEDURES
;;;

;; Calculates the tanh function

to-report tanh [input]
  let exp2x e ^ (2 * input)
  report (exp2x - 1) / (exp2x + 1)
end 

;; computes the step function given an input value and the weight on the link

to-report step [input]
  report ifelse-value (input > 0) [1][0]
end 


; Copyright 2006 Uri Wilensky.
; See Info tab for full copyright and license.

There is only one version of this model, created about 10 years ago by Jacob Samson.

Attached files

File	Type	Description	Last updated
Deep Architectures and Dropout Learning Poster Slam.pptx	powerpoint	Poster Slam Slides	about 10 years ago, by Jacob Samson	Download
Deep Neural Networks and Dropout.png	preview	Preview for 'Deep Neural Networks and Dropout'	about 10 years ago, by Jacob Samson	Download
DeepNetPoster.pptx	powerpoint	Poster Slam Poster	about 10 years ago, by Jacob Samson	Download
Final Report.docx	word	Final Report	about 10 years ago, by Jacob Samson	Download
JacobSamson_June1.docx	word	Project Update - June 1st	about 10 years ago, by Jacob Samson	Download
JacobSamson_May18.docx	word	Project Update - May 18th	about 10 years ago, by Jacob Samson	Download
JacobSamson_May25.docx	word	Project Update - May 25th	about 10 years ago, by Jacob Samson	Download
Project Proposal.docx	word	Project Proposal	about 10 years ago, by Jacob Samson	Download

This model does not have any ancestors.

This model does not have any descendants.

NetLogo