Simple Genetic Algorithm

2 collaborators

Uri Wilensky (Author)

Tags

computer science

Tagged by Reuven M. Lerner over 10 years ago

Model group CCL | Visible to everyone | Changeable by group members (CCL)
Model was written in NetLogo 5.0.4 • Viewed 998 times • Downloaded 101 times • Run 1 time

WHAT IS IT?

This model demonstrates the use of a genetic algorithm on a very simple problem. Genetic algorithms (GAs) are a biologically-inspired computer science technique that combine notions from Mendelian genetics and Darwinian evolution to search for good solutions to problems (including difficult problems). The GA works by generating a random population of solutions to a problem, evaluating those solutions and then using cloning, recombination and mutation to create new solutions to the problem.

In this model we use the simple "ALL-ONES" problem to demonstrate how this is possible. We use such a simple problem in this model in order to highlight the solution technique only. The idea of the "ALL-ONES" problem is to find a string of bits (that is, a sequence of just ones and zeros) that contains all ones, and no zeros. Thus the string that best solves this problem is "111111...111".

HOW IT WORKS

The genetic algorithm is composed of the following steps.

1) A population of random solutions is created. Each solution consists of a string of randomly mixed "1"s and "0"s.

2) Each solution is evaluated on the basis of how well it solves the problem. This measure of the "goodness" of the solution is called its "fitness". In this model, our goal is simply to find a solution that consists of all "1"s. (In real-world applications of the genetic algorithm, the goals are much more complex, but the solutions are still usually encoded as binary strings.)

3) A new generation of solutions is created from the old generation, where solutions that have a higher fitness scores are more likely to be chosen as "parent" solutions than those that have low fitness scores.

A) The selection method used in this model is called "tournament selection", with a tournament size of 3. This means that 3 solutions are drawn randomly from the old generation, and the one with the highest fitness is chosen to become a parent.

B) Either one or two parents are chosen to create children. With one parent, the child is a clone or copy of the parent. With two parents, the process is the digital analog of sexual recombination -- the two children inherit part of their genetic material from one parent and part from the other.

C) There is also a chance that mutation will occur, and some of the child's bits will be changed from "1"s to "0"s, or vice versa.

4) Steps 2 and 3 above are repeated until a solution is found that successfully solves the problem.

HOW TO USE IT

Press the SETUP button to create an initial random population of solutions.

Press the STEP button to have one new generation created from the old generation.

Press the GO button to have the genetic algorithm run until a solution has been found.

The best solution found in each generation is displayed in the VIEW. Each white column represents a "1"-bit and each black column represents a "0"-bit.

=== Parameters ===

The POPULATION-SIZE slider controls the number of solutions that are present in each generation.

The CROSSOVER-RATE slider controls what percent of each new generation is created through sexual reproduction (recombination or crossover between two parents' genetic material), and what percent (100 - CROSSOVER-RATE) is created through asexual reproduction (cloning of one parent's genetic material).

The MUTATION-RATE slider controls the percent chance of mutation. This chance applies to each position in the string of bits of a new individual. For instance, if the string is 100 bits long, and the mutation-rate is set at 1%, then on average one bit will be changed during the creation of each new individual.

The PLOT-DIVERSITY? switch controls whether the amount of diversity within the population of solutions is plotted each generation, shown in the "Diversity Plot". Turning off PLOT-DIVERSITY? significantly increases the speed of the model because calculating diversity requires a lot of computation.

The "Fitness Plot" is used to show the best, average, and worst fitness values of the solutions at each generation.

THINGS TO NOTICE

Step through the model slowly, and look at the visual representation of the best solution found in each generation, displayed in the VIEW. How often does the best solution in Generation X+1 appear to be the offspring of the best solution in Generation X?

As the fitness in the population increases, the diversity decreases. Why is this?

THINGS TO TRY

Explore the effects of larger or smaller population sizes on the number of generations it takes to solve the problem completely. What happens if you measure the amount of time (in seconds) that it takes to solve the problem completely?

How does asexual reproduction compare to sexual reproduction for solving this problem? (What if CLONING-RATE is 100, or CLONING-RATE is 0?)

How much mutation is beneficial for the genetic algorithm? Can the genetic algorithm find a perfect solution if there is MUTATION-RATE is 0? What about if MUTATION-RATE is 10.0? Can you find an optimal MUTATION-RATE?

EXTENDING THE MODEL

Many variations on this simple genetic algorithm exist. For example, some genetic algorithms include "elitism". In this case, the best X% of solutions from the old generation are always copied directly into the new generation. Modify this model so that it uses elitism.

Another type of selection for reproduction that is sometimes used in genetic algorithms is called "roulette selection". In this case, you may imagine each solution in the population being assigned a wedge of a large roulette wheel. The size of the wedge is determined by dividing the fitness of each solution by the sum of the fitnesses of all solutions in the population. Thus, the probability of selecting any given solution to reproduce is directly proportional to its fitness. Try implementing this type of selection, and compare its performance to the "tournament selection" method that is currently used in this model.

As noted above, the "ALL-ONES" problem is a toy problem that is not very interesting in its own right. A natural extension of this model is to use the genetic algorithm to solve a problem that is significantly more interesting. Fortunately, you can change the problem that the genetic algorithm is solving by only modifying one thing, the "fitness function", which evaluates how good a given string of bits is at solving whatever problem you are trying to solve. For example, you could evolve rules for how a turtle should move, in order to maximize its food collection as it travels through the world. To do so, you might change the `ga-calculate-fitness` procedure to run a little simulation where a turtle moves in the world (according to some rules that are defined by the string of "1"s and "0"s), count how much food the turtle collects, and then set the fitness accordingly.

NETLOGO FEATURES

Note that NetLogo's powerful ability to work with agentsets makes it very easy to code the "tournament selection" used in this model. The following code is sufficient:

``````max-one-of (n-of 3 old-generation) [ga-fitness]
``````

RELATED MODELS

Echo is another model that is inspired by the work of John H. Holland. It examines issues of evolutionary fitness and natural selection.

There are several NetLogo models that examine principles of evolution from a more biological standpoint, including Altruism, Bug Hunt Camouflage, Cooperation, Mimicry, Peppered Moths, as well as the set of Genetic Drift models.

Sunflower Biomorph uses an artistic form of simulated evolution, driven by aesthetic choices made by the user.

CREDITS AND REFERENCES

This model is based off of work by John H. Holland, who is widely regarded as the father of the genetic algorithms. See Holland's book "Adaptation in Natural and Artificial Systems", 1992, MIT Press.

Additional information about genetic algorithms is available from a plethora of sources online.

HOW TO CITE

If you mention this model in a publication, we ask that you include these citations for the model itself and for the NetLogo software:

Click to Run Model

```;; Each potential solution is represented by a turtle.

turtles-own [
bits           ;; list of 0's and 1's
fitness
]

globals [
winner         ;; turtle that currently has the best solution
]

to setup
clear-all
create-turtles population-size [
set bits n-values world-width [one-of [0 1]]
calculate-fitness
hide-turtle  ;; the turtles' locations are not used, so hide them
]
update-display
reset-ticks
end

to go
if [fitness] of winner = world-width
[ stop ]
create-next-generation
update-display
tick
end

to update-display
set winner max-one-of turtles [fitness]
ifelse item pxcor ([bits] of winner) = 1
[ set pcolor white ]
[ set pcolor black ]
]
end

;; ===== Generating Solutions

;; Each solution has its "fitness score" calculated.
;; Higher scores mean "more fit", and lower scores mean "less fit".
;; The higher a fitness score, the more likely that this solution
;;   will be chosen to reproduce and create offspring solutions
;;   in the next generation.
;;

to calculate-fitness       ;; turtle procedure
;; For the "ALL-ONES" problem, the fitness is simply equal to the number of ones
;; that occur in this solution's bits.
;; However, you could solve more interesting problems by changing this procedure
;; to evaluate the bits in other ways.  For instance, the bits might
;; encode rules for how a turtle should move across the world in a search for food.
set fitness length (remove 0 bits)
end

;; This procedure does the main work of the genetic algorithm.
;; We choose solutions with good fitness to produce offspring
;; through crossover (sexual recombination), and to be cloned
;; (asexual reproduction) into the next generation.
;; There is also a chance of mutation occurring in each individual.
;; After a full new generation of solutions has been created,
;; the old generation dies.

to create-next-generation
; The following line of code looks a bit odd, so we'll explain it.
; if we simply wrote "LET OLD-GENERATION TURTLES",
; then OLD-GENERATION would mean the set of all turtles, and when
; new solutions were created, they would be added to the breed, and
; OLD-GENERATION would also grow.  Since we don't want it to grow,
; we instead write "TURTLES WITH [TRUE]", which makes OLD-GENERATION
; an agentset, which doesn't get updated when new solutions are created.
let old-generation turtles with [true]

; Some number of the population is created by crossover each generation
; we divide by 2 because each time through the loop we create two children.
let crossover-count  (floor (population-size * crossover-rate / 100 / 2))

repeat crossover-count
[
; We use "tournament selection", with tournament size = 3.
; This means, we randomly pick 3 solutions from the previous generation
; and select the best one of those 3 to reproduce.

let parent1 max-one-of (n-of 3 old-generation) [fitness]
let parent2 max-one-of (n-of 3 old-generation) [fitness]

let child-bits crossover ([bits] of parent1) ([bits] of parent2)

; create the two children, with their new genetic material
ask parent1 [ hatch 1 [ set bits item 0 child-bits ] ]
ask parent2 [ hatch 1 [ set bits item 1 child-bits ] ]
]

; the remainder of the population is created by cloning
; selected members of the previous generation
repeat (population-size - crossover-count * 2)
[
ask max-one-of (n-of 3 old-generation) [fitness]
[ hatch 1 ]
]

; now we're just talking to the new generation of solutions here
[
; there's a chance of mutations occurring
mutate
; finally we update the fitness value for this solution
calculate-fitness
]
end

;; ===== Mutations

;; This reporter performs one-point crossover on two lists of bits.
;; That is, it chooses a random location for a splitting point.
;; Then it reports two new lists, using that splitting point,
;; by combining the first part of bits1 with the second part of bits2
;; and the first part of bits2 with the second part of bits1;
;; it puts together the first part of one list with the second part of
;; the other.

to-report crossover [bits1 bits2]
let split-point 1 + random (length bits1 - 1)
report list (sentence (sublist bits1 0 split-point)
(sublist bits2 split-point length bits2))
(sentence (sublist bits2 0 split-point)
(sublist bits1 split-point length bits1))
end

;; This procedure causes random mutations to occur in a solution's bits.
;; The probability that each bit will be flipped is controlled by the
;; MUTATION-RATE slider.

to mutate   ;; turtle procedure
set bits map [ifelse-value (random-float 100.0 < mutation-rate) [1 - ?] [?]]
bits
end

;; ===== Diversity Measures

;; Our diversity measure is the mean of all-pairs Hamming distances between
;; the genomes in the population.

to-report diversity
let distances []
let bits1 bits
ask turtles with [self > myself] [
set distances fput (hamming-distance bits bits1) distances
]
]
; The following  formula calculates how much 'disagreement' between genomes
; there could possibly be, for the current population size.
; This formula may not be immediately obvious, so here's a sketch of where
; it comes from.  Imagine a population of N turtles, where N is even, and each
; turtle has  only a single bit (0 or 1).  The most diverse this population
; can be is if half the turtles have 0 and half have 1 (you can prove this
; using calculus!). In this case, there are (N / 2) * (N / 2) pairs of bits
; that differ.  Showing that essentially the same formula (rounded down by
; the floor function) works when N is odd, is left as an exercise to the reader.
let max-possible-distance-sum floor (count turtles * count turtles / 4)

; Now, using that number, we can normalize our diversity measure to be
; between 0 (completely homogeneous population) and 1 (maximally heterogeneous)
report (sum distances) / max-possible-distance-sum
end

;; The Hamming distance between two bit sequences is the fraction
;; of positions at which the two sequences have different values.
;; We use MAP to run down the lists comparing for equality, then
;; we use LENGTH and REMOVE to count the number of inequalities.

to-report hamming-distance [bits1 bits2]
report (length remove true (map [?1 = ?2] bits1 bits2)) / world-width
end

```

There are 10 versions of this model.

Uri Wilensky almost 11 years ago Updated to NetLogo 5.0.4 Download this version
Uri Wilensky over 11 years ago Updated version tag Download this version
Uri Wilensky over 11 years ago Updated to version from NetLogo 5.0.3 distribution Download this version
Uri Wilensky over 13 years ago Updated from NetLogo 4.1 Download this version
Uri Wilensky over 13 years ago Updated from NetLogo 4.1 Download this version
Uri Wilensky over 13 years ago Updated from NetLogo 4.1 Download this version
Uri Wilensky over 13 years ago Updated from NetLogo 4.1 Download this version
Uri Wilensky over 13 years ago Model from NetLogo distribution Download this version
Uri Wilensky over 13 years ago Simple Genetic Algorithm Download this version

Attached files

File Type Description Last updated
Simple Genetic Algorithm.png preview Preview for 'Simple Genetic Algorithm' almost 11 years ago, by Uri Wilensky Download

This model does not have any ancestors.

This model does not have any descendants.