# Prob Graphs Basic

Do you have questions or comments about this model? Ask them here! (You'll first need to log in.)

## WHAT IS IT?

Prob Graphs Basic is a basic introduction to probability and statistics.

A sample space is the collection of all possible outcomes in an experiment. An example of a sample space is the numbers "1, 2, 3, 4, 5, 6, 7." An event is what you get when you run an experiment. For example, if I am running an experiment that randomly selects a single number out of the sample space "1, 2, 3, 4, 5, 6, 7," then an event might be "5." A sample is a collection of events that occur in an experiment. You could have a sample of size 1 that contains just 1 event, but you could have a sample of size 4 that contains 4 events, e.g., "5, 3, 3, 7."

In this model, 3 graphs monitor a single experiment as it unfolds. The experiment here is finding how often the number "1" shows up when you randomly select a number within a range that you define. This range could be, for example, between 1 and 2. An example of a sample space of only two values is a coin that can be either 'heads' or 'tails.' An example of a sample space of 6 values is a die that can land on the values 1 thru 6. Through observing this simple experiment through 3 different graphs, you will learn of 3 different ways of making sense of the phenomenon.

The top graph, "m/n convergence to limiting value," shows how the rate settles down to the expected- or mathematical probability. For instance, the limiting value of a coin falling on "heads" is .5 because it happens 1/2 of the time. So, the unit of analysis is a single trial and the rate is always informed by all previous trials. To explain this further, lets think of "batting average." The sample space in batting is a 'hit' or a 'no hit,' which is much the same as whether a coin falls on "heads" or on "tails" (only of course batting is not random like tossing a coin or otherwise Babe Ruth's average would have been the same as anyone's). So there are exactly 2 possible outcomes. The "batting average" keeps track, over time, of how many "hits" occurred out of all attempts to hit, known as "at bats." So the "batting average" is calculated as

Hits / At-Bats = Batting Average

For instance, using "H" for hit and "N" for no-hit, a baseball player's at-bat events may look like this, over 20 attempts:

```
N N N H H N N N N H N H N N H H H N N H
```

'Hits' are called 'favored events' because when we do the statistics, what we care about, count, and calculate is all about how often 'hits' occurred out of all the at-bat events. The m/n interpretation (favored events / total events) would interpret this string of events as 8 hits / 20 at bats, .4 probability (the same as .400), or a score of 400 (out of 1000).

You may be familiar with the fact that as the baseball season progresses, it is more and more difficult for an individual player to change his "average." This model may help you understand or at least simulate this phenomenon. But remember that a batter, unlike a coin or a die, is not behaving randomly. But in this model the behavior will be random. We have discussed batting only to give you context for thinking about the graph. A truer context, though, would be a coin that has 2 sides. In fact, this model can simulate not just objects with 2 sides, but with more. You know all about dice that have 6 sides, right? If you have set the size of your sample space to 5, then the model will simulate an experiment in which a die of 5 sides is rolled over and over again.

The middle graph, "Attempts-until-Success Distribution" counts how many trials it takes for the favored event to occur. For instance, if you're tossing a coin, it takes on average 2 tosses to get "heads," and if you're rolling a die it takes on average 6 rolls to get a "5." This graph is tracking the exact same experiment as the top graph; only it is "parsing" the events differently, that is, it is using a different rule to divide up the sequence of events over time. (We will continue using "N" and "H" but you can think of the coin with 2 sides or of the die with as many sides as you want.)

```
N N N H H N N N N H N H N N H H H N N H
```

So the unit of analysis in this interpretation of the experiment's results is the number of events leading up to and including a hit. As you see, the number of events per unit changes. In this example the string of numbers is [4; 1; 5; 2; 3; 1; 1; 3]. Note that in this string the numeral "1" appears 3 times, the numeral "2" appears 1 time, the numeral "3" appears 2 times, the numeral "4" appears 1 time, and the numeral "5" appears 1 time. The histogram of this string would peak over '1' (this peak will be of height 3), then go down to '2' (frequency of 1), etc. Perhaps this interpretation is a bit like what a batter's fans feel -- their suspense grows over failed hits until there is a hit, they are relieved and happy, and then they start counting again. So according to the context you are in -- what you're interested in finding, how you're feeling -- the world can appear different.

The bottom graph, "Successes-per-Sample distribution," takes yet another perspective on the experiment, namely a sampling perspective. The sampling perspective is used in statistics. Lets analyze the same string of events from our experiment, this time chopping it up into samples of equal size, say size 5.

```
N N N H H N N N N H N H N N H H H N N H
```

See that in the first sample there are 2 hits, in the second sample there is 1 hit, in the third sample there are 2 hits, and in the last sample there are 3 hits. This observation could be summed up as [2; 1; 2; 3]. A histogram of this result would show a frequency of 0 (y axis) over the 0 (x axis), because all samples had at least a single 'H.' Then over the '1' there will be a column of height 1, over the '2' there will be a column of height 2, and over the '3' there will be a column of height 1.

Understanding the differences and relations between these 3 graphs will give you a strong head start in studying Probability and Statistics.

This model is a part of the ProbLab curriculum. The ProbLab Curriculum is currently under development at the CCL. For more information about the ProbLab Curriculum please refer to http://ccl.northwestern.edu/curriculum/ProbLab/.

## HOW IT WORKS

The model first generates a random value between 1 and sample-space-size inclusive. The number of attempts (trials) is increased by one. If the random value is equal to 1, then the number of successes (favored events or "hits") is also increased by one. The number of attempts and the number of successes are interpreted in three different ways with each way shown in a graph as follows: (1) single attempt (trial) and single success; (2) trials (attempts) thru to each success; or (3) successes in each sample (fixed number of trials). Each of the graphs comes to be associated with typical shapes.

## HOW TO USE IT

Begin with the default settings. If you have changed them, then do the following: set the sample-space-size to 2 (so outcomes are either '1' or '2'), set the sample-size to '10' (so each sample will be a string of 10 events), and set the 'how-many-samples?' slider to 300 (so that the experiment will run a total of 300 samples of size 10 each, making a total of 3,000 trials). Press 'setup' to be sure all the variables are initialized, so that you will not have leftover values from a previous experiment). Press 'go.' Watch the 'event' monitor to see the number that the randomized procedure has reported. It will be either '1' or '2' because you have set the value to 2.

You may want to use the speed slider above the view to slow down the simulation. As you become more comfortable with understanding what you are seeing, you can speed up the simulation by moving the slider farther right.

Note how the event does not necessarily alternate between '1' and '2' according to any particular pattern. Rather, only in the long run do you see what the constant is in the phenomenon you are observing. "In the long run" is precisely what this experiment shows. You can control how long this run will be by increasing or decreasing both the 'sample-size' and/or the 'how-many-samples?' slider.

### Buttons

'setup' -- initializes all variables. Press this button to begin a new experiment.

'go' -- begins the simulation running. You can press it again to pause the model.

### Sliders

'sample-space-size' - set the size of the sample space (in integers).

'sample-size' - set the number of trials per sample.

'how-many-samples?'- set the number of samples you wish to run in the experiment.

### Monitors

'event' -- the number that the randomized procedure has generated this trial.

'total-successes' -- total number of favored events over all trials.

'total-attempts' -- total number of trials.

'rate' -- total-successes / total-attempts.

'counter' -- shows how many trials have passed since last success (or, if you've only just set up and run the model, then it will show how many trials have passed since the model began running).

'attempts-this-sample' -- counts how many trials there have been since the last success (or, if you've only just set up and run the model, then it will show how many trials have passed since the model began running).

'successes-this-sample' -- counts how many successes there have been since the last success (or, if you've only just set up and run the model, then it will show how many trials have passed since the model began running).

'samples counter' -- counts how many samples there have been since the beginning of this experiment

'min', 'mean', 'max' -- the minimum, mean, and maximum values of the Successes-per-Sample distribution

### Plots

m/n convergence to limiting value -- cumulative rate of successes (hits or favored events) per total trials.

Attempts-until-Success Distribution -- histogram of number of trials it takes until each success.

Successes-per-Sample Distribution -- histogram of number of successes within each sample.

## THINGS TO NOTICE

What are the characteristic shapes of each graph?

Look at the 'rate' monitor. What can you say about the fluctuation of numbers? What can you say about the value it settles on? What other settings in the model can you relate to this rate value?

The "Attempts-until-Success Distribution" never has values for 0, whereas the other plots sometimes do. Why is that?

Also, what can you say about the mean of this distribution? Does this make sense to you?

## THINGS TO TRY

A sample-size of 10 that is run 300 times and a sample-size of 300 that is run 10 times both produce 3000 trials, because 10 and 300 are the factors of 3000 regardless of their order in a context. Run the experiment under both combination conditions. Did this make any difference? If so, which of the three graphs did it affect and which did it not affect? Run the experiment under other pairs of combination conditions. How different do the factors have to be to cause any difference in the graphs? How does the sample-space-size play in with all this?

By now you may have noticed the typically bell-shaped histogram of the Successes-per-Sample distribution. Try to find settings that do not create this shape and analyze why this is the case.

## EXTENDING THE MODEL

As a beginning, try adding monitors to show values from variables you are interested in tracking. For instance, you may want to know the minimum, mean, and maximum values of the "Attempts-until-Success Distribution." Also, you may want to change parameters of the sliders.

Challenge: Add to the "Attempt-until-Success" plot a line that indicates the mean.

Challenge: Think of modification that keeps the 'random' reporter, but "helps" the program have more hits. Of course, this will change completely the nature of the simulation, so you can think of what you have created, and give the program a new name.

## NETLOGO FEATURES

This model is unusual in that it doesn't use the view at all. Everything that happens visually happens in the plots and monitors.

## CREDITS AND REFERENCES

This model is a part of the ProbLab curriculum. The ProbLab Curriculum is currently under development at Northwestern's Center for Connected Learning and Computer-Based Modeling. . For more information about the ProbLab Curriculum please refer to http://ccl.northwestern.edu/curriculum/ProbLab/.

## HOW TO CITE

If you mention this model in a publication, we ask that you include these citations for the model itself and for the NetLogo software:

- Abrahamson, D. and Wilensky, U. (2004). NetLogo Prob Graphs Basic model. http://ccl.northwestern.edu/netlogo/models/ProbGraphsBasic. Center for Connected Learning and Computer-Based Modeling, Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL.
- Wilensky, U. (1999). NetLogo. http://ccl.northwestern.edu/netlogo/. Center for Connected Learning and Computer-Based Modeling, Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL.

## COPYRIGHT AND LICENSE

Copyright 2004 Uri Wilensky.

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.

Commercial licenses are also available. To inquire about commercial licenses, please contact Uri Wilensky at uri@northwestern.edu.

This model was created as part of the projects: PARTICIPATORY SIMULATIONS: NETWORK-BASED DESIGN FOR SYSTEMS LEARNING IN CLASSROOMS and/or INTEGRATED SIMULATION AND MODELING ENVIRONMENT. The project gratefully acknowledges the support of the National Science Foundation (REPP & ROLE programs) -- grant numbers REC #9814682 and REC-0126227.

## Comments and Questions

globals [ event total-attempts total-successes counter counter-list successes-per-sample-list attempts-this-sample successes-this-sample samples-counter ] to setup ;; resets everything to appropriate initial values clear-all set event "-" set total-attempts 0 set total-successes 0 set counter 0 set counter-list [] set attempts-this-sample 0 set successes-this-sample 0 set samples-counter 0 set successes-per-sample-list [] reset-ticks end to go if samples-counter = how-many-samples? [stop] set total-attempts total-attempts + 1 set counter counter + 1 select-and-check tick update-and-plot end to select-and-check ;; This procedure simulates a chance event by randomly selecting a number between 1 and ;; sample-space-size, for instance between 1 and 5, as if you are rolling a die with 5 ;; sides. Next, the procedure checks to see if this event (what you "rolled") happens to ;; be '1.' A '1' is a success. Note that 'random' reports a number between 0 and value, ;; so "random 1" is only 0, and "random 2" is 0 or 1. That is why we have to add 1. set event ( 1 + random sample-space-size ) if event = 1 [ set total-successes total-successes + 1 set counter-list lput counter counter-list set counter 0 set successes-this-sample successes-this-sample + 1 ] end to update-and-plot ;; updates values for each of the three plots update-and-plot-m/n update-and-plot-attempts update-and-plot-successes end to update-and-plot-m/n set-current-plot "m/n convergence to limiting value" plot (total-successes / total-attempts) end to update-and-plot-attempts if length counter-list = 0 [stop] set-current-plot "Attempts-until-Success Distribution" ;; setting the range just beyond the maximum value (e.g.,5 beyond but it could be more or less) ;; helps the eye pick up that the right-most value is indeed the maximum value set-plot-x-range 0 ( (max counter-list) + 5) histogram counter-list let maxbar modes counter-list let maxrange length filter [ ? = item 0 maxbar ] counter-list set-plot-y-range 0 max list 10 maxrange end to update-and-plot-successes set attempts-this-sample attempts-this-sample + 1 if attempts-this-sample = sample-size [ set successes-per-sample-list lput successes-this-sample successes-per-sample-list set-current-plot "Successes-per-Sample Distribution" ;; This line adjusts the top range of the x-axis so as to stabilize and centralize ;; the distribution. The idea is to try and keep the emergent graph shape in the ;; middle of the plot. The 'ceiling' primitive keeps the maximum range value an integer. set-plot-x-range 0 ( max ( list plot-x-max ( 3 + ( ceiling ( 2 * mean successes-per-sample-list ) ) ) ) ) histogram successes-per-sample-list let maxbar modes successes-per-sample-list let maxrange length filter [ ? = item 0 maxbar ] successes-per-sample-list set-plot-y-range 0 max list 25 maxrange set attempts-this-sample 0 set successes-this-sample 0 set samples-counter samples-counter + 1 ] end ; Copyright 2004 Uri Wilensky. ; See Info tab for full copyright and license.

There are 15 versions of this model.

## Attached files

File | Type | Description | Last updated | |
---|---|---|---|---|

Prob Graphs Basic.png | preview | Preview for 'Prob Graphs Basic' | about 10 years ago, by Uri Wilensky | Download |

This model does not have any ancestors.

This model does not have any descendants.

George Dombi

## 2 Model Suggestions (Question)

1) Please make an XY plotter with minimal statistics. Possible design: - on Set up- 12 pairs of input boxes for 12 x and 12 y values open. (can use all 12 or not). -on Go - the NetLogo space shows up with an XY grade that bounds the listed points. From a location at the group mean XY, the turtles migrate to their respective positions on the XY graph. Once these are settled, a pair of invisible turtles leave the mean XY nest in opposite directions to draw the best fit straight line between the points. The values of mean x, mean y, Linear regression equation line are calculated and displayed in a comment box on the side. Variation, add slider for 1-3 groups. These are plotted in different color turtles. These turtles start from the XY mean of their group and migrate to their own XY points. The mean XY nest for each group stays on the screen. These group nests have the same color as the turtles in their group. As well a grand mean XY group nest emerges and two invisible turtles draws a best fit straight line between the mean points but covering the whole data field. An Anova test can be done with the groups to show the F-test and a Tukey test result can be do to show the multiple post hoc tests. If the group means are not statistically different they would get the same shape. 2) 3-D version. Do the same as about but in 3D for 12 sets of points each with an X, Y and Z value. Turtles are born at the Mean X,Y,Z point and migrate to their respective xyz coordinates. The best fit line is created by two invisible turtles that draw lines away from the Mean X,Y,Z nest point to form the linear regression line. The mean values and the Pearson and regression line info is calculated and written in a comment box off the Net Logo space. In 3-D mode the true best fit line is drawn. If there is more that 1 group, the turtles for each group are born at the group mean xyz point and migrate to their respective xyz position. Then a grand XYZ point opens and the two invisible turtles draw the 3-D linear regression. All the appropriate means, Std, n-values, Pearson regression and regression equation values are calculated as well as a 1-way anova table and the Tukey group difference results off in a separated comment box. Comments: These plots don't really make use of NetLogo to calculate the statistics. that would be done by math in the code. The fun part would be to watch the turtles migrate from their group mean location. Also the plotting of the XYZ space would be interesting especially if accurate as there are very few programs that visually plot points in 3-D Bye for now, George gDombi@chm.uri.edu

## Posted about 9 years ago