Probability vs nonprobability sampling

1 collaborator

WHAT IS IT?

This model should help students in understanding of basic research sampling concepts and methods - simple random (probability) sampling and convenience (nonprobability) sampling. It also shows differences in estimates that these methods provide.

HOW IT WORKS

Research population of households is being generated and then it is possible to virtually survey samples from this population about their income. Data from these surveys are being used for calculating estimates of average income for the whole population.

Two sampling methods are available. Simple random sampling randomly chooses selected amount of households to survey. Convenience sampling randomly generates the first household and then chooses other households in its neighborhood.

HOW TO USE IT

At first, population for sampling has to be generated using Setup button. It is possible to define some basic parameters for the population to be generated - number of settlements, average settlement size and standard deviation of the settlement size.

After the first step, an average income and distribution of income of the generated population is being shown. Estimates that are calculated from samples being surveyed by different methods and with different sizes then can be easily compared to the known population characteristics (average income).

Then 2 decisions about sampling have to be made: 1) Sample size - can be set up by the appropriate slider 2) Sampling method - can be chosen by clicking the appropriate button

It is also possible to repeat sampling 100 times which shows a more general pattern of differences between estimates and the real value of average income.

THINGS TO NOTICE

Simple random sampling (SRS) gives better and with increasing sample size also more precise results (i.e. with smaller estimate error). It is due to the way how the probability laws are being involved. It is also quite clear that a meaningful statistical error may be calculated for SRS - which can be vizualized on the chart with 100 samples (for 95 % confidence level it is possible to find and show the lowest error value with 5 occurences, i.e. 5 % of 100).

Convenience sampling and estimates based on that typically gives very confounding results and there is no clear relationship between the sample size and errors in estimates. This shows, how for instance sampling based on Facebook friends network or geographical closeness may be very problematic for further generalization to the whole population.

THINGS TO TRY

It is possible to explore the relationship between sampling methods, sample size and errors in estimates. E.g. to try 100 samples for 5 different sample sizes (30, 100, 200, 500, 1000) and compare results.

EXTENDING THE MODEL

The income is being generated based on settlement size and the final distribution is not statistically normal which could be changed.

CREDITS AND REFERENCES

Permission to use, modify or redistribute this model is hereby granted, provided that both of the following requirements are followed:

a) this copyright notice is included.

b) this model will not be redistributed for profit without permission from Viktor Vojtko, Faculty of Economics, University of South Bohemia. Contact the author for appropriate licenses for redistribution for profit.

Comments and Questions

Viktor Vojtko

Model description

## WHAT IS IT? This model should help students in understanding of basic research sampling concepts and methods - simple random (probability) sampling and convenience (nonprobability) sampling. It also shows differences in estimates that these methods provide. ## HOW IT WORKS Research population of households is being generated and then it is possible to virtually survey samples from this population about their income. Data from these surveys are being used for calculating estimates of average income for the whole population. Two sampling methods are available. Simple random sampling randomly chooses selected amount of households to survey. Convenience sampling randomly generates the first household and then chooses other households in its neighborhood. ## HOW TO USE IT At first, population for sampling has to be generated using Setup button. It is possible to define some basic parameters for the population to be generated - number of settlements, average settlement size and standard deviation of the settlement size. After the first step, an average income and distribution of income of the generated population is being shown. Estimates that are calculated from samples being surveyed by different methods and with different sizes then can be easily compared to the known population characteristics (average income). Then 2 decisions about sampling have to be made: 1) Sample size - can be set up by the appropriate slider 2) Sampling method - can be chosen by clicking the appropriate button It is also possible to repeat sampling 100 times which shows a more general pattern of differences between estimates and the real value of average income. ## THINGS TO NOTICE Simple random sampling (SRS) gives better and with increasing sample size also more precise results (i.e. with smaller estimate error). It is due to the way how the probability laws are being involved. It is also quite clear that a meaningful statistical error may be calculated for SRS - which can be vizualized on the chart with 100 samples (for 95 % confidence level it is possible to find and show the lowest error value with 5 occurences, i.e. 5 % of 100). Convenience sampling and estimates based on that typically gives very confounding results and there is no clear relationship between the sample size and errors in estimates. This shows, how for instance sampling based on Facebook friends network or geographical closeness may be very problematic for further generalization to the whole population. ## THINGS TO TRY It is possible to explore the relationship between sampling methods, sample size and errors in estimates. E.g. to try 100 samples for 5 different sample sizes (30, 100, 200, 500, 1000) and compare results. ## EXTENDING THE MODEL The income is being generated based on settlement size and the final distribution is not statistically normal which could be changed. ## CREDITS AND REFERENCES (c) 2014 Viktor Vojtko, Faculty of Economics, University of South Bohemia. All rights reserved. Permission to use, modify or redistribute this model is hereby granted, provided that both of the following requirements are followed: a) this copyright notice is included. b) this model will not be redistributed for profit without permission from Viktor Vojtko, Faculty of Economics, University of South Bohemia. Contact the author for appropriate licenses for redistribution for profit.

Posted over 11 years ago

Click to Run Model

breed [households household]

patches-own [
  psize
]

households-own [
  income
]

globals [
  average-estimate
  average-100-estimates
  n
]

to setup
  clear-all
  
  set-default-shape households "house"
  
  ask patches [set pcolor green]
  
  ask n-of No-of-settlements patches [
    set psize random-normal Average-settlement-size Std-dev-settlement-size
    sprout-households psize [
      set size 0.5
      set income psize * 2 + random-float 3
      set color white
      left random-float 360
      jump random-normal 0 psize / 10000
    ]
  ] 
  
  reset-ticks
end 

to random-sample
  let pom 0
  ask households [set color white]
  ask n-of Sample-size households [
    set color red
    set pom pom + income
  ]
  set average-estimate pom / Sample-size 
end 

to hundred-random-samples
  set average-100-estimates 0
  let pom 0
  set n 0
  clear-plot
  repeat 100 [
    set n n + 1
    random-sample
    set pom pom + abs (average-estimate - mean [income] of households) * 100 / mean [income] of households
    tick
  ]
  set average-100-estimates pom / 100
end 

to convenience-sample
  let pom 0
  ask households [set color white]
  ask one-of patches with [((count turtles-here + count turtles-on neighbors) >= Sample-size) and (count turtles-here > 0)] [
    ask one-of households-here [
      set color red
      ask n-of Sample-size households in-radius 7 [
        set color red
        set pom pom + income
      ]
      set average-estimate pom / Sample-size 
    ]
  ]
end 

to hundred-convenience-samples
  set average-100-estimates 0
  let pom 0
  set n 0
  clear-plot
  repeat 100 [
    set n n + 1
    convenience-sample
    set pom pom + abs (average-estimate - mean [income] of households) * 100 / mean [income] of households
    tick
  ]
  set average-100-estimates pom / 100
end

There are 3 versions of this model.

Uploaded by	When	Description	Download
Viktor Vojtko	over 11 years ago	Info added	Download this version
Viktor Vojtko	over 11 years ago	Info added	Download this version
Viktor Vojtko	over 11 years ago	Initial upload	Download this version

Attached files

File	Type	Description	Last updated
Probability vs nonprobability sampling.png	preview	Preview for 'Probability vs nonprobability sampling'	over 11 years ago, by Viktor Vojtko	Download

This model does not have any ancestors.

This model does not have any descendants.

NetLogo