RDS network

The sspse package

This is an R package to implement successive sampling population size estimation (SS-PSE).

SS-PSE is used to estimate the size of hidden populations using respondent-driven sampling (RDS) data. The package can implement SS-PSE, visibility SS-PSE, and capture-recapture SS-PSE.

The package was developed by the Hard-to-Reach Population Methods Research Group (HPMRG).

sspse banner

Installation

The package is available on CRAN and can be installed using

install.packages("sspse")

To install the latest development version from github, the best way it to use git to create a local copy and install it as usual from there. If you just want to install it, you can also use:

# If devtools is not installed:
# install.packages("devtools")

devtools::install_github("HPMRG/sspse")

Implementation

Load package and example data

library(sspse)
data(fauxmadrona)

fauxmadrona is a simulated RDS data set with no seed dependency, which is used to demonstrate RDS estimators. It has the format of an rds.data.frame and is a sample of size 500 with 10 seeds and 2 coupons from a population of size 1000. For the purpose of this example, we will assume the population size is unknown and our goal is to estimate it.

We can make a quick visualization of the recruitment chains, where the size of the node is proportional to the reported degree and the color represents separate chains.

reingold.tilford.plot(fauxmadrona, 
                      vertex.label=NA, 
                      vertex.size="degree",
                      show.legend=FALSE,
                      vertex.color="seed")

The posteriorsize() function

The function that will perform both the original and visibility variants of SS-PSE is called posteriorsize(). It requires some prior knowledge about the population size, $N$, which is usually expressed using the median.prior.size= argument.

Although there are many options within the posteriorsize function, most can be left at their default values unless you have a specific reason to believe they should be set differently.

Original SS-PSE example

Set visibility=FALSE. By default, 1000 samples will be drawn from the posterior distribution for $N$ using a burnin of 1000 and an interval of 10. This may take a few seconds to run.

fit1 <- posteriorsize(fauxmadrona, 
              median.prior.size=1000,
              visibility=FALSE)
## Using non-measurement error model with K = 14.
## Taken 1 samples...
## Taken 2 samples...
## Taken 4 samples...
...
## Taken 500 samples...
## Taken 1000 samples...

Plot the posterior distribution for $N$.

plot(fit1, type="N")

Create a table summary for the prior and posterior distributions for population size, specifying that we are interested in a 90% credible interval for $N$.

summary(fit1, HPD.level = 0.9)
## Summary of Population Size Estimation
##           Mean Median Mode 25%  75%  90%  5%  95%
## Prior     1247   1000  680 748 1480 2240 583 2852
## Posterior  974    936  874 808 1100 1275 656 1400

Visibility SS-PSE example

Set visibility=TRUE. Because of the measurement error model, this model will take a little longer to fit - perhaps a minute or so.

fit2 <- posteriorsize(fauxmadrona, 
              median.prior.size=1000,
              visibility=TRUE)
## Using a Exponentially Weighted Poisson measurement error model with K = 35.

## computing ...
## Taken 1 samples...
## Taken 2 samples...
...
## Taken 500 samples...
## Taken 1000 samples...

Summary of Population Size Estimation

Plot the posterior distribution for $N$.

plot(fit2, type="N")

Create a table summary for the prior and posterior distributions for population size, specifying that we are interested in a 90% credible interval for $N$.

summary(fit2, HPD.level = 0.9)
## Summary of Population Size Estimation
##           Mean Median Mode 25%  75%  90%  5%  95%
## Prior     1247   1000  680 748 1480 2240 583 2852
## Posterior 1275   1061  839 823 1486 2156 609 2732

Resources

Please use the GitHub repository to report bugs or request features: https://https://github.com/HPMRG/sspse

See the following papers for more information and examples:

Statistical Methodology

Applications