Converting distributions: from random individual to population average

This page is a knowledge crystal of subtype method. The page identifier is Op_en3544
Moderator:Jouni (see all)

Upload data {{#opasnet_base_link:Op_en3544}}

Converting distributions: from random individual to population average is a mathematical method to convert distributions. The starting point is a variable distribution that describes a property in a defined (sub)population, from which individuals are randomly drawn. The outcome distribution describes the uncertainty of the subpopulation mean. The method is operationalised in Open assessment functions.ANA Analytica tool file.

Scope

How can the distribution of random individuals be converted to the distribution of the subpopulation average?

Definition

Input

The starting point is a variable distribution a that describes a property in a defined population, from which individuals are randomly drawn.
The number n of observations from which the distribution was derived.
The number m of individuals in the subpopulation for which we want to estimate the average.

Output

The outcome distribution describes the uncertainty of the subpopulation mean.

Rationale

Let's assume that we want to estimate the average of a subpopulation, which consists of m individuals randomly drawn from the original population. The best estimate of the average is the average of the sample. The uncertainty about the average is the standard error of the sample. However, if we start from a distribution instead of a real population, we can draw a sample with any samplesize, and the standard error becomes infinitely small. This does not reflect our true knowledge, because the distribution we use is not based on a huge samplesize but usually a rather modest one. It may even be based on no sample at all, if expert elicitation is used.

A method is needed to reflect our uncertainty about the random individual distribution. The distribution itself does not contain this information, so we must provide it by other means. The observed sample size n is an intuitive way to do this. It is the number of observations that were collected to calculate the random individual distribution. Even if the distribution was not based on a particular study, a number that reflects the strength of evidence can be given (e.g., "the expert opinion is worth ten actual observations").

Result

The method is given as Analytica code for a function. The input parameters have been defined above. A description is added row by row.

Index i makes an additional dimension, which has the size of the observed data (n).
Local variable b is a table with dimensions i and run (the iterations), and random row numbers from run.
Each row number is changed to a realisation from the individual random distribution. These are averaged over i, which creates an average of n random draws from a. This is itself a distribution. It reflects the uncertainty about the mean of the subpopulation because of imperfect data.
Based on central limit theorem, the uncertainty of the subpopulation average is a normal distribution with a mean that was just calculated and a standard deviation that is the standard error of the original, random individual distribution.

Note! As n grows, the calculation may become slow. However, this function should not take a lot of memory. This has not yet been included in any Analytica file, so you need to create the function yourself.

<anacode> Parameters: (a: probabilistic; n, m)

Definition: index i:= 1..n; var b:= random(uniform(1,samplesize, integer:true), over:i,run); b:= average(a[run=b],i); normal(b,Sdeviation(a)/sqrt(m)) </anacode>

References