# Diversity index

## Question

How to calculate diversity indices?

Use the function diversity to calculate the most common indices. These parameters are used:

• amount: the number (or proportion) of individuals of a species in a transect.
• species: an identifier for a species
• transect: an identifier for a transect
• q: exponent for the diversity calculation

Amount, species, and transect are vectors (i.e. ordered sets of values). The parameters can be given to the function in several different ways. This is hopefully practical for the user.

• Amount, species, and transect must be of same length.
• If parameter names are not used (e.g. diversity(param1, param2, param3)), it is assumed that they are given in this order: amount, species, transect, q.
• If individual data is given using species identifiers, it the parameter name must be given: diversity(species = param1).
• The following default values are used for parameters:
• Amount: 1 for each species, i.e. each species is equally abundant.
• Species: 1, 2, 3, ... n, where n = number of rows in data, i.e. each row is a different species.
• Transect: 1, i.e. there is only one transect.
• q: q.wiki. You MUST define an object q.wiki before using diversity, otherwise alpha diversities will be calculated wrong. q.wiki can be a single number or a vector of numbers.

There are several ways to upload your data so that you can use the diversity function.

• Upload your data to Opasnet Base and call for the data using op_baseGetData function.
• Upload a CSV file to Opasnet using Special:Upload and call for the data using opasnet.data function.
• Use Example 1 code below to enter your data. The data must be in format c(4,6,2,5,7,4,3,9) where the values are either
• identifiers of the species 1,2,3... in which the individuals belong (one entry per individual), or
• abundancies of species, i.e. proportions or amounts of individuals belonging to each species among the whole population (one entry per species).

## Rationale

Actual function diversity

### Initiate functions

 ``` library(OpasnetUtils) ####NOTE! q.wiki MUST BE DEFINED OR ALPHA DIVERSITIES WILL BE CALCULATED WRONG ####### qD calculates true i.e. gamma diversity for abundance pi with exponent q. qD <- function(pi, q = q.wiki){ # q.wiki is the q given by the user. pi <- pi/sum(pi) out <- rep(NA, length(q)) for(i in 1:length(q)){ if(q[i] == 1) {out[i] <- exp(-sum(pi * log(pi)))} else if( q[i] == 999){ out[i] <- 1/max(pi)} else {out[i] <- (sum(pi^q[i]))^(1 / (1 - q[i])) } } q[q == 999] <- "∞" names(out) <- q return(out) } ###### qDa calculates alpha diversity using gamma diversities of each transect. qDa <- function(divj, wj = 1, q = q.wiki){ q. <- rep(q, times = length(wj)) wj <- rep(wj, each = length(q)) minqDj <- tapply(divj, q., min) # Should this be the inverse? out <- ifelse(q. == 1, wj * log(divj), wj * divj^(1-q.)) out <- tapply(out, q., sum) out <- ifelse(q == 1, exp(out), out^(1 / (1 - q))) out[q == 999] <- minqDj[q == 999] q[q == 999] <- "∞" names(out) <- q return(out) } ####### diversity is the function for the user. It calculates several diversity ####### indices. Parameters: ## amount: the number of individuals of a species in a transect. ## species: an identifier for a species ## transect: an identifier for a transect ## q: exponent for the diversity calculation diversity <- function(amount = rep(1,length(species)), species = 1:length(amount), transect = 1, q = q.wiki){ pij <- as.data.frame(as.table(tapply(amount, data.frame(Transect = transect, Species = species), sum))) colnames(pij)[3] <- "pij" pij <- dropall(pij[!is.na(pij\$pij) & pij\$pij != 0, ]) pij\$pij <- pij\$pij/sum(pij\$pij) wj <- as.data.frame(as.table(tapply(pij\$pij, pij\$Transect, sum))) pi <- as.data.frame(as.table(tapply(pij\$pij, pij\$Species, sum)))[, 2] colnames(wj) <- c("Transect", "wj") out <- merge(pij, wj) out <- tapply(out\$pij, out\$Transect, qD) qDj <- NULL for(i in 1:nrow(wj)){ qDj <- c(qDj, out[[i]]) } gamma <- qD(pi, q) alpha <- qDa(qDj, wj\$wj, q) dim(qDj) <- c(length(q), length(qDj)/length(q)) q[q == 999] <- "∞" dimnames(qDj) <- list(q, levels(as.factor(pij\$Transect))) return(list(gamma = gamma, alpha = alpha, tr.gammas = t(qDj))) } ########## diversity.table makes a data table out of the results. diversity.table <- function(amount = rep(1,length(species)), species = 1:length(amount), transect = 1, q = q.wiki){ temp <- diversity(amount, species, transect, q) gamma <- temp[["gamma"]] alpha <- temp[["alpha"]] beta. <- gamma / alpha beta.A <- gamma - alpha beta.W <- gamma / alpha - 1 beta.P <- 1 - alpha / gamma qt <- gsub("999", "∞", q) outlabel <- c( paste("Gamma diversity with q=", qt, sep=""), paste("Alpha diversity with q=", qt, sep=""), paste("Beta diversity (strict) with q=", qt, sep=""), paste("Beta diversity (absolute turnover) with q=", qt, sep=""), paste("Beta diversity (Whittaker's turnover) with q=", qt, sep=""), paste("Beta diversity (proportional turnover) with q=", qt, sep="")) outsymbol <- c( paste(qt, "D", sep=""), paste(qt, "Da", sep=""), paste(qt, "Beta.", sep=""), paste(qt, "Beta.A", sep=""), paste(qt, "Beta.W", sep=""), paste(qt, "Beta.P", sep="")) out <- data.frame(Name = c(outlabel, "Richness", "Shannon index", "Simpson index", "Inverse Simpson index", "Gini-Simpson index", "Berger-Parker index"), Symbol = c(outsymbol, "S", "H' or log(1D)", "λ or 1/(2D)", "1/λ or 2D", "1-λ or 1-1/(2D)", "1/(∞D)"), Value = c(gamma, alpha, beta., beta.A, beta.W, beta.P, length(pi), log(qD(pi,1)), 1/qD(pi,2), qD(pi,2), 1-1/qD(pi,2), 1/qD(pi,999))) return(out) } ##################### Data management function longfo <- function( spraw, # Species data.frame in wide format trapraw, # Trap data.frame id.vars = character() # Column names used as identifiers (not species mearurements) ) { a <- merge(trapraw, spraw) # Assumed that the only common column is the Trap identifier. a\$StartDay <- as.POSIXct(paste(a\$StartYear, a\$StartMonth, a\$StartDay, sep = "-"), tz = "UTC") a\$EndDay <- as.POSIXct(paste(a\$EndYear, a\$EndMonth, a\$EndDay, sep = "-"), tz = "UTC") a\$NumDays <- as.numeric(a\$EndDay - a\$StartDay) a <- a[ , !colnames(a) %in% c("StartYear", "StartMonth", "EndYear", "EndMonth")] id.vars <- colnames(a)[colnames(a) %in% union(id.vars, c("Trap", "StartDay", "EndDay", "NumDays", "TrapID", "SeqInTrap", "SampleCode", "SoilType", "Region", "Latitude"))] # The typical id.vars used. a <- melt(a, id.vars = id.vars, variable.name = "Species", value.name = "Individuals" ) return(a) } ########## THIS FUNCTION CREATES A SAMPLE OF INDIVIDUAL SAMPLING PERIODS ########## AND THEN AGGREGATES ACROSS THE GROUPS. prepare <- function( dat, # data.frame with the data samp = TRUE, # do we sample or take rows in order? lens = c(0, 1, 2, 3, 5, 7, 10, 14, 20, 30, 50, 70, 100, 99999999), # Numbers of lengths in samples reps = 2, # how many repetitions? cols = c("TrapID", "Species"), # Column names used for grouping verbose = FALSE # Verbose returns a list with out as well as a. ) { cols <- colnames(dat)[colnames(dat) %in% cols] groups <- aggregate(dat\$Individuals, by = dat[cols], FUN = length) # This code makes a list of lengths that is used for index j to avoid overwhelmingly many samples for large groups. # First, possible lengths of a sample are determined by len. Second, the number of observations in each group # determine how far you go in the list. lengths <- list() for(i in 1:length(groups\$x)) lengths[[i]] <- c(lens[1:(findInterval(groups\$x[[i]], lens) - 1)], groups\$x[[i]])[-1] if(!samp) reps <- 1 a <- data.frame() # Sampled data.frame going to aggregation for(i in 1:nrow(groups)) { for(j in lengths[[i]]) { for(k in 1:reps) { if(samp) ro <- sample(groups\$x[i], j, replace = FALSE) else ro <- 1:j if(j != 0) {a <- rbind(a, data.frame( Len = j, Reps = k, merge(dat, groups[i , colnames(groups) != "x"])[ro , ] ))} } } } if(verbose) { print(head(a)) print(nrow(a)) } ##### Create new variable Spec which is the first occurrence of a species in a group. cols <- union(cols, c("Len", "Reps")) a <- a[order(a\$StartDay) , ] # Order by start day. a\$Spec <- 0 rownames(a) <- 1:nrow(a) a\$Spec[as.numeric(rownames(unique(subset(a, Individuals > 0, cols))))] <- 1 ##### AGGREGATION FROM THE SAMPLED PERIODS out <- aggregate(a[colnames(a) %in% c("Individuals", "Spec", "NumDays")], by = a[cols], FUN = sum) temp <- aggregate(a[colnames(a) %in% c("Region")], by = a[cols], FUN = function(x) x[1]) out <- cbind(out, temp[!colnames(temp) %in% cols]) # Remove redundant columns before cbind. temp <- aggregate(a[colnames(a) %in% c("SoilType", "TrapID")], by = a[cols], FUN = function(x) paste(x, collapse = " ")) out <- cbind(out, temp[!colnames(temp) %in% cols]) temp <- aggregate(a[colnames(a) %in% c("StartDay")], by = a[cols], FUN = min) out <- cbind(out, temp[!colnames(temp) %in% cols]) temp <- aggregate(a[colnames(a) %in% c("EndDay", "Len", "Reps")], by = a[cols], FUN = max) out <- cbind(out, temp[!colnames(temp) %in% cols]) out\$SamplingGap <- as.numeric(out\$EndDay - out\$StartDay - out\$NumDays) if(verbose) return(list(out, a)) else return(out) } objects.store(qD, qDa, diversity, diversity.table, longfo, prepare) cat("Functions qD, qDa, diversity, and diversity.table, longfo, prepare stored.\n") ```

The function diversity produces a list where the first, second, and third element are the gamma, the alpha, and transect-specific gamma diversities, respectively.

Function diversity.table produces a data.frame of several diversity indices.

### Examples

Example 1 to use function

 Give your data in R format or leave empty for example data:Is your data individual data or group abundancies?:Individual Group ```library(xtable) q.wiki = 0 if(is.null(data)) {data <- c(1,4,6,9,3,4,5,6,5,4,3,3,5,5,7,5,5,4,3,4,5,6,8,9,5,4,5,4,3,3,4,9,6,6,4,5,3,2,1,1,2,3,4,3,2,3,4,5,6,7,7)} data diversity(data) if(individual==TRUE) out <- diversity(species = data, q = q.wiki) else out <- diversity(amount = data, q = q.wiki) print(xtable(out), type = 'html') ```

Example 2

 Select your data:Tigre Which q values you want to calculate.: 0 0.5 1 2 3 6 ∞ ```library(OpasnetBaseUtils) library(xtable) data <- opasnet.csv(wikidata, wiki = "opasnet_en", sep = ",") data <- data[data\$Species != 1, ] species <- data\$Species amount <- data\$Individuals transect <- data\$Transect diversity(amount, species, transect, q.wiki) print(xtable(diversity(amount, species, transect, q.wiki)[[3]]), type = 'html') print(xtable(diversity.table(amount, species, transect, q.wiki)), type = 'html') ```

The data should be given in R format as a list of values in parenthesis, beginning with c:

```c(3,5,3,5,2,1,3,3,4,2) or equivalently c(0.1,0.2,0.4,0.1,0.2)
```

where the values are either

• identifiers of the species 1,2,3... in which the individuals belong (one entry per individual), or
• abundancies of species, i.e. proportions of individuals belonging to each species among the whole population (one entry per species).

### Calculations

Diversity indices are thoroughly described in Wikipedia.