Input.interp: Difference between revisions

Latest revision as of 17:44, 24 January 2013

Redirect to:

OpasnetUtils/Interpret

@@ Line 1: / Line 1: @@
-[[Category:R tool]]
+#REDIRECT [[OpasnetUtils/Interpret]]
-{{method|moderator=Jouni|stub=Yes}}
-'''input.interp''' is an R function that interprets model inputs from a user-friendly format into explicit and exact mathematical format. The purpose is to make it easy for a user to give input without a need to worry about technical modelling details.
-==Question==
-What should be a list of important user input formats, and how should they be interpreted?
-==Answer==
-The basic feature is that if a text string can be converted to a meaningful numeric object, it will be. This function can be used when data is downloaded from [[Opasnet Base]]: if Result.Text contains this kind of numeric information, it is converted to numbers and fused with Result.
-n is the number of iterations in the model. # is any numeric character in the text string.
-{| {{prettytable}}
-!Example!!Regular expression!!Interpretation !! Output in R
-|----
-| 12 000 ||# # || 12000. Text is interpreted as number if space removal makes it a number. || as.numeric(gsub(" ", "", Result.text))
-|----
-| 12,345 ||#,# || 12.345. Commas are interpreted as decimal points. || as.numeric(gsub(",", ".", Result.text)) # Note! Do not use comma as a thousand separator!
-|----
-| -14,23 || -# || -14.23. Minus in the beginning of entry is interpreted as minus, not a sign for a range. ||
-|----
-| 50 - 125 ||# - # ||Uniform distribution between 50 and 125 || data.frame(iter=1:n, result=runif(n,50,125))
-|----
-| -12 345 - -23,56 || || Uniform distribution between -12345 and -23.56. ||
-|----
-| 1 - 50 ||# - # || Loguniform distribution between 1 and 50 (Lognormality is assumed if the ratio of upper to lower is => 30) ||
-|----
-| 3.1 ± 1.2 or 3.1 +- 1.2||# ± # or # +- # ||Normal distribution with mean 3.1 and SD 1.2 || data.frame(iter=1:n, result=rnorm(n,3.1,1.2))
-|----
-| 2.4 (1.8 - 3.0) || # (# - #) ||Normal distribution with mean 2.4 and 95 % confidence interval from 1.8 to 3.0 || data.frame(iter=1:n, result=rnorm(n,2.4,(3.0-1.8)/2/1.96))
-|----
-| 2.4 (2.0 - 3.2) || # (# - #) ||Lognormal distribution with mean 2.4 and 95 % confidence interval from 2.0 to 3.0. Lognormality is assumed if the difference from mean to upper limit is => 50 % greater than from mean to lower limit.||
-|----
-| 24 - 35 (odds 5:1) || # - # (odds #:#) || Odds is five to one that the truth is between 24 and 35. How to calculate this, I don't know yet, but there must be a prior.|| {{attack|# |I am not sure whether this is actually needed. Who expresses uncertainties in this way?|--[[User:Jouni|Jouni]] 14:00, 28 December 2011 (EET)}}
-|----
-| 2;4;7 || || Each entry (2, 4, and 7 in this case) are equally likely to occur. Entries can also be text.||
-|----
-| * (in index, or explanatory, columns) || || The result applies to all locations of this index.|| With merge() function, this column is not used as a criterion when these rows are merged.
-|}
-How to actually make this happen in R?
-# Make a temporary result ''temp'' by removing all spaces from Result.Text. Columns: ''Indices,Result.Result.Text,temp'' (Indices contains all explanatory columns.)
-# Replace all "," with "."
-# Check if there are parentheses "()". If yes, assume that they contain 95 % CI.
-# Check if there are ranges "#-#".
-# Divide the rows of the data.frame into two new data.frames with the same list of columns (''Indices,Result'').
-#* If temp is a syntactically correct distribution, take the row to data.frame A and replace ''Result'' with ''temp''.
-#* Otherwise, take the row to data.frame B and replace ''Result'' with ''Result.Text'' if that is not NA.
-# Create a new data.frame with index Iter = 1:n.
-# Make a random sample from each probability distribution in data.frame A using Iter.
-# Merge the data.frame B with Iter.
-# Join data.frames A and B with rbind(). Columns: ''Iter,Index,Result''.
-<rcode>
-lmean <- function(parmean, parsd) {return(log(parmean)-log(1+(parsd^2)/(parmean^2))/2)}
-lsd <- function(parmean, parsd) {return(log(1+(parsd^2)/(parmean^2)))}
-input.interp <- function(res.char, n = 1000) {
-	res.char <- gsub(" ", "", res.char)
-	res.char <- gsub(",", ".", res.char)
-	plusminus <- gregexpr(paste("\\+-|", rawToChar(as.raw(177)), sep = ""), res.char) # saattaa osoittautua ongelmaksi enkoodauksen vuoksi
-	plusminus.length <- sapply(plusminus, length)
-	plusminus.exists <- unlist(plusminus)[cumsum(c(0, plusminus.length[-length(plusminus.length)])) + 1] > 0
-	minus <- gregexpr("-", res.char)
-	minus.length <- sapply(minus, length)
-	minus.exists <- unlist(minus)[cumsum(c(0, minus.length[-length(minus.length)])) + 1] > 0
-	brackets <- gregexpr("\\(.*\\)", res.char) # matches for brackets "(...)"
-	brackets.length <- as.numeric(unlist(lapply(brackets, attributes)))
-	brackets.pos <- unlist(brackets)
-	out <- list()
-	for(i in 1:length(res.char)) {
-		if(brackets.pos[i] >= 0) {
-			minus.relevant <- unlist(minus)[(cumsum(c(0, minus.length)) + 1)[i]:cumsum(minus.length)[i]] # Meni hieman monimutkaiseksi ylla olevan vektorisoinnin vuoksi
-			n.minus.inside.brackets <- sum(minus.relevant > brackets.pos[i] & minus.relevant < brackets.pos[i] + brackets.length[i])
-			imean <- as.numeric(substr(res.char[i], 1, brackets.pos[i] - 1))
-			if(n.minus.inside.brackets == 1) {
-				ici <- c(as.numeric(substr(res.char[i], brackets.pos[i] + 1, minus.relevant[minus.relevant > brackets.pos[i]] - 1)), as.numeric(substr(res.char[i],
-					minus.relevant[minus.relevant > brackets.pos[i]] + 1, brackets.pos[i] + brackets.length[i] - 2)))
-				isd <- sum(abs(ici - imean) / 2) / qnorm(0.975)
-				if((ici[2] - imean) / (ici[1] - imean) < 1.5) {
-					out[[i]] <- rnorm(n, imean, isd)
-				} else {
-					out[[i]] <- rlnorm(n, lmean(imean, isd), lsd(imean, isd)) # menee vaarin koska isd on laskettu normaalijakaumalle
-				}
-			} else
-			if(n.minus.inside.brackets == 2|n.minus.inside.brackets == 3) {
-				# consecutive.minuses <-  minus.relevant + 1 == c(minus.relevant[2:length(minus.relevant)], 0) # turha jos oletetaan etta ensimmainen luku sulkujen sisalla on aina pienempi
-				ici <- c(as.numeric(substr(res.char[i], brackets.pos[i] + 1, minus.relevant[minus.relevant > brackets.pos[i]][2] - 1)), as.numeric(substr(res.char[i],
-					minus.relevant[minus.relevant > brackets.pos[i]][2] + 1, brackets.pos[i] + brackets.length[i] - 2)))
-				isd <- sum(abs(ici - imean) / 2) / qnorm(0.975)
-				out[[i]] <- rnorm(n, imean, isd)
-			} else out[[i]] <- paste("Unable to interpret \"", res.char[i], "\"", sep = "")
-		} else {
-			if(minus.exists[i]) {
-				minus.relevant <- unlist(minus)[(cumsum(c(0, minus.length)) + 1)[i]:cumsum(minus.length)[i]]
-				if(length(minus.relevant)==1) {if(as.numeric(substr(res.char[i], 1, minus.relevant - 1)) / as.numeric(substr(res.char[i], minus.relevant + 1, nchar(res.char[i]))) >= 1/100) {
-					out[[i]] <- runif(n, as.numeric(substr(res.char[i], 1, minus.relevant - 1)), as.numeric(substr(res.char[i], minus.relevant + 1, nchar(res.char[i]))))} else {
-					out[[i]] <- exp(runif(n, log(as.numeric(substr(res.char[i], 1, minus.relevant - 1))), log(as.numeric(substr(res.char[i], minus.relevant + 1, nchar(res.char[i]))))))}
-				} else {out[[i]] <- runif(n, as.numeric(substr(res.char[i], 1, minus.relevant[2] - 1)), as.numeric(substr(res.char[i], minus.relevant[2] + 1, nchar(res.char[i]))))}
-			} else {
-				if(plusminus.exists[i]) {
-					out[[i]] <- rnorm(n, as.numeric(substr(res.char[i], 1, plusminus[[i]][1] - 1)), as.numeric(substr(res.char[i], plusminus[[i]][1] + 1, nchar(res.char[i]))))
-				}
-			}
-		}
-	}
-	out
-}
-iter.f <- function(x) {
-:x
-}
-input.interp.df <- function(data, rescol = "Result.text", N = 1000) {
-	temp <- input.interp(data[, rescol], N)
-	temp.lengths <- sapply(temp, length)
-	out <- data[rep(1:nrow(data), each = temp.lengths),]
-	out$Interp.Result <- unlist(temp)
-	dim(temp.lengths) <- length(temp.lengths)
-	out$Iteration <- unlist(apply(temp.lengths, 1, iter.f))
-	out
-}
-</rcode>
-{{comment|# |Koodi on vielä vaiheessa, ottaa character vectorin alkiot ja antaa tulkinnat listana. Virhetoleranssi hyvin huono.|--[[User:Teemu R|Teemu R]] 03:09, 24 January 2012 (EET)}}
-{{comment|# |Data.framelle oma wrapperi. Testaamaton, ottaa ja antaa data.framen.|--[[User:Teemu R|Teemu R]] 14:15, 24 January 2012 (EET)}}

Input.interp: Difference between revisions

Latest revision as of 17:44, 24 January 2013

Navigation menu