Input.interp: Difference between revisions
(testattu ja korjattu wrapperi dataframeille) |
(";" interpretation added, error message added) |
||
Line 92: | Line 92: | ||
isd <- sum(abs(ici - imean) / 2) / qnorm(0.975) | isd <- sum(abs(ici - imean) / 2) / qnorm(0.975) | ||
out[[i]] <- rnorm(n, imean, isd) | out[[i]] <- rnorm(n, imean, isd) | ||
} else out[[i]] <- paste("Unable to interpret \"", res.char[i], "\"", sep = "") | } else { | ||
out[[i]] <- NA | |||
warning(paste("Unable to interpret \"", res.char[i], "\"", sep = "")) | |||
} | |||
} else { | } else { | ||
if(minus.exists[i]) { | if(minus.exists[i]) { | ||
Line 105: | Line 108: | ||
} | } | ||
} | } | ||
} | } else { | ||
if(sum(unlist(strsplit(res.char[i], ""))==";") > 0) out[[i]] <- sample(sapply(strsplit(res.char[i], ";"), as.numeric), N, replace = TRUE) | |||
} else {out[[i]] <- NA; warning(paste("Unable to interpret \"", res.char[i], "\"", sep = ""))} | |||
} | } | ||
out | out |
Revision as of 21:54, 29 January 2012
Moderator:Jouni (see all) |
This page is a stub. You may improve it into a full page. |
Upload data
|
input.interp is an R function that interprets model inputs from a user-friendly format into explicit and exact mathematical format. The purpose is to make it easy for a user to give input without a need to worry about technical modelling details.
Question
What should be a list of important user input formats, and how should they be interpreted?
Answer
The basic feature is that if a text string can be converted to a meaningful numeric object, it will be. This function can be used when data is downloaded from Opasnet Base: if Result.Text contains this kind of numeric information, it is converted to numbers and fused with Result.
n is the number of iterations in the model. # is any numeric character in the text string.
Example | Regular expression | Interpretation | Output in R |
---|---|---|---|
12 000 | # # | 12000. Text is interpreted as number if space removal makes it a number. | as.numeric(gsub(" ", "", Result.text)) |
12,345 | #,# | 12.345. Commas are interpreted as decimal points. | as.numeric(gsub(",", ".", Result.text)) # Note! Do not use comma as a thousand separator! |
-14,23 | -# | -14.23. Minus in the beginning of entry is interpreted as minus, not a sign for a range. | |
50 - 125 | # - # | Uniform distribution between 50 and 125 | data.frame(iter=1:n, result=runif(n,50,125)) |
-12 345 - -23,56 | Uniform distribution between -12345 and -23.56. | ||
1 - 50 | # - # | Loguniform distribution between 1 and 50 (Lognormality is assumed if the ratio of upper to lower is => 30) | |
3.1 ± 1.2 or 3.1 +- 1.2 | # ± # or # +- # | Normal distribution with mean 3.1 and SD 1.2 | data.frame(iter=1:n, result=rnorm(n,3.1,1.2)) |
2.4 (1.8 - 3.0) | # (# - #) | Normal distribution with mean 2.4 and 95 % confidence interval from 1.8 to 3.0 | data.frame(iter=1:n, result=rnorm(n,2.4,(3.0-1.8)/2/1.96)) |
2.4 (2.0 - 3.2) | # (# - #) | Lognormal distribution with mean 2.4 and 95 % confidence interval from 2.0 to 3.0. Lognormality is assumed if the difference from mean to upper limit is => 50 % greater than from mean to lower limit. | |
24 - 35 (odds 5:1) | # - # (odds #:#) | Odds is five to one that the truth is between 24 and 35. How to calculate this, I don't know yet, but there must be a prior. | ⇤--#: . I am not sure whether this is actually needed. Who expresses uncertainties in this way? --Jouni 14:00, 28 December 2011 (EET) (type: truth; paradigms: science: attack) |
2;4;7 | Each entry (2, 4, and 7 in this case) are equally likely to occur. Entries can also be text. | ||
* (in index, or explanatory, columns) | The result applies to all locations of this index. | With merge() function, this column is not used as a criterion when these rows are merged. |
How to actually make this happen in R?
- Make a temporary result temp by removing all spaces from Result.Text. Columns: Indices,Result.Result.Text,temp (Indices contains all explanatory columns.)
- Replace all "," with "."
- Check if there are parentheses "()". If yes, assume that they contain 95 % CI.
- Check if there are ranges "#-#".
- Divide the rows of the data.frame into two new data.frames with the same list of columns (Indices,Result).
- If temp is a syntactically correct distribution, take the row to data.frame A and replace Result with temp.
- Otherwise, take the row to data.frame B and replace Result with Result.Text if that is not NA.
- Create a new data.frame with index Iter = 1:n.
- Make a random sample from each probability distribution in data.frame A using Iter.
- Merge the data.frame B with Iter.
- Join data.frames A and B with rbind(). Columns: Iter,Index,Result.
----#: . Koodi on vielä vaiheessa, ottaa character vectorin alkiot ja antaa tulkinnat listana. Virhetoleranssi hyvin huono. --Teemu R 03:09, 24 January 2012 (EET) (type: truth; paradigms: science: comment) ----#: . Data.framelle oma wrapperi. Testaamaton, ottaa ja antaa data.framen. --Teemu R 14:15, 24 January 2012 (EET) (type: truth; paradigms: science: comment)