Object-oriented programming in Opasnet: Difference between revisions
mNo edit summary |
|||
(14 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
{{method|moderator=Jouni|stub=Yes}} | {{method|moderator=Jouni|stub=Yes}} | ||
[[Category:Code under inspection]] | |||
'''[[:en:Object-oriented programming|Object-oriented programming]]''' is an approach where programs (or, in '''Opasnet''', typically assessment models) have a modular structure in such a way that each part is considered as a separate object that has specific properties and interacts with other objects in standard ways. | '''[[:en:Object-oriented programming|Object-oriented programming]]''' is an approach where programs (or, in '''Opasnet''', typically assessment models) have a modular structure in such a way that each part is considered as a separate object that has specific properties and interacts with other objects in standard ways. | ||
Home work.2 | |||
1. What are S4 class objects | |||
'''Learning to work with modelling objects in Opasnet''' | '''Learning to work with modelling objects in Opasnet''' | ||
* [http://www.ustream.tv/ | * [http://www.ustream.tv/recorded/22619949 '''WATCH recorded shows online'''] | ||
* Objective: To learn the structure and use of S4 class objects oassessment and ovariable. | * Objective: To learn the structure and use of S4 class objects oassessment and ovariable. | ||
** How they are created in wiki? | ** How they are created in wiki? | ||
Line 67: | Line 77: | ||
Objects have two different implementations: wiki page in Opasnet, and S4 class object called ''ovariable'' (open assessment variable) in [[R-tools]]. The wiki page is the user-friendly interface for users, and ovariable is the versatile format for efficient, standardised modelling. The default direction for data is long (using the terminology in the merge function). | Objects have two different implementations: wiki page in Opasnet, and S4 class object called ''ovariable'' (open assessment variable) in [[R-tools]]. The wiki page is the user-friendly interface for users, and ovariable is the versatile format for efficient, standardised modelling. The default direction for data is long (using the terminology in the merge function). | ||
{{ | {{defend_invalid|# |Should we have attribute "target" that defines the target of the variable estimate. For example, "height" may estimate the whole variation of heights of individuals in a population, or it may estimate the mean height of the population. Somehow the population, the target that is the basic unit (individual in this case) and the statistical parameter should be explicitly described. Can this be done by using vector attributes that have a value for each index column in the sample? Is this an index-specific issue, or variable-specific?|--[[User:Jouni|Jouni]] 07:50, 9 April 2012 (EEST)}} | ||
: {{attack|# |I don't understand what I am talking about. Maybe this should be removed.|--[[User:Jouni|Jouni]] 17:07, 16 October 2012 (EEST)}} | |||
{| {{prettytable}} | {| {{prettytable}} | ||
Line 73: | Line 84: | ||
! What it contains | ! What it contains | ||
! How implemented in the wiki | ! How implemented in the wiki | ||
! How implemented in the R-tools as a S4 class object | ! How implemented in the R-tools as a S4 class object ovariable | ||
|---- | |---- | ||
! colspan="4"|These attributes are needed in R-tools. | ! colspan="4"|These attributes are needed in R-tools. | ||
|---- | |---- | ||
| '''name | |||
| The name of the object. | |||
| The name of the wiki page. | |||
| The Name slot in ovariable; it must be the same as the name of the ovariable object. | |||
|---- | |||
| '''data | | '''data | ||
| Observations, expert judgement, discussions, and other pieces of information. | | Observations, expert judgement, discussions, and other pieces of information. | ||
| Subheading under Rationale | | Subheading under Rationale | ||
| Slot data = "data.frame". The data frame | | Slot data = "data.frame". The data frame can contain Obs and Unit columns, at least one index column, and Result as the observation column. However, it must not contain Iter column. | ||
|---- | |---- | ||
| ''' | | '''output | ||
| | | The output of the calculations. | ||
| Not shown | | Not shown | ||
| Slot | | Slot output = "data.frame". The data frame may contain columns Iter, Obs, Unit, one column for each index, and ''Name''Result (where ''Name'' is the name of the object). | ||
|---- | |---- | ||
| '''marginal | | '''marginal | ||
Line 91: | Line 107: | ||
| Not implemented in wiki. | | Not implemented in wiki. | ||
| Slot marginal = "vector". Especially with indices with lots of locations, joint distribution needs much less memory. | | Slot marginal = "vector". Especially with indices with lots of locations, joint distribution needs much less memory. | ||
|---- | |||
| '''dependencies | |||
| A list of dependencies, i.e. objects that are causally upstream. | |||
| List of links to other pages under subheading Dependencies. | |||
| Slot dependencies = data.frame. Dependencies contains columns Name that contains object names and Key that contains keys for model runs. | |||
|---- | |---- | ||
| '''formula | | '''formula | ||
| A computer code or algorithm to derive the answer from rationale and objects listed in dependencies. The formula may assume a deterministic dependency (e.g. y <- k*x + b), a conditional probability structure (y ~ dnorm(x, sd)), or a rank correlation matrix. | | A computer code or algorithm to derive the answer from rationale and objects listed in dependencies. The formula may assume a deterministic dependency (e.g. y <- k*x + b), a conditional probability structure (y ~ dnorm(x, sd)), or a rank correlation matrix. | ||
| Subheading under Rationale, often using <rcode> tags. | | Subheading under Rationale, often using <rcode> tags. | ||
| Slot formula = " | | Slot formula = "function". Formula contains the R code that is needed for calculating the output. | ||
|---- | |---- | ||
! colspan="4"|These attributes are not (yet) implemented in R-tools. | ! colspan="4"|These attributes are not (yet) implemented in R-tools. {{attack|# |The text below is outdated. Don't believe all details.|--[[User:Jouni|Jouni]] 17:07, 16 October 2012 (EEST)}} | ||
|---- | |---- | ||
| '''question | | '''question | ||
Line 138: | Line 159: | ||
| Subheading under Rationale with plain text. Also mentioned in the data table with parameter unit. If data table rows differ in units, there must be a Unit index. | | Subheading under Rationale with plain text. Also mentioned in the data table with parameter unit. If data table rows differ in units, there must be a Unit index. | ||
| There is no separate slot for unit. The unit is merged with data and subsequently with sample. This must be done, because if rows are ordered, it is impossible to attach units to right rows based on separate information. | | There is no separate slot for unit. The unit is merged with data and subsequently with sample. This must be done, because if rows are ordered, it is impossible to attach units to right rows based on separate information. | ||
|---- | |---- | ||
| '''formula.prob | | '''formula.prob | ||
Line 168: | Line 184: | ||
{{comment|# |Function merge only uses indices that are of type "marginal" and not indices that are of type "joint". For a non-ovariable data.frame, by default all indices are marginals.|--[[User:Jouni|Jouni]] 14:58, 30 April 2012 (EEST)}} | {{comment|# |Function merge only uses indices that are of type "marginal" and not indices that are of type "joint". For a non-ovariable data.frame, by default all indices are marginals.|--[[User:Jouni|Jouni]] 14:58, 30 April 2012 (EEST)}} | ||
====Formula==== | |||
<rcode | <rcode | ||
name="answer" | name="answer" | ||
label=" | label="Initiate functions" | ||
include="page:OpasnetBaseUtils|name:generic" | include="page:OpasnetBaseUtils|name:generic" | ||
> | > | ||
Line 190: | Line 208: | ||
) | ) | ||
#################### | #################### Defines the S4 class "oassessment" which is the object type for open assessments. | ||
setClass( | |||
"oassessment", | |||
representation( | |||
names = "data.frame", | |||
decisions = "data.frame", | |||
probabilities = "data.frame", | |||
stakeholders = "data.frame", | |||
vars = "list" | |||
) | |||
) | |||
########## Arithmetic operations of ovariables: first they are merged by index columns, | |||
### then the operation is performed for the Result.x and Result.y columns. | |||
### If one of the expressions is numeric, it is first transformed to ovariable. | |||
setMethod( | |||
f = "Ops", | |||
signature = signature(e1 = "ovariable", e2 = "ovariable"), | |||
definition = function(e1, e2) { | |||
out <- merge(e1, e2)@sample | |||
colnames(out) <- gsub(".x", "", colnames(out)) | |||
out$Result <- callGeneric(out$Result, out$Result.y) | |||
if(!is.null(out$Unit.y)) {out$Unit <- paste(out$Unit, "|(", out$Unit.y, ")", sep= "")} | |||
e1@sample <- out[, !colnames(out) %in% c("Result.y", "Unit.y")] | |||
return(e1) | |||
} | } | ||
) | |||
setMethod( | |||
f = "Ops", | |||
signature = signature(e1 = "ovariable", e2 = "numeric"), | |||
definition = function(e1, e2) { | |||
e2 <- make.ovariable(e2) | |||
e1 <- callGeneric(e1, e2) | |||
return(e1) | |||
} | |||
) | |||
setMethod( | |||
f = "Ops", | |||
signature = signature(e1 = "numeric", e2 = "ovariable"), | |||
definition = function(e1, e2) { | |||
e1 <- make.ovariable(e1) | |||
e1 <- callGeneric(e2, e1) | |||
return(e1) | |||
} | } | ||
) | |||
########### | #################### Math defines basic mathematical operations (log, exp, abs, ...) for ovariables | ||
setMethod( | setMethod( | ||
f = " | f = "Math", | ||
signature = "ovariable", | signature = signature(x = "ovariable"), | ||
definition = function( | definition = function(x) { | ||
x@sample$Result <- callGeneric(x@sample$Result) | |||
return(x) | |||
return( | |||
} | } | ||
) | ) | ||
Line 297: | Line 292: | ||
) | ) | ||
########## | ############ tapply of ovariables applies a function to each cell of a ragged array, that is to each (non-empty) group of | ||
### | ############ values given by a unique combination of the levels of certain factors. | ||
### | ### parameters (other parameters are as in generic tapply): | ||
### X an ovariable | |||
setMethod( | setMethod(f = "tapply", | ||
signature = signature(X = "ovariable"), | |||
signature = signature( | definition = function(X, INDEX, FUN = NULL, ..., simplify = TRUE) { | ||
definition = function( | out <- as.data.frame(as.table(tapply(X@sample$Result, INDEX, FUN, ..., simplify = TRUE))) | ||
colnames(out)[colnames(out) == "Freq"] <- "Result" | |||
X@sample <- out | |||
return( | return(X) | ||
} | } | ||
) | ) | ||
############################ orbind combines two ovariables using clever rbind | |||
orbind <- function(x, y) { | orbind <- function(x, y) { | ||
x <- | if(class(x) == "ovariable") {xsample <- x@sample} else {xsample <- x} | ||
y <- | if(class(y) == "ovariable") {ysample <- y@sample} else {ysample <- y} | ||
cols <- setdiff(colnames( | cols <- setdiff(colnames(ysample), colnames(xsample)) # Take all columns that do not exist in x and add them. | ||
col <- as.data.frame(array("NA", dim = c(1, length(cols)))) | col <- as.data.frame(array("NA", dim = c(1, length(cols)))) | ||
colnames(col) <- cols | colnames(col) <- cols | ||
if("Unit" %in% cols) {col[, "Unit"] <- "?"} | if("Unit" %in% cols) {col[, "Unit"] <- "?"} | ||
temp <- cbind( | temp <- cbind(xsample, col) | ||
cols <- setdiff(colnames( | cols <- setdiff(colnames(xsample), colnames(ysample)) # Take all columns that do not exist in y and add them. | ||
col <- as.data.frame(array("NA", dim = c(1, length(cols)))) | col <- as.data.frame(array("NA", dim = c(1, length(cols)))) | ||
colnames(col) <- cols | colnames(col) <- cols | ||
if("Unit" %in% cols) {col[, "Unit"] <- "?"} | if("Unit" %in% cols) {col[, "Unit"] <- "?"} | ||
xsample <- rbind(temp, cbind(ysample, col)) # Combine x and y with rbind. | |||
return( | return(xsample) | ||
#Should this be made S4 function for ovariables? Then it could be named simply rbind. | |||
} | } | ||
################# plot diagrams about ovariable data | |||
setMethod( | setMethod( | ||
f = "plot", | f = "plot", | ||
signature = signature("ovariable"), | signature = signature(x = "ovariable"), | ||
definition = function(x) { | definition = function(x) { | ||
x <- x@sample | x <- x@sample | ||
Line 359: | Line 339: | ||
) | ) | ||
################# plot diagrams about oassessment data | |||
setMethod( | |||
f = "plot", | |||
signature = signature(x = "oassessment"), | |||
definition = function(x) { | |||
names = " | for(i in names(x@vars)) { | ||
y<- x@vars[[i]]@sample | |||
plot(y$Source, y$Result, ylab = y[y$Source == "Data", "Unit"][1], main = i) | |||
} | |||
} | |||
) | ) | ||
######################### print ovariable contents | |||
setMethod( | setMethod( | ||
f = "print", | f = "print", | ||
signature = signature("oassessment"), | signature = signature(x = "oassessment"), | ||
definition = function(x) { | definition = function(x) { | ||
cat("Names\n") | cat("Names\n") | ||
print(xtable(x@names), type = 'html') | print(xtable(x@names), type = 'html') | ||
cat(" | cat("Decisions\n") | ||
print(xtable(x@ | print(xtable(x@decisions), type = 'html') | ||
cat("Stakeholders\n") | |||
print(xtable(x@stakeholders), type = 'html') | |||
cat("Probabilities\n") | |||
print(xtable(x@probabilities), type = 'html') | |||
cat("\n\nThe list of variables in this assessment.\n") | cat("\n\nThe list of variables in this assessment.\n") | ||
Line 390: | Line 378: | ||
setMethod( | setMethod( | ||
f = "print", | f = "print", | ||
signature = signature("ovariable"), | signature = signature(x = "ovariable"), | ||
definition = function(x) { | definition = function(x) { | ||
cat("Sample\n") | cat("Sample\n") | ||
Line 401: | Line 389: | ||
) | ) | ||
########### init. | ###################### fetch downloads a variable. | ||
fetch <- function(x) { | |||
out <- tidy(op_baseGetData("opasnet_base", x), direction = "wide") | |||
return(out) | |||
} | |||
########### init.assessment creates S4 assessment from names data.frame, including decisions, stakeholders, probabilities, and variables. | |||
########### NOTE! You must include the formula code from each variable page, otherwise formulas and dependencies are not updated. | ########### NOTE! You must include the formula code from each variable page, otherwise formulas and dependencies are not updated. | ||
########### Parameters: | ########### Parameters: | ||
## names: a data.frame that has the structure of oassessment@name (Columns: Name, Identifier, Direction, Result) | ## names: a data.frame that has the structure of oassessment@name (Columns: Name, Identifier, Direction, Result) | ||
init. | init.assessment <- function(names) | ||
{ | |||
vars <- list() | names <- fetch(names) | ||
decisions <- fetch(names[names$Result == "decisions", "Identifier"]) | |||
stakeholders <- fetch(names[names$Result == "stakeholders", "Identifier"]) | |||
probabilities <- fetch(names[names$Result == "probabilities", "Identifier"]) | |||
names <- names[!names$Result %in% c("decisions", "stakeholders", "probabilities"), ] | |||
vars <- list() | |||
for(x in 1:nrow(names)) { # Objects with names as aliases are created and filled with data from Opasnet Base. | for(x in 1:nrow(names)) { # Objects with names as aliases are created and filled with data from Opasnet Base. | ||
cat("Initialising variable", as.character(names$Name[x]), ".\n") | cat("Initialising variable ", as.character(names$Name[x]), ".\n", sep = "") | ||
temp <- tidy( | ident <- as.character(names$Identifier[x]) | ||
temp <- tidy(op_baseGetData("opasnet_base", ident), direction = as.character(names$Direction[x])) | |||
temp <- make.ovariable(temp) | temp <- make.ovariable(temp) | ||
if(exists(paste("formula.", | if(exists(paste("formula.", ident, sep = ""))) | ||
{temp@formula <- get(paste("formula.", | {temp@formula <- get(paste("formula.", ident, sep = ""))} | ||
if(exists(paste("dependencies.", | if(exists(paste("dependencies.", ident, sep = ""))) | ||
{temp@dependencies <- get(paste("dependencies.", | {temp@dependencies <- get(paste("dependencies.", ident, sep = ""))} | ||
vars[as.character(names$Result[x])] <- temp | vars[as.character(names$Result[x])] <- temp | ||
} | } | ||
return( | assessment <- new("oassessment", | ||
names = names, | |||
decisions = decisions, | |||
stakeholders = stakeholders, | |||
probabilities = probabilities, | |||
vars = vars | |||
) | |||
return(assessment) | |||
# Does it cause problems to use the wide direction? Anyway, there should be exactly one Result column. | |||
## No it doesn't if the Observation columns are such as Unit, Result, Description. | |||
} | |||
########### update updates the sample of an ovariable based on data and function. | |||
setMethod( | |||
f = "update", | |||
signature = "ovariable", | |||
definition = function(object) { | |||
dat <- data.frame(Source = "Data", interpret(object@data)) | |||
form <- object@formula(object@dependencies) | |||
if(is.vector(form)) {form <- data.frame(Result = form)} | |||
if(class(form) == "ovariable") {form <- form@sample} | |||
form <- data.frame(Source = "Formula", form) | |||
object@sample <- orbind(dat, form) | |||
object@marginal <- c(TRUE, object@marginal) # This alone is not enough; orbind must operate with marginals as well. | |||
return(object) | |||
} | |||
) | |||
#### This code was taken out of update. It is still needed somewhere but not here. | |||
# dep <- object@dependencies | |||
# for(i in 1:length(dep)) { | |||
# if(class(dep[[i]]) == "ovariable") { | |||
# dep[[i]] <- dep[[i]]@sample | |||
# } else { | |||
# if(length(grep("Op_(en|fi)", dep[[i]])) > 0) { | |||
# dep[[i]] <- op_baseGetData("opasnet_base", dep[[i]])} | |||
# else { | |||
# if(class(dep[[i]]) != "data.frame" & !is.numeric(dep[[i]])) { | |||
# dep[[i]] <- get(dep[[i]]) | |||
# } | |||
# } | |||
# } | |||
# } | |||
#################### interpret takes a vector and makes a data.frame out of it (to be used in e.g. make.ovariable). | |||
### It also changes abbreviations into probability samples. | |||
interpret <- function(data) { | |||
sample <- NULL | |||
if(is.vector(data)) {data <- data.frame(Result = data)} | |||
if("Iter" %in% colnames(data)) { | |||
out <- data} | |||
else { | |||
test <- !is.na(as.numeric(as.character(data$Result))) | |||
for(i in 1:nrow(data)) { | |||
if(test[i]) { | |||
sample <- c(sample, rep(as.numeric(as.character(data[i, "Result"])), n)) | |||
} else { | |||
samplingguide <- as.numeric(strsplit(gsub(" ", "", data[i, "Result"]), "-")[[1]]) | |||
if(is.na(samplingguide[1]) | is.na(samplingguide[2])) { | |||
sample <- c(sample, rep(data[i, "Result"], n)) | |||
} else { | |||
sample <- c(sample, runif(n, samplingguide[1], samplingguide[2])) | |||
} | |||
} | |||
} | |||
out <- as.data.frame(array(1:(n*nrow(data)*(ncol(data)+1)), dim = c(n*nrow(data), ncol(data) + 1))) | |||
colnames(out) <- c("Iter", colnames(data)) | |||
for(i in colnames(data)) { | |||
out[i] <- rep(data[, i], each = n) | |||
} | |||
out$Iter <- 1:n | |||
out$Result <- sample | |||
} | |||
return(out) | |||
} | |||
################################################################################# | |||
########### make.ovariable takes a vector or data.frame and makes an ovariable out of it. | |||
make.ovariable <- function( | |||
data, | |||
formula = function(dependencies){return(0)}, | |||
dependencies = list(x = 0) | |||
) { | |||
return(movariable(data, formula, dependencies)) | |||
} | } | ||
# | setGeneric("make.ovariable") # Makes make.ovariable a generic S4 function. | ||
setMethod( | |||
f = "make.ovariable", | |||
signature = signature(data = "data.frame"), | |||
definition = function( | |||
data, | |||
formula = function(dependencies){return(0)}, | |||
dependencies = list(x = 0) | |||
) { | |||
cat("Data frame\n") | |||
return(movariable(data, formula, dependencies)) | |||
} | |||
) | |||
setMethod( | |||
f = "make.ovariable", | |||
signature = signature(data = "vector"), | |||
definition = function( | |||
data, | |||
formula = function(dependencies){return(0)}, | |||
dependencies = list(x = 0) | |||
) { | |||
cat("Vector\n") | |||
data <- data.frame(Result = data) | |||
return(make.ovariable(data, formula, dependencies)) | |||
} | |||
) | |||
setMethod( | |||
f = "make.ovariable", | |||
signature = signature(data = "list"), | |||
definition = function( | |||
data, | |||
formula = function(dependencies){return(0)}, | |||
dependencies = list(x = 0) | |||
) { | |||
for(i in 1:length(data)) { | |||
cat("List", i, "\n") | |||
data[[i]] <- make.ovariable(data[[i]], formula, dependencies) | |||
} | |||
return(data) | |||
} | |||
) | |||
setMethod(f = " | setMethod( | ||
signature = signature( | f = "make.ovariable", | ||
definition = function( | signature = signature(data = "ovariable"), | ||
definition = function( | |||
data, | |||
return( | formula = NULL, | ||
dependencies = NULL | |||
) { | |||
cat("ovariable\n") | |||
if(is.null(formula)) {formula <- data@formula} | |||
if(is.null(dependencies)) {dependencies <- data@dependencies} | |||
return(movariable(data@data, formula, dependencies)) | |||
} | } | ||
) | ) | ||
########### movariable takes a data.frame, a function, and a list and makes an ovariable out of them. It is a | |||
#####subfunction of make.ovariable and prevents infinite recursion of S4 methods. | |||
movariable <- function( | |||
data, | |||
formula, | |||
dependencies | |||
) { | |||
sample <- interpret(data) | |||
out <- new("ovariable", | |||
sample = sample, | |||
data = data, | |||
marginal = ifelse(colnames(sample) %in% c("Result", "Unit"), FALSE, TRUE), | |||
formula = formula, | |||
dependencies = dependencies) | |||
out <- update(out) | |||
return(out) | |||
} | |||
</rcode> | </rcode> | ||
====Example | ====Example codes==== | ||
https://docs.google.com/document/d/1CAURYlKFmx-cBUZVAWFM2e6N7crdZ1KcO62T5nvyDaE/edit | https://docs.google.com/document/d/1CAURYlKFmx-cBUZVAWFM2e6N7crdZ1KcO62T5nvyDaE/edit | ||
'''Simple code | |||
<rcode | <rcode | ||
Line 464: | Line 620: | ||
'''Making an ovariable | |||
<rcode include="page:OpasnetBaseUtils|name:generic|page:Object-oriented_programming_in_Opasnet|name:answer" graphics="1"> | <rcode include="page:OpasnetBaseUtils|name:generic|page:Object-oriented_programming_in_Opasnet|name:answer" graphics="1"> | ||
Line 490: | Line 647: | ||
print(xtable(d@sample[d@sample$Iter < 4, ]), type = 'html') | print(xtable(d@sample[d@sample$Iter < 4, ]), type = 'html') | ||
</rcode> | |||
'''Making several differend ovariables. | |||
<rcode | |||
include=" | |||
page:OpasnetBaseUtils|name:generic| | |||
page:Object-oriented_programming_in_Opasnet|name:answer | |||
" | |||
variables="name:population|description:What is the size of the population|default:100000" | |||
graphics="1"> | |||
############################################################ | |||
cat("Initiation successful. Now starting the model.\n") | |||
library(xtable) | |||
make.ovariable("1-2") * 5 | |||
make.ovariable(data.frame(Result = "1-2")) | |||
dependencies.Op_en5675 <- list( | |||
exposure = "Op_en5674", # formula.Op_en5674(dependencies.Op_en5674), # Training exposure | |||
erf = data.frame(Unit = "RR per ug/m3", Result = 1.5), | |||
population = population, | |||
background = 100 / 100000 # cases per 100000 person-years | |||
) | |||
exposure <- op_baseGetData("opasnet_base", dependencies.Op_en5675$exposure) | |||
colnames(exposure)[colnames(exposure) == "obs.1"] <- "Obs" | |||
exposure <- tidy(exposure, direction = "wide") | |||
colnames(exposure)[colnames(exposure) == "result"] <- "Result" | |||
exposure <- make.ovariable(exposure) | |||
print(exposure) | |||
formula.Op_en5675 <- function(x) { | |||
population <- make.ovariable(x$population) | |||
background <- make.ovariable(x$background) | |||
exposure <- op_baseGetData("opasnet_base", x$exposure) | |||
colnames(exposure)[colnames(exposure) == "obs.1"] <- "Obs" | |||
exposure <- tidy(exposure, direction = "wide") | |||
colnames(exposure)[colnames(exposure) == "result"] <- "Result" | |||
exposure <- make.ovariable(exposure) | |||
erf <- make.ovariable(x$erf) | |||
cases <- population * background * exp(exposure * log(erf)) | |||
return(cases) | |||
} | |||
formula.Op_en5675(dependencies.Op_en5675) | |||
make.ovariable(dependencies.Op_en5675) | |||
out <- make.ovariable( | |||
data = "0 - 100000", | |||
formula = formula.Op_en5675, | |||
dependencies = dependencies.Op_en5675) | |||
print(out) | |||
out <- update(out) | |||
print(out) | |||
plot(out) | |||
</rcode> | </rcode> | ||
Latest revision as of 11:01, 26 August 2013
Moderator:Jouni (see all) |
This page is a stub. You may improve it into a full page. |
Upload data
|
Object-oriented programming is an approach where programs (or, in Opasnet, typically assessment models) have a modular structure in such a way that each part is considered as a separate object that has specific properties and interacts with other objects in standard ways.
Home work.2
1. What are S4 class objects
Learning to work with modelling objects in Opasnet
- WATCH recorded shows online
- Objective: To learn the structure and use of S4 class objects oassessment and ovariable.
- How they are created in wiki?
- How they are used in R?
- Time: 16.5.2012 9.00 - 15.00
- Place: Kielo?, THL, Kuopio
- Language: In Finnish, unless foreigners show up.
- Content (most material can be found from Object-oriented programming in Opasnet.
- Basic ideas of open modelling
- Structure of assessments and variables
- S4 class objects
- How S4 class objects are used to implement the structure and functionalities of assessments and variables
- Structure of oassessment
- Structure of ovariable
- What do the most important functions do (interpret, make.ovariable, init.vars, ...)
- Plans for summer work: which projects are used as playground first
- Further training: How to organise?
- My knowledge is our knowledge
- EDS Airplane
- Syvämiete
- The Greek situation: what would it be if we had had an open assessment about it 10 years ago?
- In the training, there are very advanced people and beginners. This is the purpose to force us focus on the big picture and the ideas, not so much to the details.
- It's cool to make mistakes: you get guidance about the specific things you need it for; and other people learn about what the difficult things are. This is the strength of open assessment.
Overall picture:
- Assessment:
- List of variables, e.g. Training assessment#Rationale
- Decisions, e.g. Climate change policies in Kuopio#Formula
- Operate with dependencies: Training health impact#Formula
How to perform an assessment
- Get the variable list. Make sure that upstream variables are first on the list.
- Go through each variable at a time.
- Based on data given, if any, bring in the data from Opasnet Base or make a data.frame out of the vector. Store the resulting data.frame as @data.
- Store the formula.objectname, if available, as @formula. This comes from including the formula named "formula" from the object page.
- Also dependencies are defined in the same formula. Store the dependencies.objectname, if.available, as @dependencies.
- Interpret the data and run the formulas with a systematic method such as hierarchical Bayes or other. So far, only this simplistic approach is available:
- Interpret data and make a data.frame with column Source and location Data.
- Run the formula and make a data.frame with column Source and location Formula.
- Combine the two data.frames with orbind and store the result as @sample.
- Apply the decision variable to the variable, if relevant.
- Go back to #2 until all variables have been calculated.
- Apply the stakeholder table to the assessment. Go through each decision.
- For each decision, integrate over all other indices to find the best decision option based on the outcome of interest for the stakeholder. Use probabilities described on page /Probabilities if available, otherwise assume that all locations are equally likely.
- Calculate total VOI for that decision.
- Go back to #4 until all stakeholders and decisions have been handled.
Question
How should object-oriented programming be utilised in Opasnet in such a way that
- it has seamless connections to R-tools,
- it is easy to understand by non-expert users and contributors,
- it uses the variable structure and other information structures (e.g. universal object) used in open assessment, and
- it enables standards for typical processes in environmental health assessments (such as distribution modelling, life tables, decision optimising, etc.).
Answer
Structure of objects
Objects have two different implementations: wiki page in Opasnet, and S4 class object called ovariable (open assessment variable) in R-tools. The wiki page is the user-friendly interface for users, and ovariable is the versatile format for efficient, standardised modelling. The default direction for data is long (using the terminology in the merge function).
←--#: . Should we have attribute "target" that defines the target of the variable estimate. For example, "height" may estimate the whole variation of heights of individuals in a population, or it may estimate the mean height of the population. Somehow the population, the target that is the basic unit (individual in this case) and the statistical parameter should be explicitly described. Can this be done by using vector attributes that have a value for each index column in the sample? Is this an index-specific issue, or variable-specific? --Jouni 07:50, 9 April 2012 (EEST) (type: truth; paradigms: science: defence)
- ⇤--#: . I don't understand what I am talking about. Maybe this should be removed. --Jouni 17:07, 16 October 2012 (EEST) (type: truth; paradigms: science: attack)
Attribute | What it contains | How implemented in the wiki | How implemented in the R-tools as a S4 class object ovariable |
---|---|---|---|
These attributes are needed in R-tools. | |||
name | The name of the object. | The name of the wiki page. | The Name slot in ovariable; it must be the same as the name of the ovariable object. |
data | Observations, expert judgement, discussions, and other pieces of information. | Subheading under Rationale | Slot data = "data.frame". The data frame can contain Obs and Unit columns, at least one index column, and Result as the observation column. However, it must not contain Iter column. |
output | The output of the calculations. | Not shown | Slot output = "data.frame". The data frame may contain columns Iter, Obs, Unit, one column for each index, and NameResult (where Name is the name of the object). |
marginal | A Boolean vector with the size = number of indices = ncol(data) - 1. TRUE if an index is indexing a marginal distribution in sample, FALSE if joint distribution. The difference is that in a marginal distribution there are n iterations for each location of the index, while in joint distribution, there are altogether n iterations in such a way that the frequencies of locations match their probabilities. | Not implemented in wiki. | Slot marginal = "vector". Especially with indices with lots of locations, joint distribution needs much less memory. |
dependencies | A list of dependencies, i.e. objects that are causally upstream. | List of links to other pages under subheading Dependencies. | Slot dependencies = data.frame. Dependencies contains columns Name that contains object names and Key that contains keys for model runs. |
formula | A computer code or algorithm to derive the answer from rationale and objects listed in dependencies. The formula may assume a deterministic dependency (e.g. y <- k*x + b), a conditional probability structure (y ~ dnorm(x, sd)), or a rank correlation matrix. | Subheading under Rationale, often using <rcode> tags. | Slot formula = "function". Formula contains the R code that is needed for calculating the output. |
These attributes are not (yet) implemented in R-tools. ⇤--#: . The text below is outdated. Don't believe all details. --Jouni 17:07, 16 October 2012 (EEST) (type: truth; paradigms: science: attack) | |||
question | A research question that defines the topic of the object | First main heading | Slot question = "character". Contains the question as text. |
answer | The current best answer to the question, shown as text, data table, or distribution. | Second main heading; contains a single data table. NOTE! The data table is actually under ratonale/data but often it is the same as answer. The actual answer is precisely described by distribution and sample (see below). | Only sub-attributes are implemented. |
locations | List of names for observation columns in the wide format (columns that are not indices), i.e. the same as measure.vars parameter in the function melt. | The same as locations parameter in t2b tag. | Slot locations = "vector". A vector with all observation column names. Can be integer (observation column position) or string (observation column name). By default, the column name for locations is "Parameter" and the column name for the actual observations is "Result". ⇤--#: . Do we actually need this in R, if always a variable is molten using melt before actual use? --Jouni 14:54, 4 April 2012 (EEST) (type: truth; paradigms: science: attack) |
observation | An identifier of an individual when the answer consists of a group of individuals. | Obs column (usually implicit because the default is that each row is an observation) in the data table. | Obs column in data.frames data and sample. Not explicitly needed as a slot in S4 object. ----#: . How do we operate, if there are different Obs in different variables and they are merged? a) Repeat shorter Obs's until they reach the length of the longest. b) Refuse to operate unless the user renames or removes all but one Obs column; however if Obs only has 1 observation, it can be temporarily removed during merge. --Jouni 14:54, 4 April 2012 (EEST) (type: truth; paradigms: science: comment) ←--#: . I prefer b). a) is too abstract so that the user is just confused. --Jouni 14:54, 4 April 2012 (EEST) (type: truth; paradigms: science: defence) |
iteration | An identifier of a probabilistic run or iteration. Sometimes it is also called a possible world or realisation. | Iter column in the data table (data usually not shown probabilistically in wiki). | Iter column in the data.frames sample (and rarely in data). Not explicitly needed as a slot in S4 object. |
distribution | A joint probability distribution (with indices as dimensions) describing the answer mathematically. | Not shown | Slot distribution = "distribution?". A distribution created with e.g. dnorm(0,1). ----#: . We don't know yet how to actually implement this and how the indices are included. --Jouni 14:54, 4 April 2012 (EEST) (type: truth; paradigms: science: comment) |
rationale | Any information that is needed to convince a critical reader that the answer is good. | Third main heading | Only sub-attributes are implemented. |
unit | The measurement unit(s) that are used in the answer to measure the topic. The format used is kg m^2 /s^2 where a space implies a multiplication. | Subheading under Rationale with plain text. Also mentioned in the data table with parameter unit. If data table rows differ in units, there must be a Unit index. | There is no separate slot for unit. The unit is merged with data and subsequently with sample. This must be done, because if rows are ordered, it is impossible to attach units to right rows based on separate information. |
formula.prob | A list of probabilities assigned to the competing algorithms in formula. The default is that each has an equal probability. | A detail in <rcode> code. | Slot formula.prob = "vector". Should have the same size as formula. |
Methods
R code should be developed in such a way that there are object-specific implementations of critical functions. The user should see straightforward content, and all messy indexing etc should happen behind scenes.
These methods should be implemented for ovariable objects.
- show, print: show the data slot.
- plot: plot the sample, showing one (the first by default) marginal index with all locations and all other marginal indices with the first location only.
- tidy: applies to data: remove id column; add Obs and Iter columns if they do not exist; Change the direction from wide to long.
- createSample: create sample directly from data using interp.input.
- GetSample, GetData: extract sample and data from the object, respectively.
- Ops: applies to sample: merge two ovariables based on index columns, then perform the Ops operation to the result columns.
- standardUnits: Based on units, transform the result column of data to SI units using Unit transformations table; then update unit.
- demarginalize: turn one specified index from marginal to joint format. This function has parameter jointlimit: if the length(index_i) * n > jointlimit, then index_i is demarginalized. The default for jointlimit is 1000000. n is the length of Iter.
Oavariable functions available: make, update, plot, print, orbind, merge, callGeneric Ops
----#: . Function merge only uses indices that are of type "marginal" and not indices that are of type "joint". For a non-ovariable data.frame, by default all indices are marginals. --Jouni 14:58, 30 April 2012 (EEST) (type: truth; paradigms: science: comment)
Formula
Example codes
https://docs.google.com/document/d/1CAURYlKFmx-cBUZVAWFM2e6N7crdZ1KcO62T5nvyDaE/edit
Simple code
Making an ovariable
Making several differend ovariables.
Important rules:
- Never use ".x" in the name of an ovariable.
- The observation column always must have name "Result".
- The iteration (aka. sample, run) column always must have name "Iter".
- In the data slot, there is always column "Obs". It defines the observations that are from the same individual. Of omitted, each row is assumed to be from different individual.
- Data slot must not have Iter column, but sample may or may not have. Typically it does have it.
- Variable n is reserved for number of iterations.
- Column name Source is reserved for the source of the result (either Data of Formula).
- After using make.ovariable(), use update() to update the sample of the ovariable based on both Data and Formula. This is not done automatically, because often there are problems to run the formula part of the code.
About oassessment (open assessments)
- They are collections of information that are used on assessment level.
- They contain the following parts:
- Data frame about variables included. It has columns name (wiki page name), identifier (page identifier such as Op_fi2898), and alias (name that is short but descriptive in its context and that is used in rcode about the variable such as exposure, erf.radon etc.) Also, R object saved with object.put should be downloadable using this data. Should we also have hashtags available? How?
- a list of ovariables (variables themselves, not just names)
- If a variable is in the dependencies of a variable used in an assessment, the first variable is only included partly. In practice, if sample is available that is used, otherwise data. However, formula and dependencies are NOT used because they are outside the boundaries of the assessment. Otherwise the assessment may grow into an endless web.
- There are several ways to treat dependencies.
- Upstream functions: deterministic functions that take upstream variables as inputs and calculate outputs. This is implemented first.
- JAGS or other hierarchical Bayesian approaches. What are the demands of this to oassessment? Should there be "model" or other slots for this?
- BBN. What are the demands of this to oassessment? At least conditional probabilities.
- We need a wrapper to op_baseGetData in such a way that the code may give a wiki page name, an identifier, or an alias, and the function always knows what data to download. This makes the rcode more user-friendly. But does it always need a oassessment, or does it get the identifier based on wiki page name from the wiki database directly? The latter would be more useful.
There should be a function that automatically drops dimensions of an ovariable. This is the priority list:
- First drop dimensions that are explicitly mentioned in parameter "drop" (a vector of index names).
- Drop dimensions that cause the size of the ovariable to grow larger than dropsize = 1000000.
- If there is enough information available to perform a VOI analysis, drop all dimensions except those VOIsize = 3 indices that have the largest VOI. (But VOI is attached to ovariables, not indices; how does this work out? It works out in the way that P(B) is thought as a variable and its VOI can be estimated. The VOI of P(A) or P(A|B) is calculated by giving the original ovariable as parameter.)
- Never drop indices that are mentioned in parameter keep (a vector of index names).
- "Drop" function changes an index from marginal to joint. In practice, the size changes from size(index)*size(Iter)*size(other indices combined) to size(Iter)*size(other indices combined). The variable can also be thought as a conditional probability P(A|B) which is then changed to joint by sampling from index B using its probability: P(A|B)*P(B) = P(A).
- If P(B) is omitted it is assumed that all instances of B are equally likely. Typically, P(B) is assessment-specific, and therefore there is a slot for all P(B)s in oassessment. This slot is a data.frame with columns Variable, Index, Location, P, Description.
- However, even if an index is changed from marginal to joint, the column that contains information about the location of the index on each iteration is still kept.
Rationale
See also
Help pages | Wiki editing • How to edit wikipages • Quick reference for wiki editing • Drawing graphs • Opasnet policies • Watching pages • Writing formulae • Word to Wiki • Wiki editing Advanced skills |
Training assessment (examples of different objects) | Training assessment • Training exposure • Training health impact • Training costs • Climate change policies and health in Kuopio • Climate change policies in Kuopio |
Methods and concepts | Assessment • Variable • Method • Question • Answer • Rationale • Attribute • Decision • Result • Object-oriented programming in Opasnet • Universal object • Study • Formula • OpasnetBaseUtils • Open assessment • PSSP |
Terms with changed use | Scope • Definition • Result • Tool |
References
Related files
<mfanonymousfilelist></mfanonymousfilelist>