Ovariable: Difference between revisions
(→Answer) |
m (→Answer: wording) |
||
Line 11: | Line 11: | ||
==Answer== | ==Answer== | ||
The ovariable is a class (S4 object) defined by OpasnetUtils. It has eight separate | The ovariable is a class (S4 object) defined by OpasnetUtils. It has eight separate ''slots'' that can be accessed using X@slot: | ||
*@name | *@name | ||
**Name of <self> is a requirement since R doesn't support self reference. | **Name of <self> is a requirement since R doesn't support self reference. | ||
*@output | *@output | ||
**Current definition of <self>. | **Current definition of <self>. | ||
**A single data.frame (a 2D table type in R) | **A single ''data.frame'' (a 2D table type in R) | ||
**Not defined until <self> is evaluated. | **Not defined until <self> is evaluated. | ||
*@data | *@data | ||
**A single data.frame that defines <self> as such. | **A single ''data.frame'' that defines <self> as such. | ||
**May include textual regular expressions that describe probability distributions which can be interpreted by [[OpasnetUtils/Interpret]]. | **May include textual regular expressions that describe probability distributions which can be interpreted by [[OpasnetUtils/Interpret]]. | ||
*@marginal | *@marginal | ||
Line 25: | Line 25: | ||
*@formula | *@formula | ||
**A function that defines <self>. | **A function that defines <self>. | ||
**Should return either a data.frame or an ovariable. | **Should return either a ''data.frame'' or an ''ovariable''. | ||
*@dependencies | *@dependencies | ||
**A data.frame that contains names and Rtools or Opasnet tokens/identifiers of variables required for | **A ''data.frame'' that contains names and Rtools or Opasnet tokens/identifiers of variables required for <self> evaluation (list of causal parents). | ||
** | **A way of enabling references in R (for in ''ovariables'' at least) by virtue of [[OpasnetUtils/ComputeDependencies]] which creates variables in ''.GlobalEnv'' so that they are available to expressions in @formula. | ||
**Variables are be fetched and evaluated (only once by default) upon <self> evaluation. | |||
*@ddata | *@ddata | ||
**A string containing an Opasnet identifier | **A string containing an Opasnet identifier e.g. "Op_en1000". May also contain a subset specification e.g. "Op_en1000/dataset". | ||
**This identifier is used to download data from the Opasnet database for the @data slot upon <self> evaluation. | **This identifier is used to download data from the Opasnet database for the @data slot (only if empty by default) upon <self> evaluation. | ||
The general | The general idea of ''ovariables'' is such that they should not be modified to match a specific model but rather define the variable in question as extensively as possible under it's scope. To match the scope of specific models variables can be modified by supplying orders upstream (outwards in the recursion tree). These orders are checked for upon evaluation. For example decisions in decision analysis can be supplied this way: | ||
#pick an endpoint | #pick an endpoint | ||
#make decision variables for any upstream variables | #make decision variables for any upstream variables |
Revision as of 13:07, 30 June 2014
Moderator:Nobody (see all) Click here to sign up. |
|
Upload data
|
- For updates about modelling instructions, see Portal:Modelling with Opasnet and Modelling in Opasnet.
Ovariable is an object in R software. It is the basic building block of an open assessment.
Question
What is the structure of an ovariable?
Answer
The ovariable is a class (S4 object) defined by OpasnetUtils. It has eight separate slots that can be accessed using X@slot:
- @name
- Name of <self> is a requirement since R doesn't support self reference.
- @output
- Current definition of <self>.
- A single data.frame (a 2D table type in R)
- Not defined until <self> is evaluated.
- @data
- A single data.frame that defines <self> as such.
- May include textual regular expressions that describe probability distributions which can be interpreted by OpasnetUtils/Interpret.
- @marginal
- A logical vector that indicates full marginal indices (and not parts of joint distributions, result columns or other row specific descriptions) of @output.
- @formula
- A function that defines <self>.
- Should return either a data.frame or an ovariable.
- @dependencies
- A data.frame that contains names and Rtools or Opasnet tokens/identifiers of variables required for <self> evaluation (list of causal parents).
- A way of enabling references in R (for in ovariables at least) by virtue of OpasnetUtils/ComputeDependencies which creates variables in .GlobalEnv so that they are available to expressions in @formula.
- Variables are be fetched and evaluated (only once by default) upon <self> evaluation.
- @ddata
- A string containing an Opasnet identifier e.g. "Op_en1000". May also contain a subset specification e.g. "Op_en1000/dataset".
- This identifier is used to download data from the Opasnet database for the @data slot (only if empty by default) upon <self> evaluation.
The general idea of ovariables is such that they should not be modified to match a specific model but rather define the variable in question as extensively as possible under it's scope. To match the scope of specific models variables can be modified by supplying orders upstream (outwards in the recursion tree). These orders are checked for upon evaluation. For example decisions in decision analysis can be supplied this way:
- pick an endpoint
- make decision variables for any upstream variables
- evaluate endpoint
- optimize between options defined in decisions.
Other orders include: collapse of marginal columns by sums, means or sampling to reduce data size and passing input from model level without redefining the whole variable. It is also possible to redefine any specific variable before starting the recursive evaluation, in which case the recursion stops at the defined variable (dependencies are only fetched if they do not already exist to avoid unnecessary computation).
Basic ideas and an example
The objectives of ovariable modelling are to
- Separate actual code and parameter values, so that whenever possible, parameters are shown in tables on Opasnet pages.
- Enable straightforward equations with multidimensional objects in the same was as in AnalyticaTM "Intelligent arrays".
- Enable uncertainty propagation using standard Monte Carlo.
- Enable object-oriented modelling in such a way that a model itself "knows" what input it needs, so that the user does not need to worry about that.
- Enable easy scenario analysis so that the user can change any value within a model and compare the original model with the changed model, also called a "scenario" or "counterfactual world".
- Enable intuitive expression of uncertainties. For example, an input value "5 - 7" is easily understood by a reader as an uncertain thing with possible values between five and seven. The same input is understood by the modelling system as a uniform probability distribution with min 5 and max 7, sampled by default 1000 times.
Alex and Berta are choosing a restaurant. There is one restaurant which is very good, if the main chef is working, but only mediocre if the other cook is working.
Structure, slots
- output: the main content of an ovariable
- Result is the column that contains the actual values for the answer for the question.
- Indices are columns that restrict the Result in some way. For example, the Result can be given separately for males and females, and this is expressed by an index column Sex, which contains values Male and Female. So, the Result contains one row for males and one for females.
- Unit contains the unit of the Result. It may be the same for all rows, but it may also vary from one row to another.
- Other columns can exist. Typically, they are information that were used for some purpose during the evolution of the ovariable, but they may be useless in the current ovariable. Due to these other columns, the output may sometimes be a very wide data.frame.
- data slot answers this question: What measurements are there about the topic? Typically, when data is used, the result can be directly derived from the information given (with possibly some minimal manipulation such as dropping out unnecessary rows).
- dependencies and formula are slots that are always used together. They answer this question: How can we estimate the topic indirectly? This is the case if we have knowledge about how the result of this variable depends on the results of other variables (called parents). The dependencies ara a table of parent variables and their identifiers, and formula is a function that takes the results of those parents, applies the defined code to them, and in this way produces a result for this variable.
- name slot is just a technical way to handle several objects systematically, so that proper references can be used to any ovariable needed.
- ddata slot is used for data that users can update dynamically. The data slot is evaluated when the ovariable is created, and this is typically much before a case-specific model using this ovariable is run. Therefore, if data is updated after the ovariable is created, this will not be reflected in the result, and it may be confusing to the reader. Instead, if ddata is used, the data is downloaded only when the ovariable is used in a model during a case-specific model run. In this way, the data reflects the most recent version of the data available.
- description (does not exist yet): A way to explain the user about the logic of the ovariable. We are considering to add this slot so that the modeller can store any textual descriptions about the ovariable within the ovariable for further users to read. So, it would not have any modelling functionalities.
Structure, output
The output is the essence of an ovariable. Technically, it is a data.frame meaning that it is a two-dimensional table with columns expressing different properties of the output and rows being observations of those properties. There are different kinds of columns, and it is crucial to understand the functionalities of each kind.
Result is the most important column. Each ovariable answers a question that is told in the Opasnet page of that variable, and the actual answer is given in the Result column. For example, the question could be "What is the annual average concentration of PM2.5 in Kuopio downtonwn?" and the Result column could contain 9.4, meaning 9.4 µg /m3< (the unit can be found from the column called Unit).
It is useful to clarify terms here. Answer is the overall answer to the question asked, so it is the reason for producing the Opasnet page in the first place. The answer may contain text, tables, or graphs on the web page, an R code that produces and ovariable when run, or objects stored on the server or a database. The actual format depends on the use purpose of the page. Output is a part of an ovariable that is used as a key part of the answer. All other parts of the ovariable are needed to produce the output, and the output contains what the reader wants to know about the answer. Finally, Result contains the actual numerical values for the answer. (The Result may also be text in some cases.)
Operations with ovariables and their "intelligent array" properties
- Merge
- Ops
Uncertainties and Iter
Dependencies and recursive fetching of parents
Odecisions and applying decisions
Interpreting results
Value of information
Calculating VOI on a routine basis for all marginal indices.