Ovariable: Difference between revisions

From Opasnet
Jump to navigation Jump to search
m (→‎Answer: wording)
(restructured, edited and updated)
Line 6: Line 6:
'''Ovariable''' is an object in [[R]] software. It is the basic building block of an open assessment.
'''Ovariable''' is an object in [[R]] software. It is the basic building block of an open assessment.


==Question==
== Question ==


What is the structure of an ovariable?
What is the structure of an ovariable such that
* it complies with the requirements of [[variable]] and
* it is able to implement probabilistic descriptions of multidimensional variables and
* it is able to implement different [[scenario]]s?


==Answer==
== Answer ==
The ovariable is a class (S4 object) defined by OpasnetUtils. It has eight separate ''slots'' that can be accessed using X@slot:
*@name
**Name of <self> is a requirement since R doesn't support self reference.
*@output
**Current definition of <self>.
**A single ''data.frame'' (a 2D table type in R)
**Not defined until <self> is evaluated.
*@data
**A single ''data.frame'' that defines <self> as such.
**May include textual regular expressions that describe probability distributions which can be interpreted by [[OpasnetUtils/Interpret]].
*@marginal
**A logical vector that indicates full marginal indices (and not parts of joint distributions, result columns or other row specific descriptions) of @output.
*@formula
**A function that defines <self>.
**Should return either a ''data.frame'' or an ''ovariable''.
*@dependencies
**A ''data.frame'' that contains names and Rtools or Opasnet tokens/identifiers of variables required for <self> evaluation (list of causal parents).
**A way of enabling references in R (for in ''ovariables'' at least) by virtue of [[OpasnetUtils/ComputeDependencies]] which creates variables in ''.GlobalEnv'' so that they are available to expressions in @formula.
**Variables are be fetched and evaluated (only once by default) upon <self> evaluation.
*@ddata
**A string containing an Opasnet identifier e.g. "Op_en1000". May also contain a subset specification e.g. "Op_en1000/dataset".
**This identifier is used to download data from the Opasnet database for the @data slot (only if empty by default) upon <self> evaluation.


The general idea of ''ovariables'' is such that they should not be modified to match a specific model but rather define the variable in question as extensively as possible under it's scope. To match the scope of specific models variables can be modified by supplying orders upstream (outwards in the recursion tree). These orders are checked for upon evaluation. For example decisions in decision analysis can be supplied this way:
The ovariable is a class (S4 object) defined by OpasnetUtils in [[R]] software system. Its purpose is to contain the current best answer in a machine-readable format (including uncertainties when relevant) to the question asked by the respective variable. In addition, it contains information about how to derive the current best answer. The respective variable may have an own page in Opasnet, or it may be implicit so that it is only represented by the ovariable and descriptive comments within a code.
#pick an endpoint
#make decision variables for any upstream variables
#evaluate endpoint
#optimize between options defined in decisions.
Other orders include: collapse of marginal columns by sums, means or sampling to reduce data size and passing input from model level without redefining the whole variable. It is also possible to redefine any specific variable before starting the recursive evaluation, in which case the recursion stops at the defined variable (dependencies are only fetched if they do not already exist to avoid unnecessary computation).


==Basic ideas and an example==
It is useful to clarify terms here. ''Answer'' is the overall answer to the question asked, so it is the reason for producing the Opasnet page in the first place. This is why it is typically located near the top of an Opasnet page. The answer may contain text, tables, or graphs on the web page. It typically also contains an R code with a respective ovariable, and the code produces these representations of the answer when run. (However, the ovariable is typically defined and stored under Rationale/Calculations, and the code under Answer only evaluates and plots the result.) ''Output'' is the key part (or slot) of the answer within an ovariable. All other parts of the ovariable are needed to produce the output, and the output contains what the reader wants to know about the answer. Finally, ''Result'' is the key column of the Output table (or data.frame) and contains the actual numerical values for the answer.


The objectives of ovariable modelling are to
=== Slots ===
* Separate actual code and parameter values, so that whenever possible, parameters are shown in tables on Opasnet pages.
* Enable straightforward equations with multidimensional objects in the same was as in Analytica<sup>TM</sup> "Intelligent arrays".
* Enable uncertainty propagation using standard Monte Carlo.
* Enable object-oriented modelling in such a way that a model itself "knows" what input it needs, so that the user does not need to worry about that.
* Enable easy scenario analysis so that the user can change any value within a model and compare the original model with the changed model, also called a "scenario" or "counterfactual world".
* Enable intuitive expression of uncertainties. For example, an input value "5 - 7" is easily understood by a reader as an uncertain thing with  possible values between five and seven. The same input is understood by the modelling system as a uniform probability distribution with min 5 and max 7, sampled by default 1000 times.


Alex and Berta  are  choosing a restaurant. There is one restaurant which is very good, if the main chef is working, but only mediocre if the other cook is working.
The ovariable has seven separate ''slots'' that can be accessed using X@slot:


==Structure, slots==
;@name
*Name of <self> (the ovariable object) is a requirement since R doesn't support self reference.


* output: the main content of an ovariable
;@output
** ''Result'' is the column that contains the actual values for the answer for the question.
* The current best answer to the question asked.
** ''Indices'' are columns that restrict the Result in some way. For example, the Result can be given separately for males and females, and this is expressed by an index column ''Sex'', which contains values ''Male'' and ''Female''. So, the Result contains one row for males and one for females.
* A single ''data.frame'' (a 2D table type in R)
** ''Unit'' contains the unit of the Result. It may be the same for all rows, but it may also vary from one row to another.
* Not defined until <self> is evaluated.
** Other columns can exist. Typically, they are information that were used for some purpose during the evolution of the ovariable, but they may be useless in the current ovariable. Due to these other columns, the output may sometimes be a very wide data.frame.
* Possible types of columns:
* ''data'' slot answers this question: What measurements are there about the topic? Typically, when data is used, the result can be directly derived from the information given (with possibly some minimal manipulation such as dropping out unnecessary rows).
** ''Result'' is the column that contains the actual values of the answer to the question of the variable. There is '''always''' a result column, but its name may vary; it is of type ovariablenameResult.
* ''dependencies'' and ''formula'' are slots that are always used together. They answer this question: How can we estimate the topic indirectly? This is the case if we have knowledge about how the result of this variable depends on the results of other variables (called parents). The dependencies ara a table of parent variables and their identifiers, and formula is a function that takes the results of those parents, applies the defined code to them, and in this way produces a result for this variable.
** ''Indices'' are columns that define or restrict the Result in some way. For example, the Result can be given separately for males and females, and this is expressed by an index column ''Sex'', which contains values ''Male'' and ''Female''. So, the Result contains one row for males and one for females.
* ''name'' slot is just a technical way to handle several objects systematically, so that proper references can be used to any ovariable needed.
** ''Iter'' is a specific kind of index. In Monte Carlo simulation, Iter is the number of the iteration.
* ''ddata'' slot is used for data that users can update dynamically. The data slot is evaluated when the ovariable is created, and this is typically much before a case-specific model using this ovariable is run. Therefore, if data is updated after the ovariable is created, this will not be reflected in the result, and it may be confusing to the reader. Instead, if ddata is used, the data is downloaded only when the ovariable is used in a model during a case-specific model run. In this way, the data reflects the most recent version of the data available.
** ''Unit'' contains the unit of the Result. It may be the same for all rows, but it may also vary from one row to another. Unit is not an index.
* description (does not exist yet): A way to explain the user about the logic of the ovariable. We are considering to add this slot so that the modeller can store any textual descriptions about the ovariable within the ovariable for further users to read. So, it would not have any modelling functionalities.
** Other, non-index columns can exist. Typically, they are information that were used for some purpose during the evolution of the ovariable, but they may be useless in the current ovariable. Due to these other columns, the output may sometimes be a very wide data.frame.


;@data
* A single ''data.frame'' that defines <self> as such.
* ''data'' slot answers this question: What measurements are there to answer the question? Typically, when data is used, the result can be directly derived from the information given (with possibly some minimal manipulation such as dropping out unnecessary rows).
* May include textual regular expressions that describe probability distributions which can be interpreted by [[OpasnetUtils/Interpret]].


==Structure, output==
;@marginal
*A logical vector that indicates full marginal indices (and not parts of joint distributions, result columns or other row-specific descriptions) of @output.


The output is the essence of an ovariable. Technically, it is a data.frame meaning that it is a two-dimensional table with columns expressing different properties of the output and rows being observations of those properties. There are different kinds of columns, and it is crucial to understand the functionalities of each kind.
;@formula
* A function that defines <self>.  
* Should return either a ''data.frame'' or an ''ovariable''.
* ''@formula'' and ''@dependencies'' slots are always used together. They answer this question: How can we estimate the answer indirectly? This is the case if we have knowledge about how the result of this variable depends on the results of other variables (called parents). The @dependencies is a table of parent variables and their identifiers, and @formula is a function that takes the results of those parents, applies the defined code to them, and in this way produces the @output for this variable.


Result is the most important column. Each ovariable answers a question that is told in the Opasnet page of that variable, and the actual answer is given in the Result column. For example, the question could be "What is the annual average concentration of PM2.5 in Kuopio downtonwn?" and the Result column could contain 9.4, meaning 9.4 µg /m<sup>3<</sup> (the unit can be found from the column called Unit).  
;@dependencies
*A ''data.frame'' that contains names and Rtools or Opasnet tokens/identifiers of variables required for <self> evaluation (list of causal parents).
*A way of enabling references in R (for in ''ovariables'' at least) by virtue of [[OpasnetUtils/ComputeDependencies]] which creates variables in ''.GlobalEnv'' so that they are available to expressions in @formula.
*Variables are be fetched and evaluated (only once by default) upon <self> evaluation.


It is useful to clarify terms here. ''Answer'' is the overall answer to the question asked, so it is the reason for producing the Opasnet page in the first place. The answer may contain text, tables, or graphs on the web page, an R code that produces and ovariable when run, or objects stored on the server or a database. The actual format depends on the use purpose of the page. ''Output'' is a part of an ovariable that is used as a key part of the answer. All other parts of the ovariable are needed to produce the output, and the output contains what the reader wants to know about the answer. Finally, ''Result'' contains the actual numerical values for the answer. (The Result may also be text in some cases.)
;@ddata
* A string containing an Opasnet identifier e.g. "Op_en1000". May also contain a subset specification e.g. "Op_en1000/dataset".
* This identifier is used to download data from the Opasnet database for the @data slot (only if empty by default) upon <self> evaluation.  
* By default, the data defined by ''@ddata'' is downloaded when an ovariable is created. However, it is also possible to create and save an ovariable in such a way that the data is downloaded only when the ovariable is evaluated.


=== Decisions and other upstream orders ===


The general idea of ''ovariables'' is such that they should not be modified to match a specific model but rather define the variable in question as extensively as possible under it's scope. In other words, it should answer its question in a re-usable way so that the question and answer would be useful in many different situations. (Of course, this should be kept in mind already when the question is defined.) To match the scope of specific models, ovariables can be modified by supplying orders upstream (outwards in the recursion tree). These orders are checked for upon evaluation. For example decisions in decision analysis can be supplied this way:
#pick an endpoint
#make decision variables for any upstream variables (this means that you create new scenarios with particular deviations from the actual or business-as-usual answer of that variable)
#evaluate endpoint
#optimize between options defined in decisions.


==Operations with ovariables and their "intelligent array" properties==
Other orders include: collapse of marginal columns by sums, means or sampling to reduce data size and passing input from model level without redefining the whole variable. It is also possible to redefine any specific variable before starting the recursive evaluation, in which case the recursion stops at the defined variable (dependencies are only fetched if they do not already exist; this is to avoid unnecessary computation).


* Merge
== Rationale ==
* Ops


==Uncertainties and Iter==
=== Basic properties and ideas ===


==Dependencies and recursive fetching of parents==
The objectives of ovariable modelling are to
* Separate actual code and parameter values, so that whenever possible, parameters are shown in tables on Opasnet pages.
* Enable straightforward equations with multidimensional objects in the same was as in Analytica<sup>TM</sup> "Intelligent arrays".
* Enable uncertainty propagation using standard Monte Carlo.
* Enable object-oriented modelling in such a way that a model itself "knows" what input it needs, so that the user does not need to worry about that.
* Enable easy scenario analysis so that the user can change any value within a model and compare the original model with the changed model, also called a "scenario" or "counterfactual world".
* Enable intuitive expression of uncertainties. For example, an input value "5 - 7" is easily understood by a reader as an uncertain thing with  possible values between five and seven. The same input is understood by the modelling system as a uniform probability distribution with min 5 and max 7, sampled by default 1000 times.


==Odecisions and applying decisions==
== See also ==


==Interpreting results==
* [[Variable]]
 
* [[Portal:Modelling with Opasnet]]
==Value of information==
* [[Modelling in Opasnet]]
 
* [[Operations with ovariables]] and their "intelligent array" properties: Merge, Ops
Calculating VOI on a routine basis for all marginal indices.
* [[Uncertainty]] and Iter
 
* [[Dependencies]] and recursive fetching of parents
==Optimising==
* [[Odecision]] and applying decisions
* [[OpasnetUtils/Interpret]]: Interpreting results
* [[Value of information]]: Calculating VOI on a routine basis for all marginal indices.
* [[Optimising]]


==See also==
== Related files ==
 
* [[Variable]]

Revision as of 09:59, 4 December 2014

For updates about modelling instructions, see Portal:Modelling with Opasnet and Modelling in Opasnet.

Ovariable is an object in R software. It is the basic building block of an open assessment.

Question

What is the structure of an ovariable such that

  • it complies with the requirements of variable and
  • it is able to implement probabilistic descriptions of multidimensional variables and
  • it is able to implement different scenarios?

Answer

The ovariable is a class (S4 object) defined by OpasnetUtils in R software system. Its purpose is to contain the current best answer in a machine-readable format (including uncertainties when relevant) to the question asked by the respective variable. In addition, it contains information about how to derive the current best answer. The respective variable may have an own page in Opasnet, or it may be implicit so that it is only represented by the ovariable and descriptive comments within a code.

It is useful to clarify terms here. Answer is the overall answer to the question asked, so it is the reason for producing the Opasnet page in the first place. This is why it is typically located near the top of an Opasnet page. The answer may contain text, tables, or graphs on the web page. It typically also contains an R code with a respective ovariable, and the code produces these representations of the answer when run. (However, the ovariable is typically defined and stored under Rationale/Calculations, and the code under Answer only evaluates and plots the result.) Output is the key part (or slot) of the answer within an ovariable. All other parts of the ovariable are needed to produce the output, and the output contains what the reader wants to know about the answer. Finally, Result is the key column of the Output table (or data.frame) and contains the actual numerical values for the answer.

Slots

The ovariable has seven separate slots that can be accessed using X@slot:

@name
  • Name of <self> (the ovariable object) is a requirement since R doesn't support self reference.
@output
  • The current best answer to the question asked.
  • A single data.frame (a 2D table type in R)
  • Not defined until <self> is evaluated.
  • Possible types of columns:
    • Result is the column that contains the actual values of the answer to the question of the variable. There is always a result column, but its name may vary; it is of type ovariablenameResult.
    • Indices are columns that define or restrict the Result in some way. For example, the Result can be given separately for males and females, and this is expressed by an index column Sex, which contains values Male and Female. So, the Result contains one row for males and one for females.
    • Iter is a specific kind of index. In Monte Carlo simulation, Iter is the number of the iteration.
    • Unit contains the unit of the Result. It may be the same for all rows, but it may also vary from one row to another. Unit is not an index.
    • Other, non-index columns can exist. Typically, they are information that were used for some purpose during the evolution of the ovariable, but they may be useless in the current ovariable. Due to these other columns, the output may sometimes be a very wide data.frame.
@data
  • A single data.frame that defines <self> as such.
  • data slot answers this question: What measurements are there to answer the question? Typically, when data is used, the result can be directly derived from the information given (with possibly some minimal manipulation such as dropping out unnecessary rows).
  • May include textual regular expressions that describe probability distributions which can be interpreted by OpasnetUtils/Interpret.
@marginal
  • A logical vector that indicates full marginal indices (and not parts of joint distributions, result columns or other row-specific descriptions) of @output.
@formula
  • A function that defines <self>.
  • Should return either a data.frame or an ovariable.
  • @formula and @dependencies slots are always used together. They answer this question: How can we estimate the answer indirectly? This is the case if we have knowledge about how the result of this variable depends on the results of other variables (called parents). The @dependencies is a table of parent variables and their identifiers, and @formula is a function that takes the results of those parents, applies the defined code to them, and in this way produces the @output for this variable.
@dependencies
  • A data.frame that contains names and Rtools or Opasnet tokens/identifiers of variables required for <self> evaluation (list of causal parents).
  • A way of enabling references in R (for in ovariables at least) by virtue of OpasnetUtils/ComputeDependencies which creates variables in .GlobalEnv so that they are available to expressions in @formula.
  • Variables are be fetched and evaluated (only once by default) upon <self> evaluation.
@ddata
  • A string containing an Opasnet identifier e.g. "Op_en1000". May also contain a subset specification e.g. "Op_en1000/dataset".
  • This identifier is used to download data from the Opasnet database for the @data slot (only if empty by default) upon <self> evaluation.
  • By default, the data defined by @ddata is downloaded when an ovariable is created. However, it is also possible to create and save an ovariable in such a way that the data is downloaded only when the ovariable is evaluated.

Decisions and other upstream orders

The general idea of ovariables is such that they should not be modified to match a specific model but rather define the variable in question as extensively as possible under it's scope. In other words, it should answer its question in a re-usable way so that the question and answer would be useful in many different situations. (Of course, this should be kept in mind already when the question is defined.) To match the scope of specific models, ovariables can be modified by supplying orders upstream (outwards in the recursion tree). These orders are checked for upon evaluation. For example decisions in decision analysis can be supplied this way:

  1. pick an endpoint
  2. make decision variables for any upstream variables (this means that you create new scenarios with particular deviations from the actual or business-as-usual answer of that variable)
  3. evaluate endpoint
  4. optimize between options defined in decisions.

Other orders include: collapse of marginal columns by sums, means or sampling to reduce data size and passing input from model level without redefining the whole variable. It is also possible to redefine any specific variable before starting the recursive evaluation, in which case the recursion stops at the defined variable (dependencies are only fetched if they do not already exist; this is to avoid unnecessary computation).

Rationale

Basic properties and ideas

The objectives of ovariable modelling are to

  • Separate actual code and parameter values, so that whenever possible, parameters are shown in tables on Opasnet pages.
  • Enable straightforward equations with multidimensional objects in the same was as in AnalyticaTM "Intelligent arrays".
  • Enable uncertainty propagation using standard Monte Carlo.
  • Enable object-oriented modelling in such a way that a model itself "knows" what input it needs, so that the user does not need to worry about that.
  • Enable easy scenario analysis so that the user can change any value within a model and compare the original model with the changed model, also called a "scenario" or "counterfactual world".
  • Enable intuitive expression of uncertainties. For example, an input value "5 - 7" is easily understood by a reader as an uncertain thing with possible values between five and seven. The same input is understood by the modelling system as a uniform probability distribution with min 5 and max 7, sampled by default 1000 times.

See also

Related files