Ovariable

From Opasnet
Revision as of 14:37, 27 November 2019 by Jouni (talk | contribs) (→‎Decformula)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
For updates about modelling instructions, see Portal:Modelling with Opasnet and Modelling in Opasnet.

Ovariable is an object in R software. It is the basic building block of an open assessment model.

Question

What is the structure of an ovariable such that

  • it complies with the requirements of variable and
  • it is able to implement probabilistic descriptions of multidimensional variables and
  • it is able to implement different scenarios?

Answer

The ovariable is a class (S4 object) defined by OpasnetUtils in R software system. Its purpose is to contain the current best answer in a machine-readable format (including uncertainties when relevant) to the question asked by the respective variable. In addition, it contains information about how to derive the current best answer. The respective variable may have an own page in Opasnet, or it may be implicit so that it is only represented by the ovariable and descriptive comments within a code.

It is useful to clarify terms here. Answer is the overall answer to the question asked, so it is the reason for producing the Opasnet page in the first place. This is why it is typically located near the top of an Opasnet page. The answer may contain text, tables, or graphs on the web page. It typically also contains an R code with a respective ovariable, and the code produces these representations of the answer when run. (However, the ovariable is typically defined and stored under Rationale/Calculations, and the code under Answer only evaluates and plots the result.) Output is the key part (or slot) of the answer within an ovariable. All other parts of the ovariable are needed to produce the output, and the output contains what the reader wants to know about the answer. Finally, Result is the key column of the Output table (or data.frame) and contains the actual numerical values for the answer.

Slots

The ovariable has seven separate slots that can be accessed using X@slot:

@name
  • Name of <self> (the ovariable object) is a requirement since R doesn't support self reference.
@output
  • The current best answer to the question asked.
  • A single data.frame (a 2D table type in R)
  • Not defined until <self> is evaluated.
  • Possible types of columns:
    • Result is the column that contains the actual values of the answer to the question of the variable. There is always a result column, but its name may vary; it is of type ovariablenameResult.
    • Indices are columns that define or restrict the Result in some way. For example, the Result can be given separately for males and females, and this is expressed by an index column Sex, which contains values Male and Female. So, the Result contains one row for males and one for females.
    • Iter is a specific kind of index. In Monte Carlo simulation, Iter is the number of the iteration.
    • Unit contains the unit of the Result. It may be the same for all rows, but it may also vary from one row to another. Unit is not an index.
    • Other, non-index columns can exist. Typically, they are information that were used for some purpose during the evolution of the ovariable, but they may be useless in the current ovariable. Due to these other columns, the output may sometimes be a very wide data.frame.
@data
  • A single data.frame that defines <self> as such.
  • data slot answers this question: What measurements are there to answer the question? Typically, when data is used, the result can be directly derived from the information given (with possibly some minimal manipulation such as dropping out unnecessary rows).
  • May include textual regular expressions that describe probability distributions which can be interpreted by OpasnetUtils/Interpret.
@marginal
  • A logical vector that indicates full marginal indices (and not parts of joint distributions, result columns or other row-specific descriptions) of @output.
@formula
  • A function that defines <self>.
  • Should return either a data.frame or an ovariable.
  • @formula and @dependencies slots are always used together. They answer this question: How can we estimate the answer indirectly? This is the case if we have knowledge about how the result of this variable depends on the results of other variables (called parents). The @dependencies is a table of parent variables and their identifiers, and @formula is a function that takes the results of those parents, applies the defined code to them, and in this way produces the @output for this variable.
@dependencies
  • A data.frame that contains names and Rtools or Opasnet tokens/identifiers of variables required for <self> evaluation (list of causal parents). The following columns may be used:
    • Name: name of an ovariable or a constant (one-row numerical vector) found in the global environment (.GlobalEnv).
    • Key: the run key (typically a 16-character alphanumeric string) to be used in objects.get() function where the run contains the dependent object.
    • Ident: Page identifier and rcode name to be used in objects.latest() function where the newest run contains the dependent object. Syntax: "Op_en6007/answer".
  • A way of enabling references in R (for in ovariables at least) by virtue of OpasnetUtils/ComputeDependencies which creates variables in .GlobalEnv so that they are available to expressions in @formula.
  • Variables are fetched and evaluated (only once by default) upon <self> evaluation.
@ddata
  • A string containing an Opasnet identifier e.g. "Op_en1000". May also contain a subset specification e.g. "Op_en1000/dataset".
  • This identifier is used to download data from the Opasnet database for the @data slot (only if empty by default) upon <self> evaluation.
  • By default, the data defined by @ddata is downloaded when an ovariable is created. However, it is also possible to create and save an ovariable in such a way that the data is downloaded only when the ovariable is evaluated.

Decisions and other upstream orders

The general idea of ovariables is such that they should not be modified to match a specific model but rather define the variable in question as extensively as possible under it's scope. In other words, it should answer its question in a re-usable way so that the question and answer would be useful in many different situations. (Of course, this should be kept in mind already when the question is defined.) To match the scope of specific models, ovariables can be modified by supplying orders upstream (outwards in the recursion tree). These orders are checked for upon evaluation. For example decisions in decision analysis can be supplied this way:

  1. pick an endpoint
  2. make decision variables for any upstream variables (this means that you create new scenarios with particular deviations from the actual or business-as-usual answer of that variable)
  3. evaluate endpoint
  4. optimize between options defined in decisions.

Other orders include: collapse of marginal columns by sums, means or sampling to reduce data size and passing input from model level without redefining the whole variable. It is also possible to redefine any specific variable before starting the recursive evaluation, in which case the recursion stops at the defined variable (dependencies are only fetched if they do not already exist; this is to avoid unnecessary computation).

Decformula

There is a need to develop online models where a user can adjust ovariable values using sliders. However, all ovariables within a model should stay consistent even if one or several ovariables are changed by hand. Here we describe how that is done.

We use odecision objects for this, just like for developing scenarios. An online slider simply changes the value in the dectable of that ovariable, so that the BAU value gets replaced by the user-defined value. So, there must be an odecision for each ovariable that has a slider on a user interface. These OpasnetUtils functionalities have existed since version 1.0. (Here, we are not describing the code-level front-end functionality on a web page that actually changes the value within a dectable; this is probably straightforward in Shiny. This description discusses the approach on a conceptual level.)

One new functionality is needed: decformula that describes changes in upstream of an ovariable that is changed by the user. For example, if ovariable C is defined as A * B, it must be unambiguous what happens if the value of C is manipulated. One solution is to define decformula A = C / B implying that A changes with C, but B stays intact because it has no decformula. In practice, this procedure is used in this example:

  1. When a user pulls the slider C on a user interface, the respective value in the dectable of decC object is changed from Identity= <empty> to Replace=slidervalue.
  2. C is re-evaluated even if it has been evaluated already, so the openv$C$dec_check is ignored. Note that if dectable has rows that Add or Multiply, problems will occur.
  3. Everything directly upstream from C is checked for decformulas. If a decformula containing C is found, the decformula is evaluated using the values. In this example, the value in the decA is changed to C / B. decB remains unchanged.
    1. Note that decformulas may define plausible ranges for variables, e.g. that the value cannot be negative. These non-linearities may lead to situations where the new value (of A) does not lead to the value (of C) set by the user.
  4. All ovariables that were changed (in this case, A) are checked for their upstream by repeating steps 2 and 3.
  5. The model is re-evaluated, to ensure that all values are now plausible and, if possible, according to the new scenario based on user-defined values. However, if the user defined an implausible combination of values, the result (of C) will change during re-evaluation from the used-defined value to a plausible one, and the slider will jump to that place.

A completely different approach can be taken if time is not an issue: the whole model can be described as Bayesian belief network with enough iterations to enable user-defined conditioning on variables. However, the model becomes very heavy quickly if the number of sliders increase, as the number of iterations needed increases exponentially.

A third approach is to develop a Unicorn-type Bayesian belief network where all nodes are converted to multinormal distributions, and conditioning is solved analytically.

Rationale

Basic properties and ideas

The objectives of ovariable modelling are to

  • Separate actual code and parameter values, so that whenever possible, parameters are shown in tables on Opasnet pages.
  • Enable straightforward equations with multidimensional objects in the same was as in AnalyticaTM "Intelligent arrays".
  • Enable uncertainty propagation using standard Monte Carlo.
  • Enable object-oriented modelling in such a way that a model itself "knows" what input it needs, so that the user does not need to worry about that.
  • Enable easy scenario analysis so that the user can change any value within a model and compare the original model with the changed model, also called a "scenario" or "counterfactual world".
  • Enable intuitive expression of uncertainties. For example, an input value "5 - 7" is easily understood by a reader as an uncertain thing with possible values between five and seven. The same input is understood by the modelling system as a uniform probability distribution with min 5 and max 7, sampled by default 1000 times.

See also

Related files