Variable

From Opasnet
Revision as of 11:35, 7 June 2008 by Jouni (talk | contribs) (definition updated)
Jump to: navigation, search

<accesscontrol>members of projects,,Workshop2008,,beneris,,Erac,,Heimtsa,,Hiwate,,Intarese</accesscontrol> <section begin=glossary />

Variable is a description of a particular piece of reality. It can be a description of physical phenomena, or a description of value judgments. Also decisions included in an assessment are described as variables. Variables are continuously existing descriptions of reality, which develop in time as knowledge about them increases. Variables are therefore not tied into any single assessment, but instead can be included in other assessments. Variable is the basic building block of describing reality.<section end=glossary />

In order to make coherent descriptions of reality in assessments, the assessments must have a certain clear structure. As we also want to produce descriptions that are coherent between assessments, there must be a universal structure for all assessments. Variables with a certain set of attributes, and linkages between these variables is the universal structure of the assessments. For further details, see. The universal assessment structure is essential for coherent inclusion of causality in assessments, enabling of collective structured learning, collaborative work as well as combining value judgments with descriptions of physical reality.

Variable structure

In the new risk assessment method, variables have a specified structure with four basic attributes (and possibly some sub-attributes). The attributes of variables are the same as for other objects in the information structure of pyrkilo method, i.e. assessments and classes.


Name attribute is the identifier of the variable, which of course already more or less describes what the real-world entity the variable describes is. The variable names should be chosen so that they are descriptive, unambiguous and not easily confused with other variables. An example of a good variable name could be e.g. daily average of PM2.5 concentration in Helsinki.

Scope attribute defines the boundaries of the variable - what does it describe and what not? The boundaries can be e.g. spatial, temporal or abstract. In the above example variable, at least the geographical boundary restricts the coverage of the variable to Helsinki and the considered phenomena are restricted to PM2.5 daily averages. There could also be some further boundary settings defined in the scope of the variable, which are not explicitly mentioned in the name of the variable. Definition attribute of a variable describes how the result of the variable is derived. It consists of sub-attributes to describe the causal relations, data used to estimate the result, and the mathematical formula to calculate the result. Also alternative identified ways to derive the variable result can be described in the definition attribute as reference. The minimum requirement for defining the causality in all variables is to express the potential existence of a causal relation, i.e. that a change in an upstream variable possibly affects the variables downstream.

Definition has four sub-attributes that have particular purposes in the method:

Causality
Causality tells what we know about how upstream values affect our variable. This sub-attribute lists the upstream variables (i.e. causal parents) of the variable. It expresses their functional relationships (this variable as a function of its parents) or probabilistic relationships (conditional probability of this variable given its parents). The expression of causality is independent of the data there exists about the magnitude of the result of this variable.
Data
Data tells what we know about the magnitude of the result of this variable. This sub-attribute describes any non-causal information about the variable, such as measured data about the variable itself, measured data about an analogous situation (this requires some kind of error model), or expert judgments about the result.
Formula
Formula D↷ is the actual computer code that calculates what is described under titles Causality and Data, making a synthesis of the two. In a general form, the formula can be described as
result = formula(parent parameters, data parameters), 

where formula is the function (expressed as computer code for a specified software) for calculating the result using the parent parameters (information from causally upstream variables) and the data parameters (information from observed data) as input.

Unit
Unit attribute describes, in what units the result is presented. The units of interconnected variables need to be coherent with each other in a causal network description. The units of variables can be used to check the coherence of the causal network description by the unit test (see below for explanation).

Variable definition.PNG

Result attribute is an answer to the question presented in the scope of the variable. A result is preferably a probability distribution (which can in a special case be a single number), but a result can also be non-numerical such as "very good". It should be noted that the result is the distribution itself, although it can be expressed as some kind of description of the distribution, such as mean and standard deviation. The result should be described in such a detailed way that the full distribution can be reproduced from the information presented under this attribute. A technically straightforward way to do this is to provide a large random sample from the distribution.

The result may be a different number for different locations, such as geographical positions, population subgroups, or other determinants, Then, the result is described as

  R|x1,x2,... 

where R is the result and x1 and x2 are defining the locations. A dimension means a property along which there are multiple locations and the result of the variable may have different values when the location changes. In this case, x1 and x2 are dimensions, and particular values of x1 and x2 are locations. A variable can have zero, one, or more dimensions. Even if a dimension is continuous, it is usually operationalised in practice as a list of discrete locations. Such a list is called an index, and each location is called a row of the index.

Uncertainty about the true value of the variable is one dimension. The index of the uncertainty dimension is called the Sample index, and it contains a list of integers 1,2,3... . Uncertainty is operationalised as a sequence of random samples from the probability distribution of the result. The ith random sample is located in the ith row of the Sample index.


General attribute structure

Each attribute may contain three kinds of information:

  • Actual content (only this will have an impact on other objects)
  • Narrative description (to help understanding the actual content). Includes uncertainty analysis.
  • Discussion (argumentation about issues in the actual content)

For a detailed description of discussions, see Dealing with disputes.

Connection to the PSSP structure

A universal information structure has been suggested. This is called PSSP (Purpose, Structure, State, Performance). PSSP describes the attributes of universal objects, whereas pyrkilo method is intended for describing particular objects in the context of risk assessment. The variable structure is closely connected to PSSP, and the relationships can be described in the following way.

PSSP Variable structure
Purpose The purpose of a variable is to describe a particular piece of reality.
Structure Scope, Unit, and Definition describe the structure of the variable.
State Result is an expression of the state of the variable.
Performance Performance is an expression of the uncertainty of the variable, i.e. how well does the variable fulfill its purpose, i.e. describe the piece of reality defined in the scope. On variable-level performance is evaluated separately for result (parameter uncertainty) and definition (model uncertainty). However, evaluating the performance of a scope of a variable can not be done on the variable-level, but instead on assessment-level.

There are different kinds of variables

Although all variables share the same basic structure, it is useful to distinguish different kinds of variables based on their use or position in a risk assessment.

  • Endpoint variables are variables that describe phenomena which are outcomes of the assessed causal network, i.e. there is no variables downstream from an endpoint variable according to the scope of the assessment. In practice endpoint variables are most often also chosen as indicators.
  • Intermediate variables include all other variables besides endpoint variables.
  • Key variable D↷is a variable which is particularly important in carrying out the assessment successfully and/or assessing the endpoints adequately.
  • Indicator is a variable that is particularly important in relation to the interests of the intended users of the assessment output or other stakeholders. Indicators are used as means of effective communication of the assessment results. Communication here refers to conveying information about certain phenomena of interest to the intended target audience of the assessment output, but also to monitoring the statuses of the certain phenomena e.g. in evaluating effectiveness of actions taken to influence that phenomena. In the context of integrated assessment indicators can generally be considered as pieces of information serving the purpose of communicating the most essential aspects of a particular risk assessment to meet the needs of the uses of the assessment. Indicators can be endpoint variables, but also any other variables located anywhere in the causal network.
  • Decision variables are possible decisions that are in consideration within a risk assessment. The main interest of the assessment is then the comparison of outcomes resulting from the different decision options. More of decision variables can be found from a separate page.


Variables are versatile objects. They are able to describe all of the following aspects of reality:

  • Causal relationships linking variables in the different steps in the causal chain from source to impact (mainly in the definition/causality attribute);
  • Different environmental, social, economic and infrastructural contexts in which risks might arise and play out (mainly in the scope attribute);
  • Physical and chemical processes that generate, transform and transport the hazards (agents) from source to the target organs in the human body (mainly as variables that are defined as functions);
  • Indicators to describe and communicate the causal chain and impacts (variables selected for reporting);
  • Different policy measures that might be taken to address the risks, and thus different assessment scenarios that might be compared (decision variables);
  • Appraisal of the impacts (and the policy scenarios to which they relate), in the light of agreed value systems and rules for evaluation (variables describing value judgements or derived from value judgement variables).
  • Adaptation and feedback loops arising as a result of adaptation to the risks, at both individual and institutional level. A feedback loop is described as a variable that is indirectly dependent on the result of itself at a previous time point.

Ideally, all variables in the full-chain can be expressed quantitatively. In order to use the full chain approach quantitatively in an integrated assessment, it is necessary to acquire data for the variables, or to estimate these variables by modelling the underlying causal processes.

Proxies are not indicators

The term indicator is sometimes also (mistakenly, in the eyes of the new risk assessment method) used in the meaning of a proxy. Proxies are used as replacements for the actual objects of interest in a description if adequate information about the actual object of interest is not available. Proxies are indirect representations of the object of interest that usually have some identified correlation with the actual object of interest. At least within the context of the new risk assessment method, proxy and indicator have clearly different meanings and they should not be confused with each other. The figure below attempts to clarify the difference between proxies and indicators:


Indicators and proxies.PNG


In the example, a proxy (PM10 site concentration) is used to indirectly represent and replace the actual object of interest (exposure to traffic PM2.5). Mortality due to traffic PM2.5 is identified as a variable of specific interest to be reported to the target audience, i.e. selected as an indicator. The other two nodes in the graph are considered as ordinary variables. The above graph has been made with Analytica, here is the File:Indicators and proxies.ANA.

Specifying indicators and other variables

When the endpoints, indicators and key variables have been identified, they should be specified in more detail. Additional variables are created and specified in addition to the endpoints, indicators and key variables as is necessary to complete the causal network. Specifying these variables means defining the contents of the attributes of each variable. The four plausibility tests are very useful in specifying variables.

Plausibility tests are procedures that clarify the goodness of variables in respect to some important properties, such as measurability, coherence, and clarity. The four plausibility tests are clairvoyant test, causality test, unit test, and Feynman test.

  1. Clairvoyant test (about the ambiguity of a variable): If a putative clairvoyant (a person that knows everything) is able to answer the question defined in the scope attribute in an unambiguous way, the variable is said to pass this test. The answer to the question is equal to the contents of the result attribute.
  2. Causality test (about the nature of the relation between two variables): If you alter the value of a particular variable (all else being equal), those values that are altered are said to be causally linked to the particular value. In other words, they are directly downstream in the causal chain, or children of the particular variable.
  3. Unit test (the coherence of the variable definitions throughout the network): The function defining a particular variable must result (when the upstream variables are used as inputs of the function) in the same unit as implied in the scope attribute and defined in the unit attribute.
  4. Feynman test (about the clarity of description): If you cannot explain it to your grandmother, you don't understand it well enough yourself. (According to the quantum physicist and Nobel laureate Richard Feynman.)

The specification of variables proceeds in iterative steps, going into more detail as the overall understanding of the assessed phenomena increases. First, it is most crucial to specify the scopes (and names) of the variables and their causal relations. As part of the specification process, in particular the name and scope attributes, the clairvoyant test can be applied. The test helps to ensure the clarity and unambiguity of the variable scope.

Addressing causalities means in practice that all changes in any variable description should be reflected in all the variables that the particular variables is causally linked to. At this point, the causality test can be used, although not always necessarily quantitatively. In the early phases of the process, it is probably most convenient to describe causal networks as diagrams, representing the indicators, endpoints, key variables and other variables as nodes (or boxes) and causal relations as arrows pointing from upstream variables to downstream variables. In the graphical representations of causal networks the arrows are only statements of existence of a causal relation between particular variables, more detailed definitions of the relations should be described within the definition attribute of each variable according to how well the causal relation is known or understood.

Once a relatively complete and coherent graphical representation of the causal network has been created, the specification process for the identified indicators may continue to more detail. The indicators, the leading variables, are of crucial importance in the assessment process. If, during the specification process, it turns out that the indicator would conflict with one or several of the properties of good indicators, such as calibration, it may be necessary to consider revising the scoping of the indicator or choosing another leading variable in the source - impact chain to replace it. This may naturally bring about a partial revision of the whole causal network affecting a bunch of key variables, endpoints and indicators. For example, it may happen that no applicable exposure-response function is available for calculating the health impact from intake of ozone. In this case, the exposure-response indicator may be replaced with an intake fraction indicator affecting both the downstream and upstream variables in the causal network in the form of e.g. bringing about a need to change the units the variables are described in.

The description, unit and definition attributes are specified as is explained in the previous section. The unit test can be applied to check the calculability, and thus descriptive coherence, of the causal network. When all the variables in the network appear to pass the required tests, the indicator and variable results can be computed across the network and the first round of iteration is done. Improvement of the description takes place through deliberation and re-specification of the variables, especially definition and result attributes, until an adequate level of quality of description throughout the network has been reached. The discussion attribute provides the place for deliberating and documenting deliberation throughout the process.

Importance of indicators in the assessment process

Indicators have a special role in making the assessment. As mentioned above, indicators are the variables of most interest from the point of view of the use, users and other audiences of the assessment. The idea thus behind the indicator selection, specification and use is to highlight the most important and/or significant parts of the source-impact chain which are to be assessed and subsequently reported. The selected set of indicators guides the assessment process to address the relevant issues within the assessment scope according to the purpose of the assessment. It could be said that indicators are the leading variables in carrying out the assessment, other variables are subsidiary to specifying the indicators.

However, within the context of integrated risk assessment, selecting and specifying indicators may sound more straightforward than it actually is. Maybe, identification of indicators and specification of the causal network in line with the identified indicators, could grasp the essence of the process better. Instead of merely picking from a predefined set of indicators, selection here refers rather to identifying the most interesting phenomena within the scope of the assessment in order to describe and report them as indicators. Specification of indicators then is similar to specification of all other variables, although indicators are the ones that are primarily considered while other variables are considered secondarily, and mainly in relation to the indicators.

In principle, any variable could be chosen as an indicator and the set(s) of indicators could be composed of any types of indicators across the full-chain description. In practice, the generally relevant types of indicators, such as performance indicators can be somewhat predefined and even some detailed indicators can be defined in relation to commonly existing purposes and user needs. This kind of generality is also helpful in bringing coherence between the assessments.

On the generalizability of variables

Aim: Variables must be generalizable so that they can be used without additional knowledge of the context. In other words, the context must be described well enough inside the variable.

→ Because of this, the variables must be estimates about the truth, and not deliberate under- or overestimates. Biased estimates are common in risk assessment because usually the assessments want to avoid false negative results much more than false positive results. In other words, it is much worse if there is a risk and you don't find it than if there is no risk and you think there is.

→ Decisions may be based on risk aversion, but the estimates of variables must be best estimates, because you cannot know which decisions will be based on the variable.

Function in the pyrkilo method is a special case of a variable that has its parameters defined outside the variable itself. A simple example is variable Area of a rectangle, which is defined as Width*Height. This function can used within another variable, e.g. Area of Jouni's table, which is defined as Area of a rectangle(1.5 m, 0.8 m), and the result is 1.2 m2.

Note that any variable can be used as a function by replacing its original input parameters with other parameters.


Dimensions of a variable

Several variables may share the same dimension. One variable may use several indexes along the same dimension. Therefore, several variables may share the same index or use different indexes along a particular dimension. This creates a need to handle index conversions within dimensions. Although the following goes into technical details that have not yet really been sorted out and tested, it is discussed here for completeness.

  • A variable may have an interpolation function for a dimension. This is a function that defines how a value can be calculated for a new location based on values of other locations along that dimension. The function is used to transform the variable from one index to another index along a particular dimension.
    • The function may utilise several dimensions at the same time, such as in two-dimensional spatial transformations.
    • The interpolation function may be deterministic or probabilistic. If it is probabilistic, it is enough to take one sample for each row of the Sample index.
  • There may be correlation functions. These functions tell how two or more variables are related to each other across the uncertainty dimension. Vines (hierarchical rank correlations) are examples of these functions.


Technical issues in Mediawiki

  • Each variable is a page in the Variable namespace. The name of the variable is also the name of the page. However, draft variables may be parts of other pages.
  • The scope is the first paragraph(s) on the page, before the first sub-title. Scope starts with the word Scope in the previous line (wiki code '''Scope'''<br>. The name should be repeated in the beginning of scope in bold, followed by text "describes..." and then a description of the scope (whenever the contents fits in this format). Subtitles are NOT used with Scope; this way, it locates above the table of contents.
  • All other attributes (unit, definition, result) are second-level (==) sub-titles on the page.
  • Description of the attribute content is added at the end of that content; discussions on the content are added to the Talk page, each discussion under an own descriptive title.
  • References to external sources are added to the text with the <ref>Reference information</ref> tag. The references are located in the end of the page under subtitle References. However, reference is not an attribute of the variable despite it is technically similar.
  • In the formula, computer code for a specific software may be used. The following are in use.
    • Analytica_id: Identifier of the respective node in an Analytica model. <anacode>Place your Analytica code here. Use double Enter to make a line break.</anacode>
    • <rcode>Place you R code here. Use double Enter to make a line break.<rcode>

Event-substance

Variables are objects of event-medium composite -type. They thus describe both the events that occur within the scope of the variable and the medium where these particular events take place. In practice, the events can only be observed through the changes in the state of the medium, and it is therefore reasonable to describe the events and particular media as such composites rather than as separately.

In pyrkilo method all the variables included in an assessment must be causally related , directly or indirectly, to the endpoints of the assessment, and the causal relations must be defined. The event-media structure is the carrier of the cause-effect relations between the variables. An event occuring in a medium causes a change in state of that medium leading to another event to occur changing the state of that medium, causing yet another event to occur and so on. In addition to variables, also classes as generalizations of properties possessed by variables can be causally related to each other.


See also