Modelling in Opasnet: Difference between revisions
(→Ovariables: a little expansion) |
m (→Main features) |
||
Line 20: | Line 20: | ||
**customizable, a knowledgeable user can take over any automated model part and resume automation for the rest of the model | **customizable, a knowledgeable user can take over any automated model part and resume automation for the rest of the model | ||
** | ** | ||
*All aspects of the Opasnet Modelling environment are free and open source. | |||
==Usage== | ==Usage== |
Revision as of 12:15, 7 August 2013
Moderator:Jouni (see all) |
This page is a stub. You may improve it into a full page. |
Upload data
|
Opasnet modelling environment is an open, web-based platform for collaboratively developing numerical models.
Opasnet wiki-R modeling platform
Introduction
Opasnet has intergrated tools for building and running easily accessible statistical models in the wiki. The platform is completely modular and individual variables are perfectly reusable.
Main features
- Wiki - pages provide a natural analogy to variables in a statistical model. They contain descriptive information as well as necessary meta data (i.e. scope).
- R - is an open sourced statistical programming language akin to for example Matlab. The Opasnet wiki has an extension (R tools) to include executable R scripts on any page. The output is displayed in html format as an applet or seperate tab.
- Database - MongoDB is used to store variable related data.
- Interfaces - All these different components obviously need to work together and we have built interface solutions for each of the combinations: R-tools for running wiki integrated R scripts, table2base for uploading wiki tables to the database, OpasnetBase wiki extension for showing database entries in the wiki and the opbase script family for communication between R and the database.
- OpasnetUtils - is an R package (library), which contains tools for building mathematical models within our modeling framework which is described below in detail. OpasnetUtils is completely:
- modular
- recursive
- customizable, a knowledgeable user can take over any automated model part and resume automation for the rest of the model
- All aspects of the Opasnet Modelling environment are free and open source.
Usage
Mathematical models consist of variables, which may be known or uknown and/or can be derived from other variables using more models. Modeling in Opasnet is variable-centric. Since variables are defined universally, they should be resusable in all other models (partly or wholly). Though naturally more complex models with extremely large datasets will need more customized and static definitions to run efficiently. In practice known variables can be defined by writing parseable tables (t2b) in the wiki pages or uploading datasets directly to the OpasnetBase and downloading them in the variable defining R code or using some existing datatools within packages installed on the Opasnet server i.e. Scraper. Latent variables are usually defined using R code that depends on other defined variables, which should be listed under the dependecies of that variable. Variables can be defined both latently and by data. Both definitions are stored and can be compared.
OpasnetUtils features
The modeling itself is done using R and the OpasnetUtils package provides most of the actual tools.
Ovariables
The ovariable is a class defined by OpasnetUtils. It has eight separate "slots" that can be accessed using X@slot:
- name
- Name of <self> is a requirement since R doesn't support self reference.
- output
- Current definition of <self>.
- A single data.frame (a 2D table type in R)
- Not defined until <self> is evaluated.
- data
- A single data.frame that defines <self> as such.
- May include textual regular expressions that describe probability distributions which can be interpreted by OpasnetUtils/Interpret.
- marginal
- A logical vector that indicates full marginal indices of @output.
- formula
- A function that defines <self>.
- Should return either a data.frame or an ovariable.
- dependencies
- A data.frame that contains names and Rtools or Opasnet tokens/identifiers of variables required for using @formula.
- Dependencies will be fetched and evaluated upon <self> evaluation.
- ddata
- A string containing an Opasnet identifier (i.e. Op_en1000).
- This identifier is used to download data from the Opasnet database for the @data slot upon <self> evaluation.
The general nature of ovariables means that they should not be modified to match a specific model but rather define the variable in question as extensively as possible under it's scope. To match the scope of specific models variables can be modified by supplying orders upstream (outwards in the recursion tree). These orders are checked for upon evaluation. For example decisions in decision analysis can be supplied this way:
- pick an endpoint
- make decision variables for any upstream variables
- evaluate endpoint
- optimize between options defined in decisions.
Other orders include: collapse of marginal columns by sums, means or sampling to reduce data size and passing input from model level without redefining the whole variable. It is also possible to redefine any specific variable before starting the recursive evaluation, in which case the recursion stops at the defined variable (dependencies are only fetched if they do not already exist to avoid unnecessary computation).
Utilising ovariables
Defining and analyzing endpoints of a model can be as easy as Fetching a relevant variable, evaluating it (using EvalOutput) and using some of the available functions (i.e. summary() for ovariables).
...
Page structure for modelling pages
This is a plan for an improved page structure for pages related to modelling, databases, and codes in Opasnet.
Portal:Modelling with Opasnet Main page. Contains a brief introduction and links to the content.
- Practices
- Modelling in Opasnet Primary page of modelling instructions.
- Using Opasnet in an assessment project Motivation page, not really a method.
- Producing result from rationale An attempt for a systematic approach to info and data management.
- Object-oriented programming in Opasnet A page with an outdated training session from May 2012, and a table that should be moved to Ovariable. ----#: . Merge with Modelling in Opasnet? --Jouni 17:30, 8 July 2013 (EEST) (type: truth; paradigms: science: comment)
- Open assessment An encyclopedia page about open assessment.
- Open assessment method A method page about open assessment. Should be updated with fi:Tekaisu-menetelmä.
- Open Assessors' Network An encyclopedia page about Avary.
- Contributing to a discussion A method of pragma-dialectic discussion.
- Operating intelligently with multidimensional arrays in R A redirected page.
- A Tutorial on R A generic R guidance for beginners. Does not contains Opasnet-specific stuff.
- Tools
- R-tools Technical description of the R-tools structure and parameters.
- Opasnet base 2 Technical description of the Opasnet Base 2 structure.
- Opasnet base · Opasnet base structure Description and technical description of the old Opasnet Base, respectively. ⇤--#: . These should be redirected to new pages and permanent links put to the archive link page. --Jouni 17:30, 8 July 2013 (EEST) (type: truth; paradigms: science: attack)
- OpasnetUtils ----#: . Should contain a brief description of the package and its functionalities. --Jouni 17:30, 8 July 2013 (EEST) (type: truth; paradigms: science: comment)
- OpasnetBaseUtils ⇤--#: . Should be archived after relevant parts have been merged with OpasnetUtils. --Jouni 17:30, 8 July 2013 (EEST) (type: truth; paradigms: science: attack)
- Saved R objects A technical list of objects saved. Needed for the function objects.latest().
- Opasnet Base Connection for R ⇤--#: . Should be archived. --Jouni 17:30, 8 July 2013 (EEST) (type: truth; paradigms: science: attack)
- Opasnet Basic idea of Opasnet. Some parts should be moved to Contributing to Opasnet.
- Welcome to Opasnet A welcome page that should be urgently updated.
- Contributing to Opasnet Practical guidance for participation ⇤--#: . Actually, not very practical. --Jouni 17:30, 8 July 2013 (EEST) (type: truth; paradigms: science: attack)
- Frequently asked questions about Opasnet A part of the general description.
- What is improved by Opasnet and open assessment? ----#: . Should be merged with some page? --Jouni 17:30, 8 July 2013 (EEST) (type: truth; paradigms: science: comment)
- Opasnet structure Described different parts of Opasnet ⇤--#: . Should be updated. --Jouni 17:30, 8 July 2013 (EEST) (type: truth; paradigms: science: attack)
- Help:Opasnet policies Part of Help pages. Update in line with them, not in this process.
- Task list for Opasnet An automatic list of tasks related to Opasnet. ⇤--#: . Check and clean, mostly outdated tasks. --Jouni 17:30, 8 July 2013 (EEST) (type: truth; paradigms: science: attack)
- Attributes Technical variable. Add performance into the table, related to rating bar, discussion page, comments etc.
- Universal objects Update to more user friendly format. · PSSP Update. Not essential reading.
- Assessment Guidance for assessment structure. Update. · Category:Assessments
- Variable Update lists like in assessment. · Category:Variables Basically just describes the structure. Not essential reading.
- Study Update lists like in assessment. · Category:Study
- Ovariable Describes the structure of an ovariable and the main use of each attribute. Essential reading for those who want to understand the concept.
- Method Update lists like in assessment. · Category:Method
Question
How should modelling be done in Opasnet in practice? This page should be a general guidance on principles, not a technical manual for using different tools.
What should be the main functionalities of Opasnet modelling environment such that
- it supports decision analysis,
- it supports BBNs and Bayesian inference,
- it mainly contains modelling functionalities for numerically describe reality but
- it is also possible to numerically describe scenarios (i.e., deliberate deviations from the truth in order to be able to compare two alternative worlds that are the same in other respect than the deliberate deviation).
Answer
For a general instruction about contributing, see Contributing to Opasnet.
Obs | Property | Guidance |
---|---|---|
1 | Structure | Answer should be a data table either on the page or uploaded to Opasnet Base using R code. |
2 | Structure | The indices should logically match those of parent objects. |
3 | Applicability | The question of an object should primarily be tailored according to the particular needs of the assessment under work, and only secondarily to general use. |
4 | Coherence | The Answer of an object should be coherent with all information used in the object. In addition, it should be coherent with all other objects. If some information in another object affects the answer of this object, a link to the other object should be placed under Rationale, and specifically under Dependencies if there is a causal connection. |
5 | Coherence | Ensuring coherence is a huge task. Therefore, simple things should be done first and more laborious only if there is a need. The order in which things should be done is usually this: a) Search for similar objects and topics in Opasnet. b) If found, make links to them in both directions. c) Discuss the related info in Rationale. d) Include the info in calculations of the Answer. e) Merge the two related objects into one larger object that contains all information from the two objects and that is internally coherent. |
6 | Coherence | When you find two (or more) pieces of information about one topic, but the pieces are inconsistent, describe the Answer in this way (from simple to complex): a) Describe qualitatively what was found. b) Describe the Answer quantitatively as a list of possible hypotheses, one hypothesis for each piece of information. c) Describe the hypotheses probabilistically by giving the same probability to each hypothesis. d) Using expert judgement and/or open critical discussion, adjust probabilities to give less weight to less convincing hypotheses. e) Develop a probabilistic model that explicitly describes the hypotheses based on our understanding about topic itself and the quality of the info, and use the info as input data. |
7 | Multi-site assessment | When several similar assessments are to be performed for several sites, the structure of the assessments should contain a) a single page for the multi-site assessment, including code that has the site name as input, b) a single summary page containing a list of all sites and their individual pages, structured as a data table, c) an individual page for each site containing a data table with all site-specific parameter values needed in the assessment. |
8 | Formula | Whenever possible, all computing code should be written in R. |
9 | Formula | Instead of copying the same code to several pages, the multi-site assessment approach should be used. Alternatively, #include_code function should be used (after it has been finalised and functional). |
10 | Formula | Some procedures repeat themselves over and over again in impact assessments. These can be written as functions. Common or important functions can be included in libraries that are available in R-tools. Search for R-tools libraries so that you learn to use the same functions as others do. |
11 | Formula | When you develop your own functions with a general purpose, you should suggest your own functions to be added to an R-tools library. |
12 | Preferred R code | Objects should be described as data.frames. The use of arrays is discouraged. |
13 | Preferred R code | Probabilistic information is incorporated in a data.frame using index Run, which contains the number of Monte Carlo iteration from 1 to n (samplesize). |
14 | Preferred R code | Graphs are drawn using ggplot2 graphics package. |
15 | Preferred R code | Uploading to and downloading from Opasnet Base is done using OpasnetBaseUtils package. Uploading is only possible from some computers with specific IP. |
16 | Preferred R code | When possible and practical, summary parameters and sub-object functions are performed with the tapply family functions. |
17 | Preferred R code | Two data.frames with one or more common indices are put together using the merge function. |
18 | Preferred R code | When two data.frames have identical rows (or columns), they are put together with the cbind, which adds more columns, (or rbind, which adds more rows) function. |
Links related to the answer: Data table Opasnet Base R Parent object Child object R-tools OpasnetBaseUtils ggplot2 tapply merge data.frame rbind cbind
Note! The text talks about objects, which means any information objects. The most common objects are variables.
Relationship of Answer and Rationale
All variable pages should have a clear question and a clear answer. The answer should typically be in a form of a data table that has all indices (explanatory columns) needed to make the answer unambiguous and detailed enough. If the answer table is very large, it might be a bad idea to show it on the page; instead, a description is shown about how to calculate the answer based on Dependencies and Rationale, and only a summary of the result is shown on the page; the full answer is saved into Opasnet Base.
The answer should be a clear and concise answer to the specific question, not a general description or discussion of the topic. The answer should be understandable to anyone who has general knowledge and has read the question.
In addition, the answer should be convincing to a critical reader who reads the following data and believes it is correct:
- The Rationale section of the page.
- The Answer sections of all upstream variables listed in the Dependencies section.
- In some cases, also downstream variables may be used in inference (e.g. in hierarchical Bayes models).
It should be noted that the data mentioned above should itself be backed up by original research from several independent sources, good rationale etc. It should also be noted that ALL information that is needed to convince the reader should be put into the places mentioned and not somewhere else. In other words, when the reader has read the rationale and the relevant results, (s)he should be able to trust that s(he) is now aware of all such major points related to the specific topic that have been described in Opasnet.
This results in guidance for info producers: if there is a relevant piece of information that you are aware of but it is not mentioned, you should add it.
Indices of the data table
The indices, i.e. explanatory columns, should match in variables that are causally connected by a causal diagram (i.e., mentioned in Dependencies). This does not mean that they must be the same (as not all explanations are relevant for all variables) but it should be possible to see which parts of the results of two variables belong together. An example is a geographical grid for two connected variables such as a concentration field of a pollutant and an exposure variable for a population. If the concentration and population use the same grid, the exposure is easy to compute. However, they can be used together with different grids, but then there is a need to explain how one data can be converted into the other grid for calculating exposures.
Increasing preciseness of the answer
This is a rough order of emphasis that could guide the work when starting from scratch and proceeding to highly sophisticated and precise answers. The first step always requires careful thinking, but if there are lots of easily available data, you may proceed through steps 2 - 4 quickly; with very little data it might be impossible to get beyond step 3.
- Describe the variables, their dependencies and their indices (explantaions) to develop a coherent and understandable structure and causal diagram.
- Describe the variables as point estimates and simple (typically linear) relations to make the first runnable model. Check that all parts make sense. Check that all units are consistent. Whether all values and results are plausible is desirable but not yet critical.
- Describe the variables as ranges in such a way that the true value is most likely within the range. This is more important than having a very precise range (and thus higher probability not covering the truth). This may result in vague conclusions (like: It might be a good idea to do this, but on the other hand, it might be a bad idea). But that's exactly how it should be: in the beginning, we should be uncertain about conclusions. Only later when we collect more data and things become more precise, also the conclusions are clarified. At this step, you can use sensitivity analyses to see where the most critical parts of your model are.
- The purpose of an assessment model is to find recommendations for actions. Except for the most clear cases, this is not possible by using variable ranges. Instead, probability distributions are needed. Then, the model can be used in optimising, i.e., finding optimal decision combinations.
- When you have your model developed this far, you can use the Value of information analysis (VOI analysis) to find the critical parts of your model. The difference to a sensitivity analysis is that a VOI analysis tests which parts would change your recommendation, not which parts would change your estimate of outcome. Often the two analyses point to the same direction, but a VOI analysis is more about what you care, while a sensitivity analysis can be performed even if no explicit decision has yet been clarified.
Rationale
A draft based on own thinking. Not even the topics are clear yet.
Montako lukiolaista tarvitaan korvaamaan 1 asiantuntija? Laske tehokas asiantuntijan opiskeluaika ja se osuus joka siitä tarvitaan ratkaisemaan kyseinen ongelma
Arvaus: 10. Asiantuntijat halveksivat pinnallista tietoa ja heillä on syvällistä. Mikä ero? Kytkennät. Jos 2 asiaa on mahdollisia mutta ei yhtaikaa, asiantuntija tunnistaa tämän mutta maallikko ei. Lukiolaisista saadaan asiantuntijoita opettamalla heille menetelmä kuvata kytkentöjä. Sen jälkeen kaiken tiedon ei tarvitse enää olla 1 ihmisen päässä.
Ihmisten on vaikea hahmottaa, että lukuisia ongelmia voidaan ratkoa kerralla samalla menetelmällä. Sen sijaan yhden ongelman ratkaisuun voidaan motivoida suuria joukkoja, jos aihe on heille tärkeä. Pitäisikö siis löytää se yksi tärkeä asia? Muut sitten alkavat ratketa vahingossa.
Vaikeaa on myös nähdä metatason kysymyksiä eli järjestelmää tai itseä osana isompaa rakennetta, jonka puitteissa ovat myös mahdolliset maailmat ja jonka sisältä ratkaisut löytyvät.
Mielikuvituksen jaloin laji on kuvitella hyviä asioita, jotka voisivat olla mutta eivät ole, sekä niiden ei-olemisen ja olemisen välistä polkua.
Tieteellinen tiede on kuin amerikkalainen unelma: tieteen menetelmin tehdään riittävästi läpimurtoja jotta joka sukupolvelle riittää omat menestystarinansa ja idolinsa, mutta käytännössä tieteen metodi on liian kaukana tutkijan arjesta jotta se todella siihen vaikuttaisi. Niinpä tutkijat elävät illuusion varassa kuten amerikkalaisetkin, ja puurtavat vailla mahdollisuuksia todellisiin tavoitteisiinsa jotka ovat suurempia ja vaikuttavampia kuin mihin tieteen järjestelmä antaa mahdollisuuksia. Tutkijoiden aika ja resurssit menevät 2 asian miettimiseen: mistä saan rahaa ja miten saan ajatuksiani julkaistuksi. Sen sijaan ajatustensa itsensä kehittämiseen on aina liian vähän aikaa. Niinpä ei vain tavoitteet vaan myös kyvyt ovat suuremmat kuin mihin järjestelmä taipuu. Parhaiten pärjäävät toimitusjohtajatyypit, jotka osaavat organisoida rahankeruun, julkaisemisen ja instituutiot oman mielenkiintonsa kohteisiin.
A meta-analysis produces an esimate (with confidence intervals) for the average exposure-response. However, we are interested in the distribution of the individual exposure-response functions. If we know the distribution of the individual distributions, we can always produce the distribution for the average. In addition, if we assume that our sample of individuals is not a random sample from the whole population of interest, we can try to assess the subgroup's bias from the random sample, and in this way produce sub-population-specific exposure-response distributions. In other words, we can mimic a real study using bootstrapping (biased bootstrapping, if we have an estimate how the study is biased) from our individual-level whole-population distribution; in this way, we can test whether it is plausible that the study data actually came from such a whole-population distribution in such a way we think. This is difficult task, but if it works, it offers a method to systematically include any studies in the same examination of individual-level (not average) exposure-response functions.
See also
- Contributing to Opasnet
- Welcome to Opasnet
- Open assessment
- Frequently asked questions about Opasnet
- Modelling in Opasnet
- What is improved by Opasnet and open assessment?
- Open Assessors' Network
- Opasnet structure
- Assessment
- Variable
- Ovariable
- R-tools
- Opasnet base 2
- Contributing to a discussion
- Using Opasnet in an assessment project
- Help:Opasnet policies
- Task list for Opasnet
- Producing result from rationale
- Opasnet
- Opasnet Base Connection for R
- Operating intelligently with multidimensional arrays in R
- Object-oriented programming in Opasnet
Keywords
Modelling, Opasnet Base, scenario
References
Related files
<mfanonymousfilelist></mfanonymousfilelist>