Opasnet base structure: Difference between revisions

From Opasnet
Jump to navigation Jump to search
(→‎All tables: Overview: added comment)
(→‎Table structure: Formula added)
Line 45: Line 45:


===Table structure===
===Table structure===
====Formula structure====
Now it has become clear that it is not enough to have samples of the result distributions. It must be possible to completely recalculate the result based on the information in the [[Opasnet Base]]. There are different approaches:
* Calculate the result based on a formula that may refer to other variables called parents. This is a deterministic approach.
* Calculate the result based on the marginal distribution and (conditional) rank correlations with parent variables. This is a probabilistic approach.
This approach requires new tables, namely:
* Formula
** id (automatic incremental integer)
** Obj_id_v
** Obj_id_r
** When (what is the relationship between upload and formula/When? Is there always a new formula for a new upload? No, because the upload may change even if formula doesn't, if the parent change. Is there always a new upload for a new formula? Yes, because it is necessary to make a new upload.
** Language (of the formula code)
** Code (a large text or memo field for the formula)
: {{comment|# |Do we need tables DIF and DIP like Uninet?|--[[User:Jouni|Jouni]] 21:50, 30 December 2009 (UTC)}}
* DIP
** DIP_node_id
** DIP_parent_node_id
** DIP_corr_coeff
** DIP_parent_index
* DIF
** DIF_node_id
** DIF_formula
** DIF_varnames_in_formula


====Objinfo: new structure====
====Objinfo: new structure====
Line 374: Line 401:
|
|
|}
|}
:{{defend|# |We should add Formula_id to Res.|--[[User:Jouni|Jouni]] 21:50, 30 December 2009 (UTC)}}
|VALIGN="top"|
|VALIGN="top"|
{| WIDTH="250px" {{prettytable}}
{| WIDTH="250px" {{prettytable}}

Revision as of 21:50, 30 December 2009



This page is about the structure of Opasnet Base. For a general description, see Opasnet base.

Scope

Opasnet base is a storage and retrieval system for results of variable and data from studies. What is the structure of Opasnet base such that it enables the following functionalities?

  1. Storage of results of variables with uncertainties when necessary, and as multidimensional arrays when necessary.R↻
  2. Automatic retrieval of results when called from Opasnet wiki or other platforms or modelling systems.
  3. Description and handling of the dimensions that a variable may take.
  4. It is possible to protect some results and data from reading by unauthorised persons.
  5. If is possible to build user interfaces for easily entering observations into the Base.


Definition

Data

Software

Because Opasnet base will contain very large amounts of mostly numerical information, the state-of-the-art structure is a SQL database. Because of its flexibility, ease of use, and cost, MySQL is an optimal choice among SQL software. In addition to the database software, a variable transfer protocol is needed on top of that so that the results of variables can be retrieved and new results stored either automatically by a calculating software, or manually by the user. Fancy presenting software can be built on top of the database, but that is not the topic of this page.

Storage and retrieval of results of variables

The most important functionality is to store and retrieve the results of variables. Because variables may take very different forms (from a single value such as natural constant to an uncertain spatio-temporal concentration field over the whole Europe), the database must be very flexible. The basic solution is described in the variable page, and it is only briefly summarised here. The result is described as

  P(R|x1,x2,...) 

where P(R) is the probability distribution of the result and x1 and x2 are defining locations of a dimension where a particular P(R) applies. Typically locations are operationalised as discrete indices. A variable must have at least one dimension. Uncertainty about the true value of the variable is operationalised as a random sample from the probability distribution, in such a way that the samples are located along an index Sample, which is a list of integers 1,2,3...n, where n=number of samples.


Dependencies

Result

Opasnet base is a MySQL database located at http://base.opasnet.org.

Table structure

Formula structure

Now it has become clear that it is not enough to have samples of the result distributions. It must be possible to completely recalculate the result based on the information in the Opasnet Base. There are different approaches:

  • Calculate the result based on a formula that may refer to other variables called parents. This is a deterministic approach.
  • Calculate the result based on the marginal distribution and (conditional) rank correlations with parent variables. This is a probabilistic approach.


This approach requires new tables, namely:

  • Formula
    • id (automatic incremental integer)
    • Obj_id_v
    • Obj_id_r
    • When (what is the relationship between upload and formula/When? Is there always a new formula for a new upload? No, because the upload may change even if formula doesn't, if the parent change. Is there always a new upload for a new formula? Yes, because it is necessary to make a new upload.
    • Language (of the formula code)
    • Code (a large text or memo field for the formula)
----#: . Do we need tables DIF and DIP like Uninet? --Jouni 21:50, 30 December 2009 (UTC) (type: truth; paradigms: science: comment)
  • DIP
    • DIP_node_id
    • DIP_parent_node_id
    • DIP_corr_coeff
    • DIP_parent_index
  • DIF
    • DIF_node_id
    • DIF_formula
    • DIF_varnames_in_formula


Objinfo: new structure

The structure of Objinfo should be changed. The original plan was that there is at most one row of Objinfo per Object. Now it is clear that this does not have all functionalities we need. Instead, there should be a possibility to add any number of actions per object. Therefore, even the name of the table should be changed to Act. The structure should be changed accordingly:

  • The field id is the primary field for the table. It is NOT the Obj.id any longer.
  • A new field Obj_id should be added. This is the old field id.
  • End field should be removed, it is not used.
  • Url should be changed to Comment, as it may contain also other info.
  • The length of Comment should be 250 characters (at least).
  • Begin should be replaced by When, which is the current timestamp of the row addition.
  • A new field Act_id should be added.
  • A new table Acttype for actions should be added. It would contain only fields id and Act, and the following rows:
    1. Start the object
    2. Finish the assessment
    3. Add a reference
    4. Add an URL
    5. Peer review the object definition: accept based on the discussion
    6. Peer review the object definition: reject based on the discussion
    7. Peer review the object definition: accept (personal opinion)
    8. Peer review the object definition: reject (personal opinion)
    9. Clairvoyant test for the scope: pass
    10. Clairvoyant test for the scope: fail
    11. Save a run of the object.

Merging Res and Resinfo -tables

These tables should be merged. Discussion is here D↷.

All tables: Overview

  • We need Ressec (Result secure) and Resinfosec (Result info secure) tables for secure information. All other tables are openly readable except these two. They have the same structure as Res and Resinfo tables, respectively.
Obj
Describes all objects
FIELD TYPE EXTRA
id int(10) primary
Ident varchar(20) unique
Name varchar(200)
Unit varchar(16)
Objtype_id tinyint(3)
Page int(10)
Wiki_id tinyint(3)

⇤--1: . Unit should have at least 32 characters. --Jouni 19:29, 17 September 2009 (EEST) (type: truth; paradigms: science: attack)

----2: . We can increase it to 64 at once. --Juha Villman 07:52, 18 September 2009 (EEST) (type: truth; paradigms: science: comment)

Cell (previously Res)
Cells of an object
FIELD TYPE EXTRA
id int(12) primary
Obj_id_v (variable id) int(10)
Obj_id_r (run id) int(10)
Mean (mean of the cell) float
N (samplesize) int(10)
Loc
Location information
FIELD TYPE EXTRA
id int(10) primary
Obj_id_i (index id) int(10)
Location varchar(1000)
Roww (row # of index) Mediumint(8)
Description varchar(150)
Item
Items of a set
FIELD TYPE EXTRA
id int(10) primary
Sett_id (set to which the item belongs) int(10)
Obj_id (item id) int(10)
Fail (membership not valid?) tinyint(1)
Loccell (previously Locres)
Locations of a cell
FIELD TYPE EXTRA
id int(10) primary
Cell_id int(10)
Loc_id int(10)
Res (previously Sam)
Result distribution (actual values)
FIELD TYPE EXTRA
id bigint(20) primary
Cell_id int(12)
Obs (previously Sample) int(10)
Result float
Sett
List of sets
FIELD TYPE EXTRA
id int(10) primary
Obj_id int(10)
Settype_id tinyint(3)
Settype (previously Sty)
Types of set-item memberships
FIELD TYPE EXTRA
id tinyint(3)
Settype (previously Stype) varchar(30)
Objtype (previously Typ)
Types of objects
FIELD TYPE EXTRA
id tinyint(3) primary
Objtype (previously Type) varchar(30)
Wiki (previously Wik)
Wiki information
FIELD TYPE EXTRA
id tinyint(3) primary
Url varchar(255)
Wname varchar(20)
Resinfo (previously Descr)
Additional description of the result
FIELD TYPE EXTRA
id bigint(20) primary
Restext (previously Description) varchar(250)
Who varchar(50)
When timestamp
←--#: . We should add Formula_id to Res. --Jouni 21:50, 30 December 2009 (UTC) (type: truth; paradigms: science: defence)
Objinfo (previously Inf)
Additional information about the object
FIELD TYPE EXTRA
id int(10) primary
Begin date
End date
Who varchar(50)
Url varchar(250)

See also

Some useful syntax

<sql-query display=1> SELECT Obj.id, Obj.Ident, Obj.Name, Obj.Typ_id, Sty_id, Itemm.Ident as Iident, Itemm.Name as Iname FROM Obj LEFT JOIN Sett ON Obj.id = Sett.Obj_id LEFT JOIN Item ON Sett.id = Item.Sett_id LEFT JOIN Obj AS Itemm ON Item.Obj_id = Itemm.id </sql-query>


NOTE! The queries below work in the new database "opasnet_base", not "resultdb" as the old versions.

{{#sql-query: SELECT Var.Ident, Var.Name, Var.Unit, Run.Ident, Begin, Who, Run.Name as Method FROM Obj as Var, Obj as Run, Cell, Objinfo WHERE Var.Ident = "Op_en1913" AND Var.id = Cell.Obj_id_v AND Run.id = Cell.Obj_id_r AND Run.id = Objinfo.id GROUP BY Var.id, Run.id |Runs}}

{{#sql-query: SELECT Var.Ident, Var.Name, Cell.id, N, Begin, Mean, Var.Unit FROM Obj as Var, Obj as Run, Cell, Objinfo WHERE Var.Ident = "Op_en1913" AND Var.id = Cell.Obj_id_v AND Run.id = Cell.Obj_id_r AND Run.id = Objinfo.id GROUP BY Cell.id ORDER BY Run.id DESC, Var.Ident |Means and samplesizes (N)}}

{{#sql-query: SELECT Var.Ident, Cell.id, Cell.Obj_id_r as Run, Obs, Result, Var.Unit FROM Obj as Var, Cell, Res WHERE Var.Ident = "Op_en1913" AND Var.id = Cell.Obj_id_v AND Cell.id = Res.Cell_id ORDER BY Cell.Obj_id_r, Var.Ident, Cell.id |Full sample}}


List all dimensions that have indices, and the indices concatenated:

<sql-query display="1"> SELECT Dim.Ident, Dim.Name, Dim.Unit, Group_concat(Ind.Ident ORDER BY Ind.Name SEPARATOR ', ') as Indices FROM Obj AS Dim, Obj as Ind, Sett, Item WHERE Dim.id = Sett.Obj_id AND Sett.Settype_id=1 AND Sett.id = Item.Sett_id AND Item.Obj_id = Ind.id GROUP BY Dim.Name ORDER BY Dim.id </sql-query>


List all indices, and their locations concatenated:

<sql-query display="1"> SELECT Ident, Name, Unit, GROUP_CONCAT(Location ORDER BY Roww SEPARATOR ', ') AS Locations FROM Obj AS Ind, Loc WHERE Ind.id = Loc.Obj_id_i GROUP BY Name ORDER BY Name </sql-query>


List all variables and their runs, and also list all indices (concatenated) used for each variable for each run.

<sql-query display="1"> SELECT Var_id, Run_id, Ident, Name, GROUP_CONCAT(Indic SEPARATOR ', ') AS Indices, N, Method FROM

  (SELECT Var.id as Var_id, Run.id as Run_id, Var.Ident AS Ident, Var.Name as Name, Ind.Ident AS Indic, N, Run.Name AS Method
  FROM Obj AS Var, Obj AS Run, Obj AS Ind, Loccell, Loc, Cell
  WHERE Var.id = Cell.Obj_id_v
  AND Run.id = Cell.Obj_id_r
  AND Cell.id = Loccell.Cell_id
  AND Loc.id = Loccell.Loc_id
  AND Ind.id = Loc.Obj_id_i
  GROUP BY Var_id, Run_id, Ind.Ident ) AS Temp1

GROUP BY Var_id, Run_id </sql-query>