Opasnet base structure: Difference between revisions

From Opasnet
Jump to navigation Jump to search
(marked outdated)
 
(7 intermediate revisions by 2 users not shown)
Line 13: Line 13:
}}
}}


This page is about the '''structure of Opasnet Base'''. For a general description, see [[Opasnet base]].
:''This page is about the '''old structure of Opasnet Base''' used in ca. 2008-2011. For a description about the current database, see [[Opasnet base 2]].


==Scope==
==Question==


[[image:Opasnet Base structure.png|thumb|400px|Structure and connections (lines) of the tables (boxes) in the [[Opasnet Base]]. All table identifiers are called id (so they can be called by like obj.id). When obj.id is referred to in another table such as actobj, it is called actobj.obj_id. The latter end of a one-to-many relationship is marked with a ring. Important substantive fields are listed inside table boxes.]]
[[image:Opasnet Base structure.png|thumb|400px|Structure and connections (lines) of the tables (boxes) in the [[Opasnet Base]]. All table identifiers are called id (so they can be called by like obj.id). When obj.id is referred to in another table such as actobj, it is called actobj.obj_id. The latter end of a one-to-many relationship is marked with a ring. Important substantive fields are listed inside table boxes.]]


'''Opasnet base''' is a storage and retrieval system for [[result]]s of [[variable]] and [[data]] from [[study|studies]]. What is the structure of [[Opasnet base]] such that it enables the following functionalities?
'''Opasnet Base''' is a storage and retrieval system for [[result]]s of [[variable]] and [[data]] from [[study|studies]]. What is the structure of [[Opasnet Base]] such that it enables the following functionalities?
# Storage of results of variables with uncertainties when necessary, and as multidimensional arrays when necessary.{{reslink|Should all variables go to result distribution database?}}
# Storage of results of variables with uncertainties when necessary, and as multidimensional arrays when necessary.{{reslink|Should all variables go to result distribution database?}}
# Automatic retrieval of results when called from [[Opasnet wiki]] or other platforms or modelling systems.
# Automatic retrieval of results when called from [[Opasnet wiki]] or other platforms or modelling systems.
# Description and handling of the [[dimension]]s that a [[variable]] may take.
# Description and handling of the [[index|indices]]s that a [[variable]] may take.
# It is possible to protect some results and data from reading by unauthorised persons.
# It is possible to protect some results and data from reading by unauthorised persons.
# If is possible to build user interfaces for easily entering observations into the Base.
# If is possible to build user interfaces for easily entering observations into the Base.


 
==Answer==
==Definition==
 
===Data===
 
====Software====
 
Because Opasnet base will contain very large amounts of mostly numerical information, the state-of-the-art structure is a [[:en:SQL|SQL]] database. Because of its flexibility, ease of use, and cost, [[:en:MySQL|MySQL]] is an optimal choice among SQL software. In addition to the database software, a [[variable transfer protocol]] is needed on top of that so that the results of variables can be retrieved and  new results stored either automatically by a calculating software, or manually by the user. Fancy presenting software can be built on top of the database, but that is not the topic of this page.
 
====Storage and retrieval of results of variables====
 
The most important functionality is to store and retrieve the results of variables. Because variables may take very different forms (from a single value such as natural constant to an uncertain spatio-temporal concentration field over the whole Europe), the database must be very flexible. The basic solution is described in the [[variable]] page, and it is only briefly summarised here. The result is described as
 
  P(R|x<sub>1</sub>,x<sub>2</sub>,...)
 
where P(R) is the probability distribution of the result and x<sub>1</sub> and x<sub>2</sub> are defining [[location]]s of a [[dimension]] where a particular P(R) applies. Typically locations are operationalised as discrete [[Index|indices]]. A variable must have at least one dimension. [[Uncertainty]] about the true value of the variable is operationalised as a random sample from the probability distribution, in such a way that the samples are located along an index ''Sample'', which is a list of integers 1,2,3...n, where n=number of samples.
 
 
* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base&oldid=7856 Old description of the structure]
 
===Dependencies===
 
* [[Opasnet structure]]
* [[Open assessment]]
 
==Result==


Opasnet base is a [[:en:MySQL|MySQL]] database located at http://base.opasnet.org.
Opasnet base is a [[:en:MySQL|MySQL]] database located at http://base.opasnet.org.


===Data structure===
===Data structure===
:''Main article: '''[[Data structures in Opasnet]]'''''


All data should be convertible into the following format:
All data should be convertible into the following format:


{| {{prettytable}}
{| {{prettytable}}
| colspan="3"| || colspan="2"  style="background-color: #FFD8F0;"|Personal measurements
! colspan="3"| || colspan="3"  style="background-color: #FFD8F0;"|Observation
|-----
|-----
| style="background-color: #CCDFC8;"|Year ||  style="background-color: #CCDFC8;"|Sex ||  style="background-color: #CCDFC8;"|Age ||  style="background-color: #DFB8D0;"|Height ||  style="background-color: #DFB8D0;"|Weight
! style="background-color: #CCDFC8;"|Year ||  style="background-color: #CCDFC8;"|Sex ||  style="background-color: #CCDFC8;"|Age ||  style="background-color: #DFB8D0;"|Height ||  style="background-color: #DFB8D0;"|Weight ||  style="background-color: #DFB8D0;"|Description
|-----
|-----
|  style="background-color: #ECFFE8;"|2009 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|20 ||  style="background-color: #ECDFE8;"|178 ||  style="background-color: #ECDFE8;"|70
|  style="background-color: #ECFFE8;"|2009 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|20 ||  style="background-color: #ECDFE8;"|178 ||  style="background-color: #ECDFE8;"|70 ||  style="background-color: #ECDFE8;"|An optional column for descriptive text about each row.
|-----
|-----
|  style="background-color: #ECFFE8;"|2009 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|30 ||  style="background-color: #ECDFE8;"|174 ||  style="background-color: #ECDFE8;"|79
|  style="background-color: #ECFFE8;"|2009 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|30 ||  style="background-color: #ECDFE8;"|174 ||  style="background-color: #ECDFE8;"|79 ||  style="background-color: #ECDFE8;"|
|-----
|-----
|  style="background-color: #ECFFE8;"|2010 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|25 ||  style="background-color: #ECDFE8;"|183 ||  style="background-color: #ECDFE8;"|84
|  style="background-color: #ECFFE8;"|2010 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|25 ||  style="background-color: #ECDFE8;"|183 ||  style="background-color: #ECDFE8;"|84 ||  style="background-color: #ECDFE8;"|
|-----
|-----
|  style="background-color: #ECFFE8;"|2010 ||  style="background-color: #ECFFE8;"|Female ||  style="background-color: #ECFFE8;"|22 ||  style="background-color: #ECDFE8;"|168 ||  style="background-color: #ECDFE8;"|65
|  style="background-color: #ECFFE8;"|2010 ||  style="background-color: #ECFFE8;"|Female ||  style="background-color: #ECFFE8;"|22 ||  style="background-color: #ECDFE8;"|168 ||  style="background-color: #ECDFE8;"|65 ||  style="background-color: #ECDFE8;"|
|}
|}


Line 76: Line 53:


{| {{prettytable}}
{| {{prettytable}}
|  style="background-color: #CCDFC8;"|Name for explanation column(s).
|  style="background-color: #CCDFC8;"|Names of explanation columns, also known as indices.
|-----
|-----
|  style="background-color: #ECFFE8;"|Explanation data. These are determined or decided before the the actual observations are done.
|  style="background-color: #ECFFE8;"|Explanation data, also known as locations. You can use these columns as search criteria.
|-----
|-----
|  style="background-color: #FFD8F0;"|Observation index. Common name for all observations
|  style="background-color: #FFD8F0;"|Observation index, typically called "Observation". Common name for all observation columns
|-----
|-----
|  style="background-color: #DFB8D0;"|Name for observation column(s). These are the parameters studied.
|  style="background-color: #DFB8D0;"|Names of observation columns. These are the parameters of interest.
|-----
|-----
|  style="background-color: #ECDFE8;"|Observation data. These are the actual measurements.
|  style="background-color: #ECDFE8;"|Observation data. These are the actual measurements.
|-----
|-----
|}
|}
This is the "Standard data" that is entered as a Data table. The observation index is given separately in Object info and does not yet show up in the table.
{| {{prettytable}}
|  style="background-color: #CCDFC8;"|Year ||  style="background-color: #CCDFC8;"|Sex ||  style="background-color: #CCDFC8;"|Age ||  style="background-color: #DFB8D0;"|Height ||  style="background-color: #DFB8D0;"|Weight
|-----
|  style="background-color: #ECFFE8;"|2009 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|20 ||  style="background-color: #ECDFE8;"|178 ||  style="background-color: #ECDFE8;"|70
|-----
|  style="background-color: #ECFFE8;"|2009 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|30 ||  style="background-color: #ECDFE8;"|174 ||  style="background-color: #ECDFE8;"|79
|-----
|  style="background-color: #ECFFE8;"|2010 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|25 ||  style="background-color: #ECDFE8;"|183 ||  style="background-color: #ECDFE8;"|84
|-----
|  style="background-color: #ECFFE8;"|2010 ||  style="background-color: #ECFFE8;"|Female ||  style="background-color: #ECFFE8;"|22 ||  style="background-color: #ECDFE8;"|168 ||  style="background-color: #ECDFE8;"|65
|}
This is Object information. It slightly varies depending the format you use for uploading data.
{| {{prettytable}}
|+ '''Info_table
|-----
| ident || Op_en2693
|-----
| name || Testvariable
|-----
| unit || #
|-----
| # explanation cols || 3
|-----
| observation index || health impact
|-----
| probabilistic? || No
|}
This is the indexified table where all observations have been put into a single column. The next step is to replace all explanatory data text (columns 1-4) with identifiers (from the Loc table in the Opasnet Base).
{| {{prettytable}}
|  style="background-color: #CCDFC8;"|Year ||  style="background-color: #CCDFC8;"|Sex ||  style="background-color: #CCDFC8;"|Age ||  style="background-color: #FFD8F0;"|Personal measurements || result
|-----
|  style="background-color: #ECFFE8;"|2009 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|20 ||  style="background-color: #DFB8D0;"|Height ||  style="background-color: #ECDFE8;"|178
|-----
|  style="background-color: #ECFFE8;"|2009 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|30 ||  style="background-color: #DFB8D0;"|Height ||  style="background-color: #ECDFE8;"|174
|-----
|  style="background-color: #ECFFE8;"|2010 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|25 ||  style="background-color: #DFB8D0;"|Height ||  style="background-color: #ECDFE8;"|183
|-----
|  style="background-color: #ECFFE8;"|2010 ||  style="background-color: #ECFFE8;"|Female ||  style="background-color: #ECFFE8;"|22 ||  style="background-color: #DFB8D0;"|Height ||  style="background-color: #ECDFE8;"|168
|-----
|  style="background-color: #ECFFE8;"|2009 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|20 ||  style="background-color: #DFB8D0;"|Weight ||  style="background-color: #ECDFE8;"|70
|-----
|  style="background-color: #ECFFE8;"|2009 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|30 ||  style="background-color: #DFB8D0;"|Weight ||  style="background-color: #ECDFE8;"|79
|-----
|  style="background-color: #ECFFE8;"|2010 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|25 ||  style="background-color: #DFB8D0;"|Weight ||  style="background-color: #ECDFE8;"|84
|-----
|  style="background-color: #ECFFE8;"|2010 ||  style="background-color: #ECFFE8;"|Female ||  style="background-color: #ECFFE8;"|22 ||  style="background-color: #DFB8D0;"|Weight ||  style="background-color: #ECDFE8;"|65
|}
(The tables above have been created with [[:File:Opasnet base explanation.ods]].)


===Table structure in the database===
===Table structure in the database===


====All tables====
==== All tables ====


'''NOTE! Update this table when table structure is changed!'''
{| VALIGN="top" BORDER="0"
{| VALIGN="top" BORDER="0"
|-
|-
|VALIGN="top"|
|VALIGN="top"|
{| {{prettytable}}
{| {{prettytable}}
|colspan=4|'''act'''
|colspan=5|'''act'''
|-
|-
|Field||Type||Null||Extra
|colspan=5|'''Uploads, updates, and other actions'''
|-
|-
|id||int(10) unsigned|| ||auto_increment
|Field||Type||Null||Extra||Key
|-
|-
|series_id||int(10) unsigned||
|id||int(10) unsigned||NO||auto_increment||PRI
|-
|-
|acttype_id||tinyint(3) unsigned||
|acttype_id||tinyint(3) unsigned||NO||||MUL
|-
|-
|who||varchar(50)||
|who||varchar(50)||NO||||
|-
|-
|comments||varchar(250)||Yes||
|comments||varchar(250)||YES||||
|-
|-
|time||timestamp||
|time||timestamp||NO||||
|-
|-
|temp_id||int(10) unsigned||
|temp_id||int(10) unsigned||NO||||MUL
|}
|}
|VALIGN="top"|
|VALIGN="top"|
{| {{prettytable}}
{| {{prettytable}}
|colspan=4|'''actobj'''
|colspan=5|'''actloc'''
|-
|Field||Type||Null||Extra
|-
|id||int(10) unsigned|| ||auto_increment
|-
|-
|act_id||int(10) unsigned||
|colspan=5|'''Locations of an act'''
|-
|-
|obj_id||int(10) unsigned||
|Field||Type||Null||Extra||Key
|-
|-
|series_id||int(10) unsigned||
|actobj_id||int(10) unsigned||NO||||PRI
|-
|-
|unit||varchar(64)||
|loc_id||int(10) unsigned||NO||||PRI
|}
|}
|VALIGN="top"|
|VALIGN="top"|
{| {{prettytable}}
{| {{prettytable}}
|colspan=4|'''acttype'''
|colspan=5|'''actobj'''
|-
|-
|Field||Type||Null||Extra
|colspan=5|'''Acts of an object'''
|-
|-
|id||int(10) unsigned|| ||auto_increment
|Field||Type||Null||Extra||Key
|-
|-
|acttype||varchar(250)||
|id||int(10) unsigned||NO||auto_increment||PRI
|}
|-
|-
|
|act_id||int(10) unsigned||NO||||MUL
{| {{prettytable}}
|colspan=4|'''cell'''
|-
|-
|Field||Type||Null||Extra
|obj_id||int(10) unsigned||NO||||MUL
|-
|-
|id||int(12) unsigned|| ||auto_increment
|series_id||int(10) unsigned||NO||||MUL
|-
|-
|obj_id_v||int(10) unsigned||
|unit||varchar(64)||YES||||
|}
 
|-
|-
|obj_id_r||int(10) unsigned||
|
|-
|actobj_id||int(10) unsigned||
|-
|mean||float||Yes||
|-
|sd||float||
|-
|n||int(10)||
|}
|VALIGN="top"|
{| {{prettytable}}
{| {{prettytable}}
|colspan=4|'''formula'''
|colspan=5|'''acttype'''
|-
|Field||Type||Null||Extra
|-
|id||int(10) unsigned|| ||auto_increment
|-
|obj_id_v||int(10) unsigned||
|-
|-
|act_id||int(10) unsigned||
|colspan=5|'''List of action types'''
|-
|-
|actobj_id||int(10) unsigned||
|Field||Type||Null||Extra||Key
|-
|-
|language||smallint(5) unsigned||
|id||int(10) unsigned||NO||auto_increment||PRI
|-
|-
|code||longtext||Yes||
|acttype||varchar(250)||NO||||UNI
|}
|}
|VALIGN="top"|
|VALIGN="top"|
{| {{prettytable}}
{| {{prettytable}}
|colspan=4|'''item'''
|colspan=5|'''cell'''
|-
|Field||Type||Null||Extra
|-
|-
|id||int(10) unsigned|| ||auto_increment
|colspan=5|'''Cells of an object'''
|-
|-
|sett_id||int(10) unsigned||
|Field||Type||Null||Extra||Key
|-
|-
|obj_id||int(10) unsigned||
|id||int(12) unsigned||NO||auto_increment||PRI
|-
|-
|fail||tinyint(1) unsigned||
|actobj_id||int(10) unsigned||NO||||MUL
|}
|-
|-
|VALIGN="top"|
|mean||float||YES||||
{| {{prettytable}}
|colspan=4|'''language'''
|-
|-
|Field||Type||Null||Extra
|sd||float||NO||||
|-
|-
|id||tinyint(3) unsigned|| ||auto_increment
|n||int(10)||NO||||
|-
|-
|language||varchar(250)||
|sip||varchar(2000)||YES||||
|}
|}
|VALIGN="top"|
|VALIGN="top"|
{| {{prettytable}}
{| {{prettytable}}
|colspan=4|'''loc'''
|colspan=5|'''loc'''
|-
|-
|Field||Type||Null||Extra
|colspan=5|'''Location information'''
|-
|-
|id||int(10) unsigned|| ||auto_increment
|Field||Type||Null||Extra||Key
|-
|-
|std_id||int(10) unsigned||
|id||int(10) unsigned||NO||auto_increment||PRI
|-
|-
|obj_id_i||int(10) unsigned||
|std_id||int(10) unsigned||NO||||MUL
|-
|-
|location||varchar(100)||
|obj_id_i||int(10) unsigned||NO||||MUL
|-
|-
|roww||mediumint(8) unsigned||
|location||varchar(100)||NO||||
|-
|-
|description||varchar(150)||
|roww||mediumint(8) unsigned||NO||||
|}
|VALIGN="top"|
{| {{prettytable}}
|colspan=4|'''loccell'''
|-
|Field||Type||Null||Extra
|-
|id||int(10) unsigned|| ||auto_increment
|-
|cell_id||int(10) unsigned||
|-
|-
|loc_id||int(10) unsigned||
|description||varchar(150)||NO||||
|}
|}
|-
|-
|VALIGN="top"|
|
{| {{prettytable}}
{| {{prettytable}}
|colspan=4|'''obj'''
|colspan=5|'''loccell'''
|-
|-
|Field||Type||Null||Extra
|colspan=5|'''Locations of a cell'''
|-
|-
|id||int(10) unsigned|| ||auto_increment
|Field||Type||Null||Extra||Key
|-
|-
|ident||varchar(20)||
|cell_id||int(10) unsigned||NO||||PRI
|-
|-
|name||varchar(200)||
|loc_id||int(10) unsigned||NO||||PRI
|-
|objtype_id||tinyint(3) unsigned||
|-
|page||int(10) unsigned||
|-
|wiki_id||tinyint(3) unsigned||
|-
|newest||int(10) unsigned||
|}
|}
|VALIGN="top"|
|VALIGN="top"|
{| {{prettytable}}
{| {{prettytable}}
|colspan=4|'''objinfo'''
|colspan=5|'''obj'''
|-
|Field||Type||Null||Extra
|-
|id||int(10) unsigned|| ||auto_increment
|-
|-
|obj_id||int(10) unsigned||
|colspan=5|'''Object information (all objects)'''
|-
|-
|acttype_id||tinyint(3) unsigned||
|Field||Type||Null||Extra||Key
|-
|-
|who||varchar(50)||
|id||int(10) unsigned||NO||auto_increment||PRI
|-
|-
|comments||varchar(250)||
|ident||varchar(20)||NO||||UNI
|-
|-
|time||timestamp||
|name||varchar(200)||NO||||
|}
|VALIGN="top"|
{| {{prettytable}}
|colspan=4|'''objtype'''
|-
|-
|Field||Type||Null||Extra
|objtype_id||tinyint(3) unsigned||NO||||MUL
|-
|-
|id||tinyint(3)||
|page||int(10) unsigned||NO||||
|-
|-
|objtype||varchar(30)||
|wiki_id||tinyint(3) unsigned||NO||||
|}
|}
|-
 
|VALIGN="top"|
|VALIGN="top"|
{| {{prettytable}}
{| {{prettytable}}
|colspan=4|'''res'''
|colspan=5|'''objtype'''
|-
|Field||Type||Null||Extra
|-
|id||bigint(20) unsigned|| ||auto_increment
|-
|-
|cell_id||int(20) unsigned||
|colspan=5|'''Types of objects'''
|-
|-
|obs||int(10) unsigned||
|Field||Type||Null||Extra||Key
|-
|-
|result||float||
|id||tinyint(3)||NO||||PRI
|-
|-
|restext||varchar(250)||Yes||
|objtype||varchar(30)||NO||||
|}
|}
|VALIGN="top"|
 
{| {{prettytable}}
|colspan=4|'''resinfo'''
|-
|-
|Field||Type||Null||Extra
|
|-
|id||bigint(20) unsigned|| ||auto_increment
|-
|restext||varchar(250)||
|-
|who||varchar(50)||
|-
|time||timestamp||
|}
|VALIGN="top"|
{| {{prettytable}}
{| {{prettytable}}
|colspan=4|'''resinfosec'''
|colspan=5|'''res'''
|-
|Field||Type||Null||Extra
|-
|-
|id||bigint(20) unsigned|| ||auto_increment
|colspan=5|'''Result distribution (actual values)'''
|-
|-
|restext||varchar(250)||
|Field||Type||Null||Extra||Key
|-
|-
|who||varchar(50)||
|id||bigint(20) unsigned||NO||auto_increment||PRI
|-
|-
|time||timestamp||
|cell_id||int(12) unsigned||NO||||MUL
|}
|-
|-
|VALIGN="top"|
|obs||int(10) unsigned||NO||||
{| {{prettytable}}
|colspan=4|'''ressec'''
|-
|-
|Field||Type||Null||Extra
|result||float||NO||||
|-
|-
|id||bigint(20) unsigned|| ||auto_increment
|restext||varchar(250)||YES||||
|-
|-
|cell_id||int(20) unsigned||
|implausible||binary(1)||YES||||
|-
|obs||int(10) unsigned||
|-
|result||float||
|-
|restext||varchar(250)||Yes||
|}
|}
|VALIGN="top"|
|VALIGN="top"|
{| {{prettytable}}
{| {{prettytable}}
|colspan=4|'''sett'''
|colspan=5|'''wiki'''
|-
|Field||Type||Null||Extra
|-
|id||int(10) unsigned|| ||auto_increment
|-
|-
|obj_id||int(10) unsigned||
|colspan=5|'''Wiki information'''
|-
|-
|settype_id||tinyint(3) unsigned||
|Field||Type||Null||Extra||Key
|}
|VALIGN="top"|
{| {{prettytable}}
|colspan=4|'''settype'''
|-
|-
|Field||Type||Null||Extra
|id||tinyint(3)||NO||||PRI
|-
|-
|id||tinyint(3) unsigned|| ||auto_increment
|url||varchar(255)||NO||||
|-
|-
|settype||varchar(30)||
|wname||varchar(20)||NO||||
|}
|-
|VALIGN="top"|
{| {{prettytable}}
|colspan=4|'''wiki'''
|-
|Field||Type||Null||Extra
|-
|id||tinyint(3)||
|-
|url||varchar(255)||
|-
|wname||varchar(20)||
|}
|}
|}
|}


====Replacing some cells====
====Contents of selected tables====
 
It is possible that there is a large data, where there is a need to update only a few cells while all others remain the same. How should this be done?
# Use the current replace functionality. Replace all cells but most of them with the original value.
# Use a new act_type that is similar to the current append functionality. This should be understood in a way that if there are two (or more) identical cells (based on cell indices and locations), then the newest result is used and all older ones are discarded. (If the old ''append'' is used, then new info is just seen as a new row in the data table, not a replacement of an existing row.
# Add a new field into the cell (?) table with an updated cell_id (in a similar way than act_id and series_id are used in the actobj table). This way, the new cell can automatically inherit all locations of the old cell.
 
====Formula structure====
 
Now it has become clear that it is not enough to have samples of the result distributions. It must be possible to completely recalculate the result based on the information in the [[Opasnet Base]]. There are different approaches:
* Calculate the result based on a formula that may refer to other variables called parents. This is a deterministic approach.
* Calculate the result based on the marginal distribution and (conditional) rank correlations with parent variables. This is a probabilistic approach.
 
This approach requires new tables, namely Formula and Language.
 
 
: {{comment|11|Do we need tables DIF and DIP like Uninet?|--[[User:Jouni|Jouni]] 21:50, 30 December 2009 (UTC)}}
* DIP
** DIP_node_id
** DIP_parent_node_id
** DIP_corr_coeff
** DIP_parent_index
* DIF
** DIF_node_id
** DIF_formula
** DIF_varnames_in_formula
 
====All tables: Overview====
 
* We need '''Ressec''' (Result secure) and '''Resinfosec''' (Result info secure) tables for secure information. All other tables are openly readable except these two. They have the same structure as Res and Resinfo tables, respectively.
 


{|
|
{| {{prettytable}}
{| {{prettytable}}
|+'''Tables_in_opasnet_base
|+ Table objtype
!Table
! id!! objtype
!Description
|----
|Acttype
|List of action types
|----
|Cell
|Cells of an object
|----
|Formula
|Formulas for computing variable results
|----
|Item
|
|----
|Language
|List of languages understood by the formula
|----
|Loc
|Location information
|----
|Loccell
|Locations of a cell
|----
|Log
|
|----
|Obj
|Object information (all objects)
|----
|----
|Objinfo
|| 1|| Variable
|Additional information about the objects
|----
|----
|Objtype
|| 2|| Study
|Types of objects
|----
|----
|Res
|| 3|| Method
|Result distribution (actual values)
|----
|----
|Resinfo
|| 4|| Assessment
|Additional description of the result
|----
|----
|Resinfosec
|| 5|| Class
|Additional description of the result
|----
|----
|Ressec
|| 6|| Index
|Result distribution (actual values)
|----
|----
|Sett
|| 7|| Nugget
|Memberships of items in sets
|----
|----
|Settype
|| 8|| Encyclopedia article
|Types of set-item memeberships
|----
|----
|Wiki
|| 9|| Run
|Wiki information
|----
|----
|}
|}


:{{attack|3 |Res and Resinfo should be merged. Similarly, Ressec and Resinfosec should be merged.|--[[User:Jouni|Jouni]] 09:48, 31 December 2009 (UTC)}}{{reslink|Res and Resinfo -tables should be merged}}
|
 
:{{attack|4 |We must divide Obj into Obj and Act. Act table (previously Objinfo) would contain uploads (previously Runs) and formula updates.|--[[User:Jouni|Jouni]] 09:48, 31 December 2009 (UTC)}}
::{{defend|12|Objinfo should be renamed Act.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
::{{defend|13|Uploads (previously "runs") should no longer be called objects. They belong to table Act. As a side effect, one upload refers to exactly one object (i.e., when uploading a full model, all object will have different upload numbers).|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
 
====Main tables====


{| {{prettytable}}
{| {{prettytable}}
|+'''Tables with primary content
|+ Table acttype
!Table
! id|| acttype
!Field
!Type
!Null
!Key
!Default
!Extra
|----
|----
!Cell
|| 1|| Start object
|id
|int(12) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|----
|Cell
|| 2|| Finish assessment
|Obj_id_v
|int(10) unsigned
|NO
|MUL
|NULL
|
|----
|----
|Cell
|| 3|| Update formula
|Obj_id_r
|int(10) unsigned
|NO
|
|NULL
|
|----
|----
|Cell
|| 4|| Upload data (replace)
|Mean
|float
|YES
|
|NULL
|
|----
|----
|Cell
|| 5|| Upload data (append)
|SD
|float
|NO
|
|NULL
|
|----
|----
|Cell
|| 6|| Review scope
|N
|int(10)
|NO
|
|NULL
|
|----
|----
!Formula
|| 7|| Review definition
|id
|int(10) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|----
|Formula
|| 8|| Add object info
|Obj_id_v
|int(10) unsigned
|NO
|MUL
|NULL
|
|----
|----
|Formula
|}
|Act_id
|int(10) unsigned
|NO
|MUL
|NULL
|
|----
|Formula
|Language
|smallint(5) unsigned
|NO
|
|NULL
|
|----
|Formula
|Code
|longtext
|YES
|
|NULL
|
|----
!Loc
|id
|int(10) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Loc
|Std_id
|int(10) unsigned
|NO
|MUL
|NULL
|
|----
|Loc
|Obj_id_i
|int(10) unsigned
|NO
|MUL
|NULL
|
|----
|Loc
|Location
|varchar(100)
|NO
|
|NULL
|
|----
|Loc
|Roww
|mediumint(8) unsigned
|NO
|
|NULL
|
|----
|Loc
|Description
|varchar(150)
|NO
|
|NULL
|
|----
!Loccell
|id
|int(10) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Loccell
|Cell_id
|int(10) unsigned
|NO
|MUL
|NULL
|
|----
|Loccell
|Loc_id
|int(10) unsigned
|NO
|
|NULL
|
|----
!Obj
|id
|int(10) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Obj
|Ident
|varchar(20)
|NO
|UNI
|NULL
|
|----
|Obj
|Name
|varchar(200)
|NO
|
|NULL
|
|----
|Obj
|Unit
|varchar(64)
|NO
|
|NULL
|
|----
|Obj
|Objtype_id
|tinyint(3) unsigned
|NO
|MUL
|NULL
|
|----
|Obj
|Page
|int(10) unsigned
|NO
|
|NULL
|
|----
|Obj
|Wiki_id
|tinyint(3) unsigned
|NO
|
|NULL
|
|----
|Obj
|Newest
|int(10) unsigned
|NO
|MUL
|NULL
|
|----
!Objinfo
|id
|int(10) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Objinfo
|Obj_id
|int(10) unsigned
|NO
|MUL
|NULL
|
|----
|Objinfo
|Acttype_id
|tinyint(3) unsigned
|NO
|
|NULL
|
|----
|Objinfo
|Who
|varchar(50)
|NO
|
|NULL
|
|----
|Objinfo
|Comments
|varchar(250)
|NO
|
|NULL
|
|----
|Objinfo
|Moment
|timestamp
|NO
|
|CURRENT_TIMESTAMP
|
|----
!Res
|id
|bigint(20) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Res
|Cell_id
|int(12) unsigned
|NO
|MUL
|NULL
|
|----
|Res
|Obs
|int(10) unsigned
|NO
|
|NULL
|
|----
|Res
|Result
|float
|NO
|
|NULL
|
|----
!Resinfo
|id
|bigint(20) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Resinfo
|Restext
|varchar(250)
|NO
|
|NULL
|
|----
|Resinfo
|Who
|varchar(50)
|NO
|
|NULL
|
|----
|Resinfo
|When
|timestamp
|NO
|
|CURRENT_TIMESTAMP
|
|----
!Resinfosec
|colspan="6"|Resinfosec must have the identical structure to Resinfo.
|----
!Ressec
|colspan="6"|Ressec must have the identical structure to Res.
|}
|}


:{{attack|5 |We should add Res.Formula_id.|--[[User:Jouni|Jouni]] 21:50, 30 December 2009 (UTC)}}
==Rationale==


:{{attack|9 |Cell.Obj_id_r should be renamed Cell.Act_id.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
===Data===


:{{comment|14 |Should we move Obj.Unit to Cell? This may be useful, because different columns in an object (especially a study) may have different units. On the other hand, it will make things more complicated. This does not work unless we develop a good interface for entering units to different columns.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
====Software====


:{{attack|15 |Do we really need Obj.Newest? It is just redundant, although it may save computing time at some point.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
Because Opasnet base will contain very large amounts of mostly numerical information, the state-of-the-art structure is a [[:en:SQL|SQL]] database. Because of its flexibility, ease of use, and cost, [[:en:MySQL|MySQL]] is an optimal choice among SQL software. In addition to the database software, a [[variable transfer protocol]] is needed on top of that so that the results of variables can be retrieved and  new results stored either automatically by a calculating software, or manually by the user. Fancy presenting software can be built on top of the database, but that is not the topic of this page.
::{{defend| |Yes we need it.|--[[User:Juha Villman|Juha Villman]] 11:12, 4 January 2010 (UTC)}}


:{{attack|16 |I think that Resinfo.Who and Resinfo.When should rather be merged with Act. Then, when a user uploads new data (even a single line with a wiki form), the upload would always have a new act_id with a timestamp.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
====Storage and retrieval of results of variables====


:{{attack|17 |Resinfo.Restext should rather be a text field, not varchar with a fixed length. Is there a difference?|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
The most important functionality is to store and retrieve the results of variables. Because variables may take very different forms (from a single value such as natural constant to an uncertain spatio-temporal concentration field over the whole Europe), the database must be very flexible. The basic solution is described in the [[variable]] page, and it is only briefly summarised here. The result is described as
::{{defend|18 |Varchar is almost always faster. We should stick to it|--[[User:Juha Villman|Juha Villman]] 11:12, 4 January 2010 (UTC)}}


:{{attack|18|The Log table must be read protected. We must give the users an opportunity to use the data without other users being able to know about that. In contrast, uploading data is something that should be open information.|--[[User:Jouni|Jouni]] 06:27, 5 January 2010 (UTC)}}
  P(R|x<sub>1</sub>,x<sub>2</sub>,...)  


:{{attack|19 |Loc.Roww should be removed, because it is based on old thinking that did not work out.|--[[User:Jouni|Jouni]] 21:00, 5 January 2010 (UTC)}}
where P(R) is the probability distribution of the result and x<sub>1</sub> and x<sub>2</sub> are defining [[location]]s of an [[index]] where a particular P(R) applies. Typically locations are operationalised as discrete [[Index|indices]]. A variable must have at least one [[index]]. [[Uncertainty]] about the true value of the variable is operationalised as a random sample from the probability distribution, in such a way that the samples are located along an index ''Sample'', which is a list of integers 1,2,3...n, where n=number of samples.


:{{attack|20 |A problem: The data of one object can be uploaded at several times, e.g. if the object is the result of an online survey. Therefore, the act.id is not an unique identifier for a full data. This can be clarified by adding two different act types: upload data (replace) and upload data (append). The previous starts a new dataset that replaces the previous one, while the latter simply adds new data points to the existing data. Two problems still remain:
:# The nomenclature should be clarified: what is upload, act, and what is the collection of data?
:# There should be a unambiguous (and hopefully simple) way of performing queries so that all appends of the same data are included. |--[[User:Jouni|Jouni]] 10:34, 6 January 2010 (UTC)}}
::{{comment|21 |Solution: A new field act.bunch_id is the id of the latest act.id of upload data (replace) for that object. This can be updated automatically. All uploads are used where obj.newest eq act.bunch_id and act.id eq cell.act_id.|--[[User:Jouni|Jouni]] 10:34, 6 January 2010 (UTC)}}


We need a query for updating act.series_id based on information from act, actobj and obj tables. This is the first attempt, but does not work yet:
* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base&oldid=7856 Old description of the structure]


SELECT *
===Dependencies===
FROM actobj AS appobj
 
LEFT JOIN act AS appends ON appobj.act_id = appends.id
* [[Opasnet structure]]
LEFT JOIN actobj AS repobj ON repobj.obj_id = appobj.obj_id
* [[Open assessment]]
LEFT JOIN act AS replaces ON repobj.act_id = replaces.id
WHERE appends.acttype_id = 5
AND replaces.acttype_id = 4;


====Tables with additional information====
====Replacing some cells====


{| {{prettytable}}
It is possible that there is a large data, where there is a need to update only a few cells while all others remain the same. How should this be done? There are a few potential alternatives.
|+'''Tables with additional information
# Use the current replace functionality. Replace all cells but most of them with the original value.
!Table
# Use a new act_type that is similar to the current append functionality. This should be understood in a way that if there are two (or more) identical cells (based on cell indices and locations), then the newest result is used and all older ones are discarded. (If the old ''append'' is used, then new info is just seen as a new row in the data table, not a replacement of an existing row.
!Field
# Add a new field into the cell (?) table with an updated cell_id (in a similar way than act_id and series_id are used in the actobj table). This way, the new cell can automatically inherit all locations of the old cell.
!Type
!Null
!Key
!Default
!Extra
|----
!Item
|id
|int(10) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Item
|Sett_id
|int(10) unsigned
|NO
|MUL
|NULL
|
|----
|Item
|Obj_id
|int(10) unsigned
|NO
|
|NULL
|
|----
|Item
|Fail
|tinyint(1) unsigned
|NO
|MUL
|NULL
|
|----
!Log
|id
|int(10)
|NO
|PRI
|NULL
|auto_increment
|----
|Log
|ip
|varchar(15)
|NO
|
|NULL
|
|----
|Log
|wiki_uid
|varchar(30)
|YES
|
|NULL
|
|----
|Log
|wiki_page
|varchar(30)
|YES
|
|NULL
|
|----
|Log
|time
|timestamp
|NO
|
|0000-00-00 00:00:00
|
|----
|Log
|query
|varchar(256)
|NO
|
|NULL
|
|----
!Sett
|id
|int(10) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Sett
|Obj_id
|int(10) unsigned
|NO
|MUL
|NULL
|
|----
|Sett
|Settype_id
|tinyint(3) unsigned
|NO
|
|NULL
|
|----
|}


===Formula structure===


====Tables with lists====
Now it has become clear that it is not enough to have samples of the result distributions. It must be possible to completely recalculate the result based on the information in the [[Opasnet Base]]. There are different approaches:
* Calculate the result based on a formula that may refer to other variables called parents. This is a deterministic approach.
* Calculate the result based on the marginal distribution and (conditional) rank correlations with parent variables. This is a probabilistic approach.


{| {{prettytable}}
This approach requires new tables, namely Formula and Language.
|+'''Tables with lists
!Table
!Field
!Type
!Null
!Key
!Default
!Extra
|----
!Acttype
|id
|int(10) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Acttype
|Acttype
|varchar(250)
|NO
|UNI
|NULL
|
|----
!Language
|id
|tinyint(3) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Language
|Language
|varchar(250)
|NO
|UNI
|NULL
|
|----
!Objtype
|id
|tinyint(3)
|NO
|PRI
|NULL
|
|----
|Objtype
|Objtype
|varchar(30)
|NO
|
|NULL
|
|----
!Settype
|id
|tinyint(3) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Settype
|Settype
|varchar(30)
|NO
|
|NULL
|
|----
!Wiki
|id
|tinyint(3)
|NO
|PRI
|NULL
|
|----
|Wiki
|Url
|varchar(255)
|NO
|
|NULL
|
|----
|Wiki
|Wname
|varchar(20)
|NO
|
|NULL
|
|----
|}


:{{attack|10 |Objtype.id and Wiki.id should be auto increments.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}


Show contents of
: {{comment|11|Do we need tables DIF and DIP like Uninet?|--[[User:Jouni|Jouni]] 21:50, 30 December 2009 (UTC)}}
{{#sql-query:
* DIP
SELECT * FROM objtype
** DIP_node_id
|objtype}} |  
** DIP_parent_node_id
{{#sql-query:
** DIP_corr_coeff
SELECT * FROM acttype
** DIP_parent_index
|acttype}} |
* DIF
{{#sql-query:
** DIF_node_id
SELECT * FROM wiki
** DIF_formula
|wiki}} |
** DIF_varnames_in_formula
{{#sql-query:
SELECT * FROM language
|language}} |
{{#sql-query:
SELECT * FROM settype
|settype}}


===Universal Opasnet Base===
===Universal Opasnet Base===


The idea of universal Opasnet Base says that it should be possible to store results in such a way that the results themselves are public but their interpretation is limited. For example, patient symptoms and clinical test results should be openly available for research, but information about whose results they are should private. This can be achieved with the following database structure.
The idea of universal Opasnet Base says that it should be possible to store results in such a way that the results themselves are public but their interpretation is limited. For example, patient symptoms and clinical test results should be openly available for research, but information about whose results they are should be private. This can be achieved with the following database structure.


[[File:Universal Opasnet Base structure.png|thumb|500px|Universal Opasnet Base has some parts that exist in different versions depending on the privacy level. The yellow areas are e.g. a public area and a private area. The parts that are white are public.]]
[[File:Universal Opasnet Base structure.png|thumb|400px|Universal Opasnet Base has some parts that exist in different versions depending on the privacy level. The yellow areas are e.g. a public area and a private area. The parts that are white are public.]]


Let's say that it is enough to have two security levels, public and private. A person wants to record personal health information into the database. She logs in with her personal user name. The private profile gives the name (say, Liisa) and social security number of the person, while the public profile says only "40-50-year-old woman in Finland". Liisa writes down her symptoms and saves them. This is what is stored in the databases:
Let's say that it is enough to have two security levels, public and private. A person wants to record personal health information into the database. She logs in with her personal user name. The private profile gives the name (say, Liisa) and social security number of the person, while the public profile says only "30-40-year-old woman in Finland". Liisa writes down her symptoms or medical information and saves them. This is what is stored in the databases:


{| {{prettytable}}
{| {{prettytable}}
Line 1,182: Line 385:
|----
|----
| act.who
| act.who
| Liisa, 010160-1024
| Liisa, 010175-1024
| Woman, 40-50 a
| Woman, 30-40 a
|----
|----
| act.when
| act.when
Line 1,190: Line 393:
|----
|----
| obj.name
| obj.name
| Personal reporting of health symptoms
| N/A. Data is taken from public side.
| Personal reporting of health symptoms
| Pregnancy test
|----
|----
| loccell.loc_id (locations and indices explained)
| loccell.loc_id (locations and indices explained)
| Person = 010160-1024 <br>Time = 2011-03-09 <br>Severity = Moderate <br>ICD-10 = Headache
| Person = 010175-1024 <br>Time = 2011-03-09 <br>Test = Clearblue digital test
| Age = 40-50 <br>Sex = Female <br>Country = Finland <br>Time = 2011-03 <br>Severity = Moderate <br>ICD-10 = Headache
| Age = 30-40 <br>Sex = Female <br>Country = Finland <br>Time = 2011-03 <br>Test = Clearblue digital test
|----
|----
| res.restext
| res.restext
| Nothing. res table does not exist in the private part.
| N/A. Data is taken from public side.
| I had headache all morning, but it went away after I took ibuprofen.
| Pregnant 1-2 weeks.
|----
|----
|}
|}


Based on the information, anyone can see that there is a woman in Finland who has used a Clearblue pregnancy test and the result was positive. But there is no way an outsider could connect this information to any particular person, because all information that could be used for linking is located in the private website. However, an authorised person from health case could see the data in the private database and connect Liisa and the test result.


===Tasks performed===
==See also==
 
:{{attack_invalid|6 |All timestamps (Resinfo.When, Objinfo.Moment) should be renamed to .Time.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
::{{comment||DONE|--[[User:Juha Villman|Juha Villman]] 11:23, 5 January 2010 (UTC)}}


:{{attack_invalid|7 |All column names should start with either a Capital or small letter.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base_structure&oldid=18900#All_tables:_Overview A previous discussion about the structure]
::{{comment| |DONE. All table and column names start with small letter.|--[[User:Juha Villman|Juha Villman]] 08:58, 5 January 2010 (UTC)}}
* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base_structure&oldid=18900#Main_tables A previous structure and related discussions]
 
* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base_structure&oldid=18900#Tasks_performed Previous tasks performed]
I restructured the Base by adding a few tables (act, actobj, language)  and columns (act.temp_id, cell.actobj_id, formula.actobj_id, res.restext, ressec.restext).  
 
The following queries were run to make new connections possible with the new strucure (obj-actobj-cell-act instead of the old poorer way obj-cell-obj as run-objinfo).
 
; This query was used to fetch all the object - upload (run) pairs:
 
SELECT cell.obj_id_v as obj_id, act.id as act_id, cell.obj_id_r as run_id, objinfo.id, act.temp_id
FROM objinfo, cell, act
WHERE cell.obj_id_r = objinfo.obj_id
AND objinfo.id = act.temp_id
GROUP BY cell.obj_id_r, cell.obj_id_v;
 
 
; This query was used to fill the actobj table:
 
INSERT IGNORE INTO actobj (act_id, obj_id)
SELECT act.id as act_id, cell.obj_id_v as obj_id
FROM objinfo, cell, act
WHERE cell.obj_id_r = objinfo.obj_id
AND objinfo.id = act.temp_id
GROUP BY cell.obj_id_r, cell.obj_id_v ;
 
 
;This query was used to update the actobj column in the cell table:
 
UPDATE objinfo, cell, act, actobj
SET cell.actobj_id = actobj.id
WHERE cell.obj_id_r = objinfo.obj_id
AND objinfo.id = act.temp_id
AND act.id = actobj.act_id
AND cell.obj_id_v = actobj.obj_id;
 
As a result, the Base seemed to work exactly like it should. Note! Existing columns were NOT changed, so that all previous queries should work just like before. However, with the new structure, it is now possible to start updating the queries.
 
==See also==


; A basic query for retrieving the full result of a variable upload (an example):
; A basic query for retrieving the full result of a variable upload (an example):

Latest revision as of 12:01, 10 January 2014



This page is about the old structure of Opasnet Base used in ca. 2008-2011. For a description about the current database, see Opasnet base 2.

Question

Structure and connections (lines) of the tables (boxes) in the Opasnet Base. All table identifiers are called id (so they can be called by like obj.id). When obj.id is referred to in another table such as actobj, it is called actobj.obj_id. The latter end of a one-to-many relationship is marked with a ring. Important substantive fields are listed inside table boxes.

Opasnet Base is a storage and retrieval system for results of variable and data from studies. What is the structure of Opasnet Base such that it enables the following functionalities?

  1. Storage of results of variables with uncertainties when necessary, and as multidimensional arrays when necessary.R↻
  2. Automatic retrieval of results when called from Opasnet wiki or other platforms or modelling systems.
  3. Description and handling of the indicess that a variable may take.
  4. It is possible to protect some results and data from reading by unauthorised persons.
  5. If is possible to build user interfaces for easily entering observations into the Base.

Answer

Opasnet base is a MySQL database located at http://base.opasnet.org.

Data structure

Main article: Data structures in Opasnet

All data should be convertible into the following format:

Observation
Year Sex Age Height Weight Description
2009 Male 20 178 70 An optional column for descriptive text about each row.
2009 Male 30 174 79
2010 Male 25 183 84
2010 Female 22 168 65

where

Names of explanation columns, also known as indices.
Explanation data, also known as locations. You can use these columns as search criteria.
Observation index, typically called "Observation". Common name for all observation columns
Names of observation columns. These are the parameters of interest.
Observation data. These are the actual measurements.

Table structure in the database

All tables

act
Uploads, updates, and other actions
Field Type Null Extra Key
id int(10) unsigned NO auto_increment PRI
acttype_id tinyint(3) unsigned NO MUL
who varchar(50) NO
comments varchar(250) YES
time timestamp NO
temp_id int(10) unsigned NO MUL
actloc
Locations of an act
Field Type Null Extra Key
actobj_id int(10) unsigned NO PRI
loc_id int(10) unsigned NO PRI
actobj
Acts of an object
Field Type Null Extra Key
id int(10) unsigned NO auto_increment PRI
act_id int(10) unsigned NO MUL
obj_id int(10) unsigned NO MUL
series_id int(10) unsigned NO MUL
unit varchar(64) YES
acttype
List of action types
Field Type Null Extra Key
id int(10) unsigned NO auto_increment PRI
acttype varchar(250) NO UNI
cell
Cells of an object
Field Type Null Extra Key
id int(12) unsigned NO auto_increment PRI
actobj_id int(10) unsigned NO MUL
mean float YES
sd float NO
n int(10) NO
sip varchar(2000) YES
loc
Location information
Field Type Null Extra Key
id int(10) unsigned NO auto_increment PRI
std_id int(10) unsigned NO MUL
obj_id_i int(10) unsigned NO MUL
location varchar(100) NO
roww mediumint(8) unsigned NO
description varchar(150) NO
loccell
Locations of a cell
Field Type Null Extra Key
cell_id int(10) unsigned NO PRI
loc_id int(10) unsigned NO PRI
obj
Object information (all objects)
Field Type Null Extra Key
id int(10) unsigned NO auto_increment PRI
ident varchar(20) NO UNI
name varchar(200) NO
objtype_id tinyint(3) unsigned NO MUL
page int(10) unsigned NO
wiki_id tinyint(3) unsigned NO
objtype
Types of objects
Field Type Null Extra Key
id tinyint(3) NO PRI
objtype varchar(30) NO
res
Result distribution (actual values)
Field Type Null Extra Key
id bigint(20) unsigned NO auto_increment PRI
cell_id int(12) unsigned NO MUL
obs int(10) unsigned NO
result float NO
restext varchar(250) YES
implausible binary(1) YES
wiki
Wiki information
Field Type Null Extra Key
id tinyint(3) NO PRI
url varchar(255) NO
wname varchar(20) NO

Contents of selected tables

Table objtype
id objtype
1 Variable
2 Study
3 Method
4 Assessment
5 Class
6 Index
7 Nugget
8 Encyclopedia article
9 Run
Table acttype
id acttype
1 Start object
2 Finish assessment
3 Update formula
4 Upload data (replace)
5 Upload data (append)
6 Review scope
7 Review definition
8 Add object info

Rationale

Data

Software

Because Opasnet base will contain very large amounts of mostly numerical information, the state-of-the-art structure is a SQL database. Because of its flexibility, ease of use, and cost, MySQL is an optimal choice among SQL software. In addition to the database software, a variable transfer protocol is needed on top of that so that the results of variables can be retrieved and new results stored either automatically by a calculating software, or manually by the user. Fancy presenting software can be built on top of the database, but that is not the topic of this page.

Storage and retrieval of results of variables

The most important functionality is to store and retrieve the results of variables. Because variables may take very different forms (from a single value such as natural constant to an uncertain spatio-temporal concentration field over the whole Europe), the database must be very flexible. The basic solution is described in the variable page, and it is only briefly summarised here. The result is described as

  P(R|x1,x2,...) 

where P(R) is the probability distribution of the result and x1 and x2 are defining locations of an index where a particular P(R) applies. Typically locations are operationalised as discrete indices. A variable must have at least one index. Uncertainty about the true value of the variable is operationalised as a random sample from the probability distribution, in such a way that the samples are located along an index Sample, which is a list of integers 1,2,3...n, where n=number of samples.


Dependencies

Replacing some cells

It is possible that there is a large data, where there is a need to update only a few cells while all others remain the same. How should this be done? There are a few potential alternatives.

  1. Use the current replace functionality. Replace all cells but most of them with the original value.
  2. Use a new act_type that is similar to the current append functionality. This should be understood in a way that if there are two (or more) identical cells (based on cell indices and locations), then the newest result is used and all older ones are discarded. (If the old append is used, then new info is just seen as a new row in the data table, not a replacement of an existing row.
  3. Add a new field into the cell (?) table with an updated cell_id (in a similar way than act_id and series_id are used in the actobj table). This way, the new cell can automatically inherit all locations of the old cell.

Formula structure

Now it has become clear that it is not enough to have samples of the result distributions. It must be possible to completely recalculate the result based on the information in the Opasnet Base. There are different approaches:

  • Calculate the result based on a formula that may refer to other variables called parents. This is a deterministic approach.
  • Calculate the result based on the marginal distribution and (conditional) rank correlations with parent variables. This is a probabilistic approach.

This approach requires new tables, namely Formula and Language.


----11: . Do we need tables DIF and DIP like Uninet? --Jouni 21:50, 30 December 2009 (UTC) (type: truth; paradigms: science: comment)
  • DIP
    • DIP_node_id
    • DIP_parent_node_id
    • DIP_corr_coeff
    • DIP_parent_index
  • DIF
    • DIF_node_id
    • DIF_formula
    • DIF_varnames_in_formula

Universal Opasnet Base

The idea of universal Opasnet Base says that it should be possible to store results in such a way that the results themselves are public but their interpretation is limited. For example, patient symptoms and clinical test results should be openly available for research, but information about whose results they are should be private. This can be achieved with the following database structure.

Universal Opasnet Base has some parts that exist in different versions depending on the privacy level. The yellow areas are e.g. a public area and a private area. The parts that are white are public.

Let's say that it is enough to have two security levels, public and private. A person wants to record personal health information into the database. She logs in with her personal user name. The private profile gives the name (say, Liisa) and social security number of the person, while the public profile says only "30-40-year-old woman in Finland". Liisa writes down her symptoms or medical information and saves them. This is what is stored in the databases:

Information stored in the public and private databases. The private database can read tables from the public one but not vice versa.
Table, field Private database Public database
act.who Liisa, 010175-1024 Woman, 30-40 a
act.when 2011-03-09 22:09:10 2011-03
obj.name N/A. Data is taken from public side. Pregnancy test
loccell.loc_id (locations and indices explained) Person = 010175-1024
Time = 2011-03-09
Test = Clearblue digital test
Age = 30-40
Sex = Female
Country = Finland
Time = 2011-03
Test = Clearblue digital test
res.restext N/A. Data is taken from public side. Pregnant 1-2 weeks.

Based on the information, anyone can see that there is a woman in Finland who has used a Clearblue pregnancy test and the result was positive. But there is no way an outsider could connect this information to any particular person, because all information that could be used for linking is located in the private website. However, an authorised person from health case could see the data in the private database and connect Liisa and the test result.

See also

A basic query for retrieving the full result of a variable upload (an example)

{{#sql-query: SELECT obj.ident, obj.name, obj.unit, obj.page, obj.wiki_id, comments, mean, sd, n, location, ind.ident, obs, result, restext FROM obj LEFT JOIN actobj ON actobj.obj_id = obj.id LEFT JOIN act ON actobj.act_id = act.id LEFT JOIN cell ON cell.actobj_id = actobj.id LEFT JOIN loccell ON loccell.cell_id = cell.id LEFT JOIN loc on loccell.loc_id = loc.id LEFT JOIN obj AS ind ON loc.obj_id_i = ind.id LEFT JOIN res ON res.cell_id = cell.id WHERE obj.ident = "Op_en1912" AND actobj.series_id = 190 LIMIT 0,100 }}

Some useful syntax