Opasnet base structure: Difference between revisions

From Opasnet
Jump to navigation Jump to search
(→‎All tables: Overview: structure info updated from the Base; structural arguments added)
(updated and commented; Opasnet Base structure also updated to reflect this description.)
Line 52: Line 52:
* Calculate the result based on the marginal distribution and (conditional) rank correlations with parent variables. This is a probabilistic approach.
* Calculate the result based on the marginal distribution and (conditional) rank correlations with parent variables. This is a probabilistic approach.


This approach requires new tables, namely Formula and Language.


This approach requires new tables, namely:
 
* Formula
: {{comment|11|Do we need tables DIF and DIP like Uninet?|--[[User:Jouni|Jouni]] 21:50, 30 December 2009 (UTC)}}
** id (automatic incremental integer)
** Obj_id_v
** Obj_id_r
** When (what is the relationship between upload and formula/When? Is there always a new formula for a new upload? No, because the upload may change even if formula doesn't, if the parent change. Is there always a new upload for a new formula? Yes, because it is necessary to make a new upload.
** Language (of the formula code)
** Code (a large text or memo field for the formula)
: {{comment|# |Do we need tables DIF and DIP like Uninet?|--[[User:Jouni|Jouni]] 21:50, 30 December 2009 (UTC)}}
* DIP
* DIP
** DIP_node_id
** DIP_node_id
Line 71: Line 65:
** DIF_formula
** DIF_formula
** DIF_varnames_in_formula
** DIF_varnames_in_formula
====Objinfo: new structure====
The structure of Objinfo should be changed. The original plan was that there is at most one row of Objinfo per Object. Now it is clear that this does not have all functionalities we need. Instead, there should be a possibility to add any number of actions per object. Therefore, even the name of the table should be changed to Act. The structure should be changed accordingly:
* The field id is the primary field for the table. It is NOT the Obj.id any longer.
* A new field Obj_id should be added. This is the old field id.
* End field should be removed, it is not used.
* Url should be changed to Comment, as it may contain also other info.
* The length of Comment should be 250 characters (at least).
* Begin should be replaced by When, which is the current timestamp of the row addition.
* A new field Act_id should be added.
* A new table Acttype for actions should be added. It would contain only fields id and Act, and the following rows:
*# Start the object
*# Finish the assessment
*# Add a reference
*# Add an URL
*# Peer review the object definition: accept based on the discussion
*# Peer review the object definition: reject based on the discussion
*# Peer review the object definition: accept (personal opinion)
*# Peer review the object definition: reject (personal opinion)
*# Clairvoyant test for the scope: pass
*# Clairvoyant test for the scope: fail
*# Save a run of the object.
====Merging Res and Resinfo -tables====
These tables should be merged. Discussion is here {{disclink|Res and Resinfo -tables should be merged}}.


====All tables: Overview====
====All tables: Overview====
Line 109: Line 75:
!Table
!Table
!Description
!Description
|----
|Acttype
|List of action types
|----
|----
|Cell
|Cell
Line 118: Line 87:
|Item
|Item
|
|
|----
|Language
|List of languages understood by the formula
|----
|----
|Loc
|Loc
Line 160: Line 132:
|}
|}


:{{attack|3 |Res and Resinfo should be merged. Similarly, Ressec and Resinfosec should be merged.|--[[User:Jouni|Jouni]] 09:48, 31 December 2009 (UTC)}}
:{{attack|3 |Res and Resinfo should be merged. Similarly, Ressec and Resinfosec should be merged.|--[[User:Jouni|Jouni]] 09:48, 31 December 2009 (UTC)}}{{reslink|Res and Resinfo -tables should be merged}}


:{{attack|4 |We may need to divide Obj into Obj and Task. Task table would contain uploads (previously Runs) and formula updates.|--[[User:Jouni|Jouni]] 09:48, 31 December 2009 (UTC)}}
:{{attack|4 |We must divide Obj into Obj and Act. Act table (previously Objinfo) would contain uploads (previously Runs) and formula updates.|--[[User:Jouni|Jouni]] 09:48, 31 December 2009 (UTC)}}
::{{defend|12|Objinfo should be renamed Act.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
::{{defend|13|Uploads (previously "runs") should no longer be called objects. They belong to table Act. As a side effect, one upload refers to exactly one object (i.e., when uploading a full model, all object will have different upload numbers).|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}


====Columns in tables====


{| {{prettytable}}
{| {{prettytable}}
Line 240: Line 215:
|----
|----
|Formula
|Formula
|When
|Act_id
|timestamp
|int(10) unsigned
|NO
|NO
|
|MUL
|CURRENT_TIMESTAMP
|NULL
|
|
|----
|----
Line 528: Line 503:
|----
|----
!Resinfosec
!Resinfosec
|id
|colspan="6"|Resinfosec must have the identical structure to Resinfo.
|bigint(20) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Resinfosec
|Restext
|varchar(250)
|NO
|
|NULL
|
|----
|Resinfosec
|Who
|varchar(50)
|NO
|
|NULL
|
|----
|Resinfosec
|When
|timestamp
|NO
|
|CURRENT_TIMESTAMP
|
|----
|Ressec
|id
|bigint(20) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Ressec
|Cell_id
|int(12) unsigned
|NO
|MUL
|NULL
|
|----
|Ressec
|Obs
|int(10) unsigned
|NO
|
|NULL
|
|----
|Ressec
|Result
|float
|NO
|
|NULL
|
|----
|----
!Ressec
|colspan="6"|Ressec must have the identical structure to Res.
|}
|}


Line 598: Line 514:
:{{attack|5 |We should add Res.Formula_id.|--[[User:Jouni|Jouni]] 21:50, 30 December 2009 (UTC)}}
:{{attack|5 |We should add Res.Formula_id.|--[[User:Jouni|Jouni]] 21:50, 30 December 2009 (UTC)}}


:{{attack|6 |All timestamps (Resinfo.When, Objinfo.Moment) should be renamed to .Time.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
:{{attack|7 |All column names should start with either a Capital or small letter.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
:{{attack|8 |Objinfo.Begin and Objinfo.End should be removed.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
:{{attack|9 |Cell.Obj_id_r should be renamed Cell.Act_id.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
:{{comment|14 |Should we move Obj.Unit to Cell? This may be useful, because different columns in an object (especially a study) may have different units. On the other hand, it will make things more complicated. This does not work unless we develop a good interface for entering units to different columns.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
:{{attack|15 |Do we really need Obj.Newest? It is just redundant, although it may save computing time at some point.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
:{{attack|16 |I think that Resinfo.Who and Resinfo.When should rather be merged with Act. Then, when a user uploads new data (even a single line with a wiki form), the upload would always have a new act_id with a timestamp.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}
:{{attack|17 |Resinfo.Restext should rather be a text field, not varchar with a fixed length. Is there a difference?|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}


{| {{prettytable}}
{| {{prettytable}}
Line 727: Line 658:
!Default
!Default
!Extra
!Extra
|----
!Acttype
|id
|int(10) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Acttype
|Acttype
|varchar(250)
|NO
|UNI
|NULL
|
|----
!Language
|id
|tinyint(3) unsigned
|NO
|PRI
|NULL
|auto_increment
|----
|Language
|Language
|varchar(250)
|NO
|UNI
|NULL
|
|----
|----
!Objtype
!Objtype
Line 785: Line 748:
|----
|----
|}
|}
:{{attack|10 |Objtype.id and Wiki.id should be auto increments.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}}


==See also==
==See also==


===Some useful syntax===
;Some useful syntax
 
* http://www.baycongroup.com/sql_join.htm
* http://www.baycongroup.com/sql_join.htm
* [[:image:Opasnet base connection.ANA|Opasnet base connection.ANA]] for Analytica: for writing and reading variable results into and from the database. Writing requires a password. For SQL used in the model, see the model page.
* [[:image:Opasnet base connection.ANA|Opasnet base connection.ANA]] for Analytica: for writing and reading variable results into and from the database. Writing requires a password. For SQL used in the model, see the model page.
* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base&oldid=7181#Other_queries Some historical queries]
* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base&oldid=7181#Other_queries Some historical queries]
 
* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base_structure&oldid=14214#Some_useful_syntax Some historical queries 2]
<sql-query display=1>
SELECT Obj.id, Obj.Ident, Obj.Name, Obj.Typ_id, Sty_id, Itemm.Ident as Iident, Itemm.Name as Iname
FROM Obj
LEFT JOIN Sett ON Obj.id = Sett.Obj_id
LEFT JOIN Item ON Sett.id = Item.Sett_id
LEFT JOIN Obj AS Itemm ON Item.Obj_id = Itemm.id
</sql-query>
 
 
'''NOTE! The queries below work in the new database "opasnet_base", not "resultdb" as the old versions.
 
{{#sql-query:
SELECT Var.Ident, Var.Name, Var.Unit, Run.Ident, Begin, Who, Run.Name as Method
FROM Obj as Var, Obj as Run, Cell, Objinfo
WHERE Var.Ident = "Op_en{{PAGEID}}"
AND Var.id = Cell.Obj_id_v
AND Run.id = Cell.Obj_id_r
AND Run.id = Objinfo.id
GROUP BY Var.id, Run.id
|Runs}}
 
{{#sql-query:
SELECT Var.Ident, Var.Name, Cell.id, N, Begin, Mean, Var.Unit
FROM Obj as Var, Obj as Run, Cell, Objinfo
WHERE Var.Ident = "Op_en{{PAGEID}}"
AND Var.id = Cell.Obj_id_v
AND Run.id = Cell.Obj_id_r
AND Run.id = Objinfo.id
GROUP BY Cell.id
ORDER BY Run.id DESC, Var.Ident
|Means and samplesizes (N)}}
{{#sql-query:
SELECT Var.Ident, Cell.id, Cell.Obj_id_r as Run, Obs, Result, Var.Unit
FROM Obj as Var, Cell, Res
WHERE Var.Ident = "Op_en{{PAGEID}}"
AND Var.id = Cell.Obj_id_v
AND Cell.id = Res.Cell_id
ORDER BY Cell.Obj_id_r, Var.Ident, Cell.id
|Full sample}}
 
 
'''List all dimensions that have indices, and the indices concatenated:
 
<sql-query display="1">
SELECT Dim.Ident, Dim.Name, Dim.Unit, Group_concat(Ind.Ident
ORDER BY Ind.Name SEPARATOR ', ') as Indices
FROM Obj AS Dim, Obj as Ind, Sett, Item
WHERE Dim.id = Sett.Obj_id
AND Sett.Settype_id=1
AND Sett.id = Item.Sett_id
AND Item.Obj_id = Ind.id
GROUP BY Dim.Name
ORDER BY Dim.id
</sql-query>
 
 
'''List all indices, and their locations concatenated:
 
<sql-query display="1">
SELECT Ident, Name, Unit, GROUP_CONCAT(Location ORDER BY Roww SEPARATOR ', ') AS Locations
FROM Obj AS Ind, Loc
WHERE Ind.id = Loc.Obj_id_i
GROUP BY Name
ORDER BY Name
</sql-query>
 
 
'''List all variables and their runs, and also list all indices (concatenated) used for each variable for each run.
 
<sql-query display="1">
SELECT Var_id, Run_id, Ident, Name, GROUP_CONCAT(Indic SEPARATOR ', ') AS Indices, N, Method
FROM
  (SELECT Var.id as Var_id, Run.id as Run_id, Var.Ident AS Ident, Var.Name as Name, Ind.Ident AS Indic, N, Run.Name AS Method
  FROM Obj AS Var, Obj AS Run, Obj AS Ind, Loccell, Loc, Cell
  WHERE Var.id = Cell.Obj_id_v
  AND Run.id = Cell.Obj_id_r
  AND Cell.id = Loccell.Cell_id
  AND Loc.id = Loccell.Loc_id
  AND Ind.id = Loc.Obj_id_i
  GROUP BY Var_id, Run_id, Ind.Ident ) AS Temp1
GROUP BY Var_id, Run_id
</sql-query>

Revision as of 12:55, 31 December 2009



This page is about the structure of Opasnet Base. For a general description, see Opasnet base.

Scope

Opasnet base is a storage and retrieval system for results of variable and data from studies. What is the structure of Opasnet base such that it enables the following functionalities?

  1. Storage of results of variables with uncertainties when necessary, and as multidimensional arrays when necessary.R↻
  2. Automatic retrieval of results when called from Opasnet wiki or other platforms or modelling systems.
  3. Description and handling of the dimensions that a variable may take.
  4. It is possible to protect some results and data from reading by unauthorised persons.
  5. If is possible to build user interfaces for easily entering observations into the Base.


Definition

Data

Software

Because Opasnet base will contain very large amounts of mostly numerical information, the state-of-the-art structure is a SQL database. Because of its flexibility, ease of use, and cost, MySQL is an optimal choice among SQL software. In addition to the database software, a variable transfer protocol is needed on top of that so that the results of variables can be retrieved and new results stored either automatically by a calculating software, or manually by the user. Fancy presenting software can be built on top of the database, but that is not the topic of this page.

Storage and retrieval of results of variables

The most important functionality is to store and retrieve the results of variables. Because variables may take very different forms (from a single value such as natural constant to an uncertain spatio-temporal concentration field over the whole Europe), the database must be very flexible. The basic solution is described in the variable page, and it is only briefly summarised here. The result is described as

  P(R|x1,x2,...) 

where P(R) is the probability distribution of the result and x1 and x2 are defining locations of a dimension where a particular P(R) applies. Typically locations are operationalised as discrete indices. A variable must have at least one dimension. Uncertainty about the true value of the variable is operationalised as a random sample from the probability distribution, in such a way that the samples are located along an index Sample, which is a list of integers 1,2,3...n, where n=number of samples.


Dependencies

Result

Opasnet base is a MySQL database located at http://base.opasnet.org.

Table structure

Formula structure

Now it has become clear that it is not enough to have samples of the result distributions. It must be possible to completely recalculate the result based on the information in the Opasnet Base. There are different approaches:

  • Calculate the result based on a formula that may refer to other variables called parents. This is a deterministic approach.
  • Calculate the result based on the marginal distribution and (conditional) rank correlations with parent variables. This is a probabilistic approach.

This approach requires new tables, namely Formula and Language.


----11: . Do we need tables DIF and DIP like Uninet? --Jouni 21:50, 30 December 2009 (UTC) (type: truth; paradigms: science: comment)
  • DIP
    • DIP_node_id
    • DIP_parent_node_id
    • DIP_corr_coeff
    • DIP_parent_index
  • DIF
    • DIF_node_id
    • DIF_formula
    • DIF_varnames_in_formula

All tables: Overview

  • We need Ressec (Result secure) and Resinfosec (Result info secure) tables for secure information. All other tables are openly readable except these two. They have the same structure as Res and Resinfo tables, respectively.


Tables_in_opasnet_base
Table Description
Acttype List of action types
Cell Cells of an object
Formula Formulas for computing variable results
Item
Language List of languages understood by the formula
Loc Location information
Loccell Locations of a cell
Log
Obj Object information (all objects)
Objinfo Additional information about the objects
Objtype Types of objects
Res Result distribution (actual values)
Resinfo Additional description of the result
Resinfosec Additional description of the result
Ressec Result distribution (actual values)
Sett Memberships of items in sets
Settype Types of set-item memeberships
Wiki Wiki information
⇤--3: . Res and Resinfo should be merged. Similarly, Ressec and Resinfosec should be merged. --Jouni 09:48, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)R↻
⇤--4: . We must divide Obj into Obj and Act. Act table (previously Objinfo) would contain uploads (previously Runs) and formula updates. --Jouni 09:48, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
←--12: . Objinfo should be renamed Act. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: defence)
←--13: . Uploads (previously "runs") should no longer be called objects. They belong to table Act. As a side effect, one upload refers to exactly one object (i.e., when uploading a full model, all object will have different upload numbers). --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: defence)

Columns in tables

Tables with primary content
Table Field Type Null Key Default Extra
Cell id int(12) unsigned NO PRI NULL auto_increment
Cell Obj_id_v int(10) unsigned NO MUL NULL
Cell Obj_id_r int(10) unsigned NO NULL
Cell Mean float YES NULL
Cell SD float NO NULL
Cell N int(10) NO NULL
Formula id int(10) unsigned NO PRI NULL auto_increment
Formula Obj_id_v int(10) unsigned NO MUL NULL
Formula Act_id int(10) unsigned NO MUL NULL
Formula Language smallint(5) unsigned NO NULL
Formula Code longtext YES NULL
Loc id int(10) unsigned NO PRI NULL auto_increment
Loc Std_id int(10) unsigned NO MUL NULL
Loc Obj_id_i int(10) unsigned NO MUL NULL
Loc Location varchar(100) NO NULL
Loc Roww mediumint(8) unsigned NO NULL
Loc Description varchar(150) NO NULL
Loccell id int(10) unsigned NO PRI NULL auto_increment
Loccell Cell_id int(10) unsigned NO MUL NULL
Loccell Loc_id int(10) unsigned NO NULL
Obj id int(10) unsigned NO PRI NULL auto_increment
Obj Ident varchar(20) NO UNI NULL
Obj Name varchar(200) NO NULL
Obj Unit varchar(16) NO NULL
Obj Objtype_id tinyint(3) unsigned NO MUL NULL
Obj Page int(10) unsigned NO NULL
Obj Wiki_id tinyint(3) unsigned NO NULL
Obj Newest int(10) unsigned NO MUL NULL
Objinfo id int(10) unsigned NO PRI NULL auto_increment
Objinfo Obj_id int(10) unsigned NO MUL NULL
Objinfo Acttype_id tinyint(3) unsigned NO NULL
Objinfo Begin date NO MUL NULL
Objinfo End date NO NULL
Objinfo Who varchar(50) NO NULL
Objinfo Comments varchar(250) NO NULL
Objinfo Moment timestamp NO CURRENT_TIMESTAMP
Res id bigint(20) unsigned NO PRI NULL auto_increment
Res Cell_id int(12) unsigned NO MUL NULL
Res Obs int(10) unsigned NO NULL
Res Result float NO NULL
Resinfo id bigint(20) unsigned NO PRI NULL auto_increment
Resinfo Restext varchar(250) NO NULL
Resinfo Who varchar(50) NO NULL
Resinfo When timestamp NO CURRENT_TIMESTAMP
Resinfosec Resinfosec must have the identical structure to Resinfo.
Ressec Ressec must have the identical structure to Res.
⇤--1: . Obj.Unit should have at least 32 characters. --Jouni 19:29, 17 September 2009 (EEST) (type: truth; paradigms: science: attack)
←--2: . We can increase it to 64 at once. --Juha Villman 07:52, 18 September 2009 (EEST) (type: truth; paradigms: science: defence)
⇤--5: . We should add Res.Formula_id. --Jouni 21:50, 30 December 2009 (UTC) (type: truth; paradigms: science: attack)
⇤--6: . All timestamps (Resinfo.When, Objinfo.Moment) should be renamed to .Time. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
⇤--7: . All column names should start with either a Capital or small letter. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
⇤--8: . Objinfo.Begin and Objinfo.End should be removed. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
⇤--9: . Cell.Obj_id_r should be renamed Cell.Act_id. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
----14: . Should we move Obj.Unit to Cell? This may be useful, because different columns in an object (especially a study) may have different units. On the other hand, it will make things more complicated. This does not work unless we develop a good interface for entering units to different columns. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: comment)
⇤--15: . Do we really need Obj.Newest? It is just redundant, although it may save computing time at some point. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
⇤--16: . I think that Resinfo.Who and Resinfo.When should rather be merged with Act. Then, when a user uploads new data (even a single line with a wiki form), the upload would always have a new act_id with a timestamp. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
⇤--17: . Resinfo.Restext should rather be a text field, not varchar with a fixed length. Is there a difference? --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
Tables with additional information
Table Field Type Null Key Default Extra
Item id int(10) unsigned NO PRI NULL auto_increment
Item Sett_id int(10) unsigned NO MUL NULL
Item Obj_id int(10) unsigned NO NULL
Item Fail tinyint(1) unsigned NO MUL NULL
Log id int(10) NO PRI NULL auto_increment
Log ip varchar(15) NO NULL
Log wiki_uid varchar(30) YES NULL
Log wiki_page varchar(30) YES NULL
Log time timestamp NO 0000-00-00 00:00:00
Log query varchar(256) NO NULL
Sett id int(10) unsigned NO PRI NULL auto_increment
Sett Obj_id int(10) unsigned NO MUL NULL
Sett Settype_id tinyint(3) unsigned NO NULL



Tables with lists
Table Field Type Null Key Default Extra
Acttype id int(10) unsigned NO PRI NULL auto_increment
Acttype Acttype varchar(250) NO UNI NULL
Language id tinyint(3) unsigned NO PRI NULL auto_increment
Language Language varchar(250) NO UNI NULL
Objtype id tinyint(3) NO PRI NULL
Objtype Objtype varchar(30) NO NULL
Settype id tinyint(3) unsigned NO PRI NULL auto_increment
Settype Settype varchar(30) NO NULL
Wiki id tinyint(3) NO PRI NULL
Wiki Url varchar(255) NO NULL
Wiki Wname varchar(20) NO NULL
⇤--10: . Objtype.id and Wiki.id should be auto increments. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)

See also

Some useful syntax