Opasnet base structure: Difference between revisions
Juha Villman (talk | contribs) (→Columns in tables: some commments added) |
Juha Villman (talk | contribs) |
||
Line 519: | Line 519: | ||
:{{attack|7 |All column names should start with either a Capital or small letter.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}} | :{{attack|7 |All column names should start with either a Capital or small letter.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}} | ||
:{{ | :{{attack_invalid|8 |Objinfo.Begin and Objinfo.End should be removed.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}} | ||
::{{comment| |DONE|--[[User:Juha Villman|Juha Villman]] 12:54, 4 January 2010 (UTC)}} | |||
:{{attack|9 |Cell.Obj_id_r should be renamed Cell.Act_id.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}} | :{{attack|9 |Cell.Obj_id_r should be renamed Cell.Act_id.|--[[User:Jouni|Jouni]] 12:55, 31 December 2009 (UTC)}} |
Revision as of 12:54, 4 January 2010
Moderator:Jouni (see all) |
|
Upload data
|
This page is about the structure of Opasnet Base. For a general description, see Opasnet base.
Scope
Opasnet base is a storage and retrieval system for results of variable and data from studies. What is the structure of Opasnet base such that it enables the following functionalities?
- Storage of results of variables with uncertainties when necessary, and as multidimensional arrays when necessary.R↻
- Automatic retrieval of results when called from Opasnet wiki or other platforms or modelling systems.
- Description and handling of the dimensions that a variable may take.
- It is possible to protect some results and data from reading by unauthorised persons.
- If is possible to build user interfaces for easily entering observations into the Base.
Definition
Data
Software
Because Opasnet base will contain very large amounts of mostly numerical information, the state-of-the-art structure is a SQL database. Because of its flexibility, ease of use, and cost, MySQL is an optimal choice among SQL software. In addition to the database software, a variable transfer protocol is needed on top of that so that the results of variables can be retrieved and new results stored either automatically by a calculating software, or manually by the user. Fancy presenting software can be built on top of the database, but that is not the topic of this page.
Storage and retrieval of results of variables
The most important functionality is to store and retrieve the results of variables. Because variables may take very different forms (from a single value such as natural constant to an uncertain spatio-temporal concentration field over the whole Europe), the database must be very flexible. The basic solution is described in the variable page, and it is only briefly summarised here. The result is described as
P(R|x1,x2,...)
where P(R) is the probability distribution of the result and x1 and x2 are defining locations of a dimension where a particular P(R) applies. Typically locations are operationalised as discrete indices. A variable must have at least one dimension. Uncertainty about the true value of the variable is operationalised as a random sample from the probability distribution, in such a way that the samples are located along an index Sample, which is a list of integers 1,2,3...n, where n=number of samples.
Dependencies
Result
Opasnet base is a MySQL database located at http://base.opasnet.org.
Table structure
Formula structure
Now it has become clear that it is not enough to have samples of the result distributions. It must be possible to completely recalculate the result based on the information in the Opasnet Base. There are different approaches:
- Calculate the result based on a formula that may refer to other variables called parents. This is a deterministic approach.
- Calculate the result based on the marginal distribution and (conditional) rank correlations with parent variables. This is a probabilistic approach.
This approach requires new tables, namely Formula and Language.
- ----11: . Do we need tables DIF and DIP like Uninet? --Jouni 21:50, 30 December 2009 (UTC) (type: truth; paradigms: science: comment)
- DIP
- DIP_node_id
- DIP_parent_node_id
- DIP_corr_coeff
- DIP_parent_index
- DIF
- DIF_node_id
- DIF_formula
- DIF_varnames_in_formula
All tables: Overview
- We need Ressec (Result secure) and Resinfosec (Result info secure) tables for secure information. All other tables are openly readable except these two. They have the same structure as Res and Resinfo tables, respectively.
Table | Description |
---|---|
Acttype | List of action types |
Cell | Cells of an object |
Formula | Formulas for computing variable results |
Item | |
Language | List of languages understood by the formula |
Loc | Location information |
Loccell | Locations of a cell |
Log | |
Obj | Object information (all objects) |
Objinfo | Additional information about the objects |
Objtype | Types of objects |
Res | Result distribution (actual values) |
Resinfo | Additional description of the result |
Resinfosec | Additional description of the result |
Ressec | Result distribution (actual values) |
Sett | Memberships of items in sets |
Settype | Types of set-item memeberships |
Wiki | Wiki information |
- ⇤--3: . Res and Resinfo should be merged. Similarly, Ressec and Resinfosec should be merged. --Jouni 09:48, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)R↻
- ⇤--4: . We must divide Obj into Obj and Act. Act table (previously Objinfo) would contain uploads (previously Runs) and formula updates. --Jouni 09:48, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
- ←--12: . Objinfo should be renamed Act. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: defence)
- ←--13: . Uploads (previously "runs") should no longer be called objects. They belong to table Act. As a side effect, one upload refers to exactly one object (i.e., when uploading a full model, all object will have different upload numbers). --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: defence)
Columns in tables
Table | Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|---|
Cell | id | int(12) unsigned | NO | PRI | NULL | auto_increment |
Cell | Obj_id_v | int(10) unsigned | NO | MUL | NULL | |
Cell | Obj_id_r | int(10) unsigned | NO | NULL | ||
Cell | Mean | float | YES | NULL | ||
Cell | SD | float | NO | NULL | ||
Cell | N | int(10) | NO | NULL | ||
Formula | id | int(10) unsigned | NO | PRI | NULL | auto_increment |
Formula | Obj_id_v | int(10) unsigned | NO | MUL | NULL | |
Formula | Act_id | int(10) unsigned | NO | MUL | NULL | |
Formula | Language | smallint(5) unsigned | NO | NULL | ||
Formula | Code | longtext | YES | NULL | ||
Loc | id | int(10) unsigned | NO | PRI | NULL | auto_increment |
Loc | Std_id | int(10) unsigned | NO | MUL | NULL | |
Loc | Obj_id_i | int(10) unsigned | NO | MUL | NULL | |
Loc | Location | varchar(100) | NO | NULL | ||
Loc | Roww | mediumint(8) unsigned | NO | NULL | ||
Loc | Description | varchar(150) | NO | NULL | ||
Loccell | id | int(10) unsigned | NO | PRI | NULL | auto_increment |
Loccell | Cell_id | int(10) unsigned | NO | MUL | NULL | |
Loccell | Loc_id | int(10) unsigned | NO | NULL | ||
Obj | id | int(10) unsigned | NO | PRI | NULL | auto_increment |
Obj | Ident | varchar(20) | NO | UNI | NULL | |
Obj | Name | varchar(200) | NO | NULL | ||
Obj | Unit | varchar(64) | NO | NULL | ||
Obj | Objtype_id | tinyint(3) unsigned | NO | MUL | NULL | |
Obj | Page | int(10) unsigned | NO | NULL | ||
Obj | Wiki_id | tinyint(3) unsigned | NO | NULL | ||
Obj | Newest | int(10) unsigned | NO | MUL | NULL | |
Objinfo | id | int(10) unsigned | NO | PRI | NULL | auto_increment |
Objinfo | Obj_id | int(10) unsigned | NO | MUL | NULL | |
Objinfo | Acttype_id | tinyint(3) unsigned | NO | NULL | ||
Objinfo | Begin | date | NO | MUL | NULL | |
Objinfo | End | date | NO | NULL | ||
Objinfo | Who | varchar(50) | NO | NULL | ||
Objinfo | Comments | varchar(250) | NO | NULL | ||
Objinfo | Moment | timestamp | NO | CURRENT_TIMESTAMP | ||
Res | id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
Res | Cell_id | int(12) unsigned | NO | MUL | NULL | |
Res | Obs | int(10) unsigned | NO | NULL | ||
Res | Result | float | NO | NULL | ||
Resinfo | id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
Resinfo | Restext | varchar(250) | NO | NULL | ||
Resinfo | Who | varchar(50) | NO | NULL | ||
Resinfo | When | timestamp | NO | CURRENT_TIMESTAMP | ||
Resinfosec | Resinfosec must have the identical structure to Resinfo. | |||||
Ressec | Ressec must have the identical structure to Res. |
- ⇤--1: . Obj.Unit should have at least 32 characters. --Jouni 19:29, 17 September 2009 (EEST) (type: truth; paradigms: science: attack)
- ←--2: . We can increase it to 64 at once. --Juha Villman 07:52, 18 September 2009 (EEST) (type: truth; paradigms: science: defence)
- ----3: . DONE --Juha Villman 10:56, 4 January 2010 (UTC) (type: truth; paradigms: science: comment)
- ⇤--5: . We should add Res.Formula_id. --Jouni 21:50, 30 December 2009 (UTC) (type: truth; paradigms: science: attack)
- ⇤--6: . All timestamps (Resinfo.When, Objinfo.Moment) should be renamed to .Time. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
- ⇤--7: . All column names should start with either a Capital or small letter. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
- ⇤--8: . Objinfo.Begin and Objinfo.End should be removed. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
- ----': . DONE --Juha Villman 12:54, 4 January 2010 (UTC) (type: truth; paradigms: science: comment)
- ⇤--9: . Cell.Obj_id_r should be renamed Cell.Act_id. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
- ----14: . Should we move Obj.Unit to Cell? This may be useful, because different columns in an object (especially a study) may have different units. On the other hand, it will make things more complicated. This does not work unless we develop a good interface for entering units to different columns. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: comment)
- ⇤--15: . Do we really need Obj.Newest? It is just redundant, although it may save computing time at some point. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
- ←--': . Yes we need it. --Juha Villman 11:12, 4 January 2010 (UTC) (type: truth; paradigms: science: defence)
- ⇤--16: . I think that Resinfo.Who and Resinfo.When should rather be merged with Act. Then, when a user uploads new data (even a single line with a wiki form), the upload would always have a new act_id with a timestamp. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
- ⇤--17: . Resinfo.Restext should rather be a text field, not varchar with a fixed length. Is there a difference? --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
- ←--18: . Varchar is almost always faster. We should stick to it --Juha Villman 11:12, 4 January 2010 (UTC) (type: truth; paradigms: science: defence)
Table | Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|---|
Item | id | int(10) unsigned | NO | PRI | NULL | auto_increment |
Item | Sett_id | int(10) unsigned | NO | MUL | NULL | |
Item | Obj_id | int(10) unsigned | NO | NULL | ||
Item | Fail | tinyint(1) unsigned | NO | MUL | NULL | |
Log | id | int(10) | NO | PRI | NULL | auto_increment |
Log | ip | varchar(15) | NO | NULL | ||
Log | wiki_uid | varchar(30) | YES | NULL | ||
Log | wiki_page | varchar(30) | YES | NULL | ||
Log | time | timestamp | NO | 0000-00-00 00:00:00 | ||
Log | query | varchar(256) | NO | NULL | ||
Sett | id | int(10) unsigned | NO | PRI | NULL | auto_increment |
Sett | Obj_id | int(10) unsigned | NO | MUL | NULL | |
Sett | Settype_id | tinyint(3) unsigned | NO | NULL |
Table | Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|---|
Acttype | id | int(10) unsigned | NO | PRI | NULL | auto_increment |
Acttype | Acttype | varchar(250) | NO | UNI | NULL | |
Language | id | tinyint(3) unsigned | NO | PRI | NULL | auto_increment |
Language | Language | varchar(250) | NO | UNI | NULL | |
Objtype | id | tinyint(3) | NO | PRI | NULL | |
Objtype | Objtype | varchar(30) | NO | NULL | ||
Settype | id | tinyint(3) unsigned | NO | PRI | NULL | auto_increment |
Settype | Settype | varchar(30) | NO | NULL | ||
Wiki | id | tinyint(3) | NO | PRI | NULL | |
Wiki | Url | varchar(255) | NO | NULL | ||
Wiki | Wname | varchar(20) | NO | NULL |
- ⇤--10: . Objtype.id and Wiki.id should be auto increments. --Jouni 12:55, 31 December 2009 (UTC) (type: truth; paradigms: science: attack)
See also
- Some useful syntax
- http://www.baycongroup.com/sql_join.htm
- Opasnet base connection.ANA for Analytica: for writing and reading variable results into and from the database. Writing requires a password. For SQL used in the model, see the model page.
- Some historical queries
- Some historical queries 2