Opasnet Base: Difference between revisions
(23 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
[[Category: | [[Category:Opasnet Base]] | ||
{{ | [[Category:Data]] | ||
[[Category:Opasnet]] | |||
{{method|moderator = Jouni | |||
| reference = {{publication | | reference = {{publication | ||
| authors = Juha Villman, Einari Happonen, Jouni T. Tuomisto | | authors = Juha Villman, Einari Happonen, Jouni T. Tuomisto | ||
| page = Opasnet Base | | page = Opasnet Base | ||
| explanation = | | explanation = | ||
| publishingyear = 2012 | | publishingyear = 2012 | ||
Line 11: | Line 13: | ||
}} | }} | ||
This page is about the '''Opasnet Base | ''This page is about the '''Opasnet Base''' database. The previous version is described on page [[Opasnet Base (2008-2011)]]. | ||
==Question== | ==Question== | ||
Line 43: | Line 45: | ||
|id||int(10) unsigned||NO||auto_increment||PRI | |id||int(10) unsigned||NO||auto_increment||PRI | ||
|- | |- | ||
|type||ENUM( | |obj_id||int(10) unsigned||NO|||| | ||
|- | |||
|series_id||int(10) unsigned||NO|||| | |||
|- | |||
|unit||varchar(64)||YES|||| | |||
|- | |||
|type||ENUM('replace','append')||NO|||| | |||
|- | |- | ||
|who||varchar( | |who||varchar(255)||NO|||| | ||
|- | |- | ||
|when||timestamp||NO|||| | |when||timestamp||NO|||| | ||
|- | |- | ||
|comments||varchar( | |comments||varchar(255)||YES|||| | ||
|- | |- | ||
|lang||char(3)||NO||ISO 639-2, default:'eng'|| | |lang||char(3)||NO||ISO 639-2, default:'eng'|| | ||
|} | |} | ||
{{comment|1 |I guess this should have field obj_id to identify which object we are updating. In the previous version, act - obj was many-to-many relationship, but it can as well be many-to-one which makes life easier; it is not important to know that two variables were uploaded in the same model run. Also series_id and unit could be in this table.|--[[User:Jouni|Jouni]] 22:56, 22 February 2012 (EET)}} | |||
{{defend|2 |I have now altered the table structure to implement Jouni's idea.|--[[User:Ehac|Einari]] 12:31, 19 March 2012 (EET)}} | |||
|VALIGN="top"| | |VALIGN="top"| | ||
{| | |||
|} | |||
|- | |- | ||
Line 85: | Line 82: | ||
|id||int(10) unsigned||NO||auto_increment||PRI | |id||int(10) unsigned||NO||auto_increment||PRI | ||
|- | |- | ||
|ident||varchar( | |ident||varchar(64)||NO||||UNI | ||
|- | |||
|name||varchar(255)||NO|||| | |||
|- | |- | ||
| | |subset||varchar(255)||NO||||UNI(with ident) | ||
|- | |- | ||
|type||ENUM('variable','study','method','assessment','class','nugget','encyclopedia')||NO|||| | |type||ENUM('variable','study','method','assessment','class','nugget','encyclopedia')||NO|||| | ||
Line 107: | Line 106: | ||
|id||int(10) unsigned||NO||auto_increment||PRI | |id||int(10) unsigned||NO||auto_increment||PRI | ||
|- | |- | ||
| | |series_id||int(10) unsigned||NO||||UNI(with ident) | ||
|- | |- | ||
|ident||varchar(64)||NO||||UNI(with | |ident||varchar(64)||NO||||UNI(with series_id) | ||
|- | |- | ||
|type||ENUM('entity','number','time')||NO|||| | |type||ENUM('entity','number','time')||NO|||| | ||
|- | |- | ||
| | |name||varchar(255)||NO|||| | ||
|- | |||
|unit||varchar(64)||YES|||| | |||
|- | |- | ||
|page||int(10) unsigned||NO|||| | |page||int(10) unsigned||NO|||| | ||
Line 123: | Line 124: | ||
|hidden||boolean||NO||false|| | |hidden||boolean||NO||false|| | ||
|} | |} | ||
|- | |- | ||
|VALIGN="top"| | |VALIGN="top"| | ||
{| {{prettytable}} | {| {{prettytable}} | ||
Line 153: | Line 138: | ||
|url||varchar(255)||NO|||| | |url||varchar(255)||NO|||| | ||
|- | |- | ||
|wname||varchar( | |wname||varchar(255)||NO|||| | ||
|} | |} | ||
| | |||
|} | |} | ||
==MongoDB== | ===MongoDB=== | ||
==== Tables to store all locations and results ==== | |||
'''db.<objs.ident>(.<objs.subset>).dat''' | |||
Column names:<br/> | |||
sid(series.id), aid (act.id), <inds.ident>, <inds.ident>, ..., <inds.ident>, res (result)<br/> | |||
<br/> | |||
Data:<br/> | |||
<acts.series_id>, <acts.id>, <locs.id | data>, <locs.id | data>, ..., <locs.id | data>, <data> | |||
Index:<br/> | |||
db.<objs.ident>(.<objs.subset>).dat.ensureIndex({_sid:1, _aid:1}); | |||
{{comment|# |I guess this is about loc.id's, not about ind.id's. Loc.id's are either identifiers explained below or values of continuous indices.|--[[User:Jouni|Jouni]] 22:56, 22 February 2012 (EET)}}<br/> | |||
{{attack|# |It was actually about ind.id, but the line described column names, not the data itself. To avoid future misleads I added data describing line as well. Object data can consist of location identifiers or e.g. real numbers, depending of the type of the column index.|--[[User:Ehac|Einari]] 13:33, 19 March 2012 (EET)}} | |||
== | ==== Tables to store real location values for entity type indices ==== | ||
'''db.<objs.ident>(.<objs.subset>).locs''' | |||
Column names:<br/> | |||
iid (index id), lid (location id), val | |||
Index:<br/> | |||
ensureIndex(array("iid" => 1, "lid" => 1)); | |||
{{defend|# |If the index is entity type, inds.id is the identifier of the index (whose details are in MySQL) and value is the name of the location. If the index is continuous, it does not need rows in this table, as everything necessary is described in the tables inds and db.<object.ident>.dat.|--[[User:Jouni|Jouni]] 22:56, 22 February 2012 (EET)}} | |||
{{comment|# |What is key?|--[[User:Jouni|Jouni]] 22:56, 22 February 2012 (EET)}}<br/> | |||
{{defend|# |Key was something that I cannot really confirm. I altered the table description so that index id and location id together make the key (indexed in that order). This means that location values must always be accessed through index id. I believe that this is the fastest way in practice.|--[[User:Ehac|Einari]] 14:35, 19 March 2012 (EET)}} | |||
{{comment|Toinen asia OB2:een liittyen: Indekseille pitäisi pystyä antamaan yksiköt, koska ne eivät ole itsestäänselviä (Erkki huomasi tämän). Onko tämän toteutus suunniteltu? Seka kannassa pitää olla paikka, ja lisäksi esim. t2b:ssa olisi syytä olla parametri yksiköiden antamista varten (tai pitäisikö nykyistä unit-parametrin käyttöä laajentaa?).|--[[User:Jouni|Jouni]] 16:25, 1 January 2013 (EET)}} | |||
====JSON interface==== | |||
The interface is based on a PHP script that outputs JSON, so data can be transferred using ordinary HTTP requests. One could even read through most of the meta-data in a web browser with a plugin that displays json better: | |||
* firefox addon - https://addons.mozilla.org/en-US/firefox/addon/jsonview/ | |||
* list of objects http://cl1.opasnet.org/opasnet_base_2/index.php | |||
* most recent upload of an object - http://cl1.opasnet.org/opasnet_base_2/index.php?ident=op_en5599&act=0 | |||
* data (key is unique to every download instance and is generated when the above link is opened) - http://cl1.opasnet.org/opasnet_base_2/index.php?key=8dYqLbtppz4HM7sYaAyj4t6S49jQNRWJ | |||
==Rationale== | |||
* [[Opasnet Base structure]] | |||
* [[Opasnet Base]] | |||
* [[Opasnet structure]] | * [[Opasnet structure]] | ||
* [[Open assessment]] | * [[Open assessment]] | ||
==See also== | |||
'''Other databases with (ot without) R connectivity | |||
[[ | * [http://docs.ckan.org/en/ckan-2.3/api/index.html API Guide] for http://www.avoindata.fi | ||
[[ | * [http://ropengov.github.io/tutorials/statfi_tutorial/ statfi] R package for Statistics Finland [http://www.stat.fi] | ||
[[ | * [http://ropengov.github.io/r/2015/05/01/eurostat-package-examples/ eurostat] R package for Eurostat data [http://Fec.europa.eu/eurostat] | ||
* [http://ropengov.github.io/general/2014/09/30/fmi/ fmi] R package for Finnish Meteorological Institute [http://www.fmi.fi] | |||
* [http://ropengov.github.io/ rOpenGov] project for useful data and computing for decisions | |||
* [https://cran.r-project.org/web/packages/sorvi/index.html sorvi] R package for some basic functionalities such as social security number functions | |||
{{Opasnet Base}} |
Latest revision as of 14:59, 29 November 2015
Moderator:Jouni (see all) |
|
Upload data
|
This page is about the Opasnet Base database. The previous version is described on page Opasnet Base (2008-2011).
Question
How to improve the existing Opasnet Base? Following issues of Opasnet Base 1 must be resolved:
- Opasnet Base 1 structure makes filtering by location very slow on big data
- Opasnet Base 1 structure is perhaps unnecessarily complex = queries are hard to adapt and read
- MySQL is not ideal for storing huge amounts of data, or at least this is how we assume?
- MySQL tables have fixed column types = difficult to store data objects with varying types of indices into one data table
- Multiple languages (localization) not supported
Answer
MySQL is good on relations but weak on dynamic big data. Let's keep the basic "scaffold" in MySQL and store the big data (locations and results) into noSQL-base. After few days of research the best candidate for noSQL-base seems to be MongoDB. Combining relational MySQL and non-relational MongoDB will be the foundation for the new Opasnet Base 2.
Table structure in the database
All tables
----1: . I guess this should have field obj_id to identify which object we are updating. In the previous version, act - obj was many-to-many relationship, but it can as well be many-to-one which makes life easier; it is not important to know that two variables were uploaded in the same model run. Also series_id and unit could be in this table. --Jouni 22:56, 22 February 2012 (EET) (type: truth; paradigms: science: comment) ←--2: . I have now altered the table structure to implement Jouni's idea. --Einari 12:31, 19 March 2012 (EET) (type: truth; paradigms: science: defence) |
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
MongoDB
Tables to store all locations and results
db.<objs.ident>(.<objs.subset>).dat
Column names:
sid(series.id), aid (act.id), <inds.ident>, <inds.ident>, ..., <inds.ident>, res (result)
Data:
<acts.series_id>, <acts.id>, <locs.id | data>, <locs.id | data>, ..., <locs.id | data>,
Index:
db.<objs.ident>(.<objs.subset>).dat.ensureIndex({_sid:1, _aid:1});
----#: . I guess this is about loc.id's, not about ind.id's. Loc.id's are either identifiers explained below or values of continuous indices. --Jouni 22:56, 22 February 2012 (EET) (type: truth; paradigms: science: comment)
⇤--#: . It was actually about ind.id, but the line described column names, not the data itself. To avoid future misleads I added data describing line as well. Object data can consist of location identifiers or e.g. real numbers, depending of the type of the column index. --Einari 13:33, 19 March 2012 (EET) (type: truth; paradigms: science: attack)
Tables to store real location values for entity type indices
db.<objs.ident>(.<objs.subset>).locs
Column names:
iid (index id), lid (location id), val
Index:
ensureIndex(array("iid" => 1, "lid" => 1));
←--#: . If the index is entity type, inds.id is the identifier of the index (whose details are in MySQL) and value is the name of the location. If the index is continuous, it does not need rows in this table, as everything necessary is described in the tables inds and db.<object.ident>.dat. --Jouni 22:56, 22 February 2012 (EET) (type: truth; paradigms: science: defence)
----#: . What is key? --Jouni 22:56, 22 February 2012 (EET) (type: truth; paradigms: science: comment)
←--#: . Key was something that I cannot really confirm. I altered the table description so that index id and location id together make the key (indexed in that order). This means that location values must always be accessed through index id. I believe that this is the fastest way in practice. --Einari 14:35, 19 March 2012 (EET) (type: truth; paradigms: science: defence)
----Toinen asia OB2:een liittyen: Indekseille pitäisi pystyä antamaan yksiköt, koska ne eivät ole itsestäänselviä (Erkki huomasi tämän). Onko tämän toteutus suunniteltu? Seka kannassa pitää olla paikka, ja lisäksi esim. t2b:ssa olisi syytä olla parametri yksiköiden antamista varten (tai pitäisikö nykyistä unit-parametrin käyttöä laajentaa?).: . --Jouni 16:25, 1 January 2013 (EET) {{{3}}} (type: truth; paradigms: science: comment)
JSON interface
The interface is based on a PHP script that outputs JSON, so data can be transferred using ordinary HTTP requests. One could even read through most of the meta-data in a web browser with a plugin that displays json better:
- firefox addon - https://addons.mozilla.org/en-US/firefox/addon/jsonview/
- list of objects http://cl1.opasnet.org/opasnet_base_2/index.php
- most recent upload of an object - http://cl1.opasnet.org/opasnet_base_2/index.php?ident=op_en5599&act=0
- data (key is unique to every download instance and is generated when the above link is opened) - http://cl1.opasnet.org/opasnet_base_2/index.php?key=8dYqLbtppz4HM7sYaAyj4t6S49jQNRWJ
Rationale
See also
Other databases with (ot without) R connectivity
- API Guide for http://www.avoindata.fi
- statfi R package for Statistics Finland [2]
- eurostat R package for Eurostat data [3]
- fmi R package for Finnish Meteorological Institute [4]
- rOpenGov project for useful data and computing for decisions
- sorvi R package for some basic functionalities such as social security number functions
Pages related to Opasnet Base |
Opasnet Base · Uploading to Opasnet Base · Data structures in Opasnet · Opasnet Base UI · Modelling in Opasnet · Special:Opasnet Base Import · Opasnet Base Connection for R (needs updating) · Converting KOPRA data into Opasnet Base · Poll · Working with sensitive data · Saved R objects |
Pages related to the 2008-2011 version of Opasnet Base |
Opasnet base connection for Analytica · Opasnet base structure · Related Analytica file (old version File:Transferring to result database.ANA) · Analytica Web Player · Removed pages and other links · Standard run · OpasnetBaseUtils |