Uploading to Opasnet Base: Difference between revisions

From Opasnet
Jump to navigation Jump to search
(instructions from file:Opasnet base connection.ANA moved here without editing (yet))
 
(Data uploaded using Opasnet Base Import)
 
(12 intermediate revisions by 3 users not shown)
Line 2: Line 2:
[[Category:Opasnet Base]]
[[Category:Opasnet Base]]
{{method|moderator=Jouni}}
{{method|moderator=Jouni}}
'''Uploading to [[Opasnet Base]]''' helps you understand what data could and should be updated to [[Opasnet Base]] and what the recommended data structures and formats are. For technical instructions how to use the current upload software, see [[Opasnet Base connection]]. For a general description about the database, see [[Opasnet Base]] and for technical details about the database, see [[Opasnet Base structure).
'''Uploading to [[Opasnet Base]]''' helps you understand what data could and should be updated to [[Opasnet Base]] and what the recommended data structures and formats are. For technical instructions how to use the current upload software, see [[Opasnet Base connection]]. For a general description about the database, see [[Opasnet Base]] and for technical details about the database, see [[Opasnet Base structure]]. For details about downloading data, see [[Opasnet Base UI]].


==Scope==
== Question ==


What data could and should be updated to [[Opasnet Base]] and what are the recommended data structures and formats?
What data could and should be updated to [[Opasnet Base]] and what are the recommended data structures and formats?


==Definition==
== Answer ==


8.7.2010 Jouni Tuomisto
There are several different tools to upload data to [[Opasnet Base]]. These are first listed and described here, and then more details are given either here or on a separate page.
If the variable is deterministic, Obs should be 0. This must be in all upload methods. They are corrected accordingly, see Indexify.
# '''If you have a small table that you want to show in a wiki page''': [[Table2Base]] is a functionality that uploads a table from an [[Opasnet]] page to [[Opasnet Base]]. <nowiki><t2b></nowiki> tag is used to make the table. By default, the table content replaces any previous content in [[Opasnet Base]] every time when the table is changed and the page saved.
# '''If you have your data in R''': [[Opasnet Base Connection for R]] is a group of [[R]] functions that can easily upload data from models run in the [[R]] environment.
# '''If you have a large data in a file''': [[Special:Opasnet_Base_Import|Opasnet Base Import]] is a tool for uploading data in CSV or XLS files to the Opasnet Base. Registered Opasnet users can use this tool by following the "Upload data" link in the top right corner box on any variable (or other data storing) page. This is explained on more detail on this page.
# '''If you want to collect data from several Opasnet users''': Interface for R code can also be extended to upload data from a questionnaire form on an Opasnet page. This way, the data itself does not stay on the page and can therefore only be seen from the base. See [[R-tools]] for instructions to build an interface.


; Findid: This function gets an id from a table.
=== Uploading data with Opasnet Base Import ===
in: the property for which the id is needed. In MUST be unique in cond and it must contain index i.
table: the table from where the id is brought. The table MUST have .j as the column index, .i as the row index, and a column named 'id'.
cond: the name of the field that is compared with in. Cond must be text.


;Textify: Changes a number to a text value with up to 15 significant numbers. This bypasses the number formatting problem that tends to convert e.g. 93341 to '93.34K'. If the input is null, the result is ''.
*At the first data was modified from text file to excel file. At the same time the first ID column was removed (see figures 1 and 2),
[[File:Example text data.JPG|thumb|500px|center| Fig 1. Example raw data in text format.]]
[[File:Excel data.JPG|thumb|500px|center|Fig 2. Text data is transformed to excel format. First id column has been removed.]]


This module saves original data or model results (a study or a variable, respectively) into the Opasnet Base. You need your Opasnet username and password to do that. You must fill in all tables and fields below before the process can be completed. Fill in the data below from top to bottom.
*Secondly we went to the opasnet page where the table should be uploaded and take "Upload data". (The page should be variable or method page!!)
If an object with the same Ident already exists in the Opasnet Base, the information will be added to that object.
*In the Opasnet Base Import page all the inputs depends on the data.  
Before you start, make sure that you have created an object page in the Opasnet wiki for each object (study or variable) you want to upload.
**Define Unit (If you have several units write e.g. "misc"
**If data has only one column with results, take mark into "First row contains the indices"
**If data has many columns with results, you have to define Indices and Locations (see Fig 3) Indices are all indices of data (last one is indice for Locations) in the table and Locations are name of result columns. All names must be separated with comma.
**If data is in csv format define CSV inputs.
**Browse data from files
**Upload
[[File:Opasnetbase import example.JPG|thumb|500px|center|Fig 3. Inputs of example data.]]


Data structure:
*Confirm uploading (Fig 4.)
* Data must be uploaded in the format of a two-dimensional table. The table has rows, one observation in each row, and columns (fields).  
[[File:Opbase import.JPG|thumb|500px|center|Fig 4. Preview of data rows and confirming of uploading.]]
* There are two kinds of columns. A) Index columns (also called independent variables in statistics) contain determinants of the actual data, such as sex of the observed individuals, or the observation year. B) Parameter columns (also called dependent variables) contain the actual data about the observations, given the index information.
* The first row must contain the names of the columns, i.e. the indices and parameters. These names are used when creating indices in the Opasnet Base.


Object info:
*Browse uploaded data with Opasnet Base UI (See Fig 5.).
* You must give your Opasnet username and password to upload data. The username will be stored together with the upload information.
[[File:Data uploaded.JPG|thumb|500px|center|Fig 5. Data uploaded successfully.]]
*Object info contains the most important metadata about your data.
- Data source must be 1 when using AWP.
- Analytica identifier is ignored when using AWP.
- Ident is the page identifier in Opasnet. If your study or variable does not already have a page, you must create one. The identifier is found in the metadata box in the top right corner of the Opasnet page.
- Number of indices is the number of columns that contain explanatory information (see below).
- Parameter name is a common name for all data columns. If omitted, 'Parameter' is used. See below for more details.
- If "Probabilistic?" is 1, then each row of the data table is considered a random draw from a data pool. Note that it is assumed that the index values are assumed the same in all rows, and only the index values of the first row are stored.
- Append to upload: Typically, each data upload event is given a separate identifier. If you want to continue an existing upload of the same object, you can give the number of that upload, and the new data will be appended.


Observations:
*In the data page "Show the result" shows the data from OpasnetBase (Fig 6)
* The data are copy-pasted into the field 'Observations'. The source of the data can be any spreadsheet or text processor, as long as each column is separated by a tab, and each row by a line break. Note that the pasted data should be between 'quotation marks'.
[[File:Data page.JPG|thumb|500px|center|Fig 6. Data page is ready.]]


Data info:
== Rationale ==
Fill in the additional information about the data. These are asked for the object, and also for all the indices and the parameter. Note that is an entry with the identical Ident already exists in the Opasnet Base, this information will NOT be uploaded but the existing information will be used instead. All information should be between 'quotation marks' so that they are not mistakenly interpreted as Analytica node identifiers.
- Name: a description that may be longer than an identifier. This is typically identical to the respective page in Opasnet.
- Unit: unit of measurement.


Uploading:
This section describes what the inputs are and the different ways they can be formatted for a successful upload.
* There are two ways of uploading data. A) 'Upload data' is a public format, and all details are openly available. B) 'Upload non-public data' stores the actual data (the values in the parameter columns) into a database that requires a password for reading. However, all other information (including upload metadata and the data in  the index fields) are openly available.


Follow these instructions if you have Analytica Enterprise and have an ODBC connection to the Opasnet Base. Read also the simplified help; not everything is repeated here.
What are the topics about which data can be uploaded into the Base? Well, basically any topic that provides useful information for any decision-making situation that has societal relevance. This sounds like a very wide definition, and it is. The data may be about which car models are environmentally friendly. It may be about pollutants in food. It may be new ideas about a societally just value-added tax.  


Platform:
What are, then, the data structures that are allowed? Although not all structures are allowed, almost any data can be easily transformed into the structure that is used in [[Opasnet Base]]. The data must be formatted as a two-dimensional table with one observation at each row. Cells of the table may contain either text or numerical values. Columns contain either explanations (or ''independent variables'' in statistics) or the actual observations (or ''dependent variables'' in statistics). This difference is important. Explanations are things that are fixed before the actual observation, while observations are those that are actually measured or observed. See the following table as example.
You must choose THL computer if you are not using the AWP web interface.


Writerpsswd:
{| {{prettytable}}
You must know the writer password for the Opasnet Base if you are not using the AWP web interface.
|+An example of a data table.
! City
! Year
! Sex
! Body mass index BMI (kg/m<sup>2</sup>)
! Blood cholesterol (mM)
|----
|| London|| 2010|| Male|| 20|| 3.31
|----
|| London|| 2010|| Male|| 25|| 6.83
|----
|| London|| 2010|| Female|| 20|| 5.55
|----
|| London|| 2010|| Female|| 30|| 5.42
|----
|| London|| 2010|| Female|| 25|| 4.19
|----
|| New York|| 2010|| Male|| 22|| 3.33
|----
|| New York|| 2010|| Male|| 26|| 5.84
|----
|| New York|| 2010|| Female|| 28|| 5.67
|----
|| New York|| 2010|| Female|| 26|| 4.52
|----
|| New York|| 2010|| Female|| 24|| 5.67
|----
|}


Object info:
The table seems unambiguous at the first glance, but it is impossible to interpret it correctly without knowing, which columns are explanations and which are observations. This may be a study performed in London and New York in 2010, where random people were asked for a blood test. In this case, city and year are explanations, and other columns are observations. However, the data may as well be summary statistics from a larger study, where the studied individuals were grouped in these cities based on their sex and body mass index (BMI), and the mean cholesterol in each group is the only observation column. (The data implies that London used BMI groups 5 kg/m<sup>2</sup> wide, while New York used BMI groups kg/m<sup>2</sup> wide, but you cannot know based on the data only.)
- Data source:
 
1 means that you are copy-pasting data to the 'Observations' field.  
However, if you know which columns are explanations and which are observations, you can actually deduce many important aspects of the design of the study from which the data came. Of course many things must be explained elsewhere like how people were selected and what the base population studied was.
2 means that you have a 2D table in an Analytica node. The node must have column index .j (note: it is a local index!) and row index .i. The names of the columns must be in the index .j, and the first row must contain data.  
 
3 means that you have a typical Analytica node with n indices; one of the indices may be Run. The node is transformed into a 2D table using MDArrayToTable.
Another important feature of the data is, whether it is deterministic or probabilistic. With deterministic data, it is assumed that each row is an independent piece of information. With probabilistic data, it is assumed that there are random draws from a pool of potential observations (like an urn full of balls with different colours, each having the same probability of being picked). However, all observations are not picked from the same pool, but from several pools uniquely defined by the explanation columns.
- Analytica identifier is the identifier of the node to be used. The name must be given between 'quotation marks', i.e. as text.
 
- Ident: like in the simplified upload.
Let's assume that we look at the table above and learn that it is a probabilistic data with two explanation columns. We then know that people were randomly picked from either London or New York in 2010 (explanation columns are always the first ones on the left side). In London, two happened to be males and three females, with measured BMIs and cholesterol levels. Thus, there are five observations from the pool defined by "London 2010", and also five observations from "New York 2010." These are numbered 1..5. If the data is deterministic, the observation number is 0 for all rows.
- Number of indices: like in the simplified upload if data source 2 is used; for 3, the number of indices comes from the node, and this entry is ignored.
 
- Parameter name:  like in the simplified upload if data source 2 is used; for 3, the parameter is implicit, and this entry is ignored.
== See also ==
- Probabilistic?: like in the simplified upload if data source 2 is used; for 3, if this entry is 1, the sample mode is used and the full distribution is saved, if the entry is not 1, the mid mode is used.
 
- Append to upload: like in the simplified upload.
{{Opasnet Base}}<!-- __OBI_TS:1430934755 -->

Latest revision as of 17:52, 6 May 2015


Uploading to Opasnet Base helps you understand what data could and should be updated to Opasnet Base and what the recommended data structures and formats are. For technical instructions how to use the current upload software, see Opasnet Base connection. For a general description about the database, see Opasnet Base and for technical details about the database, see Opasnet Base structure. For details about downloading data, see Opasnet Base UI.

Question

What data could and should be updated to Opasnet Base and what are the recommended data structures and formats?

Answer

There are several different tools to upload data to Opasnet Base. These are first listed and described here, and then more details are given either here or on a separate page.

  1. If you have a small table that you want to show in a wiki page: Table2Base is a functionality that uploads a table from an Opasnet page to Opasnet Base. <t2b> tag is used to make the table. By default, the table content replaces any previous content in Opasnet Base every time when the table is changed and the page saved.
  2. If you have your data in R: Opasnet Base Connection for R is a group of R functions that can easily upload data from models run in the R environment.
  3. If you have a large data in a file: Opasnet Base Import is a tool for uploading data in CSV or XLS files to the Opasnet Base. Registered Opasnet users can use this tool by following the "Upload data" link in the top right corner box on any variable (or other data storing) page. This is explained on more detail on this page.
  4. If you want to collect data from several Opasnet users: Interface for R code can also be extended to upload data from a questionnaire form on an Opasnet page. This way, the data itself does not stay on the page and can therefore only be seen from the base. See R-tools for instructions to build an interface.

Uploading data with Opasnet Base Import

  • At the first data was modified from text file to excel file. At the same time the first ID column was removed (see figures 1 and 2),
Fig 1. Example raw data in text format.
Fig 2. Text data is transformed to excel format. First id column has been removed.
  • Secondly we went to the opasnet page where the table should be uploaded and take "Upload data". (The page should be variable or method page!!)
  • In the Opasnet Base Import page all the inputs depends on the data.
    • Define Unit (If you have several units write e.g. "misc"
    • If data has only one column with results, take mark into "First row contains the indices"
    • If data has many columns with results, you have to define Indices and Locations (see Fig 3) Indices are all indices of data (last one is indice for Locations) in the table and Locations are name of result columns. All names must be separated with comma.
    • If data is in csv format define CSV inputs.
    • Browse data from files
    • Upload
Fig 3. Inputs of example data.
  • Confirm uploading (Fig 4.)
Fig 4. Preview of data rows and confirming of uploading.
  • Browse uploaded data with Opasnet Base UI (See Fig 5.).
Fig 5. Data uploaded successfully.
  • In the data page "Show the result" shows the data from OpasnetBase (Fig 6)
Fig 6. Data page is ready.

Rationale

This section describes what the inputs are and the different ways they can be formatted for a successful upload.

What are the topics about which data can be uploaded into the Base? Well, basically any topic that provides useful information for any decision-making situation that has societal relevance. This sounds like a very wide definition, and it is. The data may be about which car models are environmentally friendly. It may be about pollutants in food. It may be new ideas about a societally just value-added tax.

What are, then, the data structures that are allowed? Although not all structures are allowed, almost any data can be easily transformed into the structure that is used in Opasnet Base. The data must be formatted as a two-dimensional table with one observation at each row. Cells of the table may contain either text or numerical values. Columns contain either explanations (or independent variables in statistics) or the actual observations (or dependent variables in statistics). This difference is important. Explanations are things that are fixed before the actual observation, while observations are those that are actually measured or observed. See the following table as example.

An example of a data table.
City Year Sex Body mass index BMI (kg/m2) Blood cholesterol (mM)
London 2010 Male 20 3.31
London 2010 Male 25 6.83
London 2010 Female 20 5.55
London 2010 Female 30 5.42
London 2010 Female 25 4.19
New York 2010 Male 22 3.33
New York 2010 Male 26 5.84
New York 2010 Female 28 5.67
New York 2010 Female 26 4.52
New York 2010 Female 24 5.67

The table seems unambiguous at the first glance, but it is impossible to interpret it correctly without knowing, which columns are explanations and which are observations. This may be a study performed in London and New York in 2010, where random people were asked for a blood test. In this case, city and year are explanations, and other columns are observations. However, the data may as well be summary statistics from a larger study, where the studied individuals were grouped in these cities based on their sex and body mass index (BMI), and the mean cholesterol in each group is the only observation column. (The data implies that London used BMI groups 5 kg/m2 wide, while New York used BMI groups kg/m2 wide, but you cannot know based on the data only.)

However, if you know which columns are explanations and which are observations, you can actually deduce many important aspects of the design of the study from which the data came. Of course many things must be explained elsewhere like how people were selected and what the base population studied was.

Another important feature of the data is, whether it is deterministic or probabilistic. With deterministic data, it is assumed that each row is an independent piece of information. With probabilistic data, it is assumed that there are random draws from a pool of potential observations (like an urn full of balls with different colours, each having the same probability of being picked). However, all observations are not picked from the same pool, but from several pools uniquely defined by the explanation columns.

Let's assume that we look at the table above and learn that it is a probabilistic data with two explanation columns. We then know that people were randomly picked from either London or New York in 2010 (explanation columns are always the first ones on the left side). In London, two happened to be males and three females, with measured BMIs and cholesterol levels. Thus, there are five observations from the pool defined by "London 2010", and also five observations from "New York 2010." These are numbered 1..5. If the data is deterministic, the observation number is 0 for all rows.

See also

Pages related to Opasnet Base

Opasnet Base · Uploading to Opasnet Base · Data structures in Opasnet · Opasnet Base UI · Modelling in Opasnet · Special:Opasnet Base Import · Opasnet Base Connection for R (needs updating) · Converting KOPRA data into Opasnet Base · Poll · Working with sensitive data · Saved R objects

Pages related to the 2008-2011 version of Opasnet Base

Opasnet base connection for Analytica · Opasnet base structure · Related Analytica file (old version File:Transferring to result database.ANA) · Analytica Web Player · Removed pages and other links · Standard run · OpasnetBaseUtils