Data structures in Opasnet: Difference between revisions

Latest revision as of 20:28, 10 April 2015

[show] This page is a knowledge crystal of subtype method. The page identifier is Op_en5459
Moderator:Jouni (see all)
This page is a stub. You may improve it into a full page.
Upload data {{#opasnet_base_link:Op_en5459}}

Question

What data structures there are in Opasnet and how should they be used?

Answer

This topic was previously in Opasnet base structure. Now there is only a brief description of the topic.

All data should be convertible into the following format (shown here in the wide format, i.e. each observation type as one column):

			Observation
Year	Sex	Age	Height	Weight	Description
2009	Male	20	178	70	An optional column for descriptive text about each row.
2009	Male	30	174	79
2010	Male	25	183	84
2010	Female	22	168	65

where

Names of explanation columns, also known as indices.

Explanation data, also known as locations. You can use these columns as search criteria.

Observation index, typically called "Observation". Common name for all observation columns

Names of observation columns. These are the parameters of interest.

Observation data. These are the actual measurements.

Other Object information. It slightly varies depending the format you use for uploading data.

**Info table**
Parameter	Example	How entered in Table2Base
ident	Op_en2693	Automatically taken from the wiki page id.
name	Testvariable	Automatically taken from the wiki page name.
unit	cm,kg	unit="cm,kg"
explanation cols	Year, Sex, Age, Observation	index="Year,Sex,Age,Observation"
observation index	Observation	Given as the last index.
observation #	1, 2, 3	If indices together don't uniquely identify the row, use an additional index column "obs" with row numbers.

This is a table in long format where all observations have been put into a single column. There is an additional column "Observation" explaining which parameter is in which row. In this example, indices Year, Sex and Age uniquely define a row, and therefore there is no need for obs column. When a table is used in calculations, all rows where the index Observation has the value Description will be removed first.

Year	Sex	Age	Observation	Result
2009	Male	20	Height	178
2009	Male	30	Height	174
2010	Male	25	Height	183
2010	Female	22	Height	168
2009	Male	20	Weight	70
2009	Male	30	Weight	79
2010	Male	25	Weight	84
2010	Female	22	Weight	65
2009	Male	20	Description	An optional column for descriptive text about each row.
2009	Male	30	Description
2010	Male	25	Description
2010	Female	22	Description

(The tables above have been created with File:Opasnet base explanation.ods.)

Protecting non-public data

Go down this list in this order until you have reached a proper level of protection.

Remove personal information (names, social security numbers etc.) from the data and use person-specific identifiers instead. Keep the key linking names and identifiers in a safe place.
Remove other sensitive information (the name of an endangered species or a drug studied) and use an identifier instead.
Make information coarser e.g. by giving relative values instead of absolute: do not give the exact operation date but give the time from a reference date, but tell the reference date only at the precision of one year. Similarly, instead of giving the exact location of an endangered species, give relative location to a reference point not exactly revealed.

Rationale

Protecting data

Opasnet is a workspace for open sharing and using of data. However, sometimes it is necessary to restrict the use of data. For example, the data may contain personal patient information or unpublished research data.

The main approach is always that as much data should be opened as possible.

It is not a question about whether a piece of data is openable or not but which parts are and how other parts should be handled. For example, "John Doe has lung cancer, which was operated using an experimental method with radical lymph node removal in Mass General Hospital on 6 Jan 2012" is a piece of data that clearly must not be published by the principal investigator. However, a simple change in the information content turns this sensitive patient information into neutral medical information: "A lung cancer was operated using an experimental method with radical lymph node removal in Mass General Hospital on 6 Jan 2012". In addition, the investigator may not want to tell which specific treatments they are testing, but he may want to release the data as much as possible to get comments about the study design and statistical data analysis from colleagues: "A lung cancer was operated using a method B in Mass General Hospital on 6 Jan 2012."

With these two changes to the data, it can be released in a machine-readable format in the Internet. It is still very useful for health economists and possibly to many other people as well.

@@ Line 112: / Line 112: @@
 With these two changes to the data, it can be released in a machine-readable format in the Internet. It is still very useful for health economists and possibly to many other people as well.
+== See also ==
+{{Opasnet Base}}

Pages related to Opasnet Base	Opasnet Base · Uploading to Opasnet Base · Data structures in Opasnet · Opasnet Base UI · Modelling in Opasnet · Special:Opasnet Base Import · Opasnet Base Connection for R (needs updating) · Converting KOPRA data into Opasnet Base · Poll · Working with sensitive data · Saved R objects
Pages related to the 2008-2011 version of Opasnet Base	Opasnet base connection for Analytica · Opasnet base structure · Related Analytica file (old version File:Transferring to result database.ANA) · Analytica Web Player · Removed pages and other links · Standard run · OpasnetBaseUtils