Data structures in Opasnet
Moderator:Jouni (see all) |
This page is a stub. You may improve it into a full page. |
Upload data
|
Question
What data structures there are in Opasnet and how should they be used?
Answer
- This topic was previously in Opasnet base structure. Now there is only a brief description of the topic.
All data should be convertible into the following format (shown here in the wide format, i.e. each observation type as one column):
Observation | |||||
---|---|---|---|---|---|
Year | Sex | Age | Height | Weight | Description |
2009 | Male | 20 | 178 | 70 | An optional column for descriptive text about each row. |
2009 | Male | 30 | 174 | 79 | |
2010 | Male | 25 | 183 | 84 | |
2010 | Female | 22 | 168 | 65 |
where
Names of explanation columns, also known as indices. |
Explanation data, also known as locations. You can use these columns as search criteria. |
Observation index, typically called "Observation". Common name for all observation columns |
Names of observation columns. These are the parameters of interest. |
Observation data. These are the actual measurements. |
Other Object information. It slightly varies depending the format you use for uploading data.
Parameter | Example | How entered in Table2Base |
---|---|---|
ident | Op_en2693 | Automatically taken from the wiki page id. |
name | Testvariable | Automatically taken from the wiki page name. |
unit | cm,kg | unit="cm,kg" |
explanation cols | Year, Sex, Age, Observation | index="Year,Sex,Age,Observation" |
observation index | Observation | Given as the last index. |
observation # | 1, 2, 3 | If indices together don't uniquely identify the row, use an additional index column "obs" with row numbers. |
This is a table in long format where all observations have been put into a single column. There is an additional column "Observation" explaining which parameter is in which row. In this example, indices Year, Sex and Age uniquely define a row, and therefore there is no need for obs column. When a table is used in calculations, all rows where the index Observation has the value Description will be removed first.
Year | Sex | Age | Observation | Result |
---|---|---|---|---|
2009 | Male | 20 | Height | 178 |
2009 | Male | 30 | Height | 174 |
2010 | Male | 25 | Height | 183 |
2010 | Female | 22 | Height | 168 |
2009 | Male | 20 | Weight | 70 |
2009 | Male | 30 | Weight | 79 |
2010 | Male | 25 | Weight | 84 |
2010 | Female | 22 | Weight | 65 |
2009 | Male | 20 | Description | An optional column for descriptive text about each row. |
2009 | Male | 30 | Description | |
2010 | Male | 25 | Description | |
2010 | Female | 22 | Description |
(The tables above have been created with File:Opasnet base explanation.ods.)
Protecting non-public data
Go down this list in this order until you have reached a proper level of protection.
- Remove personal information (names, social security numbers etc.) from the data and use person-specific identifiers instead. Keep the key linking names and identifiers in a safe place.
- Remove other sensitive information (the name of an endangered species or a drug studied) and use an identifier instead.
- Make information coarser e.g. by giving relative values instead of absolute: do not give the exact operation date but give the time from a reference date, but tell the reference date only at the precision of one year. Similarly, instead of giving the exact location of an endangered species, give relative location to a reference point not exactly revealed.
Rationale
Protecting data
Opasnet is a workspace for open sharing and using of data. However, sometimes it is necessary to restrict the use of data. For example, the data may contain personal patient information or unpublished research data.
The main approach is always that as much data should be opened as possible.
It is not a question about whether a piece of data is openable or not but which parts are and how other parts should be handled. For example, "John Doe has lung cancer, which was operated using an experimental method with radical lymph node removal in Mass General Hospital on 6 Jan 2012" is a piece of data that clearly must not be published by the principal investigator. However, a simple change in the information content turns this sensitive patient information into neutral medical information: "A lung cancer was operated using an experimental method with radical lymph node removal in Mass General Hospital on 6 Jan 2012". In addition, the investigator may not want to tell which specific treatments they are testing, but he may want to release the data as much as possible to get comments about the study design and statistical data analysis from colleagues: "A lung cancer was operated using a method B in Mass General Hospital on 6 Jan 2012."
With these two changes to the data, it can be released in a machine-readable format in the Internet. It is still very useful for health economists and possibly to many other people as well.