WHO mortality data: Difference between revisions

From Opasnet
Jump to navigation Jump to search
(→‎Definition: links added)
(index descriptions added)
Line 9: Line 9:
==Definition==
==Definition==


Based on WHO statistics. NOTE! The actual work done by WHO should also be documented here. However, the current version is for illustration of a [[study]] in [[Opasnet]], and therefore we only describe the work done to manipulate the existing info from the WHO website into Opasnet.
The mortality data is actually quite complex. One could assume that is it country*ICD code*age group*year, but
* different ICD code groupings have been used in different countries
* different age group categories have been used in different countries
* different observation years.


The first, pilot run of the study was performed by Jouni Tuomisto and Marko Tainio based on existing WHO data retrieved in 2003 from Finland (1996 statistics). [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R20_Mallit/WHO%20mortality3.ANA Original model].
Therefore, this is not a nice 4D cube. Instead, there are lots of merged and empty cells in the cube. There should be a plan for how this is organised. the current idea:
* Analyse the data for each country to identify the age, icd, and year locations used.
* Create indices for all different variations.
* On the database level,
** describe which locations in which indices are equal.
** describe which locations are mutually exhaustive subsets of another location.
 
The data contains the following locations:
 
Country (104 countries available): 1125
1300
1360
1365
1400
1430
2005
2010
2020
2025
2030
2040
2045
2050
2070
2085
2090
2110
2120
2130
2140
2150
2160
2170
2180
2190
2210
2230
2240
2260
2270
2300
2310
2320
2340
2350
2360
2370
2380
2385
2400
2410
2420
2430
2440
2445
2450
2455
2460
2470
3020
3030
3080
3090
3150
3160
3190
3255
3320
3325
3380
4010
4012
4018
4038
4045
4050
4055
4070
4080
4084
4085
4150
4160
4180
4182
4184
4186
4188
4190
4200
4210
4220
4230
4240
4260
4270
4272
4273
4274
4276
4280
4290
4300
4303
4308
4310
4320
4330
4335
4350
5020
5105
5150
 
 
Admin1, Subdiv: None
 
Year: 1996..2007
 
List: 101
103
104
10M
 
ICD 10 codes: 10584 different codes. Some of these are combinations of several ICD codes, listed as numbers.
 
Sex: 1,2,9 (Male, Female, Unknown)
 
Format (for age groupings): 0,1,2,4,7
 
IM format (for infant mortality): 1,2,8
 
 
===Structuring the data in the database===
 
For one country, it is straightforward to use indices that have only rows that contain data for that particular country. This will lead to 1 sex index, several year indices, 3 infant mortality indices, 5 non-infant mortality indices, and 4 different ICD indices. However, the whole study will be extremely complex with >= 14 dimensions and a huge number of empty cells.
 
What we need is a system that is able to aggregate and disaggregate data from one index to another. Aggregation is straightforward, but disaggregation requires data from other countries; this is used if no disaggregation data is not available for the particular country. Should we use Dirichlet for disaggregation?
 
Can the aggregation and disaggregation be done at the Base level? Or maybe we need an Analytica model that is uploaded to AWP, and that takes care of the (dis)aggregation. This sounds better.


Loki perustettu 23.1.2003.
*Tiedosto: TainioMDTable kokeilu.ANA
*Mallissa käytössä oleva WHO:n data on kopioitu WHO:n sivuilta 23.1.2003 WHO_Cause of Death.xls tiedostoon. Sivut löytyvät osoiteesta: [http://www3.who.int/whosis/] (Sivulta linkki: Cause of death statistics -> Table 1 -> Finland - 1996)
*Tiedostoon on lisätty Variable sarake.
*Luotu WHO data funktio, johon WHO:n data on kopioitu WHO_Cause of Death.xls tiedostosta. Luotu Var ja Item indeksit indeksoimaan taulukkoa.
*Luotu ICD-10 funktio. Luotu  Diagnos indeksi.
*Luotu Age groups funktio erottamaan kuolleisuus muista funktion WHO data taulukon otsikoista. Luotu Age groups indeksi.
*Luotu funktio 2D Age groups, jossa kuolleisuus ikäryhmittäin on erotettu muista WHO data funktion otsakkeista.
*Luotu Mortality in Finland funktio, jossa kaikki taulukon tiedot muutettu moniulotteisiksi. Luotu Unit ja Sex indeksit.
*Tallennettu malli nimellä WHO mortality.ANA. Siirretty TainioMDTable kokeilu.ANA tiedosto Vanhat alihakemistoon.




Line 38: Line 170:


* [[:Image:WHO mortality data.ANA|WHO mortality data.ANA]]
* [[:Image:WHO mortality data.ANA|WHO mortality data.ANA]]
* [http://en.opasnet.org/en-opwiki/index.php?title=Who_Mortality_Data&oldid=7746 Year 2003 version of the model].

Revision as of 06:01, 5 March 2009


WHO mortality data is a study by WHO to collect information about mortality rates in different countries. See the model file.

Scope

What are the mortality rates per country, sex, age group, and diagnosis?

Definition

The mortality data is actually quite complex. One could assume that is it country*ICD code*age group*year, but

  • different ICD code groupings have been used in different countries
  • different age group categories have been used in different countries
  • different observation years.

Therefore, this is not a nice 4D cube. Instead, there are lots of merged and empty cells in the cube. There should be a plan for how this is organised. the current idea:

  • Analyse the data for each country to identify the age, icd, and year locations used.
  • Create indices for all different variations.
  • On the database level,
    • describe which locations in which indices are equal.
    • describe which locations are mutually exhaustive subsets of another location.

The data contains the following locations:

Country (104 countries available): 1125 1300 1360 1365 1400 1430 2005 2010 2020 2025 2030 2040 2045 2050 2070 2085 2090 2110 2120 2130 2140 2150 2160 2170 2180 2190 2210 2230 2240 2260 2270 2300 2310 2320 2340 2350 2360 2370 2380 2385 2400 2410 2420 2430 2440 2445 2450 2455 2460 2470 3020 3030 3080 3090 3150 3160 3190 3255 3320 3325 3380 4010 4012 4018 4038 4045 4050 4055 4070 4080 4084 4085 4150 4160 4180 4182 4184 4186 4188 4190 4200 4210 4220 4230 4240 4260 4270 4272 4273 4274 4276 4280 4290 4300 4303 4308 4310 4320 4330 4335 4350 5020 5105 5150


Admin1, Subdiv: None

Year: 1996..2007

List: 101 103 104 10M

ICD 10 codes: 10584 different codes. Some of these are combinations of several ICD codes, listed as numbers.

Sex: 1,2,9 (Male, Female, Unknown)

Format (for age groupings): 0,1,2,4,7

IM format (for infant mortality): 1,2,8


Structuring the data in the database

For one country, it is straightforward to use indices that have only rows that contain data for that particular country. This will lead to 1 sex index, several year indices, 3 infant mortality indices, 5 non-infant mortality indices, and 4 different ICD indices. However, the whole study will be extremely complex with >= 14 dimensions and a huge number of empty cells.

What we need is a system that is able to aggregate and disaggregate data from one index to another. Aggregation is straightforward, but disaggregation requires data from other countries; this is used if no disaggregation data is not available for the particular country. Should we use Dirichlet for disaggregation?

Can the aggregation and disaggregation be done at the Base level? Or maybe we need an Analytica model that is uploaded to AWP, and that takes care of the (dis)aggregation. This sounds better.


Result

{{#opasnet_base_link:Op_en2778}}


See also