KTL Sarcoma study: Difference between revisions

From Opasnet
Jump to navigation Jump to search
(→‎Rationale: assumptions table added)
Line 62: Line 62:
==Rationale==
==Rationale==


===Study population===
=== Methods ===
====Study population====


The majority of sarcoma patients in southern Finland are treated
The majority of sarcoma patients in southern Finland are treated
Line 128: Line 129:
age and area could be found.
age and area could be found.


===Exposure assessment===
====Exposure assessment====


From the matched 337 patients, concentrations of the 17 toxic
From the matched 337 patients, concentrations of the 17 toxic
Line 209: Line 210:
The laboratory has successfully participated in several international quality control studies for the analysis of PCDD/Fs, and PCBs. Matrices in these studies have included cow milk, human milk and human serum. (Yrjänheikki, 1991, Rymen, 1994, WHO, 1996 and Lindström et al., 2000). The laboratory of chemistry in the National Public Health Institute is an accredited testing laboratory (No T077) in Finland (EN ISO/IEC 17025). The scope of accreditation includes PCDD/Fs, non-ortho PCBs, and other PCBs from human tissue samples.
The laboratory has successfully participated in several international quality control studies for the analysis of PCDD/Fs, and PCBs. Matrices in these studies have included cow milk, human milk and human serum. (Yrjänheikki, 1991, Rymen, 1994, WHO, 1996 and Lindström et al., 2000). The laboratory of chemistry in the National Public Health Institute is an accredited testing laboratory (No T077) in Finland (EN ISO/IEC 17025). The scope of accreditation includes PCDD/Fs, non-ortho PCBs, and other PCBs from human tissue samples.


===Statistical analyses===
====Statistical analyses====


Conditional logistic regression analysis was performed with
Conditional logistic regression analysis was performed with
Line 246: Line 247:
preservatives, strong detergents, heavy metals, other chemicals.
preservatives, strong detergents, heavy metals, other chemicals.


===Simulated data===
===Data===


; This code was used to create a csv file that contains a simulated data from this study. When compared with the original data, the simulated data
* [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/Analyysi020712.xls The original data], [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R80_Sarkooma2/Analyysi020712b.csv in csv file]
* has the same number of observations,
* [[:Image:KTL Sarcoma study statistical analyses.txt|KTL Sarcoma study statistical analyses.txt]]
* has the same range of values in each variable,
* [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/TuomistoAnalyysiloki20020712.txt Log file about the statistical analyses: Part 1], [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/TuomistoAnalyysiloki20020712_osa2.txt Part2]
* has approximately the same correlation structure between all variables.
* [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/TuomistoAnalyysi020712Tulokset.xls Compilation of the results of statistical analyses]
 
The code below runs the main fish consumption and PCDD/F variables, but because this is personal-level data, you need a password to run it. However, you can see ready-made results [http://en.opasnet.org/en-opwiki/index.php/Special:R-tools?id=0CrH05oF1BEBRYcW].
 
For variable descriptions, see {{disclink|Variable description for reduced data}}
 
<rcode graphics="1" variables="name:password|description:Password|type:password">


<rcode>
library(OpasnetUtils)
library(OpasnetUtils)
library(MASS)
library(mc2d)
library(reshape2)
library(ggplot2)
objects.get("isqT7nvhd0ViUR7d")
objects.get("isqT7nvhd0ViUR7d")


Line 266: Line 267:
data <- data[2:nrow(data), 2:ncol(data)]
data <- data[2:nrow(data), 2:ncol(data)]


data2 <- data
for(i in 1:ncol(data)) {
fun <- c(rep("normal", 5), rep("poisson", 12), rep("lognormal", 19))
data[[i]] <- as.numeric(as.character(data[[i]]))
}


params <- list()
colnames(data)[colnames(data) == "aluenro"] <- "BMI" # Poistetaan aluenro-sarake ja korvataan se BMI:llä.
data$BMI <- data$Paino / (data$Pituus / 100)^2


for(i in 1:ncol(data2)) {
# oprint(head(data))
data2[[i]] <- as.numeric(as.character(data2[[i]]))
if(i > 17) data2[[i]] <- ifelse(data2[[i]] == 0, 0.01, data2[[i]])
params[i] <- fitdistr(data2[[i]][!is.na(data2[[i]])], fun[i])
}


simu <- data.frame(temp = rep(NA, 968))
cat("Data from P:\\huippuyksikko\\Tutkimus\\R16_sarkooma\\Data\\Panulle20031216\\Analyysi020712_typistetty.xls.", nrow(data), "observations.\n")


for(i in 1:5) {
oprint(cor(x = data, use = "pairwise.complete.obs", method = "pearson"))
simu[[i]] <- rnorm(968, params[[i]][1], params[[i]][2])
}
for(i in 6:17) {
simu[[i]] <- rpois(968, params[[i]])
}
for(i in 18:36) {
simu[[i]] <- rlnorm(968, params[[i]][1], params[[i]][2])
}
simu[[3]] <- rbern(968, 0.5) + 1


colnames(simu) <- colnames(data)
# Basic Scatterplot Matrix
pairs(~ika+BMI+Kalaa+Silakkaa+PF23478+WHOTEQ, data = data,
  main="Simple Scatterplot Matrix")


korre <- cor(x = data2, use = "pairwise.complete.obs", method = "spearman")
</rcode>


simu <- as.data.frame(cornode(as.matrix(simu), target = korre))
====Variable information====


korre2 <- cor(x = simu, use = "pairwise.complete.obs", method = "spearman")
The variable information was originally documented in [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/TuomistoAnalyysiloki20020712.txt Log file about the statistical analyses: Part 1], but unfortunately mostly in Finnish.


qplot(melt(korre)$value, melt(korre2)$value)
{{hidden|
 
1 =
for(i in 1:ncol(simu)) {
Tähän on syytä kirjata myös muuttujaluettelo, koska sitä ei ole missään muualla kunnolla tehty.
simu[[i]] <- ifelse(
      LEIKKAUS    leikkauspvm, SASin oma formaatti (päivää jostakin kiintopisteestä?)
simu[[i]] > max(data[[i]], na.rm = TRUE) |
      IKA        ikä vuosina leikkauspäivänä
simu[[i]] < min(data[[i]], na.rm = TRUE),
      ALUENRO    aluenro tutkimusalue (potilaan kotiosoitteen postinumeron perusteella)
NA, simu[[i]]
1 -
)
2 Espoo
}
3 Helsinki
 
4 Hyvinkää
for(i in 1:ncol(data2)) {print(paste(
5 Hämeenlinna
min(data2[[i]], na.rm = TRUE),
6 Joensuu
max(data2[[i]], na.rm = TRUE),
7 Jyväskylä
min(simu[[i]], na.rm = TRUE),
8 Kotka
max(simu[[i]], na.rm = TRUE)
9 Kuopio
))}
10 Lahti
 
11 Lappeenranta
</rcode>
12 Pori
 
13 Seinäjoki
===Other data===
14 Tampere
 
15 Turku
* [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/Analyysi020712.xls The original data], [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R80_Sarkooma2/Analyysi020712b.csv in csv file]
16 Vaasa
* [[:Image:KTL Sarcoma study statistical analyses.txt|KTL Sarcoma study statistical analyses.txt]]
      PNRO        postinumero K3 (K#=kyselylomakkeen kysymys nro #)
* [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/TuomistoAnalyysiloki20020712.txt Log file about the statistical analyses: Part 1], [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/TuomistoAnalyysiloki20020712_osa2.txt Part2]
      STRATUM    stratum eli tapauksen numero ilman S-etuliitettä
* [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/TuomistoAnalyysi020712Tulokset.xls Compilation of the results of statistical analyses]
      STRATASE    strataset eli tässä analyysissä 9
 
      DGLUOKKA    diagnoosiluokka tapauksilla
The code below runs the main fish consumption and PCDD/F variables, but because this is personal-level data, you need a password to run it. However, you can see ready-made results [http://en.opasnet.org/en-opwiki/index.php/Special:R-tools?id=0CrH05oF1BEBRYcW].
DG_id DGnimi
 
1 MFH
For variable descriptions, see {{disclink|Variable description for reduced data}}
2 Liposarcoma
 
3 Leiomyosarcoma
<rcode graphics="1" variables="name:password|description:Password|type:password">
4 Angiosarcoma
 
5 Chondrosarcoma
library(OpasnetUtils)
6 Sarcoma synoviale
objects.get("isqT7nvhd0ViUR7d")
7 Sarcoma Ewing
 
8 Dermatofibrosarcoma
data <- objects.decode(etable, password)
9 Sarcoma alia
colnames(data) <- t(data[1, ])
10 Sarcoma NUD
data <- data[2:nrow(data), 2:ncol(data)]
11 Ei tietoa
 
12 Osteosarcoma extrasceletale
for(i in 1:ncol(data)) {
21 Lipoma
data[[i]] <- as.numeric(as.character(data[[i]]))
22 Tumor Desmoides
}
23 Myxoma
 
24 Muu benigni tuumori
colnames(data)[colnames(data) == "aluenro"] <- "BMI" # Poistetaan aluenro-sarake ja korvataan se BMI:llä.
25 Melanoma
data$BMI <- data$Paino / (data$Pituus / 100)^2
26 Muu kuin tuumori
 
27 Tuplanäyte
# oprint(head(data))
      SP          sukupuoli 1: mies, 2: nainen K6
 
      KOULUV      kouluvuodet K8
cat("Data from P:\\huippuyksikko\\Tutkimus\\R16_sarkooma\\Data\\Panulle20031216\\Analyysi020712_typistetty.xls.", nrow(data), "observations.\n")
      PITUUS      pituus, cm K15
 
      PAINO      paino nyt, kg K16
oprint(cor(x = data, use = "pairwise.complete.obs", method = "pearson"))
      KALAA      kalansyönti K21
 
1 Harvemmin kuin kerran kuukaudessa tai en lainkaan
# Basic Scatterplot Matrix
2 Kerran tai pari kuukaudessa
pairs(~ika+BMI+Kalaa+Silakkaa+PF23478+WHOTEQ, data = data,
3 Kerran viikossa
   main="Simple Scatterplot Matrix")
4 Pari kertaa viikossa
 
5 Lähes joka päivä
</rcode>
6 Kerran päivässä tai useammin
 
      PETOKALA    K22 (muuttujaan _YRI_ISI asti)
===Variable information===
0 En lainkaan
 
1 Harvemmin kuin kerran kuukaudessa
The variable information was originally documented in [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/TuomistoAnalyysiloki20020712.txt Log file about the statistical analyses: Part 1], but unfortunately mostly in Finnish.
2 Kerran tai pari kuukaudessa
 
3 Kerran viikossa
{{hidden|
4 Pari kertaa viikossa
1 =
5 Lähes joka päivä
Tähän on syytä kirjata myös muuttujaluettelo, koska sitä ei ole missään muualla kunnolla tehty.
      MUIKKUA   
       LEIKKAUS   leikkauspvm, SASin oma formaatti (päivää jostakin kiintopisteestä?)
      SIS_VESI   
       IKA        ikä vuosina leikkauspäivänä
      KIRJOLOH   
       ALUENRO    aluenro tutkimusalue (potilaan kotiosoitteen postinumeron perusteella)
      SILAKKAA   
1 -
      IT_MEREN   
2 Espoo
      MUUTA_IT   
3 Helsinki
      PAKASTEK   
4 Hyvinkää
      KALAS_IL   
5 Hämeenlinna
      VALTAMER   
6 Joensuu
      _YRI_ISI   
7 Jyväskylä
      SADE        sädehoito K47
8 Kotka
      VASTATTU   onko kysely palautettu? YES/NO
9 Kuopio
      SADEAIK    aikaisempi sädehoito (yhdistelty kyselyn ja sairaalan tiedoista)
10 Lahti
      TCDF2378    kongeneerispesifinen pitoisuus ng/kg (absoluuttiyksiköt rasvassa)
11 Lappeenranta
      TCDD2378   
  12 Pori
      PF12378   
13 Seinäjoki
      PF23478   
14 Tampere
      PD12378   
15 Turku
      HF123478   
16 Vaasa
      HF123678   
       PNRO       postinumero K3 (K#=kyselylomakkeen kysymys nro #)
      HF234678   
       STRATUM    stratum eli tapauksen numero ilman S-etuliitettä
      HF123789   
       STRATASE   strataset eli tässä analyysissä 9
      HD123478   
       DGLUOKKA   diagnoosiluokka tapauksilla
      HD123678   
DG_id DGnimi
       HD123789   
1 MFH
      F1234678    
2 Liposarcoma
       F1234789   
3 Leiomyosarcoma
       D1234678   
  4 Angiosarcoma
      OCDF       
5 Chondrosarcoma
      OCDD       
6 Sarcoma synoviale
      TOXSUM      edellä mainittujen 17 kongeneerin summa
7 Sarcoma Ewing
      WHOTEQ      edellä mainittujen 17 kongeneerin WHO-TEF-kertoimella painotettu summa
8 Dermatofibrosarcoma
      TUPAK      oletko koskaan tupakoinut 1=ei, 2=kyllä K30
  9 Sarcoma alia
      TSTATUS    tupakointi: 1= < 6 kk sitten, 2= > 6 kk sitten, 3=ei koskaan K30+K32
  10 Sarcoma NUD
      PACKYEAR    askivuodet: poltetut vuodet K31*( eri valmisteiden summa kpl/pvä K33)/20
  11 Ei tietoa
  NULL, jos K30=2 mutta K32=NULL
  12 Osteosarcoma extrasceletale
      KEMIK      monelleko kemikaaliryhmälle on altistunut? K42
  21 Lipoma
  Onko altistunut 1=en, 2=kyllä (K42 muuttujaan MUU2 asti)
  22 Tumor Desmoides
      LIUOTTI    liuottimet
  23 Myxoma
      MAALIT      liuotinpohjaiset maalit
  24 Muu benigni tuumori
       FORMALDE    formaldehydi
  25 Melanoma
       HYONTEIS    hyönteismyrkyt
  26 Muu kuin tuumori
       KASVINSU    kasvinsuojeluaineet
27 Tuplanäyte
       KYLLASTE   puunkyllästeet
      SP          sukupuoli 1: mies, 2: nainen K6
       PESUAINE   voimakkaat pesuaineet
       KOULUV      kouluvuodet K8
      METALLIT    raskasmetallit
      PITUUS      pituus, cm K15
      MUU1        muu
      PAINO      paino nyt, kg K16
      MUU2        muu
      KALAA      kalansyönti K21
  K42 kuvauksen perusteella 0=ei altistusta 4=pahin altistus
1 Harvemmin kuin kerran kuukaudessa tai en lainkaan
      HYONLUOK    hyönteismyrkyt
2 Kerran tai pari kuukaudessa
      KASVLUOK    kasvinsuojeluaineet
3 Kerran viikossa
      KYLLLUOK    puunkyllästeet
4 Pari kertaa viikossa
      ALKO        kuinka usein alkoholia K35
5 Lähes joka päivä
  1 päivittäin
6 Kerran päivässä tai useammin
  2 muutaman kerran viikossa
       PETOKALA   K22 (muuttujaan _YRI_ISI asti)
  3 noin kerran viikossa
0 En lainkaan
  4 pari kertaa kuukaudessa
1 Harvemmin kuin kerran kuukaudessa
  5 noin kerran kuukaudessa
2 Kerran tai pari kuukaudessa
  6 noin kerran parissa kuukaudessa
3 Kerran viikossa
  7 3-4 kertaa vuodessa
4 Pari kertaa viikossa
  8 pari kertaa vuodessa
5 Lähes joka päivä
  9 kerran vuodessa tai harvemmin
       MUIKKUA      
  10 en koskaan
       SIS_VESI   
      ALKOUS      K35 ilmoitettuna kertaa/viikko (7, 3, 1, 0.5, 0.2, 0.1, 0.05, 0.03, 0.01, 0)
      KIRJOLOH   
       ALKOKULU    alkoholin kulutus annosta/viikko. Lasketaan K36=pal ja K35 avulla
      SILAKKAA   
alkokulu: IIf([pal] Is Null And [alko]>8,0,[alkous]*[alkomaar])
      IT_MEREN   
alkomaar  K36 ilmoitettuna annosta/kerta: 0,1,2,3,5,8,12 annosta
      MUUTA_IT   
       ASUKKAIT   potilaan kotikunnan asukasluku (vuonna 1996? Pitäisikö tarkistaa mistä tilasto on?)
      PAKASTEK   
      bmi        paino / pituus / pituus * 10000
      KALAS_IL   
      mkala      Nämä kolme kalamuuttujaa on saatu regressiomallilla sovittamalla luokittelu-
       VALTAMER   
      mkaladio    muuttujat suoraan regressiomalliin. Yksiköt ovat siis epämääräisiä. Ehkä pitäisi
      _YRI_ISI   
       mkalamuu     muuttaa ensin kalamuuttujat muotoon annosta/kk niin olisi mielekäs tulkinta?
      SADE        sädehoito K47
       muu        if muu1 = 1 or muu2 = 1 then muu = 1 (muuten muu=0) HUOM! Tämä muu on laskettu
       VASTATTU    onko kysely palautettu? YES/NO
                  päin mäntyä: 1 tarkoittaa ei altistusta eli jos ei ole kahta altistusta, merkitään
       SADEAIK    aikaisempi sädehoito (yhdistelty kyselyn ja sairaalan tiedoista)
                  altistuneeksi ja jos ei ole tietoa tai on kaksi altistusta, merkitään
      TCDF2378    kongeneerispesifinen pitoisuus ng/kg (absoluuttiyksiköt rasvassa)
                  altistumattomaksi. ÄLÄ KÄYTÄ SIIS TÄTÄ MUUTTUJAA MISSÄÄN!
      TCDD2378   
       smoking    3-tstatus; 0=ei ikinä tupakoinut, 1= > 6 kk sitten, 2= < 6 kk sitten
      PF12378   
       never      smoking jaetaan 0->never=1, 1->former=1, 2->current=1
      PF23478   
      former
      PD12378   
       current
      HF123478   
mkala = 1.180840432 +
      HF123678   
      PETOKALA * 0.285437176 +
      HF234678   
      MUIKKUA  * 0.120796063 +
      HF123789   
      SIS_VESI * 0.081192631 +
      HD123478   
      KIRJOLOH * 0.432524410 +
      HD123678   
      SILAKKAA * 0.201481292 +
      HD123789   
      IT_MEREN * 0.186820358 +
      F1234678   
      MUUTA_IT *-0.096115539 +
      F1234789   
      PAKASTEK * 0.105839909 +
      D1234678   
      KALAS_IL *-0.059836406 +
      OCDF       
      VALTAMER * 0.095303307 +
      OCDD       
      _YRI_ISI * 0.033074683;
      TOXSUM      edellä mainittujen 17 kongeneerin summa
mkaladio = 1.180840432 +
      WHOTEQ      edellä mainittujen 17 kongeneerin WHO-TEF-kertoimella painotettu summa
      SILAKKAA * 0.201481292 +
      TUPAK       oletko koskaan tupakoinut 1=ei, 2=kyllä K30
      IT_MEREN * 0.186820358;
      TSTATUS    tupakointi: 1= < 6 kk sitten, 2= > 6 kk sitten, 3=ei koskaan K30+K32
mkalamuu = 1.180840432 +
      PACKYEAR    askivuodet: poltetut vuodet K31*( eri valmisteiden summa kpl/pvä K33)/20
      PETOKALA * 0.285437176 +
NULL, jos K30=2 mutta K32=NULL
       MUIKKUA  * 0.120796063 +
      KEMIK       monelleko kemikaaliryhmälle on altistunut? K42
      SIS_VESI * 0.081192631 +
  Onko altistunut 1=en, 2=kyllä (K42 muuttujaan MUU2 asti)
      KIRJOLOH * 0.432524410 +
      LIUOTTI    liuottimet
      MUUTA_IT *-0.096115539 +
      MAALIT      liuotinpohjaiset maalit
       PAKASTEK * 0.105839909 +
      FORMALDE    formaldehydi
      KALAS_IL *-0.059836406 +
      HYONTEIS    hyönteismyrkyt
      VALTAMER * 0.095303307 +
      KASVINSU    kasvinsuojeluaineet
      _YRI_ISI * 0.033074683;
      KYLLASTE    puunkyllästeet
}}
      PESUAINE    voimakkaat pesuaineet
 
      METALLIT    raskasmetallit
====Data management====
      MUU1        muu
 
      MUU2        muu
'''Code to manage the data. It takes the original data files and merges them. Works only if files are available.
K42 kuvauksen perusteella 0=ei altistusta 4=pahin altistus
 
      HYONLUOK    hyönteismyrkyt
{{hidden|
      KASVLUOK    kasvinsuojeluaineet
<pre>
      KYLLLUOK    puunkyllästeet
# Sarcoma epidemiological data
      ALKO        kuinka usein alkoholia K35
# Data was obtained from this data file (saved 23.9.2002, version 12.7.2002)
1 päivittäin
# It contains all data that was used in the publication but not e.g. all questionnaire data.
2 muutaman kerran viikossa
# U:\arkisto_kuopio\huippuyksikko\Tutkimus\R16_sarkooma\Analyysit\Analyysi020712\Analyysi020712.xls
3 noin kerran viikossa
 
4 pari kertaa kuukaudessa
library(OpasnetUtils)
5 noin kerran kuukaudessa
 
6 noin kerran parissa kuukaudessa
sarc <- read.table(
7 3-4 kertaa vuodessa
"//cesium/yhteiset/YMAL/Projects/Silakan riskiarvio/Data/Salaiset/Sarkooma/Analyysi020712_copy.csv",
8 pari kertaa vuodessa
sep = ",", header = TRUE
9 kerran vuodessa tai harvemmin
)
10 en koskaan
 
      ALKOUS      K35 ilmoitettuna kertaa/viikko (7, 3, 1, 0.5, 0.2, 0.1, 0.05, 0.03, 0.01, 0)
colnames(sarc)[c(24, 27, 28, 30, 32)] <- c(
      ALKOKULU    alkoholin kulutus annosta/viikko. Lasketaan K36=pal ja K35 avulla
"Sisävesikalaa",
alkokulu: IIf([pal] Is Null And [alko]>8,0,[alkous]*[alkomaar])
"Itämeren.lohta",
alkomaar  K36 ilmoitettuna annosta/kerta: 0,1,2,3,5,8,12 annosta
"Muuta.Itämerestä",
      ASUKKAIT    potilaan kotikunnan asukasluku (vuonna 1996? Pitäisikö tarkistaa mistä tilasto on?)
"Kalasäilykkeitä",
      bmi        paino / pituus / pituus * 10000
"Äyriäisiä"
      mkala      Nämä kolme kalamuuttujaa on saatu regressiomallilla sovittamalla luokittelu-
)
      mkaladio    muuttujat suoraan regressiomalliin. Yksiköt ovat siis epämääräisiä. Ehkä pitäisi
 
      mkalamuu    muuttaa ensin kalamuuttujat muotoon annosta/kk niin olisi mielekäs tulkinta?
########################################### Questionnaire
      muu        if muu1 = 1 or muu2 = 1 then muu = 1 (muuten muu=0) HUOM! Tämä muu on laskettu
# The questions come from the questionnaire form
                  päin mäntyä: 1 tarkoittaa ei altistusta eli jos ei ole kahta altistusta, merkitään
# U:\arkisto_kuopio\huippuyksikko\Tutkimus\R16_sarkooma\KTL_sarcoma_study_questionnaire.odt
                  altistuneeksi ja jos ei ole tietoa tai on kaksi altistusta, merkitään
# The questionnaire data comes from
                  altistumattomaksi. ÄLÄ KÄYTÄ SIIS TÄTÄ MUUTTUJAA MISSÄÄN!
# U:\arkisto_kuopio\huippuyksikko\Tutkimus\R80_Sarkooma2\Data\Kyselyt.xls (Kyselyt.csv does not contain åäö)
      smoking    3-tstatus; 0=ei ikinä tupakoinut, 1= > 6 kk sitten, 2= < 6 kk sitten
# Kyselyt.xls was saved to N:\YMAL\Projects\Silakan riskiarvio\Data\Salaiset\Kyselyt.csv
      never      smoking jaetaan 0->never=1, 1->former=1, 2->current=1
 
      former
ques <- read.table(
      current
"//cesium/yhteiset/YMAL/Projects/Silakan riskiarvio/Data/Salaiset/Kyselyt.csv",
mkala = 1.180840432 +
sep = ",", header = TRUE
      PETOKALA * 0.285437176 +
)
      MUIKKUA  * 0.120796063 +
 
      SIS_VESI * 0.081192631 +
poista <- c(
      KIRJOLOH * 0.432524410 +
"Pituus",
      SILAKKAA * 0.201481292 +
"Paino",
      IT_MEREN * 0.186820358 +
"Kalaa",
      MUUTA_IT *-0.096115539 +
"Petokalaa",
      PAKASTEK * 0.105839909 +
"Muikkua",
      KALAS_IL *-0.059836406 +
"Sisävesikalaa",
      VALTAMER * 0.095303307 +
"Kirjolohta",
      _YRI_ISI * 0.033074683;
"Silakkaa",
mkaladio = 1.180840432 +
"Itämeren.lohta",
      SILAKKAA * 0.201481292 +
"Muuta.itämerestä",
      IT_MEREN * 0.186820358;
"Pakastekalaa",
mkalamuu = 1.180840432 +
"Kalasäilykkeitä",
      PETOKALA * 0.285437176 +
"Valtamerikalaa",
      MUIKKUA  * 0.120796063 +
"Äyriäisiä",
      SIS_VESI * 0.081192631 +
"Vastattu",
      KIRJOLOH * 0.432524410 +
"AlkoKuinka.usein",
      MUUTA_IT *-0.096115539 +
"Onko1",
      PAKASTEK * 0.105839909 +
"Onko2",
      KALAS_IL *-0.059836406 +
"Onko3",
      VALTAMER * 0.095303307 +
"Onko4",
      _YRI_ISI * 0.033074683;
"Onko5",
}}
"Onko6",
"Onko7",
"Onko8",
"Onko9",
"Onko10"
)
 
# The tests below shows that the questionnaire columns in sarc and ques are actually identical. Therefore
# we remove them from ques and keep those in sarc, the data file that was used in the publication.


===Data management===
#for(i in testaa) {
# x <- dat[[paste(i, ".x", sep = "")]]
# if(is.factor(x)) x <- as.numeric(x) # levels(x)[x]
# y <- dat[[paste(i, ".y", sep = "")]]
# if(is.factor(y)) y <- as.numeric(y) # levels(y)[y]
# print(paste(i, sum( x != y, na.rm = TRUE)))
#}


'''Code to manage the data. It takes the original data files and merges them. Works only if files are available.
#sum(as.numeric(dat$alko) != as.numeric(dat$AlkoKuinka.usein), na.rm = TRUE)
 
#sum(as.numeric(dat$liuotti) != as.numeric(dat$Onko1), na.rm = TRUE)
{{hidden|
#sum(as.numeric(dat$maalit) != as.numeric(dat$Onko2), na.rm = TRUE)
<pre>
#sum(as.numeric(dat$formalde) != as.numeric(dat$Onko3), na.rm = TRUE)
# Sarcoma epidemiological data
#sum(as.numeric(dat$hyonteis) != as.numeric(dat$Onko4), na.rm = TRUE)
# Data was obtained from this data file (saved 23.9.2002, version 12.7.2002)
#sum(as.numeric(dat$Kasvinsu) != as.numeric(dat$Onko5), na.rm = TRUE)
# It contains all data that was used in the publication but not e.g. all questionnaire data.
#sum(as.numeric(dat$Kyllate) != as.numeric(dat$Onko6), na.rm = TRUE)
# U:\arkisto_kuopio\huippuyksikko\Tutkimus\R16_sarkooma\Analyysit\Analyysi020712\Analyysi020712.xls
#sum(as.numeric(dat$Pesuaine) != as.numeric(dat$Onko7), na.rm = TRUE)
#sum(as.numeric(dat$Metallit) != as.numeric(dat$Onko8), na.rm = TRUE)
#sum(as.numeric(dat$Muu1) != as.numeric(dat$Onko9), na.rm = TRUE)
#sum(as.numeric(dat$Muu2) != as.numeric(dat$Onko10), na.rm = TRUE)


library(OpasnetUtils)
ques <- ques[!colnames(ques) %in% poista]
dat <- merge(sarc, ques, by = "kysely_id", all.x = TRUE)


sarc <- read.table(
ruoat <- list(c(
"//cesium/yhteiset/YMAL/Projects/Silakan riskiarvio/Data/Salaiset/Sarkooma/Analyysi020712_copy.csv",
"Harvemmin kuin kerran kuukaudessa tai en lainkaan",
sep = ",", header = TRUE
"Kerran tai pari kuukaudessa",
"Kerran viikossa",
"Pari kertaa viikossa",
"Lähes joka päivä",
"Kerran päivässä tai useammin"
),
NA,
TRUE
)
)


colnames(sarc)[c(24, 27, 28, 30, 32)] <- c(
kalat <- list(
"Sisävesikalaa",
c(
"Itämeren.lohta",
"En lainkaan",
"Muuta.Itämerestä",
"Harvemmin kuin kerran kuukaudessa",
"Kalasäilykkeitä",
"Kerran tai pari kuukaudessa",
"Äyriäisiä"
"Kerran viikossa",
"Pari kertaa viikossa",
"Lähes joka päivä"
),
0:5,
TRUE
)
)


########################################### Questionnaire
yn <- list(c("No", "Yes"), NA, TRUE)
# The questions come from the questionnaire form
# U:\arkisto_kuopio\huippuyksikko\Tutkimus\R16_sarkooma\KTL_sarcoma_study_questionnaire.odt
# The questionnaire data comes from
# U:\arkisto_kuopio\huippuyksikko\Tutkimus\R80_Sarkooma2\Data\Kyselyt.xls (Kyselyt.csv does not contain åäö)
# Kyselyt.xls was saved to N:\YMAL\Projects\Silakan riskiarvio\Data\Salaiset\Kyselyt.csv


ques <- read.table(
locs <- list(
"//cesium/yhteiset/YMAL/Projects/Silakan riskiarvio/Data/Salaiset/Kyselyt.csv",
aluenro = c(
sep = ",", header = TRUE
"-",
)
"Espoo",
 
"Helsinki",
poista <- c(
"Hyvinkää",
"Pituus",
"Hämeenlinna",
"Paino",
"Joensuu",
"Kalaa",
"Jyväskylä",
"Petokalaa",
"Kotka",
"Muikkua",
"Kuopio",
"Sisävesikalaa",
"Lahti",
"Kirjolohta",
"Lappeenranta",
"Silakkaa",
"Pori",
"Itämeren.lohta",
"Seinäjoki",
"Muuta.itämerestä",
"Tampere",
"Pakastekalaa",
"Turku",
"Kalasäilykkeitä",
"Vaasa"
"Valtamerikalaa",
),
"Äyriäisiä",
dgluokka = list(
"Vastattu",
c(
"AlkoKuinka.usein",
"MFH",
"Onko1",
"Liposarcoma",
"Onko2",
"Leiomyosarcoma",
"Onko3",
"Angiosarcoma",
"Onko4",
"Chondrosarcoma",
"Onko5",
"Sarcoma synoviale",
"Onko6",
"Sarcoma Ewing",
"Onko7",
"Dermatofibrosarcoma",
"Onko8",
"Sarcoma alia",
"Onko9",
"Sarcoma NUD",
"Onko10"
"Ei tietoa",
)
"Osteosarcoma extrasceletale",
 
"Lipoma",
# The tests below shows that the questionnaire columns in sarc and ques are actually identical. Therefore
"Tumor Desmoides",
# we remove them from ques and keep those in sarc, the data file that was used in the publication.
"Myxoma",
 
"Muu benigni tuumori",
#for(i in testaa) {
"Melanoma",
# x <- dat[[paste(i, ".x", sep = "")]]
"Muu kuin tuumori",
# if(is.factor(x)) x <- as.numeric(x) # levels(x)[x]
"Tuplanäyte"
# y <- dat[[paste(i, ".y", sep = "")]]
),
# if(is.factor(y)) y <- as.numeric(y) # levels(y)[y]
c(1:12, 21:27)
# print(paste(i, sum( x != y, na.rm = TRUE)))
),
#}
sp = c("Male", "Female"),
 
Kalaa = ruoat,
#sum(as.numeric(dat$alko) != as.numeric(dat$AlkoKuinka.usein), na.rm = TRUE)
Petokalaa = kalat,
#sum(as.numeric(dat$liuotti) != as.numeric(dat$Onko1), na.rm = TRUE)
Muikkua = kalat,
#sum(as.numeric(dat$maalit) != as.numeric(dat$Onko2), na.rm = TRUE)
Sisävesikalaa = kalat,
#sum(as.numeric(dat$formalde) != as.numeric(dat$Onko3), na.rm = TRUE)
Kirjolohta = kalat,
#sum(as.numeric(dat$hyonteis) != as.numeric(dat$Onko4), na.rm = TRUE)
Silakkaa = kalat,
#sum(as.numeric(dat$Kasvinsu) != as.numeric(dat$Onko5), na.rm = TRUE)
Itämeren.lohta = kalat,
#sum(as.numeric(dat$Kyllate) != as.numeric(dat$Onko6), na.rm = TRUE)
Muuta.Itämerestä = kalat,
#sum(as.numeric(dat$Pesuaine) != as.numeric(dat$Onko7), na.rm = TRUE)
Pakastekalaa = kalat,
#sum(as.numeric(dat$Metallit) != as.numeric(dat$Onko8), na.rm = TRUE)
Kalasäilykkeitä = kalat,
#sum(as.numeric(dat$Muu1) != as.numeric(dat$Onko9), na.rm = TRUE)
Valtamerikalaa = kalat,
#sum(as.numeric(dat$Muu2) != as.numeric(dat$Onko10), na.rm = TRUE)
Äyriäisiä = kalat,
 
sade = yn,
ques <- ques[!colnames(ques) %in% poista]
tupak = yn,
dat <- merge(sarc, ques, by = "kysely_id", all.x = TRUE)
tstatus = c("< 6 mo ago", "> 6 mo ago", "Never"),
 
liuotti = yn,
ruoat <- list(c(
maalit = yn,
"Harvemmin kuin kerran kuukaudessa tai en lainkaan",
formalde = yn,
"Kerran tai pari kuukaudessa",
hyonteis = yn,
"Kerran viikossa",
Kasvinsu = yn,
"Pari kertaa viikossa",
Kyllaste = yn,
"Lähes joka päivä",
Pesuaine = yn,
"Kerran päivässä tai useammin"
Metallit = yn,
),
Muu1 = yn,
NA,
Muu2 = yn,
TRUE
Hyonluok = list(c("No", "Very mild", "Mild", "Moderate", "High"), 0:4, TRUE),
)
Kasvluok = list(c("No", "Very mild", "Mild", "Moderate", "High"), 0:4, TRUE),
 
Kyllluok = list(c("No", "Very mild", "Mild", "Moderate", "High"), 0:4, TRUE),
kalat <- list(
alko = list(c(
c(
"en koskaan",
"En lainkaan",
"kerran vuodessa tai harvemmin",
"Harvemmin kuin kerran kuukaudessa",
"pari kertaa vuodessa",
"Kerran tai pari kuukaudessa",
"3-4 kertaa vuodessa",
"Kerran viikossa",
"noin kerran parissa kuukaudessa",
"Pari kertaa viikossa",
"noin kerran kuukaudessa",
"Lähes joka päivä"
"pari kertaa kuukaudessa",
"noin kerran viikossa",
"muutaman kerran viikossa",
"päivittäin"
),
),
0:5,
10:1,
TRUE
TRUE
)
yn <- list(c("No", "Yes"), NA, TRUE)
locs <- list(
aluenro = c(
"-",
"Espoo",
"Helsinki",
"Hyvinkää",
"Hämeenlinna",
"Joensuu",
"Jyväskylä",
"Kotka",
"Kuopio",
"Lahti",
"Lappeenranta",
"Pori",
"Seinäjoki",
"Tampere",
"Turku",
"Vaasa"
),
),
dgluokka = list(
Koulutus = list(c(
c(
"Kansakoulu tai peruskoulu",
"MFH",
"Keskikoulu",
"Liposarcoma",
"Ammattikoulu tai vastaava",
"Leiomyosarcoma",
"Opistotutkinto ja/tai lukio",
"Angiosarcoma",
"Akateeminen tutkinto"
"Chondrosarcoma",
"Sarcoma synoviale",
"Sarcoma Ewing",
"Dermatofibrosarcoma",
"Sarcoma alia",
"Sarcoma NUD",
"Ei tietoa",
"Osteosarcoma extrasceletale",
"Lipoma",
"Tumor Desmoides",
"Myxoma",
"Muu benigni tuumori",
"Melanoma",
"Muu kuin tuumori",
"Tuplanäyte"
),
c(1:12, 21:27)
),
),
sp = c("Male", "Female"),
NA,
Kalaa = ruoat,
TRUE
Petokalaa = kalat,
),
Muikkua = kalat,
Työntekijäryhmä = c(
Sisävesikalaa = kalat,
"Ylempi toimihenkilö",
Kirjolohta = kalat,
"Alempi toimihenkilö",
Silakkaa = kalat,
"Työntekijä",
Itämeren.lohta = kalat,
"Maanviljelijä",
Muuta.Itämerestä = kalat,
"Yrittäjä",
Pakastekalaa = kalat,
"Opiskelija",
Kalasäilykkeitä = kalat,
"Eläkeläinen",
Valtamerikalaa = kalat,
"Kotirouva",
Äyriäisiä = kalat,
"Työtön"
sade = yn,
),
tupak = yn,
Painonmuutos = list(c(
tstatus = c("< 6 mo ago", "> 6 mo ago", "Never"),
"Olen laihtunut",
liuotti = yn,
"Painoni ei ole juuri muuttunut",
maalit = yn,
"Olen lihonut ja laihtunut",
formalde = yn,
"Olen lihonut"
hyonteis = yn,
),
Kasvinsu = yn,
c(3, 2, 13, 1),
Kyllaste = yn,
TRUE
Pesuaine = yn,
),
Metallit = yn,
Ruokavalio = c(
Muu1 = yn,
"ei erityisruokavaliota",
Muu2 = yn,
"kasvisruoka sekä maito- ja munatuotteet",
Hyonluok = list(c("No", "Very mild", "Mild", "Moderate", "High"), 0:4, TRUE),
"ainoastaan kasvisruokavalio",
Kasvluok = list(c("No", "Very mild", "Mild", "Moderate", "High"), 0:4, TRUE),
"gluteeniton",
Kyllluok = list(c("No", "Very mild", "Mild", "Moderate", "High"), 0:4, TRUE),
"maidoton (ei edes hyla-tuotteita)",
alko = list(c(
"muu, tarkempi kuvaus"
"en koskaan",
"kerran vuodessa tai harvemmin",
"pari kertaa vuodessa",
"3-4 kertaa vuodessa",
"noin kerran parissa kuukaudessa",
"noin kerran kuukaudessa",
"pari kertaa kuukaudessa",
"noin kerran viikossa",
"muutaman kerran viikossa",
"päivittäin"
),
),
10:1,
Leipää = ruoat,
TRUE
Puuroja = ruoat,
),
Makaronia = ruoat,
Koulutus = list(c(
Muutaviljaa = ruoat,
"Kansakoulu tai peruskoulu",
Viiliä = ruoat,
"Keskikoulu",
Juustoja = ruoat,
"Ammattikoulu tai vastaava",
Rasvaisia.juustoja = ruoat,
"Opistotutkinto ja/tai lukio",
Jäätelöä = ruoat,
"Akateeminen tutkinto"
Liharuokaa = ruoat,
Mitä.maitoa = list(c(
"en juo maitoa enkä piimää",
"rasvatonta maitoa",
"rasvatonta piimää tai kirnupiimää",
"muuta piimää",
"ykkösmaitoa",
"kevytmaitoa",
"täysmaitoa"
),
c(7, 4, 5, 6, 3, 2, 1),
TRUE
),
# Tästä välistä puuttuu Mitä.piimää. Vai puuttuuko? Onko yhdistetty maitoon?
Mitä.leivälle = list(c(
"En mitään",
"Kasvimargariinia",
"Voi-kasvirasvaseosta",
"Voita"
),
),
NA,
NA,
TRUE
TRUE
),
),
Työntekijäryhmä = c(
Paljonko.rasvaa = list(c(
"Ylempi toimihenkilö",
"En lainkaan",
"Alempi toimihenkilö",
"Voinapilla voitelen kolme viipaletta tai enemmän",
"Työntekijä",
"Voinapilla voitelen 1-2 viipaletta",
"Maanviljelijä",
"Käytän enemmän kuin yhden voinapin viipaletta kohti"
"Yrittäjä",
),
"Opiskelija",
NA,
"Eläkeläinen",
TRUE
"Kotirouva",
"Työtön"
),
),
Painonmuutos = list(c(
Mitä.rasvaa = list(c(
"Olen laihtunut",
"Ei mitään rasvaa",
"Painoni ei ole juuri muuttunut",
"Kasviöljyä",
"Olen lihonut ja laihtunut",
"Kasvimargariinia",
"Olen lihonut"
"Talousmargariinia",
"Voi-kasviöljyseosta",
"Voita"
),
),
c(3, 2, 13, 1),
c(6, 1, 2, 3, 4, 5),
TRUE
TRUE
),
),
Ruokavalio = c(
Ruokamuutos = c(
"ei erityisruokavaliota",
"Ei",
"kasvisruoka sekä maito- ja munatuotteet",
"Kyllä"
"ainoastaan kasvisruokavalio",
),
"gluteeniton",
Vesilähde = c(
"maidoton (ei edes hyla-tuotteita)",
"kunnan vesijohtovettä",
"muu, tarkempi kuvaus"
"oman kaivon vettä",
"kaupan pullotettua vettä",
"muuta"
),
),
Leipää = ruoat,
Tupakointi = c(
Puuroja = ruoat,
"En",
Makaronia = ruoat,
"Kyllä"
Muutaviljaa = ruoat,
),
Viiliä = ruoat,
TupakSäännöllisyys = c(
Juustoja = ruoat,
"en ole koskaan tupakoinut säännöllisesti",
Rasvaisia.juustoja = ruoat,
"olen tupakoinut säännöllisesti"
Jäätelöä = ruoat,
),
Liharuokaa = ruoat,
Tupakviimeksi = list(c(
Mitä.maitoa = list(c(
"yli 10 vuotta sitten",
"en juo maitoa enkä piimää",
"6 - 10 vuotta sitten",
"rasvatonta maitoa",
"1 - 5 vuotta sitten",
"rasvatonta piimää tai kirnupiimää",
"puoli vuotta - vuosi sitten",
"muuta piimää",
"1 kk - puoli vuotta sitten",
"ykkösmaitoa",
"2 pv - 1 kk sitten",
"kevytmaitoa",
"eilen tai tänään"
"täysmaitoa"
),
),
c(7, 4, 5, 6, 3, 2, 1),
7:1,
TRUE
TRUE
),
),
# Tästä välistä puuttuu Mitä.piimää. Vai puuttuuko? Onko yhdistetty maitoon?
Alkoholi = list(c(
Mitä.leivälle = list(c(
"en ole koskaan käyttänyt alkoholijuomia",
"En mitään",
"en, olen lopettanut alkoholin käytön kokonaan",
"Kasvimargariinia",
"kyllä, harvemmin kuin kerran kuussa",
"Voi-kasvirasvaseosta",
"kyllä, vähintään kerran kuussa"
"Voita"
),
),
NA,
4:1,
TRUE
TRUE
),
),
Paljonko.rasvaa = list(c(
AlkoKuinka.paljon = list(c(
"En lainkaan",
"vähemmän kuin yhden annoksen",
"Voinapilla voitelen kolme viipaletta tai enemmän",
"1 annoksen",
"Voinapilla voitelen 1-2 viipaletta",
"2 annosta",
"Käytän enemmän kuin yhden voinapin viipaletta kohti"
"3 annosta",
"4-5 annosta",
"6-10 annosta",
"yli 10 annosta"
),
),
NA,
NA,
TRUE
TRUE
),
),
Mitä.rasvaa = list(c(
Asuntotyyppi = c(
"Ei mitään rasvaa",
"omakotitalossa",
"Kasviöljyä",
"rivitalossa",
"Kasvimargariinia",
"kerrostalossa"
"Talousmargariinia",
"Voi-kasviöljyseosta",
"Voita"
),
),
c(6, 1, 2, 3, 4, 5),
Lämmitystyyppi = c(
TRUE
"kaukolämpö",
"öljylämmitys",
"sähkölämmitys",
"puulämmitys",
"muu"
),
),
Ruokamuutos = c(
Paikkaus = c(
"Ei",
"Ei ole",
"Kyllä"
"Kyllä"
),
),
Vesilähde = c(
PaikkaPoisto = yn,
"kunnan vesijohtovettä",
Montako.paikkaa = list(c(
"oman kaivon vettä",
"ei yhtään",
"kaupan pullotettua vettä",
"1 - 2",
"muuta"
"3 - 6",
"7 - 15",
"yli 15"
),
),
Tupakointi = c(
NA,  
"En",
"Kyllä"
),
TupakSäännöllisyys = c(
"en ole koskaan tupakoinut säännöllisesti",
"olen tupakoinut säännöllisesti"
),
Tupakviimeksi = list(c(
"yli 10 vuotta sitten",
"6 - 10 vuotta sitten",
"1 - 5 vuotta sitten",
"puoli vuotta - vuosi sitten",
"1 kk - puoli vuotta sitten",
"2 pv - 1 kk sitten",
"eilen tai tänään"
),
7:1,
TRUE
TRUE
),
),
Alkoholi = list(c(
Sairaus = yn,
"en ole koskaan käyttänyt alkoholijuomia",
Elinsiirto = yn,
"en, olen lopettanut alkoholin käytön kokonaan",
Tekonivel = yn,
"kyllä, harvemmin kuin kerran kuussa",
Muu = yn,
"kyllä, vähintään kerran kuussa"
Sädehoito = yn,
),
Vierasesine = yn,
4:1,
AIDS = yn,
TRUE
Neurofibr = yn,
),
vonHippel = yn
AlkoKuinka.paljon = list(c(
)
"vähemmän kuin yhden annoksen",
 
"1 annoksen",
for(i in names(locs)) {
"2 annosta",
if(!is.list(locs[[i]])) locs[[i]] <- list(locs[[i]], NA)
"3 annosta",
if(!is.numeric(locs[[i]][[2]])) locs[[i]][[2]] <- 1:length(locs[[i]][[1]])
"4-5 annosta",
if(length(locs[[i]]) < 3) locs[[i]][[3]] <- FALSE
"6-10 annosta",
if(i == "Kalaa") {print(locs[[i]]); print(dat[[i]][1:100])}
"yli 10 annosta"
dat[[i]] <- factor(dat[[i]], levels = locs[[i]][[2]], labels = locs[[i]][[1]], ordered = locs[[i]][[3]])
),
if(i == "Kalaa") {print(locs[[i]]); print(dat[[i]][1:100])}
NA,
}
TRUE
 
),
levels(dat$ikäAlin)[levels(dat$ikäAlin) == "20.5"] <- "20"
Asuntotyyppi = c(
dat$ikaA1 <- as.numeric(substr(dat$ikäAlin, 1, 2))
"omakotitalossa",
dat$ikaA2 <- as.character(dat$ikäAlin)
"rivitalossa",
dat$ikaA2 <- as.numeric(substr(dat$ikaA2, nchar(dat$ikaA2)-1, nchar(dat$ikaA2)))
"kerrostalossa"
 
),
levels(dat$ikäYlin)[levels(dat$ikäYlin) == "20.5"] <- "20"
Lämmitystyyppi = c(
levels(dat$ikäYlin)[levels(dat$ikäYlin) == "21.5"] <- "21"
"kaukolämpö",
levels(dat$ikäYlin)[levels(dat$ikäYlin) == "28.5"] <- "28"
"öljylämmitys",
levels(dat$ikäYlin)[levels(dat$ikäYlin) == "121"] <- "21"
"sähkölämmitys",
dat$ikaY1 <- as.numeric(substr(dat$ikäYlin, 1, 2))
"puulämmitys",
dat$ikaY2 <- as.character(dat$ikäYlin)
"muu"
dat$ikaY2 <- as.numeric(gsub("[- ,]", "", substr(dat$ikaY2, nchar(dat$ikaY2)-2, nchar(dat$ikaY2))))
),
 
Paikkaus = c(
dat$Pnetto <- as.numeric(sapply(as.character(dat$Muutoskg), FUN = function(x) {strsplit(x, " ")[[1]][1]}))
"Ei ole",
dat$Ppoikk <- -as.numeric(sapply(as.character(dat$Muutoskg), FUN = function(x) {strsplit(x, " ")[[1]][2]}))
"Kyllä"
dat$Ppoikk[is.na(dat$Ppoikk)] <- 0
),
dat$Pnetto <- ifelse(dat$Painonmuutos %in% c("Olen laihtunut"), -dat$Pnetto, dat$Pnetto)
PaikkaPoisto = yn,
dat$Pnetto[dat$Painonmuutos == "Painoni ei ole juuri muuttunut"] <- 0
Montako.paikkaa = list(c(
dat$Pnetto <- dat$Pnetto + dat$Ppoikk
"ei yhtään",
 
"1 - 2",
# Then we'll input values for NA based on the averages of the respective subgroups
"3 - 6",
dat$Pnetto[is.na(dat$Pnetto) & dat$Painonmuutos == "Olen lihonut"] <- 8
"7 - 15",
dat$Pnetto[is.na(dat$Pnetto) & dat$Painonmuutos == "Olen laihtunut"] <- -8
"yli 15"
dat$Pnetto[is.na(dat$Pnetto) & dat$Painonmuutos == "Olen lihonut ja laihtunut"] <- 1
),
NA,  
TRUE
),
Sairaus = yn,
Elinsiirto = yn,
Tekonivel = yn,
Muu = yn,
Sädehoito = yn,
Vierasesine = yn,
AIDS = yn,
Neurofibr = yn,
vonHippel = yn
)


for(i in names(locs)) {
dat$Rintamaitoa1[is.na(dat$Rintamaitoa1)] <- 0
if(!is.list(locs[[i]])) locs[[i]] <- list(locs[[i]], NA)
dat$Rintamaitoa2[is.na(dat$Rintamaitoa2)] <- 0
if(!is.numeric(locs[[i]][[2]])) locs[[i]][[2]] <- 1:length(locs[[i]][[1]])
dat$Rintamaitoa3[is.na(dat$Rintamaitoa3)] <- 0
if(length(locs[[i]]) < 3) locs[[i]][[3]] <- FALSE
dat$Rintamaitoa4[is.na(dat$Rintamaitoa4)] <- 0
if(i == "Kalaa") {print(locs[[i]]); print(dat[[i]][1:100])}
dat$Rintamaitoa5[is.na(dat$Rintamaitoa5)] <- 0
dat[[i]] <- factor(dat[[i]], levels = locs[[i]][[2]], labels = locs[[i]][[1]], ordered = locs[[i]][[3]])
dat$Rintamaitoa6[is.na(dat$Rintamaitoa6)] <- 0
if(i == "Kalaa") {print(locs[[i]]); print(dat[[i]][1:100])}
dat$Rintamaitoa7[is.na(dat$Rintamaitoa7)] <- 0
}
dat$Rintamaitoa8[is.na(dat$Rintamaitoa8)] <- 0
dat$Rintamaitoa <- dat$Rintamaitoa1 + dat$Rintamaitoa2 + dat$Rintamaitoa3 + dat$Rintamaitoa4 +
dat$Rintamaitoa5 + dat$Rintamaitoa6 + dat$Rintamaitoa7 + dat$Rintamaitoa8


levels(dat$ikäAlin)[levels(dat$ikäAlin) == "20.5"] <- "20"
### Graphs about WHOTEQ as a function of age and BMI.
dat$ikaA1 <- as.numeric(substr(dat$ikäAlin, 1, 2))
dat$ikaA2 <- as.character(dat$ikäAlin)
dat$ikaA2 <- as.numeric(substr(dat$ikaA2, nchar(dat$ikaA2)-1, nchar(dat$ikaA2)))


levels(dat$ikäYlin)[levels(dat$ikäYlin) == "20.5"] <- "20"
if(FALSE) {
levels(dat$ikäYlin)[levels(dat$ikäYlin) == "21.5"] <- "21"
ggplot(dat) + geom_point(aes(x = ika, y = WHOTEQ, colour = Paino / (Pituus/100)^2)) + scale_colour_gradientn(colours = rainbow(3))
levels(dat$ikäYlin)[levels(dat$ikäYlin) == "28.5"] <- "28"
ggplot(dat) + geom_point(aes(x = ika, y = WHOTEQ, colour = Paino / (Pituus/100)^2>25))
levels(dat$ikäYlin)[levels(dat$ikäYlin) == "121"] <- "21"
ggplot(dat, aes(x = ika, y = WHOTEQ, colour = Paino / (Pituus/100)^2>25)) + geom_point() + geom_smooth()
dat$ikaY1 <- as.numeric(substr(dat$ikäYlin, 1, 2))
dat$ikaY2 <- as.character(dat$ikäYlin)
dat$ikaY2 <- as.numeric(gsub("[- ,]", "", substr(dat$ikaY2, nchar(dat$ikaY2)-2, nchar(dat$ikaY2))))


dat$Pnetto <- as.numeric(sapply(as.character(dat$Muutoskg), FUN = function(x) {strsplit(x, " ")[[1]][1]}))
for(i in colnames(dat)[c(4, 6:8, 14:16, 18:35, 37:76, 82:84, 101:138, 144:152, 154:162)]) {#182, 184:199, 201:205, 238, 247, 267:268)]) {
dat$Ppoikk <- -as.numeric(sapply(as.character(dat$Muutoskg), FUN = function(x) {strsplit(x, " ")[[1]][2]}))
print(ggplot(dat, aes_string(x = i)) + geom_bar() + labs(title = i))
dat$Ppoikk[is.na(dat$Ppoikk)] <- 0
#  par(ask = interactive()) # This makes R to wait for enter before continuing
dat$Pnetto <- ifelse(dat$Painonmuutos %in% c("Olen laihtunut"), -dat$Pnetto, dat$Pnetto)
}
dat$Pnetto[dat$Painonmuutos == "Painoni ei ole juuri muuttunut"] <- 0
dat$Pnetto <- dat$Pnetto + dat$Ppoikk


# Then we'll input values for NA based on the averages of the respective subgroups
}
dat$Pnetto[is.na(dat$Pnetto) & dat$Painonmuutos == "Olen lihonut"] <- 8
dat$Pnetto[is.na(dat$Pnetto) & dat$Painonmuutos == "Olen laihtunut"] <- -8
dat$Pnetto[is.na(dat$Pnetto) & dat$Painonmuutos == "Olen lihonut ja laihtunut"] <- 1


dat$Rintamaitoa1[is.na(dat$Rintamaitoa1)] <- 0
temp <- as.character(dat$leikkaus_dt)
dat$Rintamaitoa2[is.na(dat$Rintamaitoa2)] <- 0
temp <- as.numeric(substr(temp, nchar(temp) - 1, nchar(temp)))
dat$Rintamaitoa3[is.na(dat$Rintamaitoa3)] <- 0
temp <- temp - pmax(
dat$Rintamaitoa4[is.na(dat$Rintamaitoa4)] <- 0
dat$Vuosi1,  
dat$Rintamaitoa5[is.na(dat$Rintamaitoa5)] <- 0
dat$Vuosi2,  
dat$Rintamaitoa6[is.na(dat$Rintamaitoa6)] <- 0
dat$Vuosi3,  
dat$Rintamaitoa7[is.na(dat$Rintamaitoa7)] <- 0
dat$Vuosi4,  
dat$Rintamaitoa8[is.na(dat$Rintamaitoa8)] <- 0
dat$Vuosi5,  
dat$Rintamaitoa <- dat$Rintamaitoa1 + dat$Rintamaitoa2 + dat$Rintamaitoa3 + dat$Rintamaitoa4 +
dat$Rintamaitoa5 + dat$Rintamaitoa6 + dat$Rintamaitoa7 + dat$Rintamaitoa8
 
### Graphs about WHOTEQ as a function of age and BMI.
 
if(FALSE) {
ggplot(dat) + geom_point(aes(x = ika, y = WHOTEQ, colour = Paino / (Pituus/100)^2)) + scale_colour_gradientn(colours = rainbow(3))
ggplot(dat) + geom_point(aes(x = ika, y = WHOTEQ, colour = Paino / (Pituus/100)^2>25))
ggplot(dat, aes(x = ika, y = WHOTEQ, colour = Paino / (Pituus/100)^2>25)) + geom_point() + geom_smooth()
 
for(i in colnames(dat)[c(4, 6:8, 14:16, 18:35, 37:76, 82:84, 101:138, 144:152, 154:162)]) {#182, 184:199, 201:205, 238, 247, 267:268)]) {
print(ggplot(dat, aes_string(x = i)) + geom_bar() + labs(title = i))
#  par(ask = interactive()) # This makes R to wait for enter before continuing
}
 
}
 
temp <- as.character(dat$leikkaus_dt)
temp <- as.numeric(substr(temp, nchar(temp) - 1, nchar(temp)))
temp <- temp - pmax(
dat$Vuosi1,  
dat$Vuosi2,  
dat$Vuosi3,  
dat$Vuosi4,  
dat$Vuosi5,  
dat$Vuosi6,  
dat$Vuosi6,  
dat$Vuosi7,  
dat$Vuosi7,  
Line 1,040: Line 970:
}}
}}


=== Intake data ===
==== Assumptions ====


How much mass, energy, and dioxin does one portion contain? Data are guesswork of from [http://www.fineli.fi Fineli].
The following assumptions are used to interpret survey answers:
 
<t2b name="Assumptions for calculations" index="Variable,Value,Unit" obs="Result" desc="Description,Kysymys suomeksi" unit="-">
Q23||dl /glass|2|Desilitres of milk or sourmilk per glass|
Q24|1|times /a|0|Never|
Q24|2|times /a|0.5 - 0.9|less than once a year|
Q24|3|times /a|2 - 5|A few times a year|
Q24|4|times /a|12 - 36|1 - 3 times per month|
Q24|5|times /a|52|once a week|
Q24|6|times /a|104 - 208|2 - 4 times per week|
Q24|7|times /a|260 - 364|5 or more times per week|
</t2b>
 
How much mass, energy, and dioxin does one portion contain? Data are guesswork of from [http://www.fineli.fi Fineli].


<t2b name="Food energy and dioxin" index="Food,Observation" locations="Mass,Energy,Dioxin" unit="g,kJ,pg/portion">
<t2b name="Food energy and dioxin" index="Food,Observation" locations="Mass,Energy,Dioxin" unit="g,kJ,pg/portion">
Line 1,081: Line 1,024:
</t2b>
</t2b>


=== POPs and obesity ===
=== Analyses ===
 
====Simulated data====
 
; This code was used to create a csv file that contains a simulated data from this study. When compared with the original data, the simulated data
* has the same number of observations,
* has the same range of values in each variable,
* has approximately the same correlation structure between all variables.
 
<rcode>
library(OpasnetUtils)
library(MASS)
library(mc2d)
library(reshape2)
library(ggplot2)
 
objects.get("isqT7nvhd0ViUR7d")
 
data <- objects.decode(etable, password)
colnames(data) <- t(data[1, ])
data <- data[2:nrow(data), 2:ncol(data)]
 
data2 <- data
fun <- c(rep("normal", 5), rep("poisson", 12), rep("lognormal", 19))
 
params <- list()
 
for(i in 1:ncol(data2)) {
data2[[i]] <- as.numeric(as.character(data2[[i]]))
if(i > 17) data2[[i]] <- ifelse(data2[[i]] == 0, 0.01, data2[[i]])
params[i] <- fitdistr(data2[[i]][!is.na(data2[[i]])], fun[i])
}
 
simu <- data.frame(temp = rep(NA, 968))
 
for(i in 1:5) {
simu[[i]] <- rnorm(968, params[[i]][1], params[[i]][2])
}
for(i in 6:17) {
simu[[i]] <- rpois(968, params[[i]])
}
for(i in 18:36) {
simu[[i]] <- rlnorm(968, params[[i]][1], params[[i]][2])
}
simu[[3]] <- rbern(968, 0.5) + 1
 
colnames(simu) <- colnames(data)
 
korre <- cor(x = data2, use = "pairwise.complete.obs", method = "spearman")
 
simu <- as.data.frame(cornode(as.matrix(simu), target = korre))
 
korre2 <- cor(x = simu, use = "pairwise.complete.obs", method = "spearman")
 
qplot(melt(korre)$value, melt(korre2)$value)
 
for(i in 1:ncol(simu)) {
simu[[i]] <- ifelse(
simu[[i]] > max(data[[i]], na.rm = TRUE) |
simu[[i]] < min(data[[i]], na.rm = TRUE),
NA, simu[[i]]
)
}
 
for(i in 1:ncol(data2)) {print(paste(
min(data2[[i]], na.rm = TRUE),
max(data2[[i]], na.rm = TRUE),
min(simu[[i]], na.rm = TRUE),
max(simu[[i]], na.rm = TRUE)
))}
 
</rcode>
 
==== POPs and obesity ====


Dioxins and PCBs have been assosiated to type 2 diabetes. Do dioxins cause diabetes, or do diabetes decrease dioxin elimination, or does obesity increase diabetes and decrease dioxin elimination, or something else? We tried to make sense of this by looking at sarcoma study data.
Dioxins and PCBs have been assosiated to type 2 diabetes. Do dioxins cause diabetes, or do diabetes decrease dioxin elimination, or does obesity increase diabetes and decrease dioxin elimination, or something else? We tried to make sense of this by looking at sarcoma study data.
Line 1,099: Line 1,115:
</rcode>
</rcode>


=== Self-reported chemical exposure ===
==== Self-reported chemical exposure ====


We looked at self-reported chemical exposure, especially pesticides and wood preservatives.
We looked at self-reported chemical exposure, especially pesticides and wood preservatives.
Line 1,265: Line 1,281:
</rcode>
</rcode>


=== Correlation of dioxin and fish ===
==== Correlation of dioxin and fish ====


How do individual dioxin congeners correlate with individual fish parametres in the questionnaire?
How do individual dioxin congeners correlate with individual fish parametres in the questionnaire?
Line 1,668: Line 1,684:
}}
}}


=== EU kalat ===
==== EU kalat ====


* The code that used to be here was moved to [[EU-kalat#Calculations]].
* The code that used to be here was moved to [[EU-kalat#Calculations]].

Revision as of 12:36, 10 May 2017



Question

Because it is obvious that there is a great need for improved exposure assessment in studying cancer risk of dioxins, we decided to undertake the major effort of conducting a large case-control study on soft-tissue sarcoma and measure dioxin concentrations individually in both patients and controls. Because this can be done accurately only from very large blood samples or from fat samples taken during an operation, we studied STS patients coming to surgery because of their tumor and selected appendicitis patients as controls. In the general population, the exposure to dioxins is almost totally from dietary sources — in Finland mostly from fish — and it varies widely among the population. Because of the extremely long half-life of dioxins, measured levels of dioxin at the time of operation can be used to estimate the lifetime cumu- lative exposure accurately. There is a priori no simultaneous exposure to chlorophenols or phenoxy acid herbicides, which behave completely differently in the environment, have relatively short half-lives in humans and are excreted in a few days. This enables us to estimate the association of STS with clean dioxin exposure without concomitant exposure to the main chemical, in contrast to occupational studies.[1]

Answer

There is simulated data available about the study. For details, see #Simulated data.


Main fish consumption and PCDD/F variables

Some plots about dioxin congeners.

What congener do you want to plot on X axis?:

What congener do you want to plot on X axis?:

+ Show code

Rationale

Methods

Study population

The majority of sarcoma patients in southern Finland are treated by the multidisciplinary sarcoma group of Helsinki University Central Hospital, with the remaining cases in the University Hospitals of Kuopio, Turku, or Tampere. All patients referred to these hospitals for operative treatment of STS between June 1997 (August 1996 in Helsinki) and December 1999 and more than 15 years of age were eligible as cases. The diagnoses were verified histologically for all except 7 patients. Sarcomas connected with known familial or genetic conditions, as well as sarcomas arising in visceral organs and bone, were excluded. Also other malignancies than STS, as well as nonmalignant tumors, were rejected. Some patients were operated twice during the study period; the second sample was not processed.

All patients who were operated due to an appendicitis diagnosis in a study hospital and who were more than 15 years of age were eligible as controls. They were collected from the same catchment area as the STS patients by dividing it into 15 areas (mainly according to former Finnish health care districts). One hospital performing appendectomy operations was recruited to the study from each area (in Helsinki, 2 hospitals). These were the university, central, or district hospitals of Helsinki, Hyvinka¨a¨, Ha¨meenlinna, Joensuu, Jyva¨skyla¨, Kotka, Kuopio, Lahti, Lappeenranta, Pori, Seina¨joki, Tampere, Turku and Vaasa, and the municipality hospitals in Espoo (Jorvi Hospital) and Helsinki (Maria Hospital). Informed consent was obtained from all patients in writing before the operation. The study was approved by the ethics committees of the National Public Health Institute and the hospitals involved.

The total number of patients recruited during the fieldwork was 972. One case was deleted due to missing address information, 1 case and 2 controls due to missing age information, and 3 cases and 11 controls since their fat samples were too small for dioxin analysis. As a result, we had 954 patients (148 cases and 806 controls) available for matching. The age range was 17.0 –91.1 years for cases and 15.0–88.7 years for controls. Based on National Cancer Registry data, we caught 70%, 9%, 17% and 26% of STS patients in Helsinki, Turku, Tampere and Kuopio University hospital regions, respectively, during the study period (calendar years 1997–1999). In Helsinki, all patients treated surgically with correct diagnosis were caught and agreed to participate; those not caught were either treated nonsurgically or misdiagnosed. Based on hospital discharge registry data, we estimate that about onefourth of appendicitis patients were caught in average during the most active collection period, but differences between hospitals were large.

The cases and controls were individually matched for area and age at the end of the fieldwork. This was done to ensure that there are enough controls from small areas and old age groups in the final data set, as it was not possible to analyze all recruited patients for dioxin. Area was defined based on the area of residence using the 15 areas described above. The age was determined at the day of operation. Maximum allowed difference in age between cases and controls was ± 3 years if case was < 38.0 years old, and ± 6 years if case was >= 38.0 years old. The control closest by age was matched to the case. Cases with fewer controls had a priority over cases with more controls. The number of controls per case was limited to 3. For 110 cases, 227 matching controls could be found in the pool. Thirty-nine cases had 1 control, 25 cases had 2 and 46 cases had 3 controls; for 38 cases, no control matching both age and area could be found.

Exposure assessment

From the matched 337 patients, concentrations of the 17 toxic polychlorinated dibenzo-p-dioxins and dibenzofurans (PCDD/Fs) were measured from a subcutaneous fat sample obtained during an appendectomy or sarcoma operation. Measurements were done by gas chromatography-mass spectrometry30 at the Laboratory of Chemistry, which is an accredited testing laboratory (T077) for the analysis of dioxins in human samples (current standard: EN ISO/ IEC 17025) and has successfully participated in WHO/Euro intercalibrations. The concentrations were summed up after the value of each congener was multiplied by its relative toxic potency (toxic equivalency factor, TEF). The TEF values according to WHO31 were used, resulting in toxic equivalent concentrations (WHOTEq). Fat samples were analyzed during and after the collection period. Samples from STS patients were always analyzed in a batch containing also samples from appendicitis patients. All analytical work was performed blind so that the chemistry laboratory did not know the diagnosis of the patient. Quality assurance of analysis was performed with 2 separate means: 2 preformulated pools of human fat with different concentrations of dioxins [10.6 (n = 35) and 40.2 (n = 33) ng/kg (WHO-TEq in fat)] were always run with each lot of samples, and 36 individual fat samples with WHO-TEqs ranging from 6.9 to 116 ng/kg fat were analyzed as duplicates. The coefficients of variation for WHO-TEq in preformulated pools were 5.1% and 5.7%, respectively, and in duplicate analysis, 6.2%.[2]

A detailed questionnaire about socioeconomic and lifestyle factors and chemical exposures was given to the patients in the hospital. If the patient was found not to have received the questionnaire in the hospital or if the patient did not return it, a new copy was sent to the patient’s home address. Of the matched subjects, 84 cases (76%) and 185 controls (81%) have also questionnaire information.

Detailed exposure assessment

The concentrations of the 17 toxic PCDD/F congeners and of the 36 PCB congeners were measured from fat of a subcutaneous tissue sample (0.3–1.5 g of fat) which was obtained during an appendectomy or sarcoma operation. The toxic equivalents (WHOPCDD/F-TEQ and WHOPCB-TEQ) were calculated with the sets of toxic equivalency factors (TEF), recommended by WHO in 1998 (Van den Berg et al., 1998).

Fat from tissue sample was extracted with toluene for 18–24 h using the Soxhlet apparatus. The fat content was determined gravimetrically after changing the solvent to hexane using nonane as a keeper. Fat sample was spiked with a set of 13C-labeled internal standards: sixteen 2,3,7,8-chlorinated PCDD/F congeners, three non-ortho PCBs (PCB 77, 126, 169), and nine other PCBs (PCB 30 [12C-labeled], 80, 101, 105, 138, 153, 156, 180, 194).

The sample was defatted in a silica gel column containing acidic and neutral layers of silica, and all analytes were eluted with dichloromethane (DCM):cyclohexane (c-hexane) (1:1). PCDD/Fs were separated from PCBs on activated carbon column (Carbopack C, 60/80 mesh) containing Celite (Merck 2693). The first fraction including PCBs was eluted with DCM:c-hexane (1:1) following a back elution of the second fraction (PCDD/Fs) with toluene. Eluents from both of the fractions were evaporated using nonane as a keeper and then fractions in n-hexane were further cleaned by passing them through an activated alumina column (Merck 1097). The PCDD/F fraction was eluted from the alumina column with 20% DCM in n-hexane and recovery standards (13C 1,2,3,4-TCDD and 13C 1,2,3,7,8,9-HxCDD) were added to the fraction before DCM and n-hexane were replaced by 10-15 μl of nonane. The PCB fraction was eluted from the alumina column with 2% DCM in n-hexane, and the fraction, after changing the eluent to n-hexane, was transferred to another activated carbon column (without Celite) in order to separate the non-ortho PCBs from other PCBs. DCM (50%) in n-hexane was used to elute other PCBs while non-ortho PCBs were back eluted with toluene. Recovery standards, PCB 159 for other PCBs and 13C PCB 60 for non-ortho PCBs were added prior to analysis; the solvent for other PCBs (DCM:n-hexane, 1:1) was replaced by 300 μl of n-hexane, for non-ortho PCBs toluene was replaced by 10–15 μl of nonane. The quantitation was performed by selective ion recording mode using a VG 70–250 SE (VG Analytical, UK) mass spectrometer (resolution 10,000) equipped with a HP 6890 gas chromatograph with a fused silica capillary column (DB-DIOXIN, 60 m, 0.25 mm, 0.15 μm). Two μl were injected into a split-splitless injector at 270 °C. The temperature programs for PCDD/Fs, non-ortho-PCBs, and other PCBs were:

  • start, 140 °C (4 min), rate 20 °C min−1 to 180 °C (0 min), rate 2 °C min−1 to 270 °C (36 min);
  • start, 140 °C (4 min), rate 20 °C min−1 to 200 °C (0 min), rate 10 °C min−1 to 270 °C (12 min);
  • start, 60 °C (3 min), rate 20 °C min−1 to 200 °C (0 min), rate 4 °C min−1 to 270 °C (14 min); respectively.

Limits of quantitation (LOQ) for PCDD/Fs and non-ortho PCBs varied between 0.1–5 and 1–5 pg g−1 fat, respectively, and for other PCBs between 0.02 and 0.1 ng g−1 fat, depending on each individual congener. Recoveries for internal standards were more than 50% for all congeners. Concentrations were calculated with lower bound method in which the results of congeners with concentrations below the LOQ were designated as nil.

This code was used to upload the data to Opasnet Base:

+ Show code


Quality control and assurance

Fat samples were analyzed during and after the collection period 1997–1999. All analytical work was performed blind such that the chemistry laboratory knew only the code of the sample. The laboratory reagent and equipment blank samples were treated and analyzed with the same method as the actual samples, one blank for every eight to ten samples. Quality assurance of analysis was performed in two separate ways: (a) two preformulated pools of human fat with different concentrations of PCDD/Fs [10.6 (n = 35) and 40.2 (n = 33) pg g−1 (WHOPCDD/F-TEQ in fat)] and PCBs [4.72 and 24.2 pg g−1 (WHOPCB-TEQ), respectively] were always run with each lot of samples and (b) 36 individual fat samples with WHOPCDD/F-TEQs ranging from 6.9 to 116 pg g−1 and WHOPCB-TEQs from 4.6 to 95 pg g−1 were analyzed in duplicate. The coefficients of variation (CV) for WHOPCDD/F-TEQ in preformulated pools were 5.1% and 5.7%, respectively, and for WHOPCB-TEQ 12 and 9.0%, respectively. In duplicate analysis the CV was 6.2% for WHOPCDD/F-TEQ and 18% for WHOPCB-TEQ.

The laboratory has successfully participated in several international quality control studies for the analysis of PCDD/Fs, and PCBs. Matrices in these studies have included cow milk, human milk and human serum. (Yrjänheikki, 1991, Rymen, 1994, WHO, 1996 and Lindström et al., 2000). The laboratory of chemistry in the National Public Health Institute is an accredited testing laboratory (No T077) in Finland (EN ISO/IEC 17025). The scope of accreditation includes PCDD/Fs, non-ortho PCBs, and other PCBs from human tissue samples.

Statistical analyses

Conditional logistic regression analysis was performed with SAS PHREG procedure. Odds ratios were estimated for each quintile of WHO-TEq, the sum of the toxic congeners and the most relevant individual congeners, i.e., 2378-TCDD, 2378-TCDF, 12378-PeCDD, 23478-PeCDF and 123678-HxCDD (abbreviations: T, tetra; Pe, penta; Hx, hexa; Hp, hepta; O, octa; CDD, chlorinated dibenzo-p-dioxin; CDF, chlorinated dibenzofuran). In the other congener-specific analyses, exposures were treated as continuous variables and odds ratios were calculated for an increase of an interquartile range of the exposure.

All analyses were adjusted for sex. Several variables collected with the questionnaire were used as confounders in the analysis one by one. Nonbinary variables were analyzed as quartiles. Radiation therapy given to an STS patient was considered as diseaserelated and ignored in the analyses if the link to the disease was stated in the questionnaire or if the therapy had been given within 1 year before the operation. The analysis with the largest number of missing values was that with education years with 63 cases and 112 controls, but otherwise there were at least 70 cases and 125 controls in the analyses.

Fish consumption was studied in detail. Specific questions about the frequency of fish consumption were asked: 1 about total fish consumption, and 10 about specific types of fish or fish species. Four fish types contributed most to the total fish consumption. They were assumed to have high (Baltic herring, Baltic salmon) or low (predatory fish from lakes, rainbow trout) dioxin concentration based on previous results.[3] The consumption frequencies (times per month) were calculated for high- and low-dioxin fish separately based on these 4 fish types. Exposure to the following chemicals was asked as a binary variable: solvents, solvent-based paints, formaldehyde, insecticides, fungicides/herbicides, wood preservatives, strong detergents, heavy metals, other chemicals.

Data

The code below runs the main fish consumption and PCDD/F variables, but because this is personal-level data, you need a password to run it. However, you can see ready-made results [1].

For variable descriptions, see D↷

Password:

+ Show code

Variable information

The variable information was originally documented in Log file about the statistical analyses: Part 1, but unfortunately mostly in Finnish.



Data management

Code to manage the data. It takes the original data files and merges them. Works only if files are available.



Assumptions

The following assumptions are used to interpret survey answers:

Assumptions for calculations(-)
ObsVariableValueUnitResultDescriptionKysymys suomeksi
1Q23dl /glass2Desilitres of milk or sourmilk per glass
2Q241times /a0Never
3Q242times /a0.5 - 0.9less than once a year
4Q243times /a2 - 5A few times a year
5Q244times /a12 - 361 - 3 times per month
6Q245times /a52once a week
7Q246times /a104 - 2082 - 4 times per week
8Q247times /a260 - 3645 or more times per week

How much mass, energy, and dioxin does one portion contain? Data are guesswork of from Fineli.

Food energy and dioxin(g,kJ,pg/portion)
ObsFoodMassEnergyDioxin
1Kalaa1006007
2Silakkaa100792470
3Petokalaa10030125
4Muikkua10075028
5Sisävesikalaa10066823
6Kirjolohta100106774
7Itämeren.lohta1001067770
8Muuta.Itämerestä10066815
9Pakastekalaa1003247
10Kalasäilykkeitä606007
11Valtamerikalaa1006007
12Äyriäisiä602007
13Leipää504060.01
14Puuroja2006420.02
15Makaronia2008460.02
16Muutaviljaa1506000.02
17Viiliä2003340.008
18Juustoja403000.012
19Rasvaisia.juustoja406000.03
20Jäätelöä15012000.03
21Liharuokaa15014001.5
22Maitoa2003580.004
23Piimää2003580.004
Portions per month(portions/mo)
ObsAnswerInterpretation
1En lainkaan0.003
2Harvemmin kuin kerran kuukaudessa tai en lainkaan0.1
3Harvemmin kuin kerran kuukaudessa0.5
4Kerran tai pari kuukaudessa1.5
5Kerran viikossa4
6Pari kertaa viikossa8
7Lähes joka päivä20
8Kerran päivässä tai useammin40

Analyses

Simulated data

This code was used to create a csv file that contains a simulated data from this study. When compared with the original data, the simulated data
  • has the same number of observations,
  • has the same range of values in each variable,
  • has approximately the same correlation structure between all variables.

+ Show code

POPs and obesity

Dioxins and PCBs have been assosiated to type 2 diabetes. Do dioxins cause diabetes, or do diabetes decrease dioxin elimination, or does obesity increase diabetes and decrease dioxin elimination, or something else? We tried to make sense of this by looking at sarcoma study data.

+ Show code

Self-reported chemical exposure

We looked at self-reported chemical exposure, especially pesticides and wood preservatives.

+ Show code

Correlation of dioxin and fish

How do individual dioxin congeners correlate with individual fish parametres in the questionnaire?

+ Show code

These estimates are based on the code above.

Binomial distribution parameter(probability)
ObsFishParameter
1Kalaa0.23179078
2Petokalaa0.17746642
3Muikkua0.14939457
4Sisävesikalaa0.09493785
5Kirjolohta0.29775961
6Silakkaa0.2152247
7Itämeren.lohta0.08175282
8Muuta.Itämerestä0.0282421
9Pakastekalaa0.21510166
10Kalasäilykkeitä0.21598007
11Valtamerikalaa0.1078198
12Äyriäisiä0.16021416

Correlation coefficients between fish dishes



EU kalat

  • The code that used to be here was moved to EU-kalat#Calculations.
  • What updates should be done:
    • Plot iterations to see that the model results do not drift.
    • Take modelled parameters and develop a MC model to produce predicted concentrations.
      • TCDD concentration should be added to the hierearchical Bayes model for this?
    • KTL Sarcoma study, EU-kalat and Goherr: Fish consumption study should all be combined into one model. ----#: . Can models be combined as text with paste()? This could work if all submodels had unique parameter names like N.eu and N.goh rather than just N. And data lists are merged simply with c(). --Jouni (talk) 16:22, 22 January 2017 (UTC) (type: truth; paradigms: science: comment)
    • A causal diagram should be drawn to show the model structure.
  • JAGS user manual [2] (with e.g. distribution names and other guidance)
  • How to generate predictions in JAGS [3]
  • Using rjags, a simple guidance [4]

Related:

  • Easily generate correlated variables from any distribution (without copulas) [5]

See also

Related files

References

  1. Jouni T. TUOMISTO, Juha PEKKANEN, Hannu KIVIRANTA, Erkki TUKIAINEN, Terttu VARTIAINEN and Jouko TUOMISTO. Soft-tissue sarcoma and dioxin: a case-control study. Int. J. Cancer: 108, 893–900 (2004)
  2. Chemosphere (2005) 60: 78: 854-869
  3. Kiviranta H, Korhonen M, Hallikainen A, Vartiainen T. Kalojen dioksiinien ja PCB:eiden kulkeutuminen ihmiseen. Ympäristö ja Terveys 2000; 31: 65-9.