KTL Sarcoma study: Difference between revisions

Latest revision as of 18:07, 1 August 2019

This page is a study. The page identifier is Op_en2721
Moderator:Jouni (see all)

Upload data {{#opasnet_base_link:Op_en2721}}

Authors: Jouni T. TUOMISTO, Juha PEKKANEN, Hannu KIVIRANTA, Erkki TUKIAINEN, Terttu VARTIAINEN and Jouko TUOMISTO
Reference: Int. J. Cancer: 108, 893–900 (2004) Chemosphere (2005) 60: 78: 854-869
Urn: NBN:fi-fe200901121020
Ethics: National Public Health Institute Ethics Committee
Journal: Int J Cancer}}

Question

Because it is obvious that there is a great need for improved exposure assessment in studying cancer risk of dioxins, we decided to undertake the major effort of conducting a large case-control study on soft-tissue sarcoma and measure dioxin concentrations individually in both patients and controls. Because this can be done accurately only from very large blood samples or from fat samples taken during an operation, we studied STS patients coming to surgery because of their tumor and selected appendicitis patients as controls. In the general population, the exposure to dioxins is almost totally from dietary sources — in Finland mostly from fish — and it varies widely among the population. Because of the extremely long half-life of dioxins, measured levels of dioxin at the time of operation can be used to estimate the lifetime cumu- lative exposure accurately. There is a priori no simultaneous exposure to chlorophenols or phenoxy acid herbicides, which behave completely differently in the environment, have relatively short half-lives in humans and are excreted in a few days. This enables us to estimate the association of STS with clean dioxin exposure without concomitant exposure to the main chemical, in contrast to occupational studies.^[1]

Answer

There is simulated data available about the study. For details, see #Simulated data.

Main fish consumption and PCDD/F variables

Some plots about dioxin congeners.

+ Show code - Hide code

library(OpasnetUtils)

a <- opbase.data("Op_en2721")

oprint(head(a))

plot(a[a$Congener == first, "Result"], a[a$Congener == second, "Result"], xlab = first, ylab = second)
plot(a[a$Congener == "OCDD", "Result"], a[a$Congener == "WHOTEQ", "Result"])
plot(a[a$Congener == "WHOTEQ", "Result"], a[a$Congener == "OCDD", "Result"])
plot(a[a$Congener == "OCDF", "Result"], a[a$Congener == "OCDD", "Result"])

Rationale

Methods

Study population

The majority of sarcoma patients in southern Finland are treated by the multidisciplinary sarcoma group of Helsinki University Central Hospital, with the remaining cases in the University Hospitals of Kuopio, Turku, or Tampere. All patients referred to these hospitals for operative treatment of STS between June 1997 (August 1996 in Helsinki) and December 1999 and more than 15 years of age were eligible as cases. The diagnoses were verified histologically for all except 7 patients. Sarcomas connected with known familial or genetic conditions, as well as sarcomas arising in visceral organs and bone, were excluded. Also other malignancies than STS, as well as nonmalignant tumors, were rejected. Some patients were operated twice during the study period; the second sample was not processed.

All patients who were operated due to an appendicitis diagnosis in a study hospital and who were more than 15 years of age were eligible as controls. They were collected from the same catchment area as the STS patients by dividing it into 15 areas (mainly according to former Finnish health care districts). One hospital performing appendectomy operations was recruited to the study from each area (in Helsinki, 2 hospitals). These were the university, central, or district hospitals of Helsinki, Hyvinka¨a¨, Ha¨meenlinna, Joensuu, Jyva¨skyla¨, Kotka, Kuopio, Lahti, Lappeenranta, Pori, Seina¨joki, Tampere, Turku and Vaasa, and the municipality hospitals in Espoo (Jorvi Hospital) and Helsinki (Maria Hospital). Informed consent was obtained from all patients in writing before the operation. The study was approved by the ethics committees of the National Public Health Institute and the hospitals involved.

The total number of patients recruited during the fieldwork was 972. One case was deleted due to missing address information, 1 case and 2 controls due to missing age information, and 3 cases and 11 controls since their fat samples were too small for dioxin analysis. As a result, we had 954 patients (148 cases and 806 controls) available for matching. The age range was 17.0 –91.1 years for cases and 15.0–88.7 years for controls. Based on National Cancer Registry data, we caught 70%, 9%, 17% and 26% of STS patients in Helsinki, Turku, Tampere and Kuopio University hospital regions, respectively, during the study period (calendar years 1997–1999). In Helsinki, all patients treated surgically with correct diagnosis were caught and agreed to participate; those not caught were either treated nonsurgically or misdiagnosed. Based on hospital discharge registry data, we estimate that about onefourth of appendicitis patients were caught in average during the most active collection period, but differences between hospitals were large.

The cases and controls were individually matched for area and age at the end of the fieldwork. This was done to ensure that there are enough controls from small areas and old age groups in the final data set, as it was not possible to analyze all recruited patients for dioxin. Area was defined based on the area of residence using the 15 areas described above. The age was determined at the day of operation. Maximum allowed difference in age between cases and controls was ± 3 years if case was < 38.0 years old, and ± 6 years if case was >= 38.0 years old. The control closest by age was matched to the case. Cases with fewer controls had a priority over cases with more controls. The number of controls per case was limited to 3. For 110 cases, 227 matching controls could be found in the pool. Thirty-nine cases had 1 control, 25 cases had 2 and 46 cases had 3 controls; for 38 cases, no control matching both age and area could be found.

Exposure assessment

From the matched 337 patients, concentrations of the 17 toxic polychlorinated dibenzo-p-dioxins and dibenzofurans (PCDD/Fs) were measured from a subcutaneous fat sample obtained during an appendectomy or sarcoma operation. Measurements were done by gas chromatography-mass spectrometry30 at the Laboratory of Chemistry, which is an accredited testing laboratory (T077) for the analysis of dioxins in human samples (current standard: EN ISO/ IEC 17025) and has successfully participated in WHO/Euro intercalibrations. The concentrations were summed up after the value of each congener was multiplied by its relative toxic potency (toxic equivalency factor, TEF). The TEF values according to WHO31 were used, resulting in toxic equivalent concentrations (WHOTEq). Fat samples were analyzed during and after the collection period. Samples from STS patients were always analyzed in a batch containing also samples from appendicitis patients. All analytical work was performed blind so that the chemistry laboratory did not know the diagnosis of the patient. Quality assurance of analysis was performed with 2 separate means: 2 preformulated pools of human fat with different concentrations of dioxins [10.6 (n = 35) and 40.2 (n = 33) ng/kg (WHO-TEq in fat)] were always run with each lot of samples, and 36 individual fat samples with WHO-TEqs ranging from 6.9 to 116 ng/kg fat were analyzed as duplicates. The coefficients of variation for WHO-TEq in preformulated pools were 5.1% and 5.7%, respectively, and in duplicate analysis, 6.2%.^[2]

A detailed questionnaire about socioeconomic and lifestyle factors and chemical exposures was given to the patients in the hospital. If the patient was found not to have received the questionnaire in the hospital or if the patient did not return it, a new copy was sent to the patient’s home address. Of the matched subjects, 84 cases (76%) and 185 controls (81%) have also questionnaire information.

Detailed exposure assessment

The concentrations of the 17 toxic PCDD/F congeners and of the 36 PCB congeners were measured from fat of a subcutaneous tissue sample (0.3–1.5 g of fat) which was obtained during an appendectomy or sarcoma operation. The toxic equivalents (WHOPCDD/F-TEQ and WHOPCB-TEQ) were calculated with the sets of toxic equivalency factors (TEF), recommended by WHO in 1998 (Van den Berg et al., 1998).

Fat from tissue sample was extracted with toluene for 18–24 h using the Soxhlet apparatus. The fat content was determined gravimetrically after changing the solvent to hexane using nonane as a keeper. Fat sample was spiked with a set of 13C-labeled internal standards: sixteen 2,3,7,8-chlorinated PCDD/F congeners, three non-ortho PCBs (PCB 77, 126, 169), and nine other PCBs (PCB 30 [12C-labeled], 80, 101, 105, 138, 153, 156, 180, 194).

The sample was defatted in a silica gel column containing acidic and neutral layers of silica, and all analytes were eluted with dichloromethane (DCM):cyclohexane (c-hexane) (1:1). PCDD/Fs were separated from PCBs on activated carbon column (Carbopack C, 60/80 mesh) containing Celite (Merck 2693). The first fraction including PCBs was eluted with DCM:c-hexane (1:1) following a back elution of the second fraction (PCDD/Fs) with toluene. Eluents from both of the fractions were evaporated using nonane as a keeper and then fractions in n-hexane were further cleaned by passing them through an activated alumina column (Merck 1097). The PCDD/F fraction was eluted from the alumina column with 20% DCM in n-hexane and recovery standards (13C 1,2,3,4-TCDD and 13C 1,2,3,7,8,9-HxCDD) were added to the fraction before DCM and n-hexane were replaced by 10-15 μl of nonane. The PCB fraction was eluted from the alumina column with 2% DCM in n-hexane, and the fraction, after changing the eluent to n-hexane, was transferred to another activated carbon column (without Celite) in order to separate the non-ortho PCBs from other PCBs. DCM (50%) in n-hexane was used to elute other PCBs while non-ortho PCBs were back eluted with toluene. Recovery standards, PCB 159 for other PCBs and 13C PCB 60 for non-ortho PCBs were added prior to analysis; the solvent for other PCBs (DCM:n-hexane, 1:1) was replaced by 300 μl of n-hexane, for non-ortho PCBs toluene was replaced by 10–15 μl of nonane. The quantitation was performed by selective ion recording mode using a VG 70–250 SE (VG Analytical, UK) mass spectrometer (resolution 10,000) equipped with a HP 6890 gas chromatograph with a fused silica capillary column (DB-DIOXIN, 60 m, 0.25 mm, 0.15 μm). Two μl were injected into a split-splitless injector at 270 °C. The temperature programs for PCDD/Fs, non-ortho-PCBs, and other PCBs were:

start, 140 °C (4 min), rate 20 °C min−1 to 180 °C (0 min), rate 2 °C min−1 to 270 °C (36 min);
start, 140 °C (4 min), rate 20 °C min−1 to 200 °C (0 min), rate 10 °C min−1 to 270 °C (12 min);
start, 60 °C (3 min), rate 20 °C min−1 to 200 °C (0 min), rate 4 °C min−1 to 270 °C (14 min); respectively.

Limits of quantitation (LOQ) for PCDD/Fs and non-ortho PCBs varied between 0.1–5 and 1–5 pg g−1 fat, respectively, and for other PCBs between 0.02 and 0.1 ng g−1 fat, depending on each individual congener. Recoveries for internal standards were more than 50% for all congeners. Concentrations were calculated with lower bound method in which the results of congeners with concentrations below the LOQ were designated as nil.

This code was used to upload the data to Opasnet Base:

+ Show code - Hide code

library(OpasnetBaseUtils)
library(reshape)
a <- tidy(op_baseGetData("opasnet_base", "Op_en2721"))
colnames(a)[colnames(a) == "Result"] <- "WHOTEQ"
for(i in 1:ncol(a)) {a[, i] <- as.numeric(a[, i])}
a <- a[!is.na(a$public_id), ]
#colnames(a)
#class(a)
#class(a$Result)
#a$Result
#as.numeric(a$Result)
#a$Result <- as.numeric(a$Result)
a <- melt(a, id.vars = "public_id")
colnames(a) <- c("Obs", "Congener", "Result")
a[a$Congener == "WHOTEQ", ]
plot(a[a$Congener == "OCDD", "Result"], a[a$Congener == "WHOTEQ", "Result"])
plot(a[a$Congener == "WHOTEQ", "Result"], a[a$Congener == "OCDD", "Result"])
plot(a[a$Congener == "OCDF", "Result"], a[a$Congener == "OCDD", "Result"])
a
#op_baseWrite("opasnet_base", a, ident = "Op_en2721", unit = "pg /g_fat", who = "Jouni", acttype = 4)

Quality control and assurance

Fat samples were analyzed during and after the collection period 1997–1999. All analytical work was performed blind such that the chemistry laboratory knew only the code of the sample. The laboratory reagent and equipment blank samples were treated and analyzed with the same method as the actual samples, one blank for every eight to ten samples. Quality assurance of analysis was performed in two separate ways: (a) two preformulated pools of human fat with different concentrations of PCDD/Fs [10.6 (n = 35) and 40.2 (n = 33) pg g−1 (WHOPCDD/F-TEQ in fat)] and PCBs [4.72 and 24.2 pg g−1 (WHOPCB-TEQ), respectively] were always run with each lot of samples and (b) 36 individual fat samples with WHOPCDD/F-TEQs ranging from 6.9 to 116 pg g−1 and WHOPCB-TEQs from 4.6 to 95 pg g−1 were analyzed in duplicate. The coefficients of variation (CV) for WHOPCDD/F-TEQ in preformulated pools were 5.1% and 5.7%, respectively, and for WHOPCB-TEQ 12 and 9.0%, respectively. In duplicate analysis the CV was 6.2% for WHOPCDD/F-TEQ and 18% for WHOPCB-TEQ.

The laboratory has successfully participated in several international quality control studies for the analysis of PCDD/Fs, and PCBs. Matrices in these studies have included cow milk, human milk and human serum. (Yrjänheikki, 1991, Rymen, 1994, WHO, 1996 and Lindström et al., 2000). The laboratory of chemistry in the National Public Health Institute is an accredited testing laboratory (No T077) in Finland (EN ISO/IEC 17025). The scope of accreditation includes PCDD/Fs, non-ortho PCBs, and other PCBs from human tissue samples.

Statistical analyses

Conditional logistic regression analysis was performed with SAS PHREG procedure. Odds ratios were estimated for each quintile of WHO-TEq, the sum of the toxic congeners and the most relevant individual congeners, i.e., 2378-TCDD, 2378-TCDF, 12378-PeCDD, 23478-PeCDF and 123678-HxCDD (abbreviations: T, tetra; Pe, penta; Hx, hexa; Hp, hepta; O, octa; CDD, chlorinated dibenzo-p-dioxin; CDF, chlorinated dibenzofuran). In the other congener-specific analyses, exposures were treated as continuous variables and odds ratios were calculated for an increase of an interquartile range of the exposure.

All analyses were adjusted for sex. Several variables collected with the questionnaire were used as confounders in the analysis one by one. Nonbinary variables were analyzed as quartiles. Radiation therapy given to an STS patient was considered as diseaserelated and ignored in the analyses if the link to the disease was stated in the questionnaire or if the therapy had been given within 1 year before the operation. The analysis with the largest number of missing values was that with education years with 63 cases and 112 controls, but otherwise there were at least 70 cases and 125 controls in the analyses.

Fish consumption was studied in detail. Specific questions about the frequency of fish consumption were asked: 1 about total fish consumption, and 10 about specific types of fish or fish species. Four fish types contributed most to the total fish consumption. They were assumed to have high (Baltic herring, Baltic salmon) or low (predatory fish from lakes, rainbow trout) dioxin concentration based on previous results.^[3] The consumption frequencies (times per month) were calculated for high- and low-dioxin fish separately based on these 4 fish types. Exposure to the following chemicals was asked as a binary variable: solvents, solvent-based paints, formaldehyde, insecticides, fungicides/herbicides, wood preservatives, strong detergents, heavy metals, other chemicals.

Data

The code below runs the main fish consumption and PCDD/F variables, but because this is personal-level data, you need a password to run it. However, you can see ready-made results [1].

For variable descriptions, see D↷

+ Show code - Hide code


library(OpasnetUtils)
objects.get("isqT7nvhd0ViUR7d")

data <- objects.decode(etable, password)
colnames(data) <- t(data[1, ])
data <- data[2:nrow(data), 2:ncol(data)]

for(i in 1:ncol(data)) {
	data[[i]] <- as.numeric(as.character(data[[i]]))
}

colnames(data)[colnames(data) == "aluenro"] <- "BMI" # Poistetaan aluenro-sarake ja korvataan se BMI:llä.
data$BMI <- data$Paino / (data$Pituus / 100)^2

# oprint(head(data))

cat("Data from P:\\huippuyksikko\\Tutkimus\\R16_sarkooma\\Data\\Panulle20031216\\Analyysi020712_typistetty.xls.", nrow(data), "observations.\n")

oprint(cor(x = data, use = "pairwise.complete.obs", method = "pearson"))

# Basic Scatterplot Matrix
pairs(~ika+BMI+Kalaa+Silakkaa+PF23478+WHOTEQ, data = data,
   main="Simple Scatterplot Matrix")

Questionnaire

Variable information

The variable information was originally documented in Log file about the statistical analyses: Part 1, but unfortunately mostly in Finnish.

Show details

Tähän on syytä kirjata myös muuttujaluettelo, koska sitä ei ole missään muualla kunnolla tehty.

      LEIKKAUS    leikkauspvm, SASin oma formaatti (päivää jostakin kiintopisteestä?)
      IKA         ikä vuosina leikkauspäivänä
      ALUENRO     aluenro	tutkimusalue (potilaan kotiosoitteen postinumeron perusteella)
			1	-
			2	Espoo
			3	Helsinki
			4	Hyvinkää
			5	Hämeenlinna
			6	Joensuu
			7	Jyväskylä
			8	Kotka
			9	Kuopio
			10	Lahti
			11	Lappeenranta
			12	Pori
			13	Seinäjoki
			14	Tampere
			15	Turku
			16	Vaasa
      PNRO        postinumero K3 (K#=kyselylomakkeen kysymys nro #)
      STRATUM     stratum eli tapauksen numero ilman S-etuliitettä
      STRATASE    strataset eli tässä analyysissä 9
      DGLUOKKA    diagnoosiluokka tapauksilla
			DG_id	DGnimi
			1	MFH
			2	Liposarcoma
			3	Leiomyosarcoma
			4	Angiosarcoma
			5	Chondrosarcoma
			6	Sarcoma synoviale
			7	Sarcoma Ewing
			8	Dermatofibrosarcoma
			9	Sarcoma alia
			10	Sarcoma NUD
			11	Ei tietoa
			12	Osteosarcoma extrasceletale
			21	Lipoma
			22	Tumor Desmoides
			23	Myxoma
			24	Muu benigni tuumori
			25	Melanoma
			26	Muu kuin tuumori
			27	Tuplanäyte
      SP          sukupuoli 1: mies, 2: nainen K6
      KOULUV      kouluvuodet K8
      PITUUS      pituus, cm K15
      PAINO       paino nyt, kg K16
      KALAA       kalansyönti K21
			1	Harvemmin kuin kerran kuukaudessa tai en lainkaan
			2	Kerran tai pari kuukaudessa
			3	Kerran viikossa
			4	Pari kertaa viikossa
			5	Lähes joka päivä
			6	Kerran päivässä tai useammin
      PETOKALA    K22 (muuttujaan _YRI_ISI asti)
			0	En lainkaan
			1	Harvemmin kuin kerran kuukaudessa
			2	Kerran tai pari kuukaudessa
			3	Kerran viikossa
			4	Pari kertaa viikossa
			5	Lähes joka päivä
      MUIKKUA     
      SIS_VESI    
      KIRJOLOH    
      SILAKKAA    
      IT_MEREN    
      MUUTA_IT    
      PAKASTEK    
      KALAS_IL    
      VALTAMER    
      _YRI_ISI    
      SADE        sädehoito K47
      VASTATTU    onko kysely palautettu? YES/NO
      SADEAIK     aikaisempi sädehoito (yhdistelty kyselyn ja sairaalan tiedoista)
      TCDF2378    kongeneerispesifinen pitoisuus ng/kg (absoluuttiyksiköt rasvassa)
      TCDD2378    
      PF12378     
      PF23478     
      PD12378     
      HF123478    
      HF123678    
      HF234678    
      HF123789    
      HD123478    
      HD123678    
      HD123789    
      F1234678    
      F1234789    
      D1234678    
      OCDF        
      OCDD        
      TOXSUM      edellä mainittujen 17 kongeneerin summa
      WHOTEQ      edellä mainittujen 17 kongeneerin WHO-TEF-kertoimella painotettu summa
      TUPAK       oletko koskaan tupakoinut 1=ei, 2=kyllä K30
      TSTATUS     tupakointi: 1= < 6 kk sitten, 2= > 6 kk sitten, 3=ei koskaan K30+K32
      PACKYEAR    askivuodet: poltetut vuodet K31*( eri valmisteiden summa kpl/pvä K33)/20
			NULL, jos K30=2 mutta K32=NULL
      KEMIK       monelleko kemikaaliryhmälle on altistunut? K42
		   Onko altistunut 1=en, 2=kyllä (K42 muuttujaan MUU2 asti)
      LIUOTTI     liuottimet
      MAALIT      liuotinpohjaiset maalit
      FORMALDE    formaldehydi
      HYONTEIS    hyönteismyrkyt
      KASVINSU    kasvinsuojeluaineet
      KYLLASTE    puunkyllästeet
      PESUAINE    voimakkaat pesuaineet
      METALLIT    raskasmetallit
      MUU1        muu
      MUU2        muu
			K42 kuvauksen perusteella 0=ei altistusta 4=pahin altistus
      HYONLUOK    hyönteismyrkyt
      KASVLUOK    kasvinsuojeluaineet
      KYLLLUOK    puunkyllästeet
      ALKO        kuinka usein alkoholia K35
			1	päivittäin
			2	muutaman kerran viikossa
			3	noin kerran viikossa
			4	pari kertaa kuukaudessa
			5	noin kerran kuukaudessa
			6	noin kerran parissa kuukaudessa
			7	3-4 kertaa vuodessa
			8	pari kertaa vuodessa
			9	kerran vuodessa tai harvemmin
			10	en koskaan
      ALKOUS      K35 ilmoitettuna kertaa/viikko (7, 3, 1, 0.5, 0.2, 0.1, 0.05, 0.03, 0.01, 0)
      ALKOKULU    alkoholin kulutus annosta/viikko. Lasketaan K36=pal ja K35 avulla
			alkokulu: IIf([pal] Is Null And [alko]>8,0,[alkous]*[alkomaar])
	alkomaar   K36 ilmoitettuna annosta/kerta: 0,1,2,3,5,8,12 annosta
      ASUKKAIT    potilaan kotikunnan asukasluku (vuonna 1996? Pitäisikö tarkistaa mistä tilasto on?)
      bmi         paino / pituus / pituus * 10000
      mkala       Nämä kolme kalamuuttujaa on saatu regressiomallilla sovittamalla luokittelu-
      mkaladio     muuttujat suoraan regressiomalliin. Yksiköt ovat siis epämääräisiä. Ehkä pitäisi
      mkalamuu     muuttaa ensin kalamuuttujat muotoon annosta/kk niin olisi mielekäs tulkinta?
      muu         if muu1 = 1 or muu2 = 1 then muu = 1 (muuten muu=0) HUOM! Tämä muu on laskettu
                  päin mäntyä: 1 tarkoittaa ei altistusta eli jos ei ole kahta altistusta, merkitään
                  altistuneeksi ja jos ei ole tietoa tai on kaksi altistusta, merkitään 
                  altistumattomaksi. ÄLÄ KÄYTÄ SIIS TÄTÄ MUUTTUJAA MISSÄÄN!
      smoking     3-tstatus; 0=ei ikinä tupakoinut, 1= > 6 kk sitten, 2= < 6 kk sitten
      never       smoking jaetaan 0->never=1, 1->former=1, 2->current=1
      former
      current
mkala = 1.180840432 +
     PETOKALA * 0.285437176 +
     MUIKKUA  * 0.120796063 +
     SIS_VESI * 0.081192631 +
     KIRJOLOH * 0.432524410 +
     SILAKKAA * 0.201481292 +
     IT_MEREN * 0.186820358 +
     MUUTA_IT *-0.096115539 +
     PAKASTEK * 0.105839909 +
     KALAS_IL *-0.059836406 +
     VALTAMER * 0.095303307 +
     _YRI_ISI * 0.033074683;
mkaladio = 1.180840432 +
     SILAKKAA * 0.201481292 +
     IT_MEREN * 0.186820358;
mkalamuu = 1.180840432 +
     PETOKALA * 0.285437176 +
     MUIKKUA  * 0.120796063 +
     SIS_VESI * 0.081192631 +
     KIRJOLOH * 0.432524410 +
     MUUTA_IT *-0.096115539 +
     PAKASTEK * 0.105839909 +
     KALAS_IL *-0.059836406 +
     VALTAMER * 0.095303307 +
     _YRI_ISI * 0.033074683;

Data management

Code to manage the data. It takes the original data files and merges them. Works only if files are available.

Show details

# Sarcoma epidemiological data
# Data was obtained from this data file (saved 23.9.2002, version 12.7.2002)
# It contains all data that was used in the publication but not e.g. all questionnaire data.
# U:\arkisto_kuopio\huippuyksikko\Tutkimus\R16_sarkooma\Analyysit\Analyysi020712\Analyysi020712.xls

library(OpasnetUtils)

sarc <- read.table(
	"//cesium/yhteiset/YMAL/Projects/Silakan riskiarvio/Data/Salaiset/Sarkooma/Analyysi020712_copy.csv",
	sep = ",", header = TRUE
)

colnames(sarc)[c(24, 27, 28, 30, 32)] <- c(
	"Sisävesikalaa",
	"Itämeren.lohta",
	"Muuta.Itämerestä",
	"Kalasäilykkeitä",
	"Äyriäisiä"
)

########################################### Questionnaire
# The questions come from the questionnaire form
# U:\arkisto_kuopio\huippuyksikko\Tutkimus\R16_sarkooma\KTL_sarcoma_study_questionnaire.odt
# The questionnaire data comes from
# U:\arkisto_kuopio\huippuyksikko\Tutkimus\R80_Sarkooma2\Data\Kyselyt.xls (Kyselyt.csv does not contain åäö)
# Kyselyt.xls was saved to N:\YMAL\Projects\Silakan riskiarvio\Data\Salaiset\Kyselyt.csv 

ques <- read.table(
	"//cesium/yhteiset/YMAL/Projects/Silakan riskiarvio/Data/Salaiset/Kyselyt.csv",
	sep = ",", header = TRUE
)

poista <- c(
	"Pituus",
	"Paino",
	"Kalaa",
	"Petokalaa",
	"Muikkua",
	"Sisävesikalaa",
	"Kirjolohta",
	"Silakkaa",
	"Itämeren.lohta",
	"Muuta.itämerestä",
	"Pakastekalaa",
	"Kalasäilykkeitä",
	"Valtamerikalaa",
	"Äyriäisiä",
	"Vastattu",
	"AlkoKuinka.usein",
	"Onko1",
	"Onko2",
	"Onko3",
	"Onko4",
	"Onko5",
	"Onko6",
	"Onko7",
	"Onko8",
	"Onko9",
	"Onko10"
)

# The tests below shows that the questionnaire columns in sarc and ques are actually identical. Therefore
# we remove them from ques and keep those in sarc, the data file that was used in the publication.

#for(i in testaa) {
#	x <- dat[[paste(i, ".x", sep = "")]]
#	if(is.factor(x)) x <- as.numeric(x) # levels(x)[x]
#	y <- dat[[paste(i, ".y", sep = "")]]
#	if(is.factor(y)) y <- as.numeric(y) # levels(y)[y]
#	print(paste(i, sum( x != y, na.rm = TRUE)))
#}

#sum(as.numeric(dat$alko) != as.numeric(dat$AlkoKuinka.usein), na.rm = TRUE)
#sum(as.numeric(dat$liuotti) != as.numeric(dat$Onko1), na.rm = TRUE)
#sum(as.numeric(dat$maalit) != as.numeric(dat$Onko2), na.rm = TRUE)
#sum(as.numeric(dat$formalde) != as.numeric(dat$Onko3), na.rm = TRUE)
#sum(as.numeric(dat$hyonteis) != as.numeric(dat$Onko4), na.rm = TRUE)
#sum(as.numeric(dat$Kasvinsu) != as.numeric(dat$Onko5), na.rm = TRUE)
#sum(as.numeric(dat$Kyllate) != as.numeric(dat$Onko6), na.rm = TRUE)
#sum(as.numeric(dat$Pesuaine) != as.numeric(dat$Onko7), na.rm = TRUE)
#sum(as.numeric(dat$Metallit) != as.numeric(dat$Onko8), na.rm = TRUE)
#sum(as.numeric(dat$Muu1) != as.numeric(dat$Onko9), na.rm = TRUE)
#sum(as.numeric(dat$Muu2) != as.numeric(dat$Onko10), na.rm = TRUE)

ques <- ques[!colnames(ques) %in% poista]
dat <- merge(sarc, ques, by = "kysely_id", all.x = TRUE)

ruoat <- list(c(
	"Harvemmin kuin kerran kuukaudessa tai en lainkaan",
	"Kerran tai pari kuukaudessa",
	"Kerran viikossa",
	"Pari kertaa viikossa",
	"Lähes joka päivä",
	"Kerran päivässä tai useammin"
),
NA,
TRUE
)

kalat <- list(
	c(
		"En lainkaan",
		"Harvemmin kuin kerran kuukaudessa",
		"Kerran tai pari kuukaudessa",
		"Kerran viikossa",
		"Pari kertaa viikossa",
		"Lähes joka päivä"
	),
	0:5,
	TRUE
)

yn <- list(c("No", "Yes"), NA, TRUE)

locs <- list(
	aluenro = c(
		"-",
		"Espoo",
		"Helsinki",
		"Hyvinkää",
		"Hämeenlinna",
		"Joensuu",
		"Jyväskylä",
		"Kotka",
		"Kuopio",
		"Lahti",
		"Lappeenranta",
		"Pori",
		"Seinäjoki",
		"Tampere",
		"Turku",
		"Vaasa"
	),
	dgluokka = list(
		c(
			"MFH",
			"Liposarcoma",
			"Leiomyosarcoma",
			"Angiosarcoma",
			"Chondrosarcoma",
			"Sarcoma synoviale",
			"Sarcoma Ewing",
			"Dermatofibrosarcoma",
			"Sarcoma alia",
			"Sarcoma NUD",
			"Ei tietoa",
			"Osteosarcoma extrasceletale",
			"Lipoma",
			"Tumor Desmoides",
			"Myxoma",
			"Muu benigni tuumori",
			"Melanoma",
			"Muu kuin tuumori",
			"Tuplanäyte"
		),
		c(1:12, 21:27)
	),
	sp = c("Male", "Female"),
	Kalaa = ruoat,
	Petokalaa = kalat,
	Muikkua = kalat,
	Sisävesikalaa = kalat,
	Kirjolohta = kalat,
	Silakkaa = kalat,
	Itämeren.lohta = kalat,
	Muuta.Itämerestä = kalat,
	Pakastekalaa = kalat,
	Kalasäilykkeitä = kalat,
	Valtamerikalaa = kalat,
	Äyriäisiä	= kalat,
	sade		= yn,
	tupak		= yn,
	tstatus 	= c("< 6 mo ago", "> 6 mo ago", "Never"),
	liuotti		= yn,
	maalit		= yn,
	formalde	= yn,
	hyonteis	= yn,
	Kasvinsu	= yn,
	Kyllaste	= yn,
	Pesuaine	= yn,
	Metallit	= yn,
	Muu1		= yn,
	Muu2		= yn,
	Hyonluok	= list(c("No", "Very mild", "Mild", "Moderate", "High"), 0:4, TRUE),
	Kasvluok	= list(c("No", "Very mild", "Mild", "Moderate", "High"), 0:4, TRUE),
	Kyllluok 	= list(c("No", "Very mild", "Mild", "Moderate", "High"), 0:4, TRUE),
	alko = list(c(
		"en koskaan",
		"kerran vuodessa tai harvemmin",
		"pari kertaa vuodessa",
		"3-4 kertaa vuodessa",
		"noin kerran parissa kuukaudessa",
		"noin kerran kuukaudessa",
		"pari kertaa kuukaudessa",
		"noin kerran viikossa",
		"muutaman kerran viikossa",
		"päivittäin"
	),
	10:1,
	TRUE
	),
	Koulutus = list(c(
		"Kansakoulu tai peruskoulu",
		"Keskikoulu",
		"Ammattikoulu tai vastaava",
		"Opistotutkinto ja/tai lukio",
		"Akateeminen tutkinto"
	),
	NA,
	TRUE
	),
	Työntekijäryhmä = c(
		"Ylempi toimihenkilö",
		"Alempi toimihenkilö",
		"Työntekijä",
		"Maanviljelijä",
		"Yrittäjä",
		"Opiskelija",
		"Eläkeläinen",
		"Kotirouva",
		"Työtön"
	),
	Painonmuutos = list(c(
		"Olen laihtunut",
		"Painoni ei ole juuri muuttunut",
		"Olen lihonut ja laihtunut",
		"Olen lihonut"
	),
	c(3, 2, 13, 1),
	TRUE
	),
	Ruokavalio = c(
		"ei erityisruokavaliota",
		"kasvisruoka sekä maito- ja munatuotteet",
		"ainoastaan kasvisruokavalio",
		"gluteeniton",
		"maidoton (ei edes hyla-tuotteita)",
		"muu, tarkempi kuvaus"
	),
	Leipää = ruoat,
	Puuroja = ruoat,
	Makaronia = ruoat,
	Muutaviljaa = ruoat,
	Viiliä = ruoat,
	Juustoja = ruoat,
	Rasvaisia.juustoja = ruoat,
	Jäätelöä = ruoat,
	Liharuokaa = ruoat,
	Mitä.maitoa = list(c(
		"en juo maitoa enkä piimää",
		"rasvatonta maitoa",
		"rasvatonta piimää tai kirnupiimää",
		"muuta piimää",
		"ykkösmaitoa",
		"kevytmaitoa",
		"täysmaitoa"
	),
	c(7, 4, 5, 6, 3, 2, 1),
	TRUE
	),
# Tästä välistä puuttuu Mitä.piimää. Vai puuttuuko? Onko yhdistetty maitoon?
	Mitä.leivälle = list(c(
		"En mitään",
		"Kasvimargariinia",
		"Voi-kasvirasvaseosta",
		"Voita"
	),
	NA,
	TRUE
	),
	Paljonko.rasvaa = list(c(
		"En lainkaan",
		"Voinapilla voitelen kolme viipaletta tai enemmän",
		"Voinapilla voitelen 1-2 viipaletta",
		"Käytän enemmän kuin yhden voinapin viipaletta kohti"
	),
	NA,
	TRUE
	),
	Mitä.rasvaa = list(c(
		"Ei mitään rasvaa",
		"Kasviöljyä",
		"Kasvimargariinia",
		"Talousmargariinia",
		"Voi-kasviöljyseosta",
		"Voita"
	),
	c(6, 1, 2, 3, 4, 5),
	TRUE
	),
	Ruokamuutos = c(
		"Ei",
		"Kyllä"
	),
	Vesilähde = c(
		"kunnan vesijohtovettä",
		"oman kaivon vettä",
		"kaupan pullotettua vettä",
		"muuta"
	),
	Tupakointi = c(
		"En",
		"Kyllä"
	),
	TupakSäännöllisyys = c(
		"en ole koskaan tupakoinut säännöllisesti",
		"olen tupakoinut säännöllisesti"
	),
	Tupakviimeksi = list(c(
		"yli 10 vuotta sitten",
		"6 - 10 vuotta sitten",
		"1 - 5 vuotta sitten",
		"puoli vuotta - vuosi sitten",
		"1 kk - puoli vuotta sitten",
		"2 pv - 1 kk sitten",
		"eilen tai tänään"
	),
	7:1,
	TRUE
	),
	Alkoholi = list(c(
		"en ole koskaan käyttänyt alkoholijuomia",
		"en, olen lopettanut alkoholin käytön kokonaan",
		"kyllä, harvemmin kuin kerran kuussa",
		"kyllä, vähintään kerran kuussa"
	),
	4:1,
	TRUE
	),
	AlkoKuinka.paljon = list(c(
		"vähemmän kuin yhden annoksen",
		"1 annoksen",
		"2 annosta",
		"3 annosta",
		"4-5 annosta",
		"6-10 annosta",
		"yli 10 annosta"
	),
	NA,
	TRUE
	),
	Asuntotyyppi = c(
		"omakotitalossa",
		"rivitalossa",
		"kerrostalossa"
	),
	Lämmitystyyppi = c(
		"kaukolämpö",
		"öljylämmitys",
		"sähkölämmitys",
		"puulämmitys",
		"muu"
	),
	Paikkaus = c(
		"Ei ole",
		"Kyllä"
	),
	PaikkaPoisto = yn,
	Montako.paikkaa = list(c(
		"ei yhtään",
		"1 - 2",
		"3 - 6",
		"7 - 15",
		"yli 15"
	),
	NA, 
	TRUE
	),
	Sairaus 	= yn,
	Elinsiirto	= yn,
	Tekonivel	= yn,
	Muu			= yn,
	Sädehoito	= yn,
	Vierasesine	= yn,
	AIDS		= yn,
	Neurofibr	= yn,
	vonHippel	= yn
)

for(i in names(locs)) {
	if(!is.list(locs[[i]])) locs[[i]] <- list(locs[[i]], NA)
	if(!is.numeric(locs[[i]][[2]])) locs[[i]][[2]] <- 1:length(locs[[i]][[1]])
	if(length(locs[[i]]) < 3) locs[[i]][[3]] <- FALSE
	if(i == "Kalaa") {print(locs[[i]]); print(dat[[i]][1:100])}
	dat[[i]] <- factor(dat[[i]], levels = locs[[i]][[2]], labels = locs[[i]][[1]], ordered = locs[[i]][[3]])
	if(i == "Kalaa") {print(locs[[i]]); print(dat[[i]][1:100])}
}

levels(dat$ikäAlin)[levels(dat$ikäAlin) == "20.5"] <- "20"
dat$ikaA1 <- as.numeric(substr(dat$ikäAlin, 1, 2))
dat$ikaA2 <- as.character(dat$ikäAlin)
dat$ikaA2 <- as.numeric(substr(dat$ikaA2, nchar(dat$ikaA2)-1, nchar(dat$ikaA2)))

levels(dat$ikäYlin)[levels(dat$ikäYlin) == "20.5"] <- "20"
levels(dat$ikäYlin)[levels(dat$ikäYlin) == "21.5"] <- "21"
levels(dat$ikäYlin)[levels(dat$ikäYlin) == "28.5"] <- "28"
levels(dat$ikäYlin)[levels(dat$ikäYlin) == "121"] <- "21"
dat$ikaY1 <- as.numeric(substr(dat$ikäYlin, 1, 2))
dat$ikaY2 <- as.character(dat$ikäYlin)
dat$ikaY2 <- as.numeric(gsub("[- ,]", "", substr(dat$ikaY2, nchar(dat$ikaY2)-2, nchar(dat$ikaY2))))

dat$Pnetto <- as.numeric(sapply(as.character(dat$Muutoskg), FUN = function(x) {strsplit(x, " ")[[1]][1]}))
dat$Ppoikk <- -as.numeric(sapply(as.character(dat$Muutoskg), FUN = function(x) {strsplit(x, " ")[[1]][2]}))
dat$Ppoikk[is.na(dat$Ppoikk)] <- 0
dat$Pnetto <- ifelse(dat$Painonmuutos %in% c("Olen laihtunut"), -dat$Pnetto, dat$Pnetto)
dat$Pnetto[dat$Painonmuutos == "Painoni ei ole juuri muuttunut"] <- 0
dat$Pnetto <- dat$Pnetto + dat$Ppoikk

# Then we'll input values for NA based on the averages of the respective subgroups
dat$Pnetto[is.na(dat$Pnetto) & dat$Painonmuutos == "Olen lihonut"] <- 8
dat$Pnetto[is.na(dat$Pnetto) & dat$Painonmuutos == "Olen laihtunut"] <- -8
dat$Pnetto[is.na(dat$Pnetto) & dat$Painonmuutos == "Olen lihonut ja laihtunut"] <- 1

dat$Rintamaitoa1[is.na(dat$Rintamaitoa1)] <- 0
dat$Rintamaitoa2[is.na(dat$Rintamaitoa2)] <- 0
dat$Rintamaitoa3[is.na(dat$Rintamaitoa3)] <- 0
dat$Rintamaitoa4[is.na(dat$Rintamaitoa4)] <- 0
dat$Rintamaitoa5[is.na(dat$Rintamaitoa5)] <- 0
dat$Rintamaitoa6[is.na(dat$Rintamaitoa6)] <- 0
dat$Rintamaitoa7[is.na(dat$Rintamaitoa7)] <- 0
dat$Rintamaitoa8[is.na(dat$Rintamaitoa8)] <- 0
dat$Rintamaitoa <- dat$Rintamaitoa1 + dat$Rintamaitoa2 + dat$Rintamaitoa3 + dat$Rintamaitoa4 + 
	dat$Rintamaitoa5 + dat$Rintamaitoa6 + dat$Rintamaitoa7 + dat$Rintamaitoa8

### Graphs about WHOTEQ as a function of age and BMI.

if(FALSE) {
	ggplot(dat) + geom_point(aes(x = ika, y = WHOTEQ, colour = Paino / (Pituus/100)^2)) + scale_colour_gradientn(colours = rainbow(3))
	ggplot(dat) + geom_point(aes(x = ika, y = WHOTEQ, colour = Paino / (Pituus/100)^2>25))
	ggplot(dat, aes(x = ika, y = WHOTEQ, colour = Paino / (Pituus/100)^2>25)) + geom_point() + geom_smooth()

	for(i in colnames(dat)[c(4, 6:8, 14:16, 18:35, 37:76, 82:84, 101:138, 144:152, 154:162)]) {#182, 184:199, 201:205, 238, 247, 267:268)]) {
		print(ggplot(dat, aes_string(x = i)) + geom_bar() + labs(title = i))
	#  par(ask = interactive()) # This makes R to wait for enter before continuing
	}

}

temp <- as.character(dat$leikkaus_dt)
temp <- as.numeric(substr(temp, nchar(temp) - 1, nchar(temp)))
temp <- temp - pmax(
	dat$Vuosi1, 
	dat$Vuosi2, 
	dat$Vuosi3, 
	dat$Vuosi4, 
	dat$Vuosi5, 
	dat$Vuosi6, 
	dat$Vuosi7, 
	dat$Vuosi8,
	na.rm = TRUE
)
temp[is.na(temp)] <- 0
dat$Syntymasta <- temp # Montako vuotta on viimeisen lapsen syntymästä leikkausvuoteen?

dat$ikaA1[is.na(dat$ikaA1)] <- 20
dat$ikaA2[is.na(dat$ikaA2)] <- 20
dat$ikaY1 <- ifelse(is.na(dat$ikaY1), dat$ika, dat$ikaY1)
dat$ikaY2 <- ifelse(is.na(dat$ikaY2), dat$ika, dat$ikaY2)

dat <- within(dat, lihomisvauhti <- (kgYlin - kgAlin) / ((ikaY2 + ikaY2) / 2 - (ikaA1 + ikaA2) / 2))

## From here on, dat contains also other information than that produced in the KTL sarcoma study.

ffq <- opbase.data("Op_en2721", subset = "Portions per month")
ffq$Result <- ffq$Result
ffq$Obs <- NULL

foods <- opbase.data("Op_en2721", subset = "Food energy and dioxin")

# Add two columns (energy "e" and dioxin "d") for each food item in the ffq questionnaire
dat$Energy <- 0
dat$Dioxin <- 0

for(i in unique(foods$Food)) {
	
	# Merge classified ffq questionnaire data with quantitative interpretation of ffq.
	cole <- paste("e", i, sep = "")
	colnames(ffq) <- c(i, cole) # Convert words to numbers of portions per month
	dat <- merge(dat, ffq, all.x = TRUE)
	dat[[cole]] <- ifelse(is.na(dat[[cole]]), 0, dat[[cole]] / 30) # Replace NA with 0, otherwise /mo -> /d.

	dat[[paste("d", i, sep = "")]] <- dat[[cole]] * as.numeric(as.character(foods[foods$Food == i & foods$Observation == "Dioxin", "Result"]))
	dat[[paste("m", i, sep = "")]] <- dat[[cole]] * as.numeric(as.character(foods[foods$Food == i & foods$Observation == "Mass", "Result"]))
	dat[[cole]] <- dat[[cole]] * as.numeric(as.character(foods[foods$Food == i & foods$Observation == "Energy", "Result"]))
	
	dat$Energy <- dat$Energy + dat[[cole]]
	dat$Dioxin <- dat$Dioxin + dat[[paste("d", i, sep = "")]]
}

objects.get("x89WlgJlDLA02vnL") # PCDD/F-data table1

pcdd <- table1[2:nrow(table1), c(2,3)] # Aika tiputetaan pois toistaiseksi tarpeettomana (2002, 2009 dataa)
colnames(pcdd) <- c("PCDDF", "PCB")
pcdd$PCDDF <- as.numeric(as.character(pcdd$PCDDF))
pcdd$PCB <- as.numeric(as.character(pcdd$PCB))
pcdd$TEQ <- pcdd$PCDDF + pcdd$PCB

dat$Silteq <- dat$mSilakkaa * mean(pcdd$TEQ)
dat$Kalteq <- dat$Silteq + dat$mKalaa * 1 # Estimated mean 1 pg/g fw

Interpretations

The consumption of hard fat is calculated in the following way (Q## means the value from the survey question; I## means the interpretation from the table below; Q24&I24 means that question Q24 is quantified by using interpretation from I24 with matching values; Q23*I23 means that the survey value and interpretation are multiplied.

total_fat = (Q23a*I23 + Q23b*I23) * Q24&I24 + Q25&I25 * Q26&I26 * Q21a&I21 + Q27&I27 * 20

The code assumes that a person uses 20 g/d fat for cooking. Q23: how much a) milk, b) sourmilk; Q24: What kind of milk; Q25 what fat on bread; Q26: how much fat on bread; Q27: what fat for cooking.

The following assumptions are used to interpret survey answers:

Assumptions for calculations(-)
Obs	Variable	Value	Unit	Result	Description	Vastaus suomeksi
1	Q23		dl per glass	2	Size of a glass of milk or sourmilk
2	Q24	1	fat g/dl	0.035	full milk, fat g/dl	täysmaitoa
3	Q24	2	fat g/dl	0.015	light milk, fat g/dl	kevytmaitoa
4	Q24	3	fat g/dl	0.01	1% milk, fat g/dl	ykkösmaitoa
5	Q24	4	fat g/dl	0	fat-free milk	rasvatonta maitoa
6	Q24	5	fat g/dl	0	fat-free sourmilk	rasvatonta piimää tai kirnupiimää
7	Q24	6	fat g/dl	0.01	other sourmilk fat g/dl	muuta piimää
8	Q24	7	fat g/dl	0	none of these	en juo maitoa enkä piimää
9	Q25	1	hard fat, proportion	0	none	en mitään
10	Q25	2	hard fat, proportion	0.15	soft margarine, share of hard fat	kasvimargariinia
11	Q25	3	hard fat, proportion	0.5	oil-butter-mix, share of hard fat	Voi-kasvirasvaseosta
12	Q25	4	hard fat, proportion	1	butter	voita
13	Q26	1	fat g /slice of bread	0	0 g per slice of bread	en lainkaan
14	Q26	2	fat g /slice of bread	3	3 g per slice of bread	10 g per 3 viipaletta
15	Q26	3	fat g /slice of bread	7	7 g per slice of bread	10 g per 1-2 viipaletta
16	Q26	4	fat g /slice of bread	15	15 g per slice of bread	Yli 10 g per viipale
17	Q27	1	hard fat fraction	0	hard fat fraction in the baking fat used	kasviöljyä
18	Q27	2	hard fat fraction	0.15	hard fat fraction in the baking fat used	kasvimargariinia
19	Q27	3	hard fat fraction	0.5	hard fat fraction in the baking fat used	talousmargariinia
20	Q27	4	hard fat fraction	0.5	hard fat fraction in the baking fat used	Voi-kasvirasvaseosta
21	Q27	5	hard fat fraction	1	hard fat fraction in the baking fat used	voita
22	Q27	6	hard fat fraction	0	hard fat fraction in the baking fat used	ei mitään rasvaa
23	Q35	1	alcohol times /a	300		päivittäin
24	Q35	2	alcohol times /a	100		muutaman kerran viikossa
25	Q35	3	alcohol times /a	50		noin kerran viikossa
26	Q35	4	alcohol times /a	25		pari kertaa kuukaudessa
27	Q35	5	alcohol times /a	12		noin kerran kuukaudessa
28	Q35	6	alcohol times /a	6		noin kerran parissa kuukaudessa
29	Q35	7	alcohol times /a	4		3-4 kertaa vuodessa
30	Q35	8	alcohol times /a	2		pari kertaa vuodessa
31	Q35	9	alcohol times /a	1		kerran vuodessa tai harvemmin
32	Q35	10	alcohol times /a	0		en koskaan
33	Q36	1	alcohol portion	0	g alcohol	vähemmän kuin yhden
34	Q36	2	alcohol portion	12	g alcohol	1 annoksen
35	Q36	3	alcohol portion	24	g alcohol	2 annosta
36	Q36	4	alcohol portion	36	g alcohol	3 annosta
37	Q36	5	alcohol portion	55	g alcohol	4-5 annosta
38	Q36	6	alcohol portion	96	g alcohol	6-10 annosta
39	Q36	7	alcohol portion	150	g alcohol	Yli 10 annosta
40	Q21a	1	g/day carbohydrates	1.5	carbohydrates per day of 100 g bread slices	leipää 100 g viipaleina. oletus: 50% hiilihydraattia
41	Q21a	2	g/day carbohydrates	2.5	carbohydrates per day of 100 g bread slices	leipää 100 g viipaleina.
42	Q21a	3	g/day carbohydrates	7.5	carbohydrates per day of 100 g bread slices	leipää 100 g viipaleina.
43	Q21a	4	g/day carbohydrates	15	carbohydrates per day of 100 g bread slices	leipää 100 g viipaleina.
44	Q21a	5	g/day carbohydrates	50	carbohydrates per day of 100 g bread slices	leipää 100 g viipaleina.
45	Q21a	6	g/day carbohydrates	100	carbohydrates per day of 100 g bread slices	leipää 100 g viipaleina.
46	Q21b	1	g/day carbohydrates	0.84	carbohydrates per day of 200 g porridge	puuroa 200 g annoksina. oletus: 70% hiilihydraattia viljasta, jota 20%
47	Q21b	2	g/day carbohydrates	1.4	carbohydrates per day of 200 g porridge	puuroa 200 g annoksina.
48	Q21b	3	g/day carbohydrates	4.2	carbohydrates per day of 200 g porridge	puuroa 200 g annoksina.
49	Q21b	4	g/day carbohydrates	8.4	carbohydrates per day of 200 g porridge	puuroa 200 g annoksina.
50	Q21b	5	g/day carbohydrates	28	carbohydrates per day of 200 g porridge	puuroa 200 g annoksina.
51	Q21b	6	g/day carbohydrates	56	carbohydrates per day of 200 g porridge	puuroa 200 g annoksina.
52	Q21c	1	g/day carbohydrates	1.2	carbohydrates per day of 200 g pasta	pastaa 200 g annoksina. oletus: 80% hiilihydraattia viljasta, jota 25%
53	Q21c	2	g/day carbohydrates	2	carbohydrates per day of 200 g pasta	pastaa 200 g annoksina.
54	Q21c	3	g/day carbohydrates	6	carbohydrates per day of 200 g pasta	pastaa 200 g annoksina.
55	Q21c	4	g/day carbohydrates	12	carbohydrates per day of 200 g pasta	pastaa 200 g annoksina.
56	Q21c	5	g/day carbohydrates	40	carbohydrates per day of 200 g pasta	pastaa 200 g annoksina.
57	Q21c	6	g/day carbohydrates	80	carbohydrates per day of 200 g pasta	pastaa 200 g annoksina.
58	Q21d	1	g/day carbohydrates	1.26	carbohydrates per day of 200 g musli etc	muita (mysli ym). oletus: 70% hiilihydraattia viljasta, jota 30%
59	Q21d	2	g/day carbohydrates	2.1	carbohydrates per day of 200 g musli etc	muita (mysli ym).
60	Q21d	3	g/day carbohydrates	6.3	carbohydrates per day of 200 g musli etc	muita (mysli ym).
61	Q21d	4	g/day carbohydrates	12.6	carbohydrates per day of 200 g musli etc	muita (mysli ym).
62	Q21d	5	g/day carbohydrates	42	carbohydrates per day of 200 g musli etc	muita (mysli ym).
63	Q21d	6	g/day carbohydrates	84	carbohydrates per day of 200 g musli etc	muita (mysli ym).
64	Q21e	1	g/day carbohydrates	0.3	carbohydrates per day of 200 g youghurt etc	viiliä tai jugurttia, sokeri. oletus: 5% hiilihydraattia (Doc. Geigy s. 479)
65	Q21e	2	g/day carbohydrates	0.5	carbohydrates per day of 200 g youghurt etc	viiliä tai jugurttia, sokeri.
66	Q21e	3	g/day carbohydrates	1.5	carbohydrates per day of 200 g youghurt etc	viiliä tai jugurttia, sokeri.
67	Q21e	4	g/day carbohydrates	3	carbohydrates per day of 200 g youghurt etc	viiliä tai jugurttia, sokeri.
68	Q21e	5	g/day carbohydrates	10	carbohydrates per day of 200 g youghurt etc	viiliä tai jugurttia, sokeri.
69	Q21e	6	g/day carbohydrates	20	carbohydrates per day of 200 g youghurt etc	viiliä tai jugurttia, sokeri.
70	Q21f	1	g/day carbohydrates	0.015	carbohydrates per 50 g cheese	vähärasv. juusto, sokeri.
71	Q21f	2	g/day carbohydrates	0.025	carbohydrates per 50 g cheese	vähärasv. juusto, sokeri. oletus: 1% hiilihydraattia (Doc. Geigy s. 479)
72	Q21f	3	g/day carbohydrates	0.075	carbohydrates per 50 g cheese	vähärasv. juusto, sokeri.
73	Q21f	4	g/day carbohydrates	0.15	carbohydrates per 50 g cheese	vähärasv. juusto, sokeri.
74	Q21f	5	g/day carbohydrates	0.5	carbohydrates per 50 g cheese	vähärasv. juusto, sokeri.
75	Q21f	6	g/day carbohydrates	1	carbohydrates per 50 g cheese	vähärasv. juusto, sokeri.
76	Q21g	1	g/day carbohydrates	0.015	carbohydrates per 50 g cheese	muu juusto, sokeri. oletus: 1% hiilihydraattia (Doc. Geigy s. 479)
77	Q21g	2	g/day carbohydrates	0.025	carbohydrates per 50 g cheese	muu juusto, sokeri.
78	Q21g	3	g/day carbohydrates	0.075	carbohydrates per 50 g cheese	muu juusto, sokeri.
79	Q21g	4	g/day carbohydrates	0.15	carbohydrates per 50 g cheese	muu juusto, sokeri.
80	Q21g	5	g/day carbohydrates	0.5	carbohydrates per 50 g cheese	muu juusto, sokeri.
81	Q21g	6	g/day carbohydrates	1	carbohydrates per 50 g cheese	muu juusto, sokeri.
82	Q21h	1	g/day carbohydrates	0.3	carbohydrates per 100 g ice cream	jäätelöä. oletus: 10% hiilihydraattia
83	Q21h	2	g/day carbohydrates	0.5	carbohydrates per 100 g ice cream	jäätelöä.
84	Q21h	3	g/day carbohydrates	1.5	carbohydrates per 100 g ice cream	jäätelöä.
85	Q21h	4	g/day carbohydrates	3	carbohydrates per 100 g ice cream	jäätelöä.
86	Q21h	5	g/day carbohydrates	10	carbohydrates per 100 g ice cream	jäätelöä.
87	Q21h	6	g/day carbohydrates	20	carbohydrates per 100 g ice cream	jäätelöä.
88	Q21i	1	g/day hard fat	0.12	hard fat per 200 g youghurt etc	viiliä tai jugurttia, rasva. oletus: 2 % rasvaa
89	Q21i	2	g/day hard fat	0.2	hard fat per 200 g youghurt etc	viiliä tai jugurttia, rasva.
90	Q21i	3	g/day hard fat	0.6	hard fat per 200 g youghurt etc	viiliä tai jugurttia, rasva.
91	Q21i	4	g/day hard fat	1.2	hard fat per 200 g youghurt etc	viiliä tai jugurttia, rasva.
92	Q21i	5	g/day hard fat	4	hard fat per 200 g youghurt etc	viiliä tai jugurttia, rasva.
93	Q21i	6	g/day hard fat	8	hard fat per 200 g youghurt etc	viiliä tai jugurttia, rasva.
94	Q21j	1	g/day hard fat	0.15	hard fat per 50 g low-fat cheese	vähärasvainen juusto. oletus: 10% rasvaa
95	Q21j	2	g/day hard fat	0.25	hard fat per 50 g low-fat cheese	vähärasvainen juusto.
96	Q21j	3	g/day hard fat	0.75	hard fat per 50 g low-fat cheese	vähärasvainen juusto.
97	Q21j	4	g/day hard fat	1.5	hard fat per 50 g low-fat cheese	vähärasvainen juusto.
98	Q21j	5	g/day hard fat	5	hard fat per 50 g low-fat cheese	vähärasvainen juusto.
99	Q21j	6	g/day hard fat	10	hard fat per 50 g low-fat cheese	vähärasvainen juusto.
100	Q21k	1	g/day hard fat	0.45	hard fat per 50 g cheese	juusto. oletus: 30% rasvaa (fineli)
101	Q21k	2	g/day hard fat	0.75	hard fat per 50 g cheese	juusto.
102	Q21k	3	g/day hard fat	2.25	hard fat per 50 g cheese	juusto.
103	Q21k	4	g/day hard fat	4.5	hard fat per 50 g cheese	juusto.
104	Q21k	5	g/day hard fat	15	hard fat per 50 g cheese	juusto.
105	Q21k	6	g/day hard fat	30	hard fat per 50 g cheese	juusto.
106	Q21l	1	g/day hard fat	0.3	hard fat per 100 g ice cream	jäätelöä. oletus: 10% rasvaa
107	Q21l	2	g/day hard fat	0.5	hard fat per 100 g ice cream	jäätelöä.
108	Q21l	3	g/day hard fat	1.5	hard fat per 100 g ice cream	jäätelöä.
109	Q21l	4	g/day hard fat	3	hard fat per 100 g ice cream	jäätelöä.
110	Q21l	5	g/day hard fat	10	hard fat per 100 g ice cream	jäätelöä.
111	Q21l	6	g/day hard fat	20	hard fat per 100 g ice cream	jäätelöä.
112	Q21m	1	g/day hard fat	0.45	hard fat per 100 g meat	liharuokaa. oletus: 15% rasvaa (Doc. Geigy s. 481)
113	Q21m	2	g/day hard fat	0.75	hard fat per 100 g meat	liharuokaa.
114	Q21m	3	g/day hard fat	2.25	hard fat per 100 g meat	liharuokaa.
115	Q21m	4	g/day hard fat	4.5	hard fat per 100 g meat	liharuokaa.
116	Q21m	5	g/day hard fat	15	hard fat per 100 g meat	liharuokaa.
117	Q21m	6	g/day hard fat	30	hard fat per 100 g meat	liharuokaa.
118	Q21n	1	g/day hard fat	0.15	hard fat per 100 g meat	kalaruokaa. oletus: 5% kovaa rasvaa, finelin mukaan 2-5%
119	Q21n	2	g/day hard fat	0.25	hard fat per 100 g meat	kalaruokaa.
120	Q21n	3	g/day hard fat	0.75	hard fat per 100 g meat	kalaruokaa.
121	Q21n	4	g/day hard fat	1.5	hard fat per 100 g meat	kalaruokaa.
122	Q21n	5	g/day hard fat	5	hard fat per 100 g meat	kalaruokaa.
123	Q21n	6	g/day hard fat	10	hard fat per 100 g meat	kalaruokaa.

How much mass, energy, and dioxin does one portion contain? Data are guesswork of from Fineli.

Food energy and dioxin(g,kJ,pg/portion)
Obs	Food	Mass	Energy	Dioxin
1	Kalaa	100	600	7
2	Silakkaa	100	792	470
3	Petokalaa	100	301	25
4	Muikkua	100	750	28
5	Sisävesikalaa	100	668	23
6	Kirjolohta	100	1067	74
7	Itämeren.lohta	100	1067	770
8	Muuta.Itämerestä	100	668	15
9	Pakastekalaa	100	324	7
10	Kalasäilykkeitä	60	600	7
11	Valtamerikalaa	100	600	7
12	Äyriäisiä	60	200	7
13	Leipää	50	406	0.01
14	Puuroja	200	642	0.02
15	Makaronia	200	846	0.02
16	Muutaviljaa	150	600	0.02
17	Viiliä	200	334	0.008
18	Juustoja	40	300	0.012
19	Rasvaisia.juustoja	40	600	0.03
20	Jäätelöä	150	1200	0.03
21	Liharuokaa	150	1400	1.5
22	Maitoa	200	358	0.004
23	Piimää	200	358	0.004

Portions per month(portions/mo)
Obs	Answer	Interpretation
1	En lainkaan	0.003
2	Harvemmin kuin kerran kuukaudessa tai en lainkaan	0.1
3	Harvemmin kuin kerran kuukaudessa	0.5
4	Kerran tai pari kuukaudessa	1.5
5	Kerran viikossa	4
6	Pari kertaa viikossa	8
7	Lähes joka päivä	20
8	Kerran päivässä tai useammin	40

Analyses

Simulated data

This code was used to create a csv file that contains a simulated data from this study. When compared with the original data, the simulated data

has the same number of observations,
has the same range of values in each variable,
has approximately the same correlation structure between all variables.

+ Show code - Hide code

library(OpasnetUtils)
library(MASS)
library(mc2d)
library(reshape2)
library(ggplot2)

objects.get("isqT7nvhd0ViUR7d")

data <- objects.decode(etable, password)
colnames(data) <- t(data[1, ])
data <- data[2:nrow(data), 2:ncol(data)]

data2 <- data
fun <- c(rep("normal", 5), rep("poisson", 12), rep("lognormal", 19))

params <- list()

for(i in 1:ncol(data2)) {
	data2[[i]] <- as.numeric(as.character(data2[[i]]))
	if(i > 17) data2[[i]] <- ifelse(data2[[i]] == 0, 0.01, data2[[i]])
	params[i] <- fitdistr(data2[[i]][!is.na(data2[[i]])], fun[i])
}

simu <- data.frame(temp = rep(NA, 968))

for(i in 1:5) {
	simu[[i]] <- rnorm(968, params[[i]][1], params[[i]][2])
}
for(i in 6:17) {
	simu[[i]] <- rpois(968, params[[i]])
}
for(i in 18:36) {
	simu[[i]] <- rlnorm(968, params[[i]][1], params[[i]][2])
}
simu[[3]] <- rbern(968, 0.5) + 1

colnames(simu) <- colnames(data)

korre <- cor(x = data2, use = "pairwise.complete.obs", method = "spearman")

simu <- as.data.frame(cornode(as.matrix(simu), target = korre))

korre2 <- cor(x = simu, use = "pairwise.complete.obs", method = "spearman")

qplot(melt(korre)$value, melt(korre2)$value)

for(i in 1:ncol(simu)) {
	simu[[i]] <- ifelse(
		simu[[i]] > max(data[[i]], na.rm = TRUE) | 
		simu[[i]] < min(data[[i]], na.rm = TRUE), 
		NA, simu[[i]]
	)
}

for(i in 1:ncol(data2)) {print(paste(
	min(data2[[i]], na.rm = TRUE), 
	max(data2[[i]], na.rm = TRUE),
	min(simu[[i]], na.rm = TRUE),
	max(simu[[i]], na.rm = TRUE)
))}

POPs and obesity

Dioxins and PCBs have been assosiated to type 2 diabetes. Do dioxins cause diabetes, or do diabetes decrease dioxin elimination, or does obesity increase diabetes and decrease dioxin elimination, or something else? We tried to make sense of this by looking at sarcoma study data.

+ Show code - Hide code

library(ggplot2)
dat <- re.ad.csv("V:/TUSO/Projects/POPit ja lihavuus/Sarkoomakyselydata/Copy of sarkooma_kysely_ja_dioksiinit_korjattu.csv")
dat$Diet <- dat$Rasvaa.maitotuotteista + dat$k21liha + dat$k21kala * 8
hist(dat$Diet)
dat$Diet3 <- cut(dat$Diet, 3)

ggplot(dat, aes(x = ika, y = IntakePCDDFTEQ, colour = Diet3)) + 
  geom_point() + geom_smooth()

ggplot(dat, aes(x = ika, y = PCDDFWHO05TEQ, colour = Diet3)) + 
  geom_point() + geom_smooth()

Self-reported chemical exposure

We looked at self-reported chemical exposure, especially pesticides and wood preservatives.

We also looked at the impact of self-reported occupation, recoded into 9 groups. This is best done in the unmatched dataset, but also some analyses were done with the matched dataset. Age was the only clearly significant variable, with sarcoma risk increasing by 8 % per year. Male gender seemed to increase the risk but was not statistically significant. None of the differences between occupation groups were statistically significant, and they did not show a pattern where putatively chemically-exposured groups would have higher risk.

+ Show code - Hide code

#################
# Bring in the hand-made occupation classification

d <- read.csv("V:/TUSO/Projects/Sarkooma/Analyysit/Kyselykaavaka_ammatti-tyo_edit.csv")

#colnames(d)
#[1] "N"                        "ID"                       "Työntekijäryhmä"          "Luokitus..koodi.lopussa."
#[5] "Alle.5.v.työhistoria"     "Huomattavaa"              "Ammatti"                  "Työpaikka"               
#[9] "Kesto"                    "Työtehtävä"               "AmmattiA"                 "TyöpaikkaA"              
#[13] "KestoA"                   "AmmattiB"                 "TyöpaikkaB"               "KestoB"                  
#[17] "AmmattiC"                 "TyöpaikkaC"               "KestoC"                   "AmmattiD"                
#[21] "TyöpaikkaD"               "KestoD"                  

lev <- as.character(d[974:982,4])
d <- d[1:969,c(2,4,5,6)]
d <- d[d$ID != "" , ] # Remove empty row 883
colnames(d) <- c("ID", "Tyoluokka", "Alle5v", "Huom.tyo")
d$Tyoluokka <- factor(d$Tyoluokka, levels = 1:9, labels = lev)
d$Tyoalt <- ifelse(as.numeric(d$Tyoluokka) %in% c(1,2,9), "Ei",
                   ifelse(as.numeric(d$Tyoluokka) %in% c(3,8), "Ehkä", "Kyllä"))

#> levels(d$Tyoluokka)
#[1] "Opiskelija"             "Sisätyö"                "Hoitoala"               "Maa- ja metsätalous"   
#[5] "Sotilas, palomies ym"   "Teollisuustyö"          "Rakennusala, ulkotyö"   "Kauppa, elintarvikeala"
#[9] "Työtön tai ei tietoa"  

###################################

library(lme4)
# Data from //helfs01.thl.fi/groups2/TUSO/Projects/POPit ja lihavuus/Dioksiinit vs sarkooma/Data.xlsx
dat <- read.csv("V:/TUSO/Projects/Sarkooma/Analyysit/Data_2.12.2016.csv", encoding = "UTF-8")
names(dat)
dat$PCDDFWHO05TEQ <- dat$PCDDFWHO05TEQ / 20 # Scale to a nominal interquartile range (ca. 19.5 pg/g fat, depending on subgroup)

#Pekan malli clogit-funktiolla
library("survival")

# A conditional regression with new occupation classification. Regression method as below.
#> sum(as.character(d$ID) != as.character(dat$ID))
#[1] 0
# Because rows are identically ordered, just cbind the occupation data without redundant ID.

dat <- cbind(dat, d[-1])
dat$Tyoluokka <- relevel(dat$Tyoluokka, "Sisätyö")
dat$Alle5v <- ifelse(dat$Alle5v == "1", "Yes", "No")

table(dat[c("Tyoluokka","Sarcoma.unmatched","Alle5v")], useNA = "ifany")

clogit(Sarcoma.unmatched ~ Sex + Age + Tyoluokka + Alle5v, # + PCDDFWHO05TEQ
       #strata(Sarcoma.matched.pair), 
       method="exact", data = dat
)

clogit(Sarcoma.unmatched ~ Sex + Age + Tyoluokka, # + PCDDFWHO05TEQ
       #strata(Sarcoma.matched.pair), 
       method="exact", data = dat[dat$Alle5v == "No",]
)

clogit(Sarcoma.matched ~ Sex + Tyoalt + # Tyoluokka + # + PCDDFWHO05TEQ
                        strata(Sarcoma.matched.pair), 
                      method="exact", data = dat
)

clogit(Sarcoma.matched ~ Tyoluokka + # + PCDDFWHO05TEQ # No sex to avoid too many subgroups
         strata(Sarcoma.matched.pair), 
       method="exact", data = dat
)
# The analysis above does not give reliable results because warning: Loglik converged before variable 1,2,3,4,5,6,7,8

temp <- list()
for(i in levels(dat$Tyoluokka)) {
  dat$Temp <- ifelse(dat$Tyoluokka == i, TRUE, FALSE)
  print(i)
  temp2 <- clogit(Sarcoma.matched ~ Sex + Temp + # Tyoluokka + # + PCDDFWHO05TEQ
           strata(Sarcoma.matched.pair), 
         method="exact", data = dat
  )
  print(summary(temp2)$conf.int)
  temp <- rbind(
    temp,
    data.frame(
      Tyoluokka = i,
      summary(temp2)$conf.int,
      Pvalue = summary(temp2)$coefficients[,5]
    )
  )
}
# The analysis above compares one group of Tyoluokka to all others in the matched data set. 
# A better analysis is below with unmatched analysis.
temp

# On the other hand, questionnaire was collected from everyone, so matching can be removed (unlike with dioxins)
# without altering the design. Let's try what happens without matching.

clogit(Sarcoma.unmatched ~ Sex + Age + Tyoluokka, # + PCDDFWHO05TEQ
               #strata(Sarcoma.matched.pair), 
           method="exact", data = dat
)

table(dat[c("Tyoluokka", "Sarcoma.matched")])
table(dat[c("Sarcoma.matched","Sarcoma.unmatched")], useNA = "ifany")

#exact estimation. Tuottaa saman tuloksen kuin Riikalla.
# Several different models were run. All included Sex as a confounder.
# Four pairs of models looked at each chemical risk separately (Analysis: Separate),
# and dioxin risk in the respective population.
# Four models looked at each chemical + dioxin in a combined model,
# adjusting for each other (Analysis: Combined).
# Finally, one model contained all three chemicals and dioxin in a single model,
# naturally not containing the combined chemical exposure this time.

models <- list()

models[[1]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + Exposure.woodpreservatives + 
                        strata(Sarcoma.matched.pair), 
                      method="exact", data = dat[dat$Inclusion.criteria.woodpr == 1 , ]
)

models[[2]] <- clogit(Sarcoma.matched ~ Sex + Exposure.woodpreservatives + 
                        strata(Sarcoma.matched.pair), 
                      method="exact", data = dat[dat$Inclusion.criteria.woodpr == 1 , ]
)

models[[3]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + 
                        strata(Sarcoma.matched.pair), 
                      method="exact", data = dat[dat$Inclusion.criteria.woodpr == 1 , ]
)

models[[4]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + Exposure.fungicidesherbicides + 
                        strata(Sarcoma.matched.pair), 
                      method="exact", data = dat[dat$Inclusion.criteria.funher == 1 , ]
)

models[[5]] <- clogit(Sarcoma.matched ~ Sex + Exposure.fungicidesherbicides + 
                        strata(Sarcoma.matched.pair), 
                      method="exact", data = dat[dat$Inclusion.criteria.funher == 1 , ]
)
models[[6]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + 
                        strata(Sarcoma.matched.pair), 
                      method="exact", data = dat[dat$Inclusion.criteria.funher == 1 , ]
)

models[[7]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + Exposure.insecticides + 
                        strata(Sarcoma.matched.pair), 
                      method="exact", data = dat[dat$Inclusion.criteria.insect == 1 , ]
)

models[[8]] <- clogit(Sarcoma.matched ~ Sex + Exposure.insecticides + 
                        strata(Sarcoma.matched.pair), 
                      method="exact", data = dat[dat$Inclusion.criteria.insect == 1 , ]
)
models[[9]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + 
                        strata(Sarcoma.matched.pair), 
                      method="exact", data = dat[dat$Inclusion.criteria.insect == 1 , ]
)

models[[10]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + Exposure.any + 
                         strata(Sarcoma.matched.pair), 
                       method="exact", data = dat[dat$Inclusion.criteria.any == 1 , ]
)

models[[11]] <- clogit(Sarcoma.matched ~ Sex + Exposure.any + 
                         strata(Sarcoma.matched.pair), 
                       method="exact", data = dat[dat$Inclusion.criteria.any == 1 , ]
)
models[[12]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + 
                         strata(Sarcoma.matched.pair), 
                       method="exact", data = dat[dat$Inclusion.criteria.any == 1 , ]
)

out <- data.frame()
for(i in 1:length(models)) {
  out <- rbind(
    out,
    cbind(
      as.data.frame(summary(models[[i]])$coefficients),
      as.data.frame(summary(models[[i]])$conf.int)
    )
  )
}

out$Subpop <- rep(c(
  "Wood preservatives", 
  "Fungicides, herbicides",
  "Insecticides",
  "Any of above"),
  each = 7
)
print(out, digits = 3)
out <- out[c(2,3,5,7,9,10,12,14,16,17,19,21,23,24,26,28) , c(2, 8, 9, 5, 10)]
out$Analysis <- rep(c("Combined","Separate"), each = 2, times = 4)
out <- out[order(out$Subpop, rownames(out), out$Analysis) , ]
print(out, digits = 3)

#### Analysis where all chemicals are in a single model.

mo <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + Exposure.insecticides + Exposure.fungicidesherbicides + Exposure.woodpreservatives + 
               strata(Sarcoma.matched.pair), 
             method="exact", data = dat#[dat$Inclusion.criteria.any == 1 , ]
)
summary(mo)

# Fungicides/herbicides clearly elevate the risk and is statistically significant
# Woodpreservatives also shows a high risk but is only marginally significant.
# Insecticides and dioxin are not associated with higher risk.

# Chemicals are  moderately correlated with each other and somewhat with dioxin
# as shown by the correlation table below.

cor(dat[
  dat$Inclusion.criteria.any == 1 ,
  c("Exposure.any",
    "Exposure.fungicidesherbicides",
    "Exposure.insecticides",
    "Exposure.woodpreservatives",
    "PCDDFWHO05TEQ"
  )],
  use = "pairwise.complete.obs"
)


# Is Baltic herring an independent risk factor for sarcoma?
# Well, the risk is increased but clearly non-significant (OR 1.388, 95 % CI 0.8063 - 2.389)

dat$Silakka <- as.numeric(dat$Silakkaa) > 3
table(dat$Silakkaa, dat$Silakka)
fit <- clogit(Sarcoma.matched ~ Sex + Silakka + 
                strata(Sarcoma.matched.pair), 
              method="exact", data = dat
)

summary(fit)

# A question was raised why the PCDDWHO05TEQ estimates for wood-preservative group and any-exposure group were identical.
# The reason can be seen from here:
table(dat[c(
  "Inclusion.criteria.woodpr",
  "Inclusion.criteria.any",
  "Sarcoma.matched")],
  exclude = NULL
)
# The two groups are practically identical with only two additional controls in the any-exposure group. 
# These two controls do not much change the PCDDFWHO05TEQ impact on sarcoma, and therefore the estimates are the same 
# with precision of three decimals.
table(dat[c(
  "Inclusion.criteria.insect",
  "Inclusion.criteria.any",
  "Sarcoma.matched")],
  exclude = NULL
)
# However, with insecticides, there are three controls and TWO CASES more in any-exposure group and that does change estimates.
# With herbicides and fungicides, there are one control and two cases more, also enough to change estimates.

Correlation of dioxin and fish

How do individual dioxin congeners correlate with individual fish parametres in the questionnaire?

+ Show code - Hide code

# This is code Op_en2721/ on page [[KTL Sarcoma study]]

library(lme4)
library(Hmisc)
library(MASS)
library(ggplot2)
library(rjags)

# Data from //helfs01.thl.fi/groups2/TUSO/Projects/POPit ja lihavuus/Dioksiinit vs sarkooma/Data.xlsx
dat <- read.csv("V:/TUSO/Projects/Sarkooma/Analyysit/Data_2.12.2016.csv", encoding = "UTF-8")
names(dat)

kalat <- colnames(dat)[c(29, 27, 79:89)] # All: 27, 29, 79:89
dioksiinit <- colnames(dat)[c(310:326, 365)] # All: 310:368)
dat[1:20,kalat]
dat[1:20,dioksiinit]
colnames(dat)
unique(unlist(lapply(dat[kalat], FUN = levels)))

inn <- c(
  "",
  "En lainkaan",
  "Harvemmin kuin kerran kuukaudessa tai en lainkaan",
  "Harvemmin kuin kerran kuukaudessa",
  "Kerran tai pari kuukaudessa",
  "Kerran viikossa",
  "Pari kertaa viikossa",
  "Lähes joka päivä",
  "Kerran päivässä tai useammin"
)

# Datat peräisin V:\TUSO\Projects\POPit ja lihavuus\Excel-mallit\Concentration modeling.xlsx
# paitsi doses on näppituntuma

# Meals per week
doses <- c(NA, 0, 0.1, 0.15, 0.3, 1, 2, 5, 9)

# Congener half-life in years
t1.2 <- c(7.2, 11.2, 9.8, 13.1, 5.1, 4.9, 6.7, 2.1, 3.5,
          7.0, 6.4, 7.2, 7.2, 2.8, 3.1, 4.6, 1.4, 7)

# WHO2005 TEF
TEF <- c(1, 1, 0.1, 0.1, 0.1, 0.01, 0.0003, 0.1, 0.03,
         0.3, 0.1, 0.1, 0.1, 0.1, 0.01, 0.01, 0.0003, 1)

dat2 <- dat[c(kalat, dioksiinit, "Age")]

# Convert fish intake answers to units meals/week
dat2[kalat][-1] <- lapply(dat2[kalat][-1], FUN = function(x) doses[match(x, inn)])

# Convert dioxin concentrations to TEQs
dat2[dioksiinit] <- lapply(as.list(1:length(TEF)), FUN = function(x) TEF[x] * dat2[dioksiinit][[x]])

cor(x = dat2[dioksiinit], y = dat2[kalat], use = "pairwise.complete.obs")

dat3 <- dat2[!is.na(rowSums(dat2[c(kalat, dioksiinit)])) , ]

dat3.diox <- resid(lm(cbind(
  PCDDFWHO05TEQ,
  X2378.TCDD,
  X12378.PD,
  X123478.HD,
  X123678.HD,
  X123789.HD,
  X1234678.D,
  OCDD,
  X2378.TCDF,
  X12378.PF,
  X23478.PF,
  X123478.HF,
  X123678.HF,
  X123789.HF,
  X234678.HF,
  X1234678.F,
  X1234789.F,
  OCDF
) ~ Age, dat3))

correlations <- round(cor(dat3.diox, dat3[kalat]), 3)

pvalues <- round(rcorr(dat3.diox, as.matrix(dat3[kalat]))$P, 3)[1:18, -(1:18)]

fit <- lm(
  paste("cbind(", paste(dioksiinit, collapse = ","), 
        ") ~ ", paste(c(kalat, "Age"), collapse = " + "), 
        collapse = ""),
  data = dat3  
)

summary(fit)

out <- data.frame()

################# Explain each congener with all fish variables + Age

for(i in 1:length(dioksiinit)) {
  fit <- lm(paste(dioksiinit[[i]], "~", paste(kalat, collapse = " + "), "+ Age"), dat3)
  fit <- summary(stepAIC(fit, direction="both"))
  
  out <- rbind(
    out,
    data.frame(
      fit[[4]],
      Var = rownames(fit[[4]]),
      adj.r.squared = fit[[9]],
      Congener = dioksiinit[[i]],
      Halflife = t1.2[[i]],
      Test = "With age"
    )
  )
}

############# Age is removed from the models to see the explanatory power of fish variables alone

for(i in 1:length(dioksiinit)) {
  fit <- lm(paste(dioksiinit[[i]], "~", paste(kalat, collapse = " + ")), dat3)
  fit <- summary(stepAIC(fit, direction="both"))
  
  out <- rbind(
    out,
    data.frame(
      fit[[4]],
      Var = rownames(fit[[4]]),
      adj.r.squared = fit[[9]],
      Congener = dioksiinit[[i]],
      Halflife = t1.2[[i]],
      Test = "Without age"
    )
  )
}

################# Explain each congener with SINGLE fish variables + Age

for(i in 1:length(dioksiinit)) {
  fit <- lm(paste(dioksiinit[[i]], "~", paste(kalat[c(-1, -2)], collapse = " + "), "+ Age"), dat3)
  fit <- summary(stepAIC(fit, direction="both"))
  
  out <- rbind(
    out,
    data.frame(
      fit[[4]],
      Var = rownames(fit[[4]]),
      adj.r.squared = fit[[9]],
      Congener = dioksiinit[[i]],
      Halflife = t1.2[[i]],
      Test = "Without generic fish variables, with age"
    )
  )
}

oprint(out)
write.csv(out, "V:/TUSO/Projects/Sarkooma/lineaariregressiot.csv")
colnames(out)
head(out)
temp <- out[out$Var != "(Intercept)", ]
ggplot(temp, aes(x = Var, y = Estimate, colour = Congener, size = temp$Pr...t. < 0.05)) + geom_point()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))+
  facet_wrap(~ Test)

################# Bayesian approach
kal <- kalat[-1]
datb <- dat[c(kal)]#, dioksiinit, "Age")]
#x <- datb[[1]]
datb[kal] <- lapply(
  datb[kal], 
  FUN = function(x) {
    factor(
      x, 
      levels = if(inn[3] %in% x) inn[c(3,5:9)] else inn[c(2,4:8)], 
      ordered = TRUE
    )
  }
)
test <- as.data.frame(lapply(datb, FUN = is.na))
datb <- datb[rowSums(test) < 8 , ]
datb <- as.data.frame(lapply(datb, FUN = function(x) as.numeric(x) -1))
#table(datb[1:2])
datblong <- melt(datb, measure.vars = 1:12)
ggplot(datblong, aes(x = value, weight = 1))+geom_bar()+facet_wrap(~ variable)

correlations <- cor(datb, use = "pairwise.complete.obs")
melt(correlations, measure.vars = 1:12)
pvalues <- round(rcorr(as.matrix(datb))$P, 3)

mod <- textConnection("
  model{
  for(j in 1:12) {
    for(i in 1:N) {
      datbb[i , j] ~ dbin(p[j], 6) #Six alternatives in each question
    }
    p[j] ~ dunif(0,1)
  }
}
")

# A binomial distribution is assumed for bins of answer choices.
jags <- jags.model(
  mod,
  data = list(
    N = length(datb),
    datbb = datb
  ),
  n.chains = 4,
  n.adapt = 100
)

update(jags, 1000)

samps <- jags.samples(jags, 'p', 1000)
samps.coda <- coda.samples(jags, 'p', 1000)

plot(samps.coda[[1]])
head(samps.coda)
summary(samps.coda)
hist(cor(samps.coda[[1]])[cor(samps.coda[[1]]) != 1])
### In practice, all correlations between -0.05 and 0.05 -> meaningless
# Important corralations can be (and have been above) calculated directly from data.

probs <- colSums(samps.coda[[1]]) / nrow(samps.coda[[1]])

gramm <- data.frame(
  Fish = kal,
  P = probs,
  Answer = rep(0:6, each = length(kal)),
  Freq = dbinom(rep(0:6, each = length(kal)), 6, rep(probs, 7))
)

ggplot(gramm, aes(x = Answer, weight = Freq))+geom_bar()+facet_wrap(~ Fish)

These estimates are based on the code above.

Binomial distribution parameter(probability)
Obs	Fish	Parameter
1	Kalaa	0.23179078
2	Petokalaa	0.17746642
3	Muikkua	0.14939457
4	Sisävesikalaa	0.09493785
5	Kirjolohta	0.29775961
6	Silakkaa	0.2152247
7	Itämeren.lohta	0.08175282
8	Muuta.Itämerestä	0.0282421
9	Pakastekalaa	0.21510166
10	Kalasäilykkeitä	0.21598007
11	Valtamerikalaa	0.1078198
12	Äyriäisiä	0.16021416

Correlation coefficients between fish dishes

Show details

Correlation coefficients between fishes(correlation)
Obs	Food1	Food2	Coefficient
1	Kalaa	Kalaa	1
2	Petokalaa	Kalaa	0.37831574
3	Muikkua	Kalaa	0.286657251
4	Sisävesikalaa	Kalaa	0.365112639
5	Kirjolohta	Kalaa	0.530184225
6	Silakkaa	Kalaa	0.425923276
7	Itämeren.lohta	Kalaa	0.305152183
8	Muuta.Itämerestä	Kalaa	0.2111844
9	Pakastekalaa	Kalaa	0.207057983
10	Kalasäilykkeitä	Kalaa	0.21677219
11	Valtamerikalaa	Kalaa	0.273916098
12	Äyriäisiä	Kalaa	0.143764271
13	Kalaa	Petokalaa	0.37831574
14	Petokalaa	Petokalaa	1
15	Muikkua	Petokalaa	0.422402864
16	Sisävesikalaa	Petokalaa	0.499609578
17	Kirjolohta	Petokalaa	0.205863144
18	Silakkaa	Petokalaa	0.202035751
19	Itämeren.lohta	Petokalaa	0.079332036
20	Muuta.Itämerestä	Petokalaa	0.10229554
21	Pakastekalaa	Petokalaa	-0.055693699
22	Kalasäilykkeitä	Petokalaa	0.049204262
23	Valtamerikalaa	Petokalaa	0.080707577
24	Äyriäisiä	Petokalaa	0.082444392
25	Kalaa	Muikkua	0.286657251
26	Petokalaa	Muikkua	0.422402864
27	Muikkua	Muikkua	1
28	Sisävesikalaa	Muikkua	0.394067217
29	Kirjolohta	Muikkua	0.227258746
30	Silakkaa	Muikkua	0.24510588
31	Itämeren.lohta	Muikkua	0.104823821
32	Muuta.Itämerestä	Muikkua	0.127881897
33	Pakastekalaa	Muikkua	-0.006076891
34	Kalasäilykkeitä	Muikkua	0.065331013
35	Valtamerikalaa	Muikkua	0.157485773
36	Äyriäisiä	Muikkua	0.072256144
37	Kalaa	Sisävesikalaa	0.365112639
38	Petokalaa	Sisävesikalaa	0.499609578
39	Muikkua	Sisävesikalaa	0.394067217
40	Sisävesikalaa	Sisävesikalaa	1
41	Kirjolohta	Sisävesikalaa	0.231100033
42	Silakkaa	Sisävesikalaa	0.219228101
43	Itämeren.lohta	Sisävesikalaa	0.136672786
44	Muuta.Itämerestä	Sisävesikalaa	0.105376927
45	Pakastekalaa	Sisävesikalaa	-0.013742777
46	Kalasäilykkeitä	Sisävesikalaa	0.106690375
47	Valtamerikalaa	Sisävesikalaa	0.188327229
48	Äyriäisiä	Sisävesikalaa	0.114591507
49	Kalaa	Kirjolohta	0.530184225
50	Petokalaa	Kirjolohta	0.205863144
51	Muikkua	Kirjolohta	0.227258746
52	Sisävesikalaa	Kirjolohta	0.231100033
53	Kirjolohta	Kirjolohta	1
54	Silakkaa	Kirjolohta	0.38967992
55	Itämeren.lohta	Kirjolohta	0.282808883
56	Muuta.Itämerestä	Kirjolohta	0.111980079
57	Pakastekalaa	Kirjolohta	0.166699904
58	Kalasäilykkeitä	Kirjolohta	0.267376587
59	Valtamerikalaa	Kirjolohta	0.304243078
60	Äyriäisiä	Kirjolohta	0.154695247
61	Kalaa	Silakkaa	0.425923276
62	Petokalaa	Silakkaa	0.202035751
63	Muikkua	Silakkaa	0.24510588
64	Sisävesikalaa	Silakkaa	0.219228101
65	Kirjolohta	Silakkaa	0.38967992
66	Silakkaa	Silakkaa	1
67	Itämeren.lohta	Silakkaa	0.303378534
68	Muuta.Itämerestä	Silakkaa	0.317929228
69	Pakastekalaa	Silakkaa	0.106271923
70	Kalasäilykkeitä	Silakkaa	0.202491349
71	Valtamerikalaa	Silakkaa	0.27027233
72	Äyriäisiä	Silakkaa	0.17446606
73	Kalaa	Itämeren.lohta	0.305152183
74	Petokalaa	Itämeren.lohta	0.079332036
75	Muikkua	Itämeren.lohta	0.104823821
76	Sisävesikalaa	Itämeren.lohta	0.136672786
77	Kirjolohta	Itämeren.lohta	0.282808883
78	Silakkaa	Itämeren.lohta	0.303378534
79	Itämeren.lohta	Itämeren.lohta	1
80	Muuta.Itämerestä	Itämeren.lohta	0.589928968
81	Pakastekalaa	Itämeren.lohta	0.083123081
82	Kalasäilykkeitä	Itämeren.lohta	0.167824298
83	Valtamerikalaa	Itämeren.lohta	0.415932209
84	Äyriäisiä	Itämeren.lohta	0.312466845
85	Kalaa	Muuta.Itämerestä	0.2111844
86	Petokalaa	Muuta.Itämerestä	0.10229554
87	Muikkua	Muuta.Itämerestä	0.127881897
88	Sisävesikalaa	Muuta.Itämerestä	0.105376927
89	Kirjolohta	Muuta.Itämerestä	0.111980079
90	Silakkaa	Muuta.Itämerestä	0.317929228
91	Itämeren.lohta	Muuta.Itämerestä	0.589928968
92	Muuta.Itämerestä	Muuta.Itämerestä	1
93	Pakastekalaa	Muuta.Itämerestä	0.061080508
94	Kalasäilykkeitä	Muuta.Itämerestä	0.164729369
95	Valtamerikalaa	Muuta.Itämerestä	0.381394106
96	Äyriäisiä	Muuta.Itämerestä	0.295389062
97	Kalaa	Pakastekalaa	0.207057983
98	Petokalaa	Pakastekalaa	-0.055693699
99	Muikkua	Pakastekalaa	-0.006076891
100	Sisävesikalaa	Pakastekalaa	-0.013742777
101	Kirjolohta	Pakastekalaa	0.166699904
102	Silakkaa	Pakastekalaa	0.106271923
103	Itämeren.lohta	Pakastekalaa	0.083123081
104	Muuta.Itämerestä	Pakastekalaa	0.061080508
105	Pakastekalaa	Pakastekalaa	1
106	Kalasäilykkeitä	Pakastekalaa	0.345409216
107	Valtamerikalaa	Pakastekalaa	0.129614914
108	Äyriäisiä	Pakastekalaa	0.125383855
109	Kalaa	Kalasäilykkeitä	0.21677219
110	Petokalaa	Kalasäilykkeitä	0.049204262
111	Muikkua	Kalasäilykkeitä	0.065331013
112	Sisävesikalaa	Kalasäilykkeitä	0.106690375
113	Kirjolohta	Kalasäilykkeitä	0.267376587
114	Silakkaa	Kalasäilykkeitä	0.202491349
115	Itämeren.lohta	Kalasäilykkeitä	0.167824298
116	Muuta.Itämerestä	Kalasäilykkeitä	0.164729369
117	Pakastekalaa	Kalasäilykkeitä	0.345409216
118	Kalasäilykkeitä	Kalasäilykkeitä	1
119	Valtamerikalaa	Kalasäilykkeitä	0.212919187
120	Äyriäisiä	Kalasäilykkeitä	0.296174683
121	Kalaa	Valtamerikalaa	0.273916098
122	Petokalaa	Valtamerikalaa	0.080707577
123	Muikkua	Valtamerikalaa	0.157485773
124	Sisävesikalaa	Valtamerikalaa	0.188327229
125	Kirjolohta	Valtamerikalaa	0.304243078
126	Silakkaa	Valtamerikalaa	0.27027233
127	Itämeren.lohta	Valtamerikalaa	0.415932209
128	Muuta.Itämerestä	Valtamerikalaa	0.381394106
129	Pakastekalaa	Valtamerikalaa	0.129614914
130	Kalasäilykkeitä	Valtamerikalaa	0.212919187
131	Valtamerikalaa	Valtamerikalaa	1
132	Äyriäisiä	Valtamerikalaa	0.272081755
133	Kalaa	Äyriäisiä	0.143764271
134	Petokalaa	Äyriäisiä	0.082444392
135	Muikkua	Äyriäisiä	0.072256144
136	Sisävesikalaa	Äyriäisiä	0.114591507
137	Kirjolohta	Äyriäisiä	0.154695247
138	Silakkaa	Äyriäisiä	0.17446606
139	Itämeren.lohta	Äyriäisiä	0.312466845
140	Muuta.Itämerestä	Äyriäisiä	0.295389062
141	Pakastekalaa	Äyriäisiä	0.125383855
142	Kalasäilykkeitä	Äyriäisiä	0.296174683
143	Valtamerikalaa	Äyriäisiä	0.272081755
144	Äyriäisiä	Äyriäisiä	1

EU kalat

The code that used to be here was moved to EU-kalat#Calculations.
What updates should be done:
- Plot iterations to see that the model results do not drift.
- Take modelled parameters and develop a MC model to produce predicted concentrations.
  - TCDD concentration should be added to the hierearchical Bayes model for this?
- KTL Sarcoma study, EU-kalat and Goherr: Fish consumption study should all be combined into one model. ----#: . Can models be combined as text with paste()? This could work if all submodels had unique parameter names like N.eu and N.goh rather than just N. And data lists are merged simply with c(). --Jouni (talk) 16:22, 22 January 2017 (UTC) (type: truth; paradigms: science: comment)
- A causal diagram should be drawn to show the model structure.
JAGS user manual [2] (with e.g. distribution names and other guidance)
How to generate predictions in JAGS [3]
Using rjags, a simple guidance [4]

Easily generate correlated variables from any distribution (without copulas) [5]

Concentration-age graph with THL formatting

+ Show code - Hide code

# This is code Op_en2721/ on page [[KTL Sarcoma study]]

library(ggplot2)
#library(thlGraphs)

thlPointPlot <- function (data, xvar, yvar, groupvar = NULL, ylabel = yvar, 
                          xlabel = NULL, colors = thlColors(n = 12, type = "quali", name = "line"),
                          title = NULL, subtitle = NULL, caption = NULL, 
                          legend.position = "none", base.size = 16, linewidth = 3, 
                          show.grid.x = FALSE, show.grid.y = TRUE, lang = "fi", ylimits = NULL, 
                          marked.treshold = 10, plot.missing = FALSE, xaxis.breaks = waiver(), 
                          yaxis.breaks = waiver(), panels = FALSE, nrow.panels = 1, 
                          labels.end = FALSE) 
{
  lwd <- thlPtsConvert(linewidth)
  gg <- ggplot(
    data,
    aes_(x = substitute(xvar),
         y = substitute(yvar),
         group = ifelse(!is.null(substitute(groupvar)), substitute(groupvar), NA),
         colour = ifelse(!is.null(substitute(groupvar)), substitute(groupvar), ""))
  ) # + geom_line(size = lwd) #!!!!!!!!!!!!!!!!!!!!
  if (isTRUE(plot.missing)) {
    df <- thlNaLines(
      data = data, xvar = deparse(substitute(xvar)), 
      yvar = deparse(substitute(yvar)),
      groupvar = unlist(ifelse(deparse(substitute(groupvar)) != "NULL", deparse(substitute(groupvar)), list(NULL)))
    )
    if (!is.null(df) & FALSE) { ##!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      gg <- gg + geom_line(
        data = df, aes_(
          x = substitute(xvar),
          y = substitute(yvar),
          group = ifelse(!is.null(substitute(groupvar)), substitute(groupvar), NA),
          colour = ifelse(!is.null(substitute(groupvar)), substitute(groupvar), "")
        ),
        linetype = 2,
        size = lwd
      )
    }
  }
  if (!is.null(marked.treshold)) {
    if (length(unique(data[, deparse(substitute(xvar))])) > marked.treshold) {
      if (is.factor(data[, deparse(substitute(xvar))]) || 
          is.character(data[, deparse(substitute(xvar))]) || 
          is.logical(data[, deparse(substitute(xvar))])) {
        levs <- levels(factor(data[, deparse(substitute(xvar))]))
        min <- levs[1]
        max <- levs[length(levs)]
      } else {
        min <- min(data[, deparse(substitute(xvar))])
        max <- max(data[, deparse(substitute(xvar))])
      }
      subdata <- data[c(data[, deparse(substitute(xvar))] %in% c(min, max)), ]
      gg <- gg + geom_point(
        data = subdata,
        aes_(
          x = substitute(xvar), 
          y = substitute(yvar),
          group = ifelse(!is.null(substitute(groupvar)), substitute(groupvar), NA),
          colour = ifelse(!is.null(substitute(groupvar)), substitute(groupvar), "")
        ), stroke = 1.35 * lwd, fill = "white", shape = 21, size = 10/3 * lwd
      )
    } else {
      gg <- gg + geom_point(stroke = 1.35 * lwd, fill = "white", size = 10/3 * lwd, shape = 21)
    }
  }
  if (isTRUE(labels.end)) {
    if (is.factor(data[, deparse(substitute(xvar))]) || 
        is.character(data[, deparse(substitute(xvar))]) || 
        is.logical(data[, deparse(substitute(xvar))])) {
      levs <- levels(factor(data[, deparse(substitute(xvar))]))
      maxd <- data[data[, deparse(substitute(xvar))] == levs[length(levs)], ]
    } else {
      maxd <- data[data[, deparse(substitute(xvar))] == max(data[, deparse(substitute(xvar))]), ]
    }
    brks <- maxd[, deparse(substitute(yvar))]
    labsut <- maxd[, deparse(substitute(groupvar))]
  } else (brks <- labsut <- waiver())
  gg <- gg + ylab(ifelse(deparse(substitute(ylabel)) == "yvar", deparse(substitute(yvar)), ylabel)) +
    labs(title = title, subtitle = subtitle, caption = caption) +
    thlTheme(
      show.grid.y = show.grid.y,
      show.grid.x = show.grid.x,
      base.size = base.size,
      legend.position = legend.position, 
      x.axis.title = ifelse(!is.null(xlabel), TRUE, FALSE)
    ) + 
    xlab(ifelse(!is.null(xlabel), xlabel, "")) + 
    scale_color_manual(values = colors) + 
    thlYaxisControl(
      lang = lang,
      limits = ylimits,
      breaks = yaxis.breaks, 
      sec.axis = labels.end,
      sec.axis.breaks = brks,
      sec.axis.labels = labsut
    )
  if (is.factor(data[, deparse(substitute(xvar))]) ||
      is.character(data[, deparse(substitute(xvar))]) ||
      is.logical(data[, deparse(substitute(xvar))])) {
    gg <- gg + scale_x_discrete(breaks = xaxis.breaks, expand = expand_scale(mult = c(0.05)))
  } else (gg <- gg + scale_x_continuous(breaks = xaxis.breaks))
  if (isTRUE(panels)) {
    fmla <- as.formula(paste0("~", substitute(groupvar)))
    gg <- gg + facet_wrap(fmla, scales = "free", nrow = nrow.panels)
  }
  gg
}

# Nro;Alue;SP;Alue;Ik„ (a);TEQ;TapVer;Tequart;Ik„luokka;Altistus;Valittu tapaus;Stratum2;Valittuja verrokkeja;Tapauksen ik„;;;L”ytynyt tapaus;L”ytyneiden m„„r„;Hakuprosessi: 1) Varmista, ett„ sarake Valittu tapaus on tyhj„. 2) Anna valittujen m„„r„ksi 0 ja ik„kriteetiksi tiukin k„ytetty. 3). Laske. Filter”i m„„r„ 1:t ja merkitse l”ytynyt tapauksen tunnus sarakkeeseen Valittu tapaus. 4) Laske. Filter”i L”ytyneiden m„„r„t 2, 3, jne ja valitse tapaus oikealle verrokille. 5) L”ys„„ ik„kriteeri„ jos on tarpeen ja toista 3)-4). 6) Anna valittujen m„„r„ksi 1, 2, 3 jne ja toista 2) - 5).
# Z = helfs01.thl.fi/documents/
sarc <- read.csv("Z:/YMAL_arc/CEHRA_Archived2018/Tutkimus/_until2004/R16_sarkooma/Analyysit/Analyysi/Lopulliset4.csv",
                 skip=2, sep=";", dec=",", header=FALSE)

sar <- sarc[c(1:3, 5:460),c(2,3,5,6,7)]
colnames(sar) <- c("Region","Gender","Age","TEQ","Case")
sar$Gender <- factor(sar$Gender, labels=c("Male","Female"))
sar$Case <- factor(sar$Case, labels=c("Case","Control"))

ggplot(sar, aes(x=Age, y=TEQ, colour=Gender))+geom_point()

thlPointPlot(sar, xvar=Age, yvar=TEQ, groupvar=Gender, marked.treshold = 1000,
             legend.position = "bottom",
             xlabel="Age", ylabel="", base.size=30,
             title="Dioxin concentration by age",
             subtitle="(pg/g TEQ in fat)")+
  geom_vline(xintercept=0, width=1.5)

ggsave("Dioxin concentration.png", width=11, height=8)

Related files

References

↑ Jouni T. TUOMISTO, Juha PEKKANEN, Hannu KIVIRANTA, Erkki TUKIAINEN, Terttu VARTIAINEN and Jouko TUOMISTO. Soft-tissue sarcoma and dioxin: a case-control study. Int. J. Cancer: 108, 893–900 (2004)
↑ Chemosphere (2005) 60: 78: 854-869
↑ Kiviranta H, Korhonen M, Hallikainen A, Vartiainen T. Kalojen dioksiinien ja PCB:eiden kulkeutuminen ihmiseen. Ympäristö ja Terveys 2000; 31: 65-9.

[1] Jouni T. TUOMISTO, Juha PEKKANEN, Hannu KIVIRANTA, Erkki TUKIAINEN, Terttu VARTIAINEN and Jouko TUOMISTO. Soft-tissue sarcoma and dioxin: a case-control study. Int. J. Cancer: 108, 893–900 (2004)

[2] Chemosphere (2005) 60: 78: 854-869

[3] Kiviranta H, Korhonen M, Hallikainen A, Vartiainen T. Kalojen dioksiinien ja PCB:eiden kulkeutuminen ihmiseen. Ympäristö ja Terveys 2000; 31: 65-9.

[1]

[2]

[3]

KTL Sarcoma study: Difference between revisions

Latest revision as of 18:07, 1 August 2019

Contents

Question

Answer

Rationale

Methods

Study population

Exposure assessment

Detailed exposure assessment

Quality control and assurance

Statistical analyses

Data

Questionnaire

Variable information

Data management

Interpretations

Analyses

Simulated data

POPs and obesity

Self-reported chemical exposure

Correlation of dioxin and fish

EU kalat

Concentration-age graph with THL formatting

See also

Related files

References

Navigation menu

@@ Line 2: / Line 2: @@
 [[Category:Finland]]
 [[Category:Dioxins]]
+[[Category:TCDD project]]
+[[Category:Contains R code]]
+[[Category:Code under inspection]]
+[[Category:Data]]
 {{study|moderator=Jouni}}
-<rcode include="page:OpasnetBaseUtils|name:generic" graphics="1">
-library(OpasnetBaseUtils)
-library(reshape)
-a <- tidy(op_baseGetData("opasnet_base", "Op_en2721"))
-a <- a[!is.na(a$public_id), ]
-a$Result <- as.numeric(a$Result)
-a <- melt(a, id.vars = "public_id")
-colnames(a) <- c("Obs", "Congener", "Result")
-head(a)
-op_baseWrite
-</rcode>
 *Authors:  [[User:Jouni|Jouni T. TUOMISTO]], Juha PEKKANEN, Hannu KIVIRANTA, Erkki TUKIAINEN, Terttu VARTIAINEN and Jouko TUOMISTO
@@ Line 22: / Line 14: @@
 *Journal: [http://www3.interscience.wiley.com/journal/29331/home Int J Cancer]}}
-==Scope==
+==Question==
 Because it is obvious that there is a great need for improved
@@ Line 42: / Line 34: @@
 contrast to occupational studies.<ref>[[User:Jouni|Jouni T. TUOMISTO]], Juha PEKKANEN, Hannu KIVIRANTA, Erkki TUKIAINEN, Terttu VARTIAINEN and Jouko TUOMISTO. Soft-tissue sarcoma and dioxin: a case-control study. [http://www3.interscience.wiley.com/journal/106566799/abstract Int. J. Cancer: 108, 893–900 (2004)]</ref>
-==Material and methods==
+==Answer==
+There is [[:File:KTL Sarcoma study.csv|simulated data]] available about the study. For details, see [[#Simulated data]].
-===Study population===
+'''[http://en.opasnet.org/en-opwiki/index.php/Special:R-tools?id=0CrH05oF1BEBRYcW Main fish consumption and PCDD/F variables]
+'''Some plots about dioxin congeners.
+<rcode include="page:OpasnetBaseUtils|name:generic"
+variables="
+name:first|description:What congener do you want to plot on X axis?|type:selection|options:'TCDF2378';TCDF2378;'TCDD2378';TCDD2378;'PF12378';PF12378;'PF23478';PF23478;'PD12378';PD12378;'HF123478';HF123478;'HF123678';HF123678;'HF234678';HF234678;'HF123789';HF123789;'HD123478';HD123478;'HD123678';HD123678;'HD123789';HD123789;'F1234678';F1234678;'F1234789';F1234789;'D1234678';D1234678;'OCDF';OCDF;'OCDD';OCDD;'toxsum';toxsum;'WHOTEQ';WHOTEQ|default:'WHOTEQ'|
+name:second|description:What congener do you want to plot on X axis?|type:selection|options:'TCDF2378';TCDF2378;'TCDD2378';TCDD2378;'PF12378';PF12378;'PF23478';PF23478;'PD12378';PD12378;'HF123478';HF123478;'HF123678';HF123678;'HF234678';HF234678;'HF123789';HF123789;'HD123478';HD123478;'HD123678';HD123678;'HD123789';HD123789;'F1234678';F1234678;'F1234789';F1234789;'D1234678';D1234678;'OCDF';OCDF;'OCDD';OCDD;'toxsum';toxsum;'WHOTEQ';WHOTEQ|default:'TCDD2378'"
+graphics=1 embed=1>
+library(OpasnetUtils)
+a <- opbase.data("Op_en2721")
+oprint(head(a))
+plot(a[a$Congener == first, "Result"], a[a$Congener == second, "Result"], xlab = first, ylab = second)
+plot(a[a$Congener == "OCDD", "Result"], a[a$Congener == "WHOTEQ", "Result"])
+plot(a[a$Congener == "WHOTEQ", "Result"], a[a$Congener == "OCDD", "Result"])
+plot(a[a$Congener == "OCDF", "Result"], a[a$Congener == "OCDD", "Result"])
+</rcode>
+==Rationale==
+=== Methods ===
+====Study population====
 The majority of sarcoma patients in southern Finland are treated
@@ Line 110: / Line 129: @@
 age and area could be found.
-===Exposure assessment===
+====Exposure assessment====
 From the matched 337 patients, concentrations of the 17 toxic
@@ Line 158: / Line 177: @@
 Limits of quantitation (LOQ) for PCDD/Fs and non-ortho PCBs varied between 0.1–5 and 1–5 pg g−1 fat, respectively, and for other PCBs between 0.02 and 0.1 ng g−1 fat, depending on each individual congener. Recoveries for internal standards were more than 50% for all congeners. Concentrations were calculated with lower bound method in which the results of congeners with concentrations below the LOQ were designated as nil.
+'''This code''' was used to upload the data to Opasnet Base:
+<rcode include="page:OpasnetBaseUtils|name:generic" graphics="1">
+library(OpasnetBaseUtils)
+library(reshape)
+a <- tidy(op_baseGetData("opasnet_base", "Op_en2721"))
+colnames(a)[colnames(a) == "Result"] <- "WHOTEQ"
+for(i in 1:ncol(a)) {a[, i] <- as.numeric(a[, i])}
+a <- a[!is.na(a$public_id), ]
+#colnames(a)
+#class(a)
+#class(a$Result)
+#a$Result
+#as.numeric(a$Result)
+#a$Result <- as.numeric(a$Result)
+a <- melt(a, id.vars = "public_id")
+colnames(a) <- c("Obs", "Congener", "Result")
+a[a$Congener == "WHOTEQ", ]
+plot(a[a$Congener == "OCDD", "Result"], a[a$Congener == "WHOTEQ", "Result"])
+plot(a[a$Congener == "WHOTEQ", "Result"], a[a$Congener == "OCDD", "Result"])
+plot(a[a$Congener == "OCDF", "Result"], a[a$Congener == "OCDD", "Result"])
+a
+#op_baseWrite("opasnet_base", a, ident = "Op_en2721", unit = "pg /g_fat", who = "Jouni", acttype = 4)
+</rcode>
 ====Quality control and assurance====
@@ Line 165: / Line 210: @@
 The laboratory has successfully participated in several international quality control studies for the analysis of PCDD/Fs, and PCBs. Matrices in these studies have included cow milk, human milk and human serum. (Yrjänheikki, 1991, Rymen, 1994, WHO, 1996 and Lindström et al., 2000). The laboratory of chemistry in the National Public Health Institute is an accredited testing laboratory (No T077) in Finland (EN ISO/IEC 17025). The scope of accreditation includes PCDD/Fs, non-ortho PCBs, and other PCBs from human tissue samples.
-===Statistical analyses===
+====Statistical analyses====
 Conditional logistic regression analysis was performed with
@@ Line 202: / Line 247: @@
 preservatives, strong detergents, heavy metals, other chemicals.
-<rcode>
+===Data===
- data = read.table("N:/Huippuyksikko/Tutkimus/R80_Sarkooma2/Analyysi020712b.csv", header = TRUE)
- cor(data,data, use="pairwise.complete.obs", method="spearman")
-</rcode>
-==Results==
-* [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/Analyysi020712.xls The original data]
+* [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/Analyysi020712.xls The original data], [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R80_Sarkooma2/Analyysi020712b.csv in csv file]
 * [[:Image:KTL Sarcoma study statistical analyses.txt|KTL Sarcoma study statistical analyses.txt]]
 * [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/TuomistoAnalyysiloki20020712.txt Log file about the statistical analyses: Part 1], [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/TuomistoAnalyysiloki20020712_osa2.txt Part2]
 * [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/TuomistoAnalyysi020712Tulokset.xls Compilation of the results of statistical analyses]
-===Variable information===
+The code below runs the main fish consumption and PCDD/F variables, but because this is personal-level data, you need a password to run it. However, you can see ready-made results [http://en.opasnet.org/en-opwiki/index.php/Special:R-tools?id=0CrH05oF1BEBRYcW].
+For variable descriptions, see {{disclink|Variable description for reduced data}}
+<rcode graphics="1" variables="name:password|description:Password|type:password">
+library(OpasnetUtils)
+objects.get("isqT7nvhd0ViUR7d")
+data <- objects.decode(etable, password)
+colnames(data) <- t(data[1, ])
+data <- data[2:nrow(data), 2:ncol(data)]
+for(i in 1:ncol(data)) {
+	data[[i]] <- as.numeric(as.character(data[[i]]))
+}
+colnames(data)[colnames(data) == "aluenro"] <- "BMI" # Poistetaan aluenro-sarake ja korvataan se BMI:llä.
+data$BMI <- data$Paino / (data$Pituus / 100)^2
+# oprint(head(data))
+cat("Data from P:\\huippuyksikko\\Tutkimus\\R16_sarkooma\\Data\\Panulle20031216\\Analyysi020712_typistetty.xls.", nrow(data), "observations.\n")
+oprint(cor(x = data, use = "pairwise.complete.obs", method = "pearson"))
+# Basic Scatterplot Matrix
+pairs(~ika+BMI+Kalaa+Silakkaa+PF23478+WHOTEQ, data = data,
+   main="Simple Scatterplot Matrix")
+</rcode>
+==== Questionnaire ====
+* {{#l:KTL_sarcoma_questionnaire_finnish.odt}}
+* {{#l:KTL_sarcoma_questionnaire_swedish.odt}}
+====Variable information====
 The variable information was originally documented in [http://ytoswww/yhteiset/Huippuyksikko/Tutkimus/R16_sarkooma/Analyysit/Analyysi020712/TuomistoAnalyysiloki20020712.txt Log file about the statistical analyses: Part 1], but unfortunately mostly in Finnish.
+{{hidden|
+=
   Tähän on syytä kirjata myös muuttujaluettelo, koska sitä ei ole missään muualla kunnolla tehty.
         LEIKKAUS    leikkauspvm, SASin oma formaatti (päivää jostakin kiintopisteestä?)
@@ Line 262: / Line 341: @@
 	Muu kuin tuumori
 	Tuplanäyte
-        SP          sukupuoli 1=mies, 2=nainen K6
+        SP          sukupuoli 1: mies, 2: nainen K6
         KOULUV      kouluvuodet K8
         PITUUS      pituus, cm K15
@@ Line 385: / Line 464: @@
        VALTAMER * 0.095303307 +
        _YRI_ISI * 0.033074683;
+}}
+====Data management====
+'''Code to manage the data. It takes the original data files and merges them. Works only if files are available.
+{{hidden|
+<pre>
+# Sarcoma epidemiological data
+# Data was obtained from this data file (saved 23.9.2002, version 12.7.2002)
+# It contains all data that was used in the publication but not e.g. all questionnaire data.
+# U:\arkisto_kuopio\huippuyksikko\Tutkimus\R16_sarkooma\Analyysit\Analyysi020712\Analyysi020712.xls
+library(OpasnetUtils)
+sarc <- read.table(
+	"//cesium/yhteiset/YMAL/Projects/Silakan riskiarvio/Data/Salaiset/Sarkooma/Analyysi020712_copy.csv",
+	sep = ",", header = TRUE
+)
+colnames(sarc)[c(24, 27, 28, 30, 32)] <- c(
+	"Sisävesikalaa",
+	"Itämeren.lohta",
+	"Muuta.Itämerestä",
+	"Kalasäilykkeitä",
+	"Äyriäisiä"
+)
+########################################### Questionnaire
+# The questions come from the questionnaire form
+# U:\arkisto_kuopio\huippuyksikko\Tutkimus\R16_sarkooma\KTL_sarcoma_study_questionnaire.odt
+# The questionnaire data comes from
+# U:\arkisto_kuopio\huippuyksikko\Tutkimus\R80_Sarkooma2\Data\Kyselyt.xls (Kyselyt.csv does not contain åäö)
+# Kyselyt.xls was saved to N:\YMAL\Projects\Silakan riskiarvio\Data\Salaiset\Kyselyt.csv
+ques <- read.table(
+	"//cesium/yhteiset/YMAL/Projects/Silakan riskiarvio/Data/Salaiset/Kyselyt.csv",
+	sep = ",", header = TRUE
+)
+poista <- c(
+	"Pituus",
+	"Paino",
+	"Kalaa",
+	"Petokalaa",
+	"Muikkua",
+	"Sisävesikalaa",
+	"Kirjolohta",
+	"Silakkaa",
+	"Itämeren.lohta",
+	"Muuta.itämerestä",
+	"Pakastekalaa",
+	"Kalasäilykkeitä",
+	"Valtamerikalaa",
+	"Äyriäisiä",
+	"Vastattu",
+	"AlkoKuinka.usein",
+	"Onko1",
+	"Onko2",
+	"Onko3",
+	"Onko4",
+	"Onko5",
+	"Onko6",
+	"Onko7",
+	"Onko8",
+	"Onko9",
+	"Onko10"
+)
+# The tests below shows that the questionnaire columns in sarc and ques are actually identical. Therefore
+# we remove them from ques and keep those in sarc, the data file that was used in the publication.
+#for(i in testaa) {
+#	x <- dat[[paste(i, ".x", sep = "")]]
+#	if(is.factor(x)) x <- as.numeric(x) # levels(x)[x]
+#	y <- dat[[paste(i, ".y", sep = "")]]
+#	if(is.factor(y)) y <- as.numeric(y) # levels(y)[y]
+#	print(paste(i, sum( x != y, na.rm = TRUE)))
+#}
+#sum(as.numeric(dat$alko) != as.numeric(dat$AlkoKuinka.usein), na.rm = TRUE)
+#sum(as.numeric(dat$liuotti) != as.numeric(dat$Onko1), na.rm = TRUE)
+#sum(as.numeric(dat$maalit) != as.numeric(dat$Onko2), na.rm = TRUE)
+#sum(as.numeric(dat$formalde) != as.numeric(dat$Onko3), na.rm = TRUE)
+#sum(as.numeric(dat$hyonteis) != as.numeric(dat$Onko4), na.rm = TRUE)
+#sum(as.numeric(dat$Kasvinsu) != as.numeric(dat$Onko5), na.rm = TRUE)
+#sum(as.numeric(dat$Kyllate) != as.numeric(dat$Onko6), na.rm = TRUE)
+#sum(as.numeric(dat$Pesuaine) != as.numeric(dat$Onko7), na.rm = TRUE)
+#sum(as.numeric(dat$Metallit) != as.numeric(dat$Onko8), na.rm = TRUE)
+#sum(as.numeric(dat$Muu1) != as.numeric(dat$Onko9), na.rm = TRUE)
+#sum(as.numeric(dat$Muu2) != as.numeric(dat$Onko10), na.rm = TRUE)
+ques <- ques[!colnames(ques) %in% poista]
+dat <- merge(sarc, ques, by = "kysely_id", all.x = TRUE)
+ruoat <- list(c(
+	"Harvemmin kuin kerran kuukaudessa tai en lainkaan",
+	"Kerran tai pari kuukaudessa",
+	"Kerran viikossa",
+	"Pari kertaa viikossa",
+	"Lähes joka päivä",
+	"Kerran päivässä tai useammin"
+),
+NA,
+TRUE
+)
+kalat <- list(
+	c(
+		"En lainkaan",
+		"Harvemmin kuin kerran kuukaudessa",
+		"Kerran tai pari kuukaudessa",
+		"Kerran viikossa",
+		"Pari kertaa viikossa",
+		"Lähes joka päivä"
+	),
+:5,
+	TRUE
+)
+yn <- list(c("No", "Yes"), NA, TRUE)
+locs <- list(
+	aluenro = c(
+		"-",
+		"Espoo",
+		"Helsinki",
+		"Hyvinkää",
+		"Hämeenlinna",
+		"Joensuu",
+		"Jyväskylä",
+		"Kotka",
+		"Kuopio",
+		"Lahti",
+		"Lappeenranta",
+		"Pori",
+		"Seinäjoki",
+		"Tampere",
+		"Turku",
+		"Vaasa"
+	),
+	dgluokka = list(
+		c(
+			"MFH",
+			"Liposarcoma",
+			"Leiomyosarcoma",
+			"Angiosarcoma",
+			"Chondrosarcoma",
+			"Sarcoma synoviale",
+			"Sarcoma Ewing",
+			"Dermatofibrosarcoma",
+			"Sarcoma alia",
+			"Sarcoma NUD",
+			"Ei tietoa",
+			"Osteosarcoma extrasceletale",
+			"Lipoma",
+			"Tumor Desmoides",
+			"Myxoma",
+			"Muu benigni tuumori",
+			"Melanoma",
+			"Muu kuin tuumori",
+			"Tuplanäyte"
+		),
+		c(1:12, 21:27)
+	),
+	sp = c("Male", "Female"),
+	Kalaa = ruoat,
+	Petokalaa = kalat,
+	Muikkua = kalat,
+	Sisävesikalaa = kalat,
+	Kirjolohta = kalat,
+	Silakkaa = kalat,
+	Itämeren.lohta = kalat,
+	Muuta.Itämerestä = kalat,
+	Pakastekalaa = kalat,
+	Kalasäilykkeitä = kalat,
+	Valtamerikalaa = kalat,
+	Äyriäisiä	= kalat,
+	sade		= yn,
+	tupak		= yn,
+	tstatus 	= c("< 6 mo ago", "> 6 mo ago", "Never"),
+	liuotti		= yn,
+	maalit		= yn,
+	formalde	= yn,
+	hyonteis	= yn,
+	Kasvinsu	= yn,
+	Kyllaste	= yn,
+	Pesuaine	= yn,
+	Metallit	= yn,
+	Muu1		= yn,
+	Muu2		= yn,
+	Hyonluok	= list(c("No", "Very mild", "Mild", "Moderate", "High"), 0:4, TRUE),
+	Kasvluok	= list(c("No", "Very mild", "Mild", "Moderate", "High"), 0:4, TRUE),
+	Kyllluok 	= list(c("No", "Very mild", "Mild", "Moderate", "High"), 0:4, TRUE),
+	alko = list(c(
+		"en koskaan",
+		"kerran vuodessa tai harvemmin",
+		"pari kertaa vuodessa",
+		"3-4 kertaa vuodessa",
+		"noin kerran parissa kuukaudessa",
+		"noin kerran kuukaudessa",
+		"pari kertaa kuukaudessa",
+		"noin kerran viikossa",
+		"muutaman kerran viikossa",
+		"päivittäin"
+	),
+:1,
+	TRUE
+	),
+	Koulutus = list(c(
+		"Kansakoulu tai peruskoulu",
+		"Keskikoulu",
+		"Ammattikoulu tai vastaava",
+		"Opistotutkinto ja/tai lukio",
+		"Akateeminen tutkinto"
+	),
+	NA,
+	TRUE
+	),
+	Työntekijäryhmä = c(
+		"Ylempi toimihenkilö",
+		"Alempi toimihenkilö",
+		"Työntekijä",
+		"Maanviljelijä",
+		"Yrittäjä",
+		"Opiskelija",
+		"Eläkeläinen",
+		"Kotirouva",
+		"Työtön"
+	),
+	Painonmuutos = list(c(
+		"Olen laihtunut",
+		"Painoni ei ole juuri muuttunut",
+		"Olen lihonut ja laihtunut",
+		"Olen lihonut"
+	),
+	c(3, 2, 13, 1),
+	TRUE
+	),
+	Ruokavalio = c(
+		"ei erityisruokavaliota",
+		"kasvisruoka sekä maito- ja munatuotteet",
+		"ainoastaan kasvisruokavalio",
+		"gluteeniton",
+		"maidoton (ei edes hyla-tuotteita)",
+		"muu, tarkempi kuvaus"
+	),
+	Leipää = ruoat,
+	Puuroja = ruoat,
+	Makaronia = ruoat,
+	Muutaviljaa = ruoat,
+	Viiliä = ruoat,
+	Juustoja = ruoat,
+	Rasvaisia.juustoja = ruoat,
+	Jäätelöä = ruoat,
+	Liharuokaa = ruoat,
+	Mitä.maitoa = list(c(
+		"en juo maitoa enkä piimää",
+		"rasvatonta maitoa",
+		"rasvatonta piimää tai kirnupiimää",
+		"muuta piimää",
+		"ykkösmaitoa",
+		"kevytmaitoa",
+		"täysmaitoa"
+	),
+	c(7, 4, 5, 6, 3, 2, 1),
+	TRUE
+	),
+# Tästä välistä puuttuu Mitä.piimää. Vai puuttuuko? Onko yhdistetty maitoon?
+	Mitä.leivälle = list(c(
+		"En mitään",
+		"Kasvimargariinia",
+		"Voi-kasvirasvaseosta",
+		"Voita"
+	),
+	NA,
+	TRUE
+	),
+	Paljonko.rasvaa = list(c(
+		"En lainkaan",
+		"Voinapilla voitelen kolme viipaletta tai enemmän",
+		"Voinapilla voitelen 1-2 viipaletta",
+		"Käytän enemmän kuin yhden voinapin viipaletta kohti"
+	),
+	NA,
+	TRUE
+	),
+	Mitä.rasvaa = list(c(
+		"Ei mitään rasvaa",
+		"Kasviöljyä",
+		"Kasvimargariinia",
+		"Talousmargariinia",
+		"Voi-kasviöljyseosta",
+		"Voita"
+	),
+	c(6, 1, 2, 3, 4, 5),
+	TRUE
+	),
+	Ruokamuutos = c(
+		"Ei",
+		"Kyllä"
+	),
+	Vesilähde = c(
+		"kunnan vesijohtovettä",
+		"oman kaivon vettä",
+		"kaupan pullotettua vettä",
+		"muuta"
+	),
+	Tupakointi = c(
+		"En",
+		"Kyllä"
+	),
+	TupakSäännöllisyys = c(
+		"en ole koskaan tupakoinut säännöllisesti",
+		"olen tupakoinut säännöllisesti"
+	),
+	Tupakviimeksi = list(c(
+		"yli 10 vuotta sitten",
+		"6 - 10 vuotta sitten",
+		"1 - 5 vuotta sitten",
+		"puoli vuotta - vuosi sitten",
+		"1 kk - puoli vuotta sitten",
+		"2 pv - 1 kk sitten",
+		"eilen tai tänään"
+	),
+:1,
+	TRUE
+	),
+	Alkoholi = list(c(
+		"en ole koskaan käyttänyt alkoholijuomia",
+		"en, olen lopettanut alkoholin käytön kokonaan",
+		"kyllä, harvemmin kuin kerran kuussa",
+		"kyllä, vähintään kerran kuussa"
+	),
+:1,
+	TRUE
+	),
+	AlkoKuinka.paljon = list(c(
+		"vähemmän kuin yhden annoksen",
+		"1 annoksen",
+		"2 annosta",
+		"3 annosta",
+		"4-5 annosta",
+		"6-10 annosta",
+		"yli 10 annosta"
+	),
+	NA,
+	TRUE
+	),
+	Asuntotyyppi = c(
+		"omakotitalossa",
+		"rivitalossa",
+		"kerrostalossa"
+	),
+	Lämmitystyyppi = c(
+		"kaukolämpö",
+		"öljylämmitys",
+		"sähkölämmitys",
+		"puulämmitys",
+		"muu"
+	),
+	Paikkaus = c(
+		"Ei ole",
+		"Kyllä"
+	),
+	PaikkaPoisto = yn,
+	Montako.paikkaa = list(c(
+		"ei yhtään",
+		"1 - 2",
+		"3 - 6",
+		"7 - 15",
+		"yli 15"
+	),
+	NA,
+	TRUE
+	),
+	Sairaus 	= yn,
+	Elinsiirto	= yn,
+	Tekonivel	= yn,
+	Muu			= yn,
+	Sädehoito	= yn,
+	Vierasesine	= yn,
+	AIDS		= yn,
+	Neurofibr	= yn,
+	vonHippel	= yn
+)
+for(i in names(locs)) {
+	if(!is.list(locs[[i]])) locs[[i]] <- list(locs[[i]], NA)
+	if(!is.numeric(locs[[i]][[2]])) locs[[i]][[2]] <- 1:length(locs[[i]][[1]])
+	if(length(locs[[i]]) < 3) locs[[i]][[3]] <- FALSE
+	if(i == "Kalaa") {print(locs[[i]]); print(dat[[i]][1:100])}
+	dat[[i]] <- factor(dat[[i]], levels = locs[[i]][[2]], labels = locs[[i]][[1]], ordered = locs[[i]][[3]])
+	if(i == "Kalaa") {print(locs[[i]]); print(dat[[i]][1:100])}
+}
+levels(dat$ikäAlin)[levels(dat$ikäAlin) == "20.5"] <- "20"
+dat$ikaA1 <- as.numeric(substr(dat$ikäAlin, 1, 2))
+dat$ikaA2 <- as.character(dat$ikäAlin)
+dat$ikaA2 <- as.numeric(substr(dat$ikaA2, nchar(dat$ikaA2)-1, nchar(dat$ikaA2)))
+levels(dat$ikäYlin)[levels(dat$ikäYlin) == "20.5"] <- "20"
+levels(dat$ikäYlin)[levels(dat$ikäYlin) == "21.5"] <- "21"
+levels(dat$ikäYlin)[levels(dat$ikäYlin) == "28.5"] <- "28"
+levels(dat$ikäYlin)[levels(dat$ikäYlin) == "121"] <- "21"
+dat$ikaY1 <- as.numeric(substr(dat$ikäYlin, 1, 2))
+dat$ikaY2 <- as.character(dat$ikäYlin)
+dat$ikaY2 <- as.numeric(gsub("[- ,]", "", substr(dat$ikaY2, nchar(dat$ikaY2)-2, nchar(dat$ikaY2))))
+dat$Pnetto <- as.numeric(sapply(as.character(dat$Muutoskg), FUN = function(x) {strsplit(x, " ")[[1]][1]}))
+dat$Ppoikk <- -as.numeric(sapply(as.character(dat$Muutoskg), FUN = function(x) {strsplit(x, " ")[[1]][2]}))
+dat$Ppoikk[is.na(dat$Ppoikk)] <- 0
+dat$Pnetto <- ifelse(dat$Painonmuutos %in% c("Olen laihtunut"), -dat$Pnetto, dat$Pnetto)
+dat$Pnetto[dat$Painonmuutos == "Painoni ei ole juuri muuttunut"] <- 0
+dat$Pnetto <- dat$Pnetto + dat$Ppoikk
+# Then we'll input values for NA based on the averages of the respective subgroups
+dat$Pnetto[is.na(dat$Pnetto) & dat$Painonmuutos == "Olen lihonut"] <- 8
+dat$Pnetto[is.na(dat$Pnetto) & dat$Painonmuutos == "Olen laihtunut"] <- -8
+dat$Pnetto[is.na(dat$Pnetto) & dat$Painonmuutos == "Olen lihonut ja laihtunut"] <- 1
+dat$Rintamaitoa1[is.na(dat$Rintamaitoa1)] <- 0
+dat$Rintamaitoa2[is.na(dat$Rintamaitoa2)] <- 0
+dat$Rintamaitoa3[is.na(dat$Rintamaitoa3)] <- 0
+dat$Rintamaitoa4[is.na(dat$Rintamaitoa4)] <- 0
+dat$Rintamaitoa5[is.na(dat$Rintamaitoa5)] <- 0
+dat$Rintamaitoa6[is.na(dat$Rintamaitoa6)] <- 0
+dat$Rintamaitoa7[is.na(dat$Rintamaitoa7)] <- 0
+dat$Rintamaitoa8[is.na(dat$Rintamaitoa8)] <- 0
+dat$Rintamaitoa <- dat$Rintamaitoa1 + dat$Rintamaitoa2 + dat$Rintamaitoa3 + dat$Rintamaitoa4 +
+	dat$Rintamaitoa5 + dat$Rintamaitoa6 + dat$Rintamaitoa7 + dat$Rintamaitoa8
+### Graphs about WHOTEQ as a function of age and BMI.
+if(FALSE) {
+	ggplot(dat) + geom_point(aes(x = ika, y = WHOTEQ, colour = Paino / (Pituus/100)^2)) + scale_colour_gradientn(colours = rainbow(3))
+	ggplot(dat) + geom_point(aes(x = ika, y = WHOTEQ, colour = Paino / (Pituus/100)^2>25))
+	ggplot(dat, aes(x = ika, y = WHOTEQ, colour = Paino / (Pituus/100)^2>25)) + geom_point() + geom_smooth()
+	for(i in colnames(dat)[c(4, 6:8, 14:16, 18:35, 37:76, 82:84, 101:138, 144:152, 154:162)]) {#182, 184:199, 201:205, 238, 247, 267:268)]) {
+		print(ggplot(dat, aes_string(x = i)) + geom_bar() + labs(title = i))
+	#  par(ask = interactive()) # This makes R to wait for enter before continuing
+	}
+}
+temp <- as.character(dat$leikkaus_dt)
+temp <- as.numeric(substr(temp, nchar(temp) - 1, nchar(temp)))
+temp <- temp - pmax(
+	dat$Vuosi1,
+	dat$Vuosi2,
+	dat$Vuosi3,
+	dat$Vuosi4,
+	dat$Vuosi5,
+	dat$Vuosi6,
+	dat$Vuosi7,
+	dat$Vuosi8,
+	na.rm = TRUE
+)
+temp[is.na(temp)] <- 0
+dat$Syntymasta <- temp # Montako vuotta on viimeisen lapsen syntymästä leikkausvuoteen?
+dat$ikaA1[is.na(dat$ikaA1)] <- 20
+dat$ikaA2[is.na(dat$ikaA2)] <- 20
+dat$ikaY1 <- ifelse(is.na(dat$ikaY1), dat$ika, dat$ikaY1)
+dat$ikaY2 <- ifelse(is.na(dat$ikaY2), dat$ika, dat$ikaY2)
+dat <- within(dat, lihomisvauhti <- (kgYlin - kgAlin) / ((ikaY2 + ikaY2) / 2 - (ikaA1 + ikaA2) / 2))
+## From here on, dat contains also other information than that produced in the KTL sarcoma study.
+ffq <- opbase.data("Op_en2721", subset = "Portions per month")
+ffq$Result <- ffq$Result
+ffq$Obs <- NULL
+foods <- opbase.data("Op_en2721", subset = "Food energy and dioxin")
+# Add two columns (energy "e" and dioxin "d") for each food item in the ffq questionnaire
+dat$Energy <- 0
+dat$Dioxin <- 0
+for(i in unique(foods$Food)) {
+	# Merge classified ffq questionnaire data with quantitative interpretation of ffq.
+	cole <- paste("e", i, sep = "")
+	colnames(ffq) <- c(i, cole) # Convert words to numbers of portions per month
+	dat <- merge(dat, ffq, all.x = TRUE)
+	dat[[cole]] <- ifelse(is.na(dat[[cole]]), 0, dat[[cole]] / 30) # Replace NA with 0, otherwise /mo -> /d.
+	dat[[paste("d", i, sep = "")]] <- dat[[cole]] * as.numeric(as.character(foods[foods$Food == i & foods$Observation == "Dioxin", "Result"]))
+	dat[[paste("m", i, sep = "")]] <- dat[[cole]] * as.numeric(as.character(foods[foods$Food == i & foods$Observation == "Mass", "Result"]))
+	dat[[cole]] <- dat[[cole]] * as.numeric(as.character(foods[foods$Food == i & foods$Observation == "Energy", "Result"]))
+	dat$Energy <- dat$Energy + dat[[cole]]
+	dat$Dioxin <- dat$Dioxin + dat[[paste("d", i, sep = "")]]
+}
+objects.get("x89WlgJlDLA02vnL") # PCDD/F-data table1
+pcdd <- table1[2:nrow(table1), c(2,3)] # Aika tiputetaan pois toistaiseksi tarpeettomana (2002, 2009 dataa)
+colnames(pcdd) <- c("PCDDF", "PCB")
+pcdd$PCDDF <- as.numeric(as.character(pcdd$PCDDF))
+pcdd$PCB <- as.numeric(as.character(pcdd$PCB))
+pcdd$TEQ <- pcdd$PCDDF + pcdd$PCB
+dat$Silteq <- dat$mSilakkaa * mean(pcdd$TEQ)
+dat$Kalteq <- dat$Silteq + dat$mKalaa * 1 # Estimated mean 1 pg/g fw
+</pre>
+}}
+==== Interpretations ====
+The consumption of hard fat is calculated in the following way (Q## means the value from the survey question; I## means the interpretation from the table below; Q24&I24 means that question Q24 is quantified by using interpretation from I24 with matching values; Q23*I23 means that the survey value and interpretation are multiplied.
+ total_fat = (Q23a*I23 + Q23b*I23) * Q24&I24 + Q25&I25 * Q26&I26 * Q21a&I21 + Q27&I27 * 20
+The code assumes that a person uses 20 g/d fat for cooking. Q23: how much a) milk, b) sourmilk; Q24: What kind of milk; Q25 what fat on bread; Q26: how much fat on bread; Q27: what fat for cooking.
+The following assumptions are used to interpret survey answers:
+<t2b name="Assumptions for calculations" index="Variable,Value,Unit" obs="Result" desc="Description,Vastaus suomeksi" unit="-">
+Q23||dl per glass|2|Size of a glass of milk or sourmilk|
+Q24|1|fat g/dl|0.035|full milk, fat g/dl|täysmaitoa
+Q24|2|fat g/dl|0.015|light milk, fat g/dl|kevytmaitoa
+Q24|3|fat g/dl|0.01|1% milk, fat g/dl|ykkösmaitoa
+Q24|4|fat g/dl|0|fat-free milk|rasvatonta maitoa
+Q24|5|fat g/dl|0|fat-free sourmilk|rasvatonta piimää tai kirnupiimää
+Q24|6|fat g/dl|0.01|other sourmilk fat g/dl|muuta piimää
+Q24|7|fat g/dl|0|none of these|en juo maitoa enkä piimää
+Q25|1|hard fat, proportion|0|none|en mitään
+Q25|2|hard fat, proportion|0.15|soft margarine, share of hard fat|kasvimargariinia
+Q25|3|hard fat, proportion|0.5|oil-butter-mix, share of hard fat|Voi-kasvirasvaseosta
+Q25|4|hard fat, proportion|1|butter|voita
+Q26|1|fat g /slice of bread|0|0 g per slice of bread|en lainkaan
+Q26|2|fat g /slice of bread|3|3 g per slice of bread|10 g per 3 viipaletta
+Q26|3|fat g /slice of bread|7|7 g per slice of bread|10 g per 1-2 viipaletta
+Q26|4|fat g /slice of bread|15|15 g per slice of bread|Yli 10 g per viipale
+Q27|1|hard fat fraction|0|hard fat fraction in the baking fat used|kasviöljyä
+Q27|2|hard fat fraction|0.15|hard fat fraction in the baking fat used|kasvimargariinia
+Q27|3|hard fat fraction|0.5|hard fat fraction in the baking fat used|talousmargariinia
+Q27|4|hard fat fraction|0.5|hard fat fraction in the baking fat used|Voi-kasvirasvaseosta
+Q27|5|hard fat fraction|1|hard fat fraction in the baking fat used|voita
+Q27|6|hard fat fraction|0|hard fat fraction in the baking fat used|ei mitään rasvaa
+Q35|1|alcohol times /a|300||päivittäin
+Q35|2|alcohol times /a|100||muutaman kerran viikossa
+Q35|3|alcohol times /a|50||noin kerran viikossa
+Q35|4|alcohol times /a|25||pari kertaa kuukaudessa
+Q35|5|alcohol times /a|12||noin kerran kuukaudessa
+Q35|6|alcohol times /a|6||noin kerran parissa kuukaudessa
+Q35|7|alcohol times /a|4||3-4 kertaa vuodessa
+Q35|8|alcohol times /a|2||pari kertaa vuodessa
+Q35|9|alcohol times /a|1||kerran vuodessa tai harvemmin
+Q35|10|alcohol times /a|0||en koskaan
+Q36|1|alcohol portion |0|g alcohol|vähemmän kuin yhden
+Q36|2|alcohol portion |12|g alcohol|1 annoksen
+Q36|3|alcohol portion |24|g alcohol|2 annosta
+Q36|4|alcohol portion |36|g alcohol|3 annosta
+Q36|5|alcohol portion |55|g alcohol|4-5 annosta
+Q36|6|alcohol portion |96|g alcohol|6-10 annosta
+Q36|7|alcohol portion |150|g alcohol|Yli 10 annosta
+Q21a|1|g/day carbohydrates|1.5|carbohydrates per day of 100 g bread slices|leipää 100 g viipaleina. oletus: 50% hiilihydraattia
+Q21a|2|g/day carbohydrates|2.5|carbohydrates per day of 100 g bread slices|leipää 100 g viipaleina.
+Q21a|3|g/day carbohydrates|7.5|carbohydrates per day of 100 g bread slices|leipää 100 g viipaleina.
+Q21a|4|g/day carbohydrates|15|carbohydrates per day of 100 g bread slices|leipää 100 g viipaleina.
+Q21a|5|g/day carbohydrates|50|carbohydrates per day of 100 g bread slices|leipää 100 g viipaleina.
+Q21a|6|g/day carbohydrates|100|carbohydrates per day of 100 g bread slices|leipää 100 g viipaleina.
+Q21b|1|g/day carbohydrates|0.84|carbohydrates per day of 200 g porridge|puuroa 200 g annoksina. oletus: 70% hiilihydraattia viljasta, jota 20%
+Q21b|2|g/day carbohydrates|1.4|carbohydrates per day of 200 g porridge|puuroa 200 g annoksina.
+Q21b|3|g/day carbohydrates|4.2|carbohydrates per day of 200 g porridge|puuroa 200 g annoksina.
+Q21b|4|g/day carbohydrates|8.4|carbohydrates per day of 200 g porridge|puuroa 200 g annoksina.
+Q21b|5|g/day carbohydrates|28|carbohydrates per day of 200 g porridge|puuroa 200 g annoksina.
+Q21b|6|g/day carbohydrates|56|carbohydrates per day of 200 g porridge|puuroa 200 g annoksina.
+Q21c|1|g/day carbohydrates|1.2|carbohydrates per day of 200 g pasta|pastaa 200 g annoksina. oletus: 80% hiilihydraattia viljasta, jota 25%
+Q21c|2|g/day carbohydrates|2|carbohydrates per day of 200 g pasta|pastaa 200 g annoksina.
+Q21c|3|g/day carbohydrates|6|carbohydrates per day of 200 g pasta|pastaa 200 g annoksina.
+Q21c|4|g/day carbohydrates|12|carbohydrates per day of 200 g pasta|pastaa 200 g annoksina.
+Q21c|5|g/day carbohydrates|40|carbohydrates per day of 200 g pasta|pastaa 200 g annoksina.
+Q21c|6|g/day carbohydrates|80|carbohydrates per day of 200 g pasta|pastaa 200 g annoksina.
+Q21d|1|g/day carbohydrates|1.26|carbohydrates per day of 200 g musli etc|muita (mysli ym). oletus: 70% hiilihydraattia viljasta, jota 30%
+Q21d|2|g/day carbohydrates|2.1|carbohydrates per day of 200 g musli etc|muita (mysli ym).
+Q21d|3|g/day carbohydrates|6.3|carbohydrates per day of 200 g musli etc|muita (mysli ym).
+Q21d|4|g/day carbohydrates|12.6|carbohydrates per day of 200 g musli etc|muita (mysli ym).
+Q21d|5|g/day carbohydrates|42|carbohydrates per day of 200 g musli etc|muita (mysli ym).
+Q21d|6|g/day carbohydrates|84|carbohydrates per day of 200 g musli etc|muita (mysli ym).
+Q21e|1|g/day carbohydrates|0.3|carbohydrates per day of 200 g youghurt etc|viiliä tai jugurttia, sokeri. oletus: 5% hiilihydraattia (Doc. Geigy s. 479)
+Q21e|2|g/day carbohydrates|0.5|carbohydrates per day of 200 g youghurt etc|viiliä tai jugurttia, sokeri.
+Q21e|3|g/day carbohydrates|1.5|carbohydrates per day of 200 g youghurt etc|viiliä tai jugurttia, sokeri.
+Q21e|4|g/day carbohydrates|3|carbohydrates per day of 200 g youghurt etc|viiliä tai jugurttia, sokeri.
+Q21e|5|g/day carbohydrates|10|carbohydrates per day of 200 g youghurt etc|viiliä tai jugurttia, sokeri.
+Q21e|6|g/day carbohydrates|20|carbohydrates per day of 200 g youghurt etc|viiliä tai jugurttia, sokeri.
+Q21f|1|g/day carbohydrates|0.015|carbohydrates per 50 g cheese|vähärasv. juusto, sokeri.
+Q21f|2|g/day carbohydrates|0.025|carbohydrates per 50 g cheese|vähärasv. juusto, sokeri. oletus: 1% hiilihydraattia (Doc. Geigy s. 479)
+Q21f|3|g/day carbohydrates|0.075|carbohydrates per 50 g cheese|vähärasv. juusto, sokeri.
+Q21f|4|g/day carbohydrates|0.15|carbohydrates per 50 g cheese|vähärasv. juusto, sokeri.
+Q21f|5|g/day carbohydrates|0.5|carbohydrates per 50 g cheese|vähärasv. juusto, sokeri.
+Q21f|6|g/day carbohydrates|1|carbohydrates per 50 g cheese|vähärasv. juusto, sokeri.
+Q21g|1|g/day carbohydrates|0.015|carbohydrates per 50 g cheese|muu juusto, sokeri. oletus: 1% hiilihydraattia (Doc. Geigy s. 479)
+Q21g|2|g/day carbohydrates|0.025|carbohydrates per 50 g cheese|muu juusto, sokeri.
+Q21g|3|g/day carbohydrates|0.075|carbohydrates per 50 g cheese|muu juusto, sokeri.
+Q21g|4|g/day carbohydrates|0.15|carbohydrates per 50 g cheese|muu juusto, sokeri.
+Q21g|5|g/day carbohydrates|0.5|carbohydrates per 50 g cheese|muu juusto, sokeri.
+Q21g|6|g/day carbohydrates|1|carbohydrates per 50 g cheese|muu juusto, sokeri.
+Q21h|1|g/day carbohydrates|0.3|carbohydrates per 100 g ice cream|jäätelöä. oletus: 10% hiilihydraattia
+Q21h|2|g/day carbohydrates|0.5|carbohydrates per 100 g ice cream|jäätelöä.
+Q21h|3|g/day carbohydrates|1.5|carbohydrates per 100 g ice cream|jäätelöä.
+Q21h|4|g/day carbohydrates|3|carbohydrates per 100 g ice cream|jäätelöä.
+Q21h|5|g/day carbohydrates|10|carbohydrates per 100 g ice cream|jäätelöä.
+Q21h|6|g/day carbohydrates|20|carbohydrates per 100 g ice cream|jäätelöä.
+Q21i|1|g/day hard fat|0.12|hard fat per 200 g youghurt etc|viiliä tai jugurttia, rasva. oletus: 2 % rasvaa
+Q21i|2|g/day hard fat|0.2|hard fat per 200 g youghurt etc|viiliä tai jugurttia, rasva.
+Q21i|3|g/day hard fat|0.6|hard fat per 200 g youghurt etc|viiliä tai jugurttia, rasva.
+Q21i|4|g/day hard fat|1.2|hard fat per 200 g youghurt etc|viiliä tai jugurttia, rasva.
+Q21i|5|g/day hard fat|4|hard fat per 200 g youghurt etc|viiliä tai jugurttia, rasva.
+Q21i|6|g/day hard fat|8|hard fat per 200 g youghurt etc|viiliä tai jugurttia, rasva.
+Q21j|1|g/day hard fat|0.15|hard fat per 50 g low-fat cheese|vähärasvainen juusto. oletus: 10% rasvaa
+Q21j|2|g/day hard fat|0.25|hard fat per 50 g low-fat cheese|vähärasvainen juusto.
+Q21j|3|g/day hard fat|0.75|hard fat per 50 g low-fat cheese|vähärasvainen juusto.
+Q21j|4|g/day hard fat|1.5|hard fat per 50 g low-fat cheese|vähärasvainen juusto.
+Q21j|5|g/day hard fat|5|hard fat per 50 g low-fat cheese|vähärasvainen juusto.
+Q21j|6|g/day hard fat|10|hard fat per 50 g low-fat cheese|vähärasvainen juusto.
+Q21k|1|g/day hard fat|0.45|hard fat per 50 g cheese|juusto. oletus: 30% rasvaa (fineli)
+Q21k|2|g/day hard fat|0.75|hard fat per 50 g cheese|juusto.
+Q21k|3|g/day hard fat|2.25|hard fat per 50 g cheese|juusto.
+Q21k|4|g/day hard fat|4.5|hard fat per 50 g cheese|juusto.
+Q21k|5|g/day hard fat|15|hard fat per 50 g cheese|juusto.
+Q21k|6|g/day hard fat|30|hard fat per 50 g cheese|juusto.
+Q21l|1|g/day hard fat|0.3|hard fat per 100 g ice cream|jäätelöä. oletus: 10% rasvaa
+Q21l|2|g/day hard fat|0.5|hard fat per 100 g ice cream|jäätelöä.
+Q21l|3|g/day hard fat|1.5|hard fat per 100 g ice cream|jäätelöä.
+Q21l|4|g/day hard fat|3|hard fat per 100 g ice cream|jäätelöä.
+Q21l|5|g/day hard fat|10|hard fat per 100 g ice cream|jäätelöä.
+Q21l|6|g/day hard fat|20|hard fat per 100 g ice cream|jäätelöä.
+Q21m|1|g/day hard fat|0.45|hard fat per 100 g meat |liharuokaa. oletus: 15% rasvaa (Doc. Geigy s. 481)
+Q21m|2|g/day hard fat|0.75|hard fat per 100 g meat |liharuokaa.
+Q21m|3|g/day hard fat|2.25|hard fat per 100 g meat |liharuokaa.
+Q21m|4|g/day hard fat|4.5|hard fat per 100 g meat |liharuokaa.
+Q21m|5|g/day hard fat|15|hard fat per 100 g meat |liharuokaa.
+Q21m|6|g/day hard fat|30|hard fat per 100 g meat |liharuokaa.
+Q21n|1|g/day hard fat|0.15|hard fat per 100 g meat |kalaruokaa. oletus: 5% kovaa rasvaa, finelin mukaan 2-5%
+Q21n|2|g/day hard fat|0.25|hard fat per 100 g meat |kalaruokaa.
+Q21n|3|g/day hard fat|0.75|hard fat per 100 g meat |kalaruokaa.
+Q21n|4|g/day hard fat|1.5|hard fat per 100 g meat |kalaruokaa.
+Q21n|5|g/day hard fat|5|hard fat per 100 g meat |kalaruokaa.
+Q21n|6|g/day hard fat|10|hard fat per 100 g meat |kalaruokaa.
+</t2b>
+How much mass, energy, and dioxin does one portion contain? Data are guesswork of from [http://www.fineli.fi Fineli].
+<t2b name="Food energy and dioxin" index="Food,Observation" locations="Mass,Energy,Dioxin" unit="g,kJ,pg/portion">
+Kalaa|100|600|7
+Silakkaa|100|792|470
+Petokalaa|100|301|25
+Muikkua|100|750|28
+Sisävesikalaa|100|668|23
+Kirjolohta|100|1067|74
+Itämeren.lohta|100|1067|770
+Muuta.Itämerestä|100|668|15
+Pakastekalaa|100|324|7
+Kalasäilykkeitä|60|600|7
+Valtamerikalaa|100|600|7
+Äyriäisiä|60|200|7
+Leipää|50|406|0.01
+Puuroja|200|642|0.02
+Makaronia|200|846|0.02
+Muutaviljaa|150|600|0.02
+Viiliä|200|334|0.008
+Juustoja|40|300|0.012
+Rasvaisia.juustoja|40|600|0.03
+Jäätelöä|150|1200|0.03
+Liharuokaa|150|1400|1.5
+Maitoa|200|358|0.004
+Piimää|200|358|0.004
+</t2b>
+<t2b name="Portions per month" index="Answer" obs="Interpretation" unit="portions/mo">
+En lainkaan|0.003
+Harvemmin kuin kerran kuukaudessa tai en lainkaan|0.1
+Harvemmin kuin kerran kuukaudessa|0.5
+Kerran tai pari kuukaudessa|1.5
+Kerran viikossa|4
+Pari kertaa viikossa|8
+Lähes joka päivä|20
+Kerran päivässä tai useammin|40
+</t2b>
+=== Analyses ===
+====Simulated data====
+; This code was used to create a csv file that contains a simulated data from this study. When compared with the original data, the simulated data
+* has the same number of observations,
+* has the same range of values in each variable,
+* has approximately the same correlation structure between all variables.
+<rcode>
+library(OpasnetUtils)
+library(MASS)
+library(mc2d)
+library(reshape2)
+library(ggplot2)
+objects.get("isqT7nvhd0ViUR7d")
+data <- objects.decode(etable, password)
+colnames(data) <- t(data[1, ])
+data <- data[2:nrow(data), 2:ncol(data)]
+data2 <- data
+fun <- c(rep("normal", 5), rep("poisson", 12), rep("lognormal", 19))
+params <- list()
+for(i in 1:ncol(data2)) {
+	data2[[i]] <- as.numeric(as.character(data2[[i]]))
+	if(i > 17) data2[[i]] <- ifelse(data2[[i]] == 0, 0.01, data2[[i]])
+	params[i] <- fitdistr(data2[[i]][!is.na(data2[[i]])], fun[i])
+}
+simu <- data.frame(temp = rep(NA, 968))
+for(i in 1:5) {
+	simu[[i]] <- rnorm(968, params[[i]][1], params[[i]][2])
+}
+for(i in 6:17) {
+	simu[[i]] <- rpois(968, params[[i]])
+}
+for(i in 18:36) {
+	simu[[i]] <- rlnorm(968, params[[i]][1], params[[i]][2])
+}
+simu[[3]] <- rbern(968, 0.5) + 1
+colnames(simu) <- colnames(data)
+korre <- cor(x = data2, use = "pairwise.complete.obs", method = "spearman")
+simu <- as.data.frame(cornode(as.matrix(simu), target = korre))
+korre2 <- cor(x = simu, use = "pairwise.complete.obs", method = "spearman")
+qplot(melt(korre)$value, melt(korre2)$value)
+for(i in 1:ncol(simu)) {
+	simu[[i]] <- ifelse(
+		simu[[i]] > max(data[[i]], na.rm = TRUE) |
+		simu[[i]] < min(data[[i]], na.rm = TRUE),
+		NA, simu[[i]]
+	)
+}
+for(i in 1:ncol(data2)) {print(paste(
+	min(data2[[i]], na.rm = TRUE),
+	max(data2[[i]], na.rm = TRUE),
+	min(simu[[i]], na.rm = TRUE),
+	max(simu[[i]], na.rm = TRUE)
+))}
+</rcode>
+==== POPs and obesity ====
+Dioxins and PCBs have been assosiated to type 2 diabetes. Do dioxins cause diabetes, or do diabetes decrease dioxin elimination, or does obesity increase diabetes and decrease dioxin elimination, or something else? We tried to make sense of this by looking at sarcoma study data.
+<rcode label="Code does not work without the data file" embed=1>
+library(ggplot2)
+dat <- re.ad.csv("V:/TUSO/Projects/POPit ja lihavuus/Sarkoomakyselydata/Copy of sarkooma_kysely_ja_dioksiinit_korjattu.csv")
+dat$Diet <- dat$Rasvaa.maitotuotteista + dat$k21liha + dat$k21kala * 8
+hist(dat$Diet)
+dat$Diet3 <- cut(dat$Diet, 3)
+ggplot(dat, aes(x = ika, y = IntakePCDDFTEQ, colour = Diet3)) +
+  geom_point() + geom_smooth()
+ggplot(dat, aes(x = ika, y = PCDDFWHO05TEQ, colour = Diet3)) +
+  geom_point() + geom_smooth()
+</rcode>
+==== Self-reported chemical exposure ====
+We looked at self-reported chemical exposure, especially pesticides and wood preservatives.
+We also looked at the impact of self-reported occupation, recoded into 9 groups. This is best done in the unmatched dataset, but also some analyses were done with the matched dataset. Age was the only clearly significant variable, with sarcoma risk increasing by 8 % per year. Male gender seemed to increase the risk but was not statistically significant. None of the differences between occupation groups were statistically significant, and they did not show a pattern where putatively chemically-exposured groups would have higher risk.
+<rcode label="Code does not work without the data file">
+#################
+# Bring in the hand-made occupation classification
+d <- read.csv("V:/TUSO/Projects/Sarkooma/Analyysit/Kyselykaavaka_ammatti-tyo_edit.csv")
+#colnames(d)
+#[1] "N"                        "ID"                       "Työntekijäryhmä"          "Luokitus..koodi.lopussa."
+#[5] "Alle.5.v.työhistoria"     "Huomattavaa"              "Ammatti"                  "Työpaikka"
+#[9] "Kesto"                    "Työtehtävä"               "AmmattiA"                 "TyöpaikkaA"
+#[13] "KestoA"                   "AmmattiB"                 "TyöpaikkaB"               "KestoB"
+#[17] "AmmattiC"                 "TyöpaikkaC"               "KestoC"                   "AmmattiD"
+#[21] "TyöpaikkaD"               "KestoD"
+lev <- as.character(d[974:982,4])
+d <- d[1:969,c(2,4,5,6)]
+d <- d[d$ID != "" , ] # Remove empty row 883
+colnames(d) <- c("ID", "Tyoluokka", "Alle5v", "Huom.tyo")
+d$Tyoluokka <- factor(d$Tyoluokka, levels = 1:9, labels = lev)
+d$Tyoalt <- ifelse(as.numeric(d$Tyoluokka) %in% c(1,2,9), "Ei",
+                   ifelse(as.numeric(d$Tyoluokka) %in% c(3,8), "Ehkä", "Kyllä"))
+#> levels(d$Tyoluokka)
+#[1] "Opiskelija"             "Sisätyö"                "Hoitoala"               "Maa- ja metsätalous"
+#[5] "Sotilas, palomies ym"   "Teollisuustyö"          "Rakennusala, ulkotyö"   "Kauppa, elintarvikeala"
+#[9] "Työtön tai ei tietoa"
+###################################
+library(lme4)
+# Data from //helfs01.thl.fi/groups2/TUSO/Projects/POPit ja lihavuus/Dioksiinit vs sarkooma/Data.xlsx
+dat <- read.csv("V:/TUSO/Projects/Sarkooma/Analyysit/Data_2.12.2016.csv", encoding = "UTF-8")
+names(dat)
+dat$PCDDFWHO05TEQ <- dat$PCDDFWHO05TEQ / 20 # Scale to a nominal interquartile range (ca. 19.5 pg/g fat, depending on subgroup)
+#Pekan malli clogit-funktiolla
+library("survival")
+# A conditional regression with new occupation classification. Regression method as below.
+#> sum(as.character(d$ID) != as.character(dat$ID))
+#[1] 0
+# Because rows are identically ordered, just cbind the occupation data without redundant ID.
+dat <- cbind(dat, d[-1])
+dat$Tyoluokka <- relevel(dat$Tyoluokka, "Sisätyö")
+dat$Alle5v <- ifelse(dat$Alle5v == "1", "Yes", "No")
+table(dat[c("Tyoluokka","Sarcoma.unmatched","Alle5v")], useNA = "ifany")
+clogit(Sarcoma.unmatched ~ Sex + Age + Tyoluokka + Alle5v, # + PCDDFWHO05TEQ
+       #strata(Sarcoma.matched.pair),
+       method="exact", data = dat
+)
+clogit(Sarcoma.unmatched ~ Sex + Age + Tyoluokka, # + PCDDFWHO05TEQ
+       #strata(Sarcoma.matched.pair),
+       method="exact", data = dat[dat$Alle5v == "No",]
+)
+clogit(Sarcoma.matched ~ Sex + Tyoalt + # Tyoluokka + # + PCDDFWHO05TEQ
+                        strata(Sarcoma.matched.pair),
+                      method="exact", data = dat
+)
+clogit(Sarcoma.matched ~ Tyoluokka + # + PCDDFWHO05TEQ # No sex to avoid too many subgroups
+         strata(Sarcoma.matched.pair),
+       method="exact", data = dat
+)
+# The analysis above does not give reliable results because warning: Loglik converged before variable 1,2,3,4,5,6,7,8
+temp <- list()
+for(i in levels(dat$Tyoluokka)) {
+  dat$Temp <- ifelse(dat$Tyoluokka == i, TRUE, FALSE)
+  print(i)
+  temp2 <- clogit(Sarcoma.matched ~ Sex + Temp + # Tyoluokka + # + PCDDFWHO05TEQ
+           strata(Sarcoma.matched.pair),
+         method="exact", data = dat
+  )
+  print(summary(temp2)$conf.int)
+  temp <- rbind(
+    temp,
+    data.frame(
+      Tyoluokka = i,
+      summary(temp2)$conf.int,
+      Pvalue = summary(temp2)$coefficients[,5]
+    )
+  )
+}
+# The analysis above compares one group of Tyoluokka to all others in the matched data set.
+# A better analysis is below with unmatched analysis.
+temp
+# On the other hand, questionnaire was collected from everyone, so matching can be removed (unlike with dioxins)
+# without altering the design. Let's try what happens without matching.
+clogit(Sarcoma.unmatched ~ Sex + Age + Tyoluokka, # + PCDDFWHO05TEQ
+               #strata(Sarcoma.matched.pair),
+           method="exact", data = dat
+)
+table(dat[c("Tyoluokka", "Sarcoma.matched")])
+table(dat[c("Sarcoma.matched","Sarcoma.unmatched")], useNA = "ifany")
+#exact estimation. Tuottaa saman tuloksen kuin Riikalla.
+# Several different models were run. All included Sex as a confounder.
+# Four pairs of models looked at each chemical risk separately (Analysis: Separate),
+# and dioxin risk in the respective population.
+# Four models looked at each chemical + dioxin in a combined model,
+# adjusting for each other (Analysis: Combined).
+# Finally, one model contained all three chemicals and dioxin in a single model,
+# naturally not containing the combined chemical exposure this time.
+models <- list()
+models[[1]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + Exposure.woodpreservatives +
+                        strata(Sarcoma.matched.pair),
+                      method="exact", data = dat[dat$Inclusion.criteria.woodpr == 1 , ]
+)
+models[[2]] <- clogit(Sarcoma.matched ~ Sex + Exposure.woodpreservatives +
+                        strata(Sarcoma.matched.pair),
+                      method="exact", data = dat[dat$Inclusion.criteria.woodpr == 1 , ]
+)
+models[[3]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ +
+                        strata(Sarcoma.matched.pair),
+                      method="exact", data = dat[dat$Inclusion.criteria.woodpr == 1 , ]
+)
+models[[4]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + Exposure.fungicidesherbicides +
+                        strata(Sarcoma.matched.pair),
+                      method="exact", data = dat[dat$Inclusion.criteria.funher == 1 , ]
+)
+models[[5]] <- clogit(Sarcoma.matched ~ Sex + Exposure.fungicidesherbicides +
+                        strata(Sarcoma.matched.pair),
+                      method="exact", data = dat[dat$Inclusion.criteria.funher == 1 , ]
+)
+models[[6]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ +
+                        strata(Sarcoma.matched.pair),
+                      method="exact", data = dat[dat$Inclusion.criteria.funher == 1 , ]
+)
+models[[7]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + Exposure.insecticides +
+                        strata(Sarcoma.matched.pair),
+                      method="exact", data = dat[dat$Inclusion.criteria.insect == 1 , ]
+)
+models[[8]] <- clogit(Sarcoma.matched ~ Sex + Exposure.insecticides +
+                        strata(Sarcoma.matched.pair),
+                      method="exact", data = dat[dat$Inclusion.criteria.insect == 1 , ]
+)
+models[[9]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ +
+                        strata(Sarcoma.matched.pair),
+                      method="exact", data = dat[dat$Inclusion.criteria.insect == 1 , ]
+)
+models[[10]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + Exposure.any +
+                         strata(Sarcoma.matched.pair),
+                       method="exact", data = dat[dat$Inclusion.criteria.any == 1 , ]
+)
+models[[11]] <- clogit(Sarcoma.matched ~ Sex + Exposure.any +
+                         strata(Sarcoma.matched.pair),
+                       method="exact", data = dat[dat$Inclusion.criteria.any == 1 , ]
+)
+models[[12]] <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ +
+                         strata(Sarcoma.matched.pair),
+                       method="exact", data = dat[dat$Inclusion.criteria.any == 1 , ]
+)
+out <- data.frame()
+for(i in 1:length(models)) {
+  out <- rbind(
+    out,
+    cbind(
+      as.data.frame(summary(models[[i]])$coefficients),
+      as.data.frame(summary(models[[i]])$conf.int)
+    )
+  )
+}
+out$Subpop <- rep(c(
+  "Wood preservatives",
+  "Fungicides, herbicides",
+  "Insecticides",
+  "Any of above"),
+  each = 7
+)
+print(out, digits = 3)
+out <- out[c(2,3,5,7,9,10,12,14,16,17,19,21,23,24,26,28) , c(2, 8, 9, 5, 10)]
+out$Analysis <- rep(c("Combined","Separate"), each = 2, times = 4)
+out <- out[order(out$Subpop, rownames(out), out$Analysis) , ]
+print(out, digits = 3)
+#### Analysis where all chemicals are in a single model.
+mo <- clogit(Sarcoma.matched ~ Sex + PCDDFWHO05TEQ + Exposure.insecticides + Exposure.fungicidesherbicides + Exposure.woodpreservatives +
+               strata(Sarcoma.matched.pair),
+             method="exact", data = dat#[dat$Inclusion.criteria.any == 1 , ]
+)
+summary(mo)
+# Fungicides/herbicides clearly elevate the risk and is statistically significant
+# Woodpreservatives also shows a high risk but is only marginally significant.
+# Insecticides and dioxin are not associated with higher risk.
+# Chemicals are  moderately correlated with each other and somewhat with dioxin
+# as shown by the correlation table below.
+cor(dat[
+  dat$Inclusion.criteria.any == 1 ,
+  c("Exposure.any",
+    "Exposure.fungicidesherbicides",
+    "Exposure.insecticides",
+    "Exposure.woodpreservatives",
+    "PCDDFWHO05TEQ"
+  )],
+  use = "pairwise.complete.obs"
+)
+# Is Baltic herring an independent risk factor for sarcoma?
+# Well, the risk is increased but clearly non-significant (OR 1.388, 95 % CI 0.8063 - 2.389)
+dat$Silakka <- as.numeric(dat$Silakkaa) > 3
+table(dat$Silakkaa, dat$Silakka)
+fit <- clogit(Sarcoma.matched ~ Sex + Silakka +
+                strata(Sarcoma.matched.pair),
+              method="exact", data = dat
+)
+summary(fit)
+# A question was raised why the PCDDWHO05TEQ estimates for wood-preservative group and any-exposure group were identical.
+# The reason can be seen from here:
+table(dat[c(
+  "Inclusion.criteria.woodpr",
+  "Inclusion.criteria.any",
+  "Sarcoma.matched")],
+  exclude = NULL
+)
+# The two groups are practically identical with only two additional controls in the any-exposure group.
+# These two controls do not much change the PCDDFWHO05TEQ impact on sarcoma, and therefore the estimates are the same
+# with precision of three decimals.
+table(dat[c(
+  "Inclusion.criteria.insect",
+  "Inclusion.criteria.any",
+  "Sarcoma.matched")],
+  exclude = NULL
+)
+# However, with insecticides, there are three controls and TWO CASES more in any-exposure group and that does change estimates.
+# With herbicides and fungicides, there are one control and two cases more, also enough to change estimates.
+</rcode>
+==== Correlation of dioxin and fish ====
+How do individual dioxin congeners correlate with individual fish parametres in the questionnaire?
+<rcode label="Code does not work without the data file">
+# This is code Op_en2721/ on page [[KTL Sarcoma study]]
+library(lme4)
+library(Hmisc)
+library(MASS)
+library(ggplot2)
+library(rjags)
+# Data from //helfs01.thl.fi/groups2/TUSO/Projects/POPit ja lihavuus/Dioksiinit vs sarkooma/Data.xlsx
+dat <- read.csv("V:/TUSO/Projects/Sarkooma/Analyysit/Data_2.12.2016.csv", encoding = "UTF-8")
+names(dat)
+kalat <- colnames(dat)[c(29, 27, 79:89)] # All: 27, 29, 79:89
+dioksiinit <- colnames(dat)[c(310:326, 365)] # All: 310:368)
+dat[1:20,kalat]
+dat[1:20,dioksiinit]
+colnames(dat)
+unique(unlist(lapply(dat[kalat], FUN = levels)))
+inn <- c(
+  "",
+  "En lainkaan",
+  "Harvemmin kuin kerran kuukaudessa tai en lainkaan",
+  "Harvemmin kuin kerran kuukaudessa",
+  "Kerran tai pari kuukaudessa",
+  "Kerran viikossa",
+  "Pari kertaa viikossa",
+  "Lähes joka päivä",
+  "Kerran päivässä tai useammin"
+)
+# Datat peräisin V:\TUSO\Projects\POPit ja lihavuus\Excel-mallit\Concentration modeling.xlsx
+# paitsi doses on näppituntuma
+# Meals per week
+doses <- c(NA, 0, 0.1, 0.15, 0.3, 1, 2, 5, 9)
+# Congener half-life in years
+t1.2 <- c(7.2, 11.2, 9.8, 13.1, 5.1, 4.9, 6.7, 2.1, 3.5,
+.0, 6.4, 7.2, 7.2, 2.8, 3.1, 4.6, 1.4, 7)
+# WHO2005 TEF
+TEF <- c(1, 1, 0.1, 0.1, 0.1, 0.01, 0.0003, 0.1, 0.03,
+.3, 0.1, 0.1, 0.1, 0.1, 0.01, 0.01, 0.0003, 1)
+dat2 <- dat[c(kalat, dioksiinit, "Age")]
+# Convert fish intake answers to units meals/week
+dat2[kalat][-1] <- lapply(dat2[kalat][-1], FUN = function(x) doses[match(x, inn)])
+# Convert dioxin concentrations to TEQs
+dat2[dioksiinit] <- lapply(as.list(1:length(TEF)), FUN = function(x) TEF[x] * dat2[dioksiinit][[x]])
+cor(x = dat2[dioksiinit], y = dat2[kalat], use = "pairwise.complete.obs")
+dat3 <- dat2[!is.na(rowSums(dat2[c(kalat, dioksiinit)])) , ]
+dat3.diox <- resid(lm(cbind(
+  PCDDFWHO05TEQ,
+  X2378.TCDD,
+  X12378.PD,
+  X123478.HD,
+  X123678.HD,
+  X123789.HD,
+  X1234678.D,
+  OCDD,
+  X2378.TCDF,
+  X12378.PF,
+  X23478.PF,
+  X123478.HF,
+  X123678.HF,
+  X123789.HF,
+  X234678.HF,
+  X1234678.F,
+  X1234789.F,
+  OCDF
+) ~ Age, dat3))
+correlations <- round(cor(dat3.diox, dat3[kalat]), 3)
+pvalues <- round(rcorr(dat3.diox, as.matrix(dat3[kalat]))$P, 3)[1:18, -(1:18)]
+fit <- lm(
+  paste("cbind(", paste(dioksiinit, collapse = ","),
+        ") ~ ", paste(c(kalat, "Age"), collapse = " + "),
+        collapse = ""),
+  data = dat3
+)
+summary(fit)
+out <- data.frame()
+################# Explain each congener with all fish variables + Age
+for(i in 1:length(dioksiinit)) {
+  fit <- lm(paste(dioksiinit[[i]], "~", paste(kalat, collapse = " + "), "+ Age"), dat3)
+  fit <- summary(stepAIC(fit, direction="both"))
+  out <- rbind(
+    out,
+    data.frame(
+      fit[[4]],
+      Var = rownames(fit[[4]]),
+      adj.r.squared = fit[[9]],
+      Congener = dioksiinit[[i]],
+      Halflife = t1.2[[i]],
+      Test = "With age"
+    )
+  )
+}
+############# Age is removed from the models to see the explanatory power of fish variables alone
+for(i in 1:length(dioksiinit)) {
+  fit <- lm(paste(dioksiinit[[i]], "~", paste(kalat, collapse = " + ")), dat3)
+  fit <- summary(stepAIC(fit, direction="both"))
+  out <- rbind(
+    out,
+    data.frame(
+      fit[[4]],
+      Var = rownames(fit[[4]]),
+      adj.r.squared = fit[[9]],
+      Congener = dioksiinit[[i]],
+      Halflife = t1.2[[i]],
+      Test = "Without age"
+    )
+  )
+}
+################# Explain each congener with SINGLE fish variables + Age
+for(i in 1:length(dioksiinit)) {
+  fit <- lm(paste(dioksiinit[[i]], "~", paste(kalat[c(-1, -2)], collapse = " + "), "+ Age"), dat3)
+  fit <- summary(stepAIC(fit, direction="both"))
+  out <- rbind(
+    out,
+    data.frame(
+      fit[[4]],
+      Var = rownames(fit[[4]]),
+      adj.r.squared = fit[[9]],
+      Congener = dioksiinit[[i]],
+      Halflife = t1.2[[i]],
+      Test = "Without generic fish variables, with age"
+    )
+  )
+}
+oprint(out)
+write.csv(out, "V:/TUSO/Projects/Sarkooma/lineaariregressiot.csv")
+colnames(out)
+head(out)
+temp <- out[out$Var != "(Intercept)", ]
+ggplot(temp, aes(x = Var, y = Estimate, colour = Congener, size = temp$Pr...t. < 0.05)) + geom_point()+
+  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))+
+  facet_wrap(~ Test)
+################# Bayesian approach
+kal <- kalat[-1]
+datb <- dat[c(kal)]#, dioksiinit, "Age")]
+#x <- datb[[1]]
+datb[kal] <- lapply(
+  datb[kal],
+  FUN = function(x) {
+    factor(
+      x,
+      levels = if(inn[3] %in% x) inn[c(3,5:9)] else inn[c(2,4:8)],
+      ordered = TRUE
+    )
+  }
+)
+test <- as.data.frame(lapply(datb, FUN = is.na))
+datb <- datb[rowSums(test) < 8 , ]
+datb <- as.data.frame(lapply(datb, FUN = function(x) as.numeric(x) -1))
+#table(datb[1:2])
+datblong <- melt(datb, measure.vars = 1:12)
+ggplot(datblong, aes(x = value, weight = 1))+geom_bar()+facet_wrap(~ variable)
+correlations <- cor(datb, use = "pairwise.complete.obs")
+melt(correlations, measure.vars = 1:12)
+pvalues <- round(rcorr(as.matrix(datb))$P, 3)
+mod <- textConnection("
+  model{
+  for(j in 1:12) {
+    for(i in 1:N) {
+      datbb[i , j] ~ dbin(p[j], 6) #Six alternatives in each question
+    }
+    p[j] ~ dunif(0,1)
+  }
+}
+")
+# A binomial distribution is assumed for bins of answer choices.
+jags <- jags.model(
+  mod,
+  data = list(
+    N = length(datb),
+    datbb = datb
+  ),
+  n.chains = 4,
+  n.adapt = 100
+)
+update(jags, 1000)
+samps <- jags.samples(jags, 'p', 1000)
+samps.coda <- coda.samples(jags, 'p', 1000)
+plot(samps.coda[[1]])
+head(samps.coda)
+summary(samps.coda)
+hist(cor(samps.coda[[1]])[cor(samps.coda[[1]]) != 1])
+### In practice, all correlations between -0.05 and 0.05 -> meaningless
+# Important corralations can be (and have been above) calculated directly from data.
+probs <- colSums(samps.coda[[1]]) / nrow(samps.coda[[1]])
+gramm <- data.frame(
+  Fish = kal,
+  P = probs,
+  Answer = rep(0:6, each = length(kal)),
+  Freq = dbinom(rep(0:6, each = length(kal)), 6, rep(probs, 7))
+)
+ggplot(gramm, aes(x = Answer, weight = Freq))+geom_bar()+facet_wrap(~ Fish)
+</rcode>
+These estimates are based on the code above.
+<t2b name="Binomial distribution parameter" index="Fish" obs="Parameter" unit="probability">
+Kalaa|0.23179078
+Petokalaa|0.17746642
+Muikkua|0.14939457
+Sisävesikalaa|0.09493785
+Kirjolohta|0.29775961
+Silakkaa|0.2152247
+Itämeren.lohta|0.08175282
+Muuta.Itämerestä|0.0282421
+Pakastekalaa|0.21510166
+Kalasäilykkeitä|0.21598007
+Valtamerikalaa|0.1078198
+Äyriäisiä|0.16021416
+</t2b>
+'''Correlation coefficients between fish dishes
+{{hidden|
+<t2b name="Correlation coefficients between fishes" index="Food1,Food2" obs="Coefficient" unit="correlation">
+Kalaa|Kalaa|1
+Petokalaa|Kalaa|0.37831574
+Muikkua|Kalaa|0.286657251
+Sisävesikalaa|Kalaa|0.365112639
+Kirjolohta|Kalaa|0.530184225
+Silakkaa|Kalaa|0.425923276
+Itämeren.lohta|Kalaa|0.305152183
+Muuta.Itämerestä|Kalaa|0.2111844
+Pakastekalaa|Kalaa|0.207057983
+Kalasäilykkeitä|Kalaa|0.21677219
+Valtamerikalaa|Kalaa|0.273916098
+Äyriäisiä|Kalaa|0.143764271
+Kalaa|Petokalaa|0.37831574
+Petokalaa|Petokalaa|1
+Muikkua|Petokalaa|0.422402864
+Sisävesikalaa|Petokalaa|0.499609578
+Kirjolohta|Petokalaa|0.205863144
+Silakkaa|Petokalaa|0.202035751
+Itämeren.lohta|Petokalaa|0.079332036
+Muuta.Itämerestä|Petokalaa|0.10229554
+Pakastekalaa|Petokalaa|-0.055693699
+Kalasäilykkeitä|Petokalaa|0.049204262
+Valtamerikalaa|Petokalaa|0.080707577
+Äyriäisiä|Petokalaa|0.082444392
+Kalaa|Muikkua|0.286657251
+Petokalaa|Muikkua|0.422402864
+Muikkua|Muikkua|1
+Sisävesikalaa|Muikkua|0.394067217
+Kirjolohta|Muikkua|0.227258746
+Silakkaa|Muikkua|0.24510588
+Itämeren.lohta|Muikkua|0.104823821
+Muuta.Itämerestä|Muikkua|0.127881897
+Pakastekalaa|Muikkua|-0.006076891
+Kalasäilykkeitä|Muikkua|0.065331013
+Valtamerikalaa|Muikkua|0.157485773
+Äyriäisiä|Muikkua|0.072256144
+Kalaa|Sisävesikalaa|0.365112639
+Petokalaa|Sisävesikalaa|0.499609578
+Muikkua|Sisävesikalaa|0.394067217
+Sisävesikalaa|Sisävesikalaa|1
+Kirjolohta|Sisävesikalaa|0.231100033
+Silakkaa|Sisävesikalaa|0.219228101
+Itämeren.lohta|Sisävesikalaa|0.136672786
+Muuta.Itämerestä|Sisävesikalaa|0.105376927
+Pakastekalaa|Sisävesikalaa|-0.013742777
+Kalasäilykkeitä|Sisävesikalaa|0.106690375
+Valtamerikalaa|Sisävesikalaa|0.188327229
+Äyriäisiä|Sisävesikalaa|0.114591507
+Kalaa|Kirjolohta|0.530184225
+Petokalaa|Kirjolohta|0.205863144
+Muikkua|Kirjolohta|0.227258746
+Sisävesikalaa|Kirjolohta|0.231100033
+Kirjolohta|Kirjolohta|1
+Silakkaa|Kirjolohta|0.38967992
+Itämeren.lohta|Kirjolohta|0.282808883
+Muuta.Itämerestä|Kirjolohta|0.111980079
+Pakastekalaa|Kirjolohta|0.166699904
+Kalasäilykkeitä|Kirjolohta|0.267376587
+Valtamerikalaa|Kirjolohta|0.304243078
+Äyriäisiä|Kirjolohta|0.154695247
+Kalaa|Silakkaa|0.425923276
+Petokalaa|Silakkaa|0.202035751
+Muikkua|Silakkaa|0.24510588
+Sisävesikalaa|Silakkaa|0.219228101
+Kirjolohta|Silakkaa|0.38967992
+Silakkaa|Silakkaa|1
+Itämeren.lohta|Silakkaa|0.303378534
+Muuta.Itämerestä|Silakkaa|0.317929228
+Pakastekalaa|Silakkaa|0.106271923
+Kalasäilykkeitä|Silakkaa|0.202491349
+Valtamerikalaa|Silakkaa|0.27027233
+Äyriäisiä|Silakkaa|0.17446606
+Kalaa|Itämeren.lohta|0.305152183
+Petokalaa|Itämeren.lohta|0.079332036
+Muikkua|Itämeren.lohta|0.104823821
+Sisävesikalaa|Itämeren.lohta|0.136672786
+Kirjolohta|Itämeren.lohta|0.282808883
+Silakkaa|Itämeren.lohta|0.303378534
+Itämeren.lohta|Itämeren.lohta|1
+Muuta.Itämerestä|Itämeren.lohta|0.589928968
+Pakastekalaa|Itämeren.lohta|0.083123081
+Kalasäilykkeitä|Itämeren.lohta|0.167824298
+Valtamerikalaa|Itämeren.lohta|0.415932209
+Äyriäisiä|Itämeren.lohta|0.312466845
+Kalaa|Muuta.Itämerestä|0.2111844
+Petokalaa|Muuta.Itämerestä|0.10229554
+Muikkua|Muuta.Itämerestä|0.127881897
+Sisävesikalaa|Muuta.Itämerestä|0.105376927
+Kirjolohta|Muuta.Itämerestä|0.111980079
+Silakkaa|Muuta.Itämerestä|0.317929228
+Itämeren.lohta|Muuta.Itämerestä|0.589928968
+Muuta.Itämerestä|Muuta.Itämerestä|1
+Pakastekalaa|Muuta.Itämerestä|0.061080508
+Kalasäilykkeitä|Muuta.Itämerestä|0.164729369
+Valtamerikalaa|Muuta.Itämerestä|0.381394106
+Äyriäisiä|Muuta.Itämerestä|0.295389062
+Kalaa|Pakastekalaa|0.207057983
+Petokalaa|Pakastekalaa|-0.055693699
+Muikkua|Pakastekalaa|-0.006076891
+Sisävesikalaa|Pakastekalaa|-0.013742777
+Kirjolohta|Pakastekalaa|0.166699904
+Silakkaa|Pakastekalaa|0.106271923
+Itämeren.lohta|Pakastekalaa|0.083123081
+Muuta.Itämerestä|Pakastekalaa|0.061080508
+Pakastekalaa|Pakastekalaa|1
+Kalasäilykkeitä|Pakastekalaa|0.345409216
+Valtamerikalaa|Pakastekalaa|0.129614914
+Äyriäisiä|Pakastekalaa|0.125383855
+Kalaa|Kalasäilykkeitä|0.21677219
+Petokalaa|Kalasäilykkeitä|0.049204262
+Muikkua|Kalasäilykkeitä|0.065331013
+Sisävesikalaa|Kalasäilykkeitä|0.106690375
+Kirjolohta|Kalasäilykkeitä|0.267376587
+Silakkaa|Kalasäilykkeitä|0.202491349
+Itämeren.lohta|Kalasäilykkeitä|0.167824298
+Muuta.Itämerestä|Kalasäilykkeitä|0.164729369
+Pakastekalaa|Kalasäilykkeitä|0.345409216
+Kalasäilykkeitä|Kalasäilykkeitä|1
+Valtamerikalaa|Kalasäilykkeitä|0.212919187
+Äyriäisiä|Kalasäilykkeitä|0.296174683
+Kalaa|Valtamerikalaa|0.273916098
+Petokalaa|Valtamerikalaa|0.080707577
+Muikkua|Valtamerikalaa|0.157485773
+Sisävesikalaa|Valtamerikalaa|0.188327229
+Kirjolohta|Valtamerikalaa|0.304243078
+Silakkaa|Valtamerikalaa|0.27027233
+Itämeren.lohta|Valtamerikalaa|0.415932209
+Muuta.Itämerestä|Valtamerikalaa|0.381394106
+Pakastekalaa|Valtamerikalaa|0.129614914
+Kalasäilykkeitä|Valtamerikalaa|0.212919187
+Valtamerikalaa|Valtamerikalaa|1
+Äyriäisiä|Valtamerikalaa|0.272081755
+Kalaa|Äyriäisiä|0.143764271
+Petokalaa|Äyriäisiä|0.082444392
+Muikkua|Äyriäisiä|0.072256144
+Sisävesikalaa|Äyriäisiä|0.114591507
+Kirjolohta|Äyriäisiä|0.154695247
+Silakkaa|Äyriäisiä|0.17446606
+Itämeren.lohta|Äyriäisiä|0.312466845
+Muuta.Itämerestä|Äyriäisiä|0.295389062
+Pakastekalaa|Äyriäisiä|0.125383855
+Kalasäilykkeitä|Äyriäisiä|0.296174683
+Valtamerikalaa|Äyriäisiä|0.272081755
+Äyriäisiä|Äyriäisiä|1
+</t2b>
+}}
+==== EU kalat ====
+* The code that used to be here was moved to [[EU-kalat#Calculations]].
+* What updates should be done:
+** Plot iterations to see that the model results do not drift.
+** Take modelled parameters and develop a MC model to produce predicted concentrations.
+*** TCDD concentration should be added to the hierearchical Bayes model for this?
+** [[KTL Sarcoma study]], [[EU-kalat]] and [[Goherr: Fish consumption study]] should all be combined into one model. {{comment|# |Can models be combined as text with paste()? This could work if all submodels had unique parameter names like N.eu and N.goh rather than just N. And data lists are merged simply with c().|--[[User:Jouni|Jouni]] ([[User talk:Jouni|talk]]) 16:22, 22 January 2017 (UTC)}}
+** A causal diagram should be drawn to show the model structure.
+* JAGS user manual [http://www.stats.ox.ac.uk/~nicholls/MScMCMC15/jags_user_manual.pdf] (with e.g. distribution names and other guidance)
+* How to generate predictions in JAGS [http://stats.stackexchange.com/questions/29932/how-to-generate-predictions-with-rjags]
+* Using rjags, a simple guidance [http://www.johnmyleswhite.com/notebook/2010/08/20/using-jags-in-r-with-the-rjags-package/]
+Related:
+* Easily generate correlated variables from any distribution (without copulas) [https://www.r-bloggers.com/easily-generate-correlated-variables-from-any-distribution-without-copulas/]
+==== Concentration-age graph with THL formatting ====
+<rcode label="Run on own computer">
+# This is code Op_en2721/ on page [[KTL Sarcoma study]]
+library(ggplot2)
+#library(thlGraphs)
+thlPointPlot <- function (data, xvar, yvar, groupvar = NULL, ylabel = yvar,
+                          xlabel = NULL, colors = thlColors(n = 12, type = "quali", name = "line"),
+                          title = NULL, subtitle = NULL, caption = NULL,
+                          legend.position = "none", base.size = 16, linewidth = 3,
+                          show.grid.x = FALSE, show.grid.y = TRUE, lang = "fi", ylimits = NULL,
+                          marked.treshold = 10, plot.missing = FALSE, xaxis.breaks = waiver(),
+                          yaxis.breaks = waiver(), panels = FALSE, nrow.panels = 1,
+                          labels.end = FALSE)
+{
+  lwd <- thlPtsConvert(linewidth)
+  gg <- ggplot(
+    data,
+    aes_(x = substitute(xvar),
+         y = substitute(yvar),
+         group = ifelse(!is.null(substitute(groupvar)), substitute(groupvar), NA),
+         colour = ifelse(!is.null(substitute(groupvar)), substitute(groupvar), ""))
+  ) # + geom_line(size = lwd) #!!!!!!!!!!!!!!!!!!!!
+  if (isTRUE(plot.missing)) {
+    df <- thlNaLines(
+      data = data, xvar = deparse(substitute(xvar)),
+      yvar = deparse(substitute(yvar)),
+      groupvar = unlist(ifelse(deparse(substitute(groupvar)) != "NULL", deparse(substitute(groupvar)), list(NULL)))
+    )
+    if (!is.null(df) & FALSE) { ##!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+      gg <- gg + geom_line(
+        data = df, aes_(
+          x = substitute(xvar),
+          y = substitute(yvar),
+          group = ifelse(!is.null(substitute(groupvar)), substitute(groupvar), NA),
+          colour = ifelse(!is.null(substitute(groupvar)), substitute(groupvar), "")
+        ),
+        linetype = 2,
+        size = lwd
+      )
+    }
+  }
+  if (!is.null(marked.treshold)) {
+    if (length(unique(data[, deparse(substitute(xvar))])) > marked.treshold) {
+      if (is.factor(data[, deparse(substitute(xvar))]) ||
+          is.character(data[, deparse(substitute(xvar))]) ||
+          is.logical(data[, deparse(substitute(xvar))])) {
+        levs <- levels(factor(data[, deparse(substitute(xvar))]))
+        min <- levs[1]
+        max <- levs[length(levs)]
+      } else {
+        min <- min(data[, deparse(substitute(xvar))])
+        max <- max(data[, deparse(substitute(xvar))])
+      }
+      subdata <- data[c(data[, deparse(substitute(xvar))] %in% c(min, max)), ]
+      gg <- gg + geom_point(
+        data = subdata,
+        aes_(
+          x = substitute(xvar),
+          y = substitute(yvar),
+          group = ifelse(!is.null(substitute(groupvar)), substitute(groupvar), NA),
+          colour = ifelse(!is.null(substitute(groupvar)), substitute(groupvar), "")
+        ), stroke = 1.35 * lwd, fill = "white", shape = 21, size = 10/3 * lwd
+      )
+    } else {
+      gg <- gg + geom_point(stroke = 1.35 * lwd, fill = "white", size = 10/3 * lwd, shape = 21)
+    }
+  }
+  if (isTRUE(labels.end)) {
+    if (is.factor(data[, deparse(substitute(xvar))]) ||
+        is.character(data[, deparse(substitute(xvar))]) ||
+        is.logical(data[, deparse(substitute(xvar))])) {
+      levs <- levels(factor(data[, deparse(substitute(xvar))]))
+      maxd <- data[data[, deparse(substitute(xvar))] == levs[length(levs)], ]
+    } else {
+      maxd <- data[data[, deparse(substitute(xvar))] == max(data[, deparse(substitute(xvar))]), ]
+    }
+    brks <- maxd[, deparse(substitute(yvar))]
+    labsut <- maxd[, deparse(substitute(groupvar))]
+  } else (brks <- labsut <- waiver())
+  gg <- gg + ylab(ifelse(deparse(substitute(ylabel)) == "yvar", deparse(substitute(yvar)), ylabel)) +
+    labs(title = title, subtitle = subtitle, caption = caption) +
+    thlTheme(
+      show.grid.y = show.grid.y,
+      show.grid.x = show.grid.x,
+      base.size = base.size,
+      legend.position = legend.position,
+      x.axis.title = ifelse(!is.null(xlabel), TRUE, FALSE)
+    ) +
+    xlab(ifelse(!is.null(xlabel), xlabel, "")) +
+    scale_color_manual(values = colors) +
+    thlYaxisControl(
+      lang = lang,
+      limits = ylimits,
+      breaks = yaxis.breaks,
+      sec.axis = labels.end,
+      sec.axis.breaks = brks,
+      sec.axis.labels = labsut
+    )
+  if (is.factor(data[, deparse(substitute(xvar))]) ||
+      is.character(data[, deparse(substitute(xvar))]) ||
+      is.logical(data[, deparse(substitute(xvar))])) {
+    gg <- gg + scale_x_discrete(breaks = xaxis.breaks, expand = expand_scale(mult = c(0.05)))
+  } else (gg <- gg + scale_x_continuous(breaks = xaxis.breaks))
+  if (isTRUE(panels)) {
+    fmla <- as.formula(paste0("~", substitute(groupvar)))
+    gg <- gg + facet_wrap(fmla, scales = "free", nrow = nrow.panels)
+  }
+  gg
+}
+# Nro;Alue;SP;Alue;Ik„ (a);TEQ;TapVer;Tequart;Ik„luokka;Altistus;Valittu tapaus;Stratum2;Valittuja verrokkeja;Tapauksen ik„;;;L”ytynyt tapaus;L”ytyneiden m„„r„;Hakuprosessi: 1) Varmista, ett„ sarake Valittu tapaus on tyhj„. 2) Anna valittujen m„„r„ksi 0 ja ik„kriteetiksi tiukin k„ytetty. 3). Laske. Filter”i m„„r„ 1:t ja merkitse l”ytynyt tapauksen tunnus sarakkeeseen Valittu tapaus. 4) Laske. Filter”i L”ytyneiden m„„r„t 2, 3, jne ja valitse tapaus oikealle verrokille. 5) L”ys„„ ik„kriteeri„ jos on tarpeen ja toista 3)-4). 6) Anna valittujen m„„r„ksi 1, 2, 3 jne ja toista 2) - 5).
+# Z = helfs01.thl.fi/documents/
+sarc <- read.csv("Z:/YMAL_arc/CEHRA_Archived2018/Tutkimus/_until2004/R16_sarkooma/Analyysit/Analyysi/Lopulliset4.csv",
+                 skip=2, sep=";", dec=",", header=FALSE)
+sar <- sarc[c(1:3, 5:460),c(2,3,5,6,7)]
+colnames(sar) <- c("Region","Gender","Age","TEQ","Case")
+sar$Gender <- factor(sar$Gender, labels=c("Male","Female"))
+sar$Case <- factor(sar$Case, labels=c("Case","Control"))
+ggplot(sar, aes(x=Age, y=TEQ, colour=Gender))+geom_point()
+thlPointPlot(sar, xvar=Age, yvar=TEQ, groupvar=Gender, marked.treshold = 1000,
+             legend.position = "bottom",
+             xlabel="Age", ylabel="", base.size=30,
+             title="Dioxin concentration by age",
+             subtitle="(pg/g TEQ in fat)")+
+  geom_vline(xintercept=0, width=1.5)
+ggsave("Dioxin concentration.png", width=11, height=8)
+</rcode>
 ==See also==
 * [http://clinicaltrials.gov/ct2/show/study/NCT00611078 Clinicaltrials: Another clinical study on sarcoma and pollutants (incl. dioxin)]
+== Related files ==
+* {{#l:KTL_sarcoma_questionnaire_finnish.odt}}
+* {{#l:KTL_sarcoma_questionnaire_swedish.odt}}
 ==References==
-<references/><!-- __OBI_TS:1333052480 -->
+<references/>