Methods for data integration
- The text on this page is taken from an equivalent page of the IEHIAS-project.
The purpose of link data from multiple monitoring sources is to see if there is any added value can be identified. In INTARESE, the aim of link environment and health data is to better understand the cause and severity of how different environmental hazards and exposures impact human health.
There is no best method to do this yet. All methods that can be used to link data from multiple sources are research goal oriented.
Even in a world of rapid information access, synthesizing vital scientific knowledge and evidence about EH problems, and solutions, into understandable concepts remains a formidable challenge. However, a range of scientific tools exist to facilitate such synthesis including:
- Quantifying the environmental health impacts
- Environmental health mapping
- Environmental health indicators
Scope
Purpose
The purpose of link data from multiple monitoring sources is to see if there is any added value can be identified. In INTARESE, the aim of link environment and health data is to better understand the cause and severity of how different environmental hazards and exposures impact human health.
There is no best method to do this yet. All methods that can be used to link data from multiple sources are research goal oriented.
Even in a world of rapid information access, synthesizing vital scientific knowledge and evidence about EH problems, and solutions, into understandable concepts remains a formidable challenge. However, a range of scientific tools exist to facilitate such synthesis including:
- Quantifying the environmental health impacts
- Environmental health mapping
- Environmental health indicators
Boundaries
As we all know, there is no perfect methods exist to link data from multiple sources. Each of methods has its own boundaries. Here is one example for the GIS which has been started to used broadly in linking health and environmental data from different sources. The limitations of by using GIS are (Malkawi, [1]):
- A map is primarily a means of display; it cannot predict the patterns of distribution or relationships between resources. The map does not infer a causal relationship, it merely points out that there are some spatial coincidences that are worth exploring, to see if a causal relationship exists. Likewise, to show how changes in one resource may impact distribution of another resource, the relationship must be known and put into the model creating the map.
- Another limitation of spatially-referenced environmental information is that access is often limited. For example, data may be available for only a portion of the required area or for the whole area but taken from two or more different sampling exercises which may have used different sampling methodologies, scales, or accuracy levels.
- Connected to this are costs involved in generating maps, and in printing, disseminating, and updating them. This requires specialized hardware and software, trained personnel, and often expensive and time-consuming means of acquiring, checking, interpreting, and inputting information.
- Furthermore, the technology is rapidly advancing, and thus new applications and training courses are required on an almost annual basis.
- Finally, not all people can readily relate to information in a two-dimensional spatial format, especially if the map is of an unfamiliar area or is presented in an unusual projection. Furthermore, different cultures place different importance or meaning on symbols and colors. For example, western cultures may use the color red to symbolize danger or an area where conditions are bad, but in China this color would symbolize luck or a favorable area.
Method description
Input
The most commonly mapped environmental information of relevance to the health sector includes:
- pollution sources and affected areas (including sewage, solid waste, hazardous waste, industrial pollution, smoke and other emissions, and radiation);
- land cover and use (including vegetation type, vegetation change and condition, agriculture, forestry, and soil type and condition);
- water availability and quality;
- energy sources and use (including fossil fuel use, electrical connectivity, biomass use, and renewable energy sources); and
- biological resources (including protected areas and recreational sites, endangered species, and medicinal resources).
Output
In general, by linking enviornmental health data, we can identify the health burden of environmental hazard/ecosystem degradation, and furthermore, the cost of damage to health and quality of life due to environmental degratation, for example, environmental burden of difference diseases, death from difference diseases, death from difference pollutants, etc.
Rationale
Here, we choose environmental health mapping as one example. The reasons are:
- One inherent characteristic of both environmental and health data is that they have a location component. This characteristic makes Geographic Information Systems (GIS) an ideal and sometimes indispensable tool for analyzing environmental health data.
According to Malkawi, mapping techniques by using GIS can be used in two main ways to show the links between environment and health:
- Simple overlays (comparisons) of environmental and socioeconomic (health) data can be used to identify patterns, which can then be investigated later for correlations.
- Once the causal relationship is known, however, spatial models can also be developed to predict changes in health based on environmental changes.
- EH is multi disciplinary GIS handles multi layers
Methods for data integration
Methods for link exposure and dose data
In general, there are two types of models can be used to link exposure and dose data. First, Physiologically Based Pharmacokinetic (PBPK) models are powerful computational tools that can be used to link exposure to the internal concentrations of parent compounds and/or active metabolites at the target site(s) of toxicity (http://cfpub.epa.gov).
Second, Biologically Based Pharmacokinetic (BBPK) models are being increasingly used in the risk assessment of environmental chemicals. These models are based on biological, mathematical, statistical and engineering principles. Their potential uses in risk assessment include extrapolation between individuals, species, doses and routes of exposures (http://cfpub.epa.gov).
In addition, other tools on hazard identification and exposure assessment can also be used to link exposure and dose data.
Methods for link dose and health effect data
There are many tools to link dose and health effect, e.g. tools on spatial statistics, tools on time-activity patterns, tools on EPHT (Environmental Public Health Tracking, http://www.cdc.gov), tools on dose-response assessment (DistGEN, GEN.T, http://www.foodrisk.org/resource_types/tools/dose_response.cfm) and risk characterization, etc.
DCAL (Dose and Risk Calculation software) is a comprehensive software system for the calculation of tissue dose and subsequent health risk from intakes of certain pollutant or exposure to specific pollutant present in environmental media (http://www.wise-uranium.org/rdr.html).
Methods for link exposure, dose and health effect data
Geographical information systems (GIS)
Geographical information systems (GIS) are “automated systems for the capture, storage, retrieval, analysis, and display of spatially referenced data” (Clarke et al., 1996; Higgs and Gould, 2001). GIS can relate otherwise disparate issues on the basis of common geography, revealing hidden patterns, relationships, and trends that are not readily apparent in spreadsheets or statistical packages, often creating new information from existing data resources. This feature implies, in E & H fields, GIS is a useful instrument to link the indicators from environmental monitoring, biomonitoring and health monitoring by a visual presentation. These might be represented as several different layers where each layer holds data about a particular kind of feature. Each feature is linked to a position on the graphical image on a map and a record in an attributed table. Apart from, for example, simply plotting environmental monitoring data or morbidity/mortality information on a map, GIS also offers important opportunities for inter- or extrapolation of data, for a geographical representation of monitoring or modeling data, and for the visualization of overlaps between different layers of information (Smolders et al., 2008).
GIS mapping techniques can be used in two main ways to show the links between environment and health: (i) simple overlays (comparisons) of environmental monitoring, biomonitoring and socioeconomic (health) data can be used to identify patterns, which can then be investigated later for correlations; and (ii) once the causal relationship between environment and health is known, however, spatial models can also be developed to predict changes in health based on environmental changes.
GIS application will be the cornerstone of an integrated monitoring system. Its spatial application techniques will be the best options to provide effective linkage and integration among exposure-dose-response (Smolders et al., 2008). The use of GIS techniques in integrated data from different monitoring programs will be determinately considered and enhanced further in the next step case studies.
Multiple Lines and Levels of Evidence (MLLE)
Multiple Lines and Levels of Evidence (MLLE) were originally developed for epidemiological studies when it was difficult to assign causality. It was first proposed by Hill (1965) in the medical field and has since been used in human and ecological risk assessments (Culp et al., 2000; Fairbrother, 2003). It is now being adapted by NRM (Natural Resource Management) (Adams, 2003; Young et al., 2006). At present, MLLE method is broadly used in research to explore cause and affect relationships (Norris et al., 2005).
Bayesian Belief Networks (BBN)
A Bayesian Belief Networks (BBN) is a probabilistic graphical model that represents a set of random variables and their conditional independencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases (Bayesian network in Wikipedia).
BBNs perform just such a function, providing a rational method for the integration of the best possible data from a variety of sources (Wooldridge and Done, 2003). A BBN can also incorporate prior knowledge in order to more accurately model a complex system, which may be difficult when using other techniques (Pollino, 2005).
Advanced statistical models
There are many statistical models used in different monitoring programs. Because of the need to integrate monitoring indicators in integrated monitoring programs, the use of multivariate statistical models (e.g. connecting information from different sources through models) need to be considered and developed.
Techniques for assessing uncertainty
There are a large numbers of sources of uncertainties in integrated monitoring in E & H fields, e.g. inaccuracies in observations or insufficient numbers of observations, missing components or errors in the data, random sampling error and biases (non-representativeness) in a sample, etc. All types of uncertainty require to be handled by adequate analytical techniques. A more systematic and structured approach for uncertainty analysis should be recommended.
Techniques for quality assurance and quality control
Quality assurance (QA), quality control (QC) and standard operating procedures (SOP) are separate components of an integrated monitoring program that work together to provide data of known quality. Together they minimize and quantify the errors that are introduced in sampling and allow tracking of errors that might occur. One of the most important aspects of quality assurance in a monitoring program is the development of a quality assurance plan, which should identify in a clear way the quality of the data needed and describe in detail the planned actions to provide confidence so that the program will meet its stated objectives (Shampine, 1993). These should be done with all stakeholders and for each objective. Quality control data, which allow for the quality and suitability of the environmental and health data to be evaluated and verified should be collected and utilized as an integral part of the QA effort associated with a monitoring programs (Shampine, 1993). QA/QC should address the data quality, the data type, quality should be consistent and comparable, and the data should be available and accessible.
References
- Adams, SM. 2003. Establishing causality between environmental stressors and effects on aquatic ecosystems. Human and ecological risk assessments. 9(1): 17–35.
- Clarke, K. C., McLafferty, S. L., Tempalski, B. J. 1996. On epidemiology and geographic information systems: A review and discussion of future directions. Emerg. Infect. Dis. 3:85–92.
- Culp, J.M., Lowell, R.B., Cash, K.J. 2000. Integrating mecosm experiments with field and laboratory studies to generate weight-of-evidence risk assessments for large rivers. Environmental toxicology and chemistry. 19(4): 1167–1173.
- Fairbrother, A. 2003. Lines of evidence in wildlife risk assessments. Human and ecological risk assessments. 9(6):1475–1491.
- Higgs, G., Gould, M. 2001. Is there a role for GIS in the ‘new NHS’? Health Place. 7:247–259.
- Hill, AB. 1965. The environment and disease: Association or causation. Proceedings of the Royal Society of Medicine, vol. 58, pp. 295–300.
- Norris, R., Liston, P., Mugodo, J., Nichols, S., Quinn, G., Cottingham, P., Metzeling, L., Perriss, S., Robinson, D., Tiller, D., Wilson, G. 2005. Multiple Lines and Levels of Evidence for detecting ecological responses to management intervention. In I.D. Rutherfurd, I. Wiszniewski, M.J. Askey-Doran and R. Glazik (Eds), Proceedings of the 4th Australian Stream Management Conference: linking rivers to landscapes, (pp. 456-463). Department of Primary Industries, Water and Environment, Hobart, Tasmania.
- Pollino, C.A., Woodberry, O., Nicholson, A.E., Korb, K.B. 2005. Parameterising Bayesian networks: a case study in ecological risk assessment. Proceedings of the 2005 International Conference on Simulation and Modeling, Bangkok, Thailand, January 2005.
- Shampine, W. J., 1993. Quality assurance and quality control in monitoring programs: in Improving natural resource management through monitoring, workshop, Stafford, S., ed., Environmental Monitoring and Assessment. 26 (2-3):143-151.
- Smolders, R., Gasteleyn, L., Joas, R., and Schoeters, G. 2008. Human biomonitoring and the inspire directive: spatial data as links for environment and health research. Journal of Toxicology and Environmental Health, Part B. 11 (8): 646-659.
- Wooldridge, S., Done. T. 2003. The use of Bayesian belief networks to aid in the understanding and management of large-scale coral bleaching. MODSIM 2003 International 398 Conference on Modeling and Simulation, Townsville, July 2003 399.
- Young, B., Nichols, S., Norris, R. 2006. Application of multiple lines and levels of evidence (MLLE) for addressing ecological questions of causality. Australian Society for Limnology 45th Annual Conference, 25–29 September, Albury, NSW.
See also
- HELI (WHO): Maps and spatial information technologies (Geographical Information Systems) in health and environment decision-making
- Wikipedia: Bayesian network
- Foodrisk.org