Development needs for integrated monitoring
- The text on this page is taken from an equivalent page of the IEHIAS-project.
The detail information for this material is given in 132-Development needs for integrated monitoring. It is available on INTARESE webpage.
What are the challenges by integrating data from multiple sources?
Challenges on data issues in general
There are missing data, noisy data and inconsistent data:
- Missing data
- Data is not always available
- Missing data may be due to
- equipment malfunction
- inconsistent with other recorded data and thus deleted
- data not entered due to misunderstanding
- certain data may not be considered important at the time of entry
- not register history or changes of the data
- Noisy data
- Q: What is noise?
- A: Random error in a measured variable
- Incorrect attribute values may be due to
- faulty data collection instruments
- data entry problems
- data transmission problems
- technology limitation
- inconsistency in naming convention
- Inconsistent data
- When you examine a data plot, you might find that some points appear to dramatically differ from the rest of the data (e.g. inappropriate values, Males being pregnant, or having a negative age). In some cases, it is reasonable to consider such point’s outliers, or data values that do not appear to be consistent with the rest of the data.
- Inconsistent data may be due to
- data sample problem
- equipment malfunction
- data entry problem
Challenges on data issues in environment and health fields
Before examining statistical methods for linking various types of data, it is necessary to investigate data sources that are available for tracking and linking hazards, exposure, and health effects (Mather et al., 2004). Fundamental factors that provide confidence in the results of data linkage are data quality, appropriate use of the data, and consideration of data limitations. The quality of hazard, exposure, and HOD (Health Outcome Data) are diverse, and the uses and limitations of data outside of its original purpose are not yet well defined (Table 1).
- Environmental data (hazard-exposure data)
- Hazard data tell us about pollutants that may be found in the environment, which can cause potential health problem. In INTARESE, hazard data from environmental monitoring is intended for exposure assessment, which can determine the amount, duration, and pattern of exposure to the pollutant.
- Biomonitoring data (exposure-dose data)
- Biomonitoring is the direct measurements of environmental chemicals, their metabolites or reaction products in people, usually in blood, urine, hair or milk. Exposure is defined as contact between an agent and a target. Dose is defined as the amount of agent that enters a target after crossing an exposure surface. If the exposure surface is an intake dose, the dose is an absorbed dose/intake dose; otherwise, it is an intake dose. In INTARESE, exposure and dose data are intended to estimate how much of the certain pollutant it would take to cause varying degree of health effects that could lead to illnesses.
- Health surveillance data (health effect data)
- In general, health data includes mortality and morbidity (incidence). In practice, it generally relied on a small number of measures, such as the number of monitoring region deaths, age-adjusted death rates for the monitoring region, and survival. In addition, health surveillance data also include health behavior and determinants of behavior (for example, knowledge, attitudes, and beliefs). In INTARESE, health effect data are intended to be linked to hazard-exposure-dose data in the view to assess the risk for the certain pollutant to cause health problem in the general population.
- Other relevant data (covariates)
- Other relevant data may include residence, proximity to known health effect-causing sources, socioeconomic status, age, race, and adherence to treatment regimens that may be related to incidence and hazard/exposure.
Data sources | Uses | Limitations |
---|---|---|
Environmental monitoring | Assessment of exposure
|
Difficult to access or not available
Not intended for exposure assessment Not representative in time and space Incomparable or unknown quality data |
Biomonitoring | Determine amount of exposure
Identify highly exposed individuals or groups Identify hazardous exposures Evaluate trends in exposure over time Evaluate effectiveness of public health actions Identify new or emerging exposures Helps set priorities for human health effects research In conjunction with other information:
|
Invasive and difficult to obtain samples
Results can be difficult to interpret and communicate to participants
Integrates exposure from all sources Studies can be very expensive |
Health surveillance | Describes health status of populations
Describes distribution and frequency of disease |
Data completeness
Misclassification of disease Generalizability to population Privacy and confidentiality issues |
All three types of data | Integrated environmental health impact assessment | Completeness of records
Timeliness of reporting Availability of access to data Geographic resolution of the data (scale) Frequency of data collection Lack of data collection standards |
References
- Abelsohn, A., MBChB, Frank, J., Eyles, J. 2009. Environmental Public Health Tracking/Surveillance in Canada: A Commentary. Healthc Policy. 4(3): 37–52.
- Mather, F.J., White, L.E., Langlois, E.C., Shorter, C.F., Swalm, C.M., Shaffer, J.G., Hartley, W.R. 2004. Statistical methods for linking health, exposure, and hazards. Public Health Tracking. 112: 1440-1445.
- Smolders, R., Gasteleyn, L., Joas, R., and Schoeters, G. 2008. Human biomonitoring and the inspire directive: spatial data as links for environment and health research. Journal of Toxicology and Environmental Health, Part B. 11 (8): 646-659.
- Zeng, J.H. 1999. Research and practical experiences in the use of multiple data sources for enterprise-level planning and decision-making: a literature review. Center for Technology in Government, University at Albanny.