Positive matrix factorisation

The text on this page is taken from an equivalent page of the IEHIAS-project.

Positive Matrix Factorisation (PMF) is a statistical factor analysis method, based on the law of mass conservation. By analysing measured concentrations at a series of measurement locations, the method first identifies a set of factors which can be taken to represent major emission sources. Scores on these factors are then regressed against the concentrations to estimate the contributions from each source.

Scope

Purpose

PMF is used to analyse the contributions of different sources to measured concentrations or loads of pollutants in the environment at receptor locations. It is useful, especially, where detailed data do not exist on the composition of the main emission sources, but where large numbers of sampled data are available on ambient concentrations.

Boundaries

A major advantage of PMF is that the methodology can be applied without the need for data on source emission compositions. The methodology can also help to identify missing sources, and can handle missing data or measurements below the detection limit, but requires information on uncertainties in the measurements of pollutant loads at the sampled receptors. In addition the models are constrained to non-negative species concentrations and source contributions.

Assumptions

Composition of the emission sources is constant over the period of sampling at the receptors;
Chemical species used in PMF do not interact with each other and their concentrations are linearly additive;
Source profiles (fpj) are linearly independent of each other;
The numbers of species (j) is greater than or equal to the number of sources (p);
Marker elements (tracers) for each source should be included;
There are many more samples than source types for statistically meaningful calculations.

Weaknesses and limitations

PMF models require large datasets on measured concentrations (preferable >100 samples);
Analysis is limited by the accuracy, precision, and range of species measured at the receptor (e.g. ambient monitoring) sites;
A determination must be made of how many 'factors' to retain;
Emission sources have to be deduced by interpreting these factors;
Information is needed on source profiles or existing profiles in order to verify the representativeness of the calculated source profiles and uncertainties in the estimated source contributions.
The method relies on many parameters and initial conditions and model input; results are sensitive to the pre-set parameters.

Requirements

For exposure assessment, the number of samples analysed must be representative both in time and space.

Method explanation

Input

PMF models require data on measured concentrations (of species/elements) for a number of samples, together with information on the associated uncertainties. Where appropriate (e.g. when analysing ambient PM samples), information on meteorological parameters and concentrations of associated gaseous species may also be used.

Output

Output from a PMF model comprises:

a set of factors representing the source profiles of major groups of emission sources;
estimates of the contribution from each of these sources (and their associated uncertainties).

Rationale

PMF, like other multivariate receptor models, is based on the analysis of the correlation between measured concentrations of chemical species, assuming that highly correlated compounds come from the same source. The PMF approach has been developed to resolve problems occurring in standard Principle Components Analysis (e.g. negative solutions, and the inability to include uncertainty estimates or deal with missing data), and to enable source contributions to be assessed when detailed information on source profiles is lacking.

Output from the PMF model is a set of factors representing source profiles and estimates of their associated contributions to measured concentrations at the sampled receptor sites. Interpretation of the factors (i.e. allocation to names source types) has to be done by reference to information on source emissions, derived from literature and/or available measured data.

Method

The PMF model assumes that measured concentrations at one or more receptor sites can be explained as the linear product of a source matrix and a contributing matrix. The two matrices are obtained by an interactive minimization algorithm: PMF involves constrained maximization of a weighted object function.

The primary object function is a measure of the goodness-of-fit of the predicted mass contributions for each species. Typically each species is weighted by a measure of trust in the individual measurements. The measure of trust can be adjusted for closeness to the minimum detection level, data completeness, sampling error or other user-defined attributes of the data. The results are constrained to be non-negative (although small negative values can occur) by adding penalty functions to the object function.

As with other forms of factor analysis, numerous procedural decisions have to be made and parameter values set when running PMF. These include the specification of data uncertainties, selection of the best number of factors, and choice of how to identify and deal with outliers. Results may be sensitive to these decisions, so the procedures used and assumptions made should always be fully documented in order to ensure that analysis is transparent.

PMF models are expressed as follows:

where:

p is the number of sources;

j is the number of species, with j ≥ p;

Csub>ij is the measured ambient concentration of species j in samplei;

f_pj (source profiles) is the fractional concentration of species j in the emissions from source p;

g_ip is the concentration contribution of source p to samplei; and

e_ij is the portion of the measured concentration that cannot be explained by the model.

References

Anderson, M.J., Miller, S.L. and Milford J.B. 2001 Source apportionment of exposure to toxic volatile organic compounds using positive matrix factorization. Journal of Exposure Analysis and Environmental Epidemiology 11, 295-307.
Batelle and Sonoma Technology 2002 Source apportionment analysis of air quality monitoring data: PhaseI.Final Report. Baltimore: Mid Atlantic/Northeast Visibility Union.
Hopke, P.K., Ito, K., Mar, T., Christensen, W. F., Eatough, D.J., Henry, R.C., Kim, E., Laden, F., Lall, R.,Larson, T., Liu, H., Neas, L., Pinto, J., Stölzel, M., Suh, H., Paatero, P. and Thurston, G.D. 2006 PM source apportionment and health effects: 1. Intercomparison of source apportionment results. Journal of Exposure Science and Environmental Epidemiology 16, 275-286.
Paatero, P., Hopke, P.K., Begum, B.A. and Biswas, S.K. 2005 A graphical diagnostic method for assessing the rotation in factor analytical models of atmospheric pollution. Atmospheric Environment 2005 39, 193–201.
Reff, A., Eberly, S.H. and Bhave, P.V. 2007 Receptor modeling of ambient particulate matter data using positive matrix factorization: review of existing methods. Journal of Air and Waste Management Association 57, 146–154.

**Integrated Environmental Health Impact Assessment System**
Topic	Pages
IEHIAS is a website developed by two large EU-funded projects Intarese and Heimtsa. The content from the original website was moved to Opasnet.
Toolkit
Data	Boundaries · Population: age+sex 100m LAU2 Totals Age and gender · ExpoPlatform · Agriculture emissions · Climate · Soil: Degredation · Atlases: Geochemical Urban · SoDa · PVGIS · CORINE 2000 · Biomarkers: AP As BPA BFRs Cd Dioxins DBPs Fluorinated surfactants Pb Organochlorine insecticides OPs Parabens Phthalates PAHs PCBs · Health: Effects Statistics · CARE · IRTAD · Functions: Impact Exposure-response · Monetary values · Morbidity · Mortality: Database
Examples and case studies	Defining question: Agriculture Waste Water · Defining stakeholders: Agriculture Waste Water · Engaging stakeholders: Water · Scenarios: Agriculture Crop CAP Crop allocation Energy crop · Scenario examples: Transport Waste SRES-population UVR and Cancer
Models and methods	Ind. select · Mindmap · Diagr. tools · Scen. constr. · Focal sum · Land use · Visual. toolbox · SIENA: Simulator Data Description · Mass balance · Matrix · Princ. comp. · ADMS · CAR · CHIMERE · EcoSenseWeb · H2O Quality · EMF loss · Geomorf · UVR models · INDEX · RISK IAQ · CalTOX · PANGEA · dynamiCROP · IndusChemFate · Transport · PBPK Cd · PBTK dioxin · Exp. Response · Impact calc. · Aguila · Protocol elic. · Info value · DST metadata · E & H: Monitoring Frameworks · Integrated monitoring: Concepts Framework Methods Needs
Listings	Health impacts of agricultural land use change · Health impacts of regulative policies on use of DBP in consumer products
Guidance System
The concept
Issue framing	Formulating scenarios · Scenarios: Prescriptive Descriptive Predictive Probabilistic · Scoping · Building a conceptual model · Causal chain · Other frameworks · Selecting indicators
Design	Learning · Accuracy · Complex exposures · Matching exposure and health · Info needs · Vulnerable groups · Values · Variation · Location · Resolution · Zone design · Timeframes · Justice · Screening · Estimation · Elicitation · Delphi · Extrapolation · Transferring results · Temporal extrapolation · Spatial extrapolation · Triangulation · Rapid modelling · Intake fraction · iF reading · Piloting · Example · Piloting data · Protocol development
Execution	Causal chain · Contaminant sources · Disaggregation · Contaminant release · Transport and fate · Source attribution · Multimedia models · Exposure · Exposure modelling · Intake fraction · Exposure-to-intake · Internal dose · Exposure-response · Impact analysis · Monetisation · Monetary values · Uncertainty
Appraisal