Positive matrix factorisation
- The text on this page is taken from an equivalent page of the IEHIAS-project.
Positive Matrix Factorisation (PMF) is a statistical factor analysis method, based on the law of mass conservation. By analysing measured concentrations at a series of measurement locations, the method first identifies a set of factors which can be taken to represent major emission sources. Scores on these factors are then regressed against the concentrations to estimate the contributions from each source.
Scope
Purpose
PMF is used to analyse the contributions of different sources to measured concentrations or loads of pollutants in the environment at receptor locations. It is useful, especially, where detailed data do not exist on the composition of the main emission sources, but where large numbers of sampled data are available on ambient concentrations.
Boundaries
A major advantage of PMF is that the methodology can be applied without the need for data on source emission compositions. The methodology can also help to identify missing sources, and can handle missing data or measurements below the detection limit, but requires information on uncertainties in the measurements of pollutant loads at the sampled receptors. In addition the models are constrained to non-negative species concentrations and source contributions.
Assumptions
- Composition of the emission sources is constant over the period of sampling at the receptors;
- Chemical species used in PMF do not interact with each other and their concentrations are linearly additive;
- Source profiles (fpj) are linearly independent of each other;
- The numbers of species (j) is greater than or equal to the number of sources (p);
- Marker elements (tracers) for each source should be included;
- There are many more samples than source types for statistically meaningful calculations.
Weaknesses and limitations
- PMF models require large datasets on measured concentrations (preferable >100 samples);
- Analysis is limited by the accuracy, precision, and range of species measured at the receptor (e.g. ambient monitoring) sites;
- A determination must be made of how many 'factors' to retain;
- Emission sources have to be deduced by interpreting these factors;
- Information is needed on source profiles or existing profiles in order to verify the representativeness of the calculated source profiles and uncertainties in the estimated source contributions.
- The method relies on many parameters and initial conditions and model input; results are sensitive to the pre-set parameters.
Requirements
For exposure assessment, the number of samples analysed must be representative both in time and space.
Method explanation
Input
PMF models require data on measured concentrations (of species/elements) for a number of samples, together with information on the associated uncertainties. Where appropriate (e.g. when analysing ambient PM samples), information on meteorological parameters and concentrations of associated gaseous species may also be used.
Output
Output from a PMF model comprises:
- a set of factors representing the source profiles of major groups of emission sources;
- estimates of the contribution from each of these sources (and their associated uncertainties).
Rationale
PMF, like other multivariate receptor models, is based on the analysis of the correlation between measured concentrations of chemical species, assuming that highly correlated compounds come from the same source. The PMF approach has been developed to resolve problems occurring in standard Principle Components Analysis (e.g. negative solutions, and the inability to include uncertainty estimates or deal with missing data), and to enable source contributions to be assessed when detailed information on source profiles is lacking.
Output from the PMF model is a set of factors representing source profiles and estimates of their associated contributions to measured concentrations at the sampled receptor sites. Interpretation of the factors (i.e. allocation to names source types) has to be done by reference to information on source emissions, derived from literature and/or available measured data.
Method
The PMF model assumes that measured concentrations at one or more receptor sites can be explained as the linear product of a source matrix and a contributing matrix. The two matrices are obtained by an interactive minimization algorithm: PMF involves constrained maximization of a weighted object function.
The primary object function is a measure of the goodness-of-fit of the predicted mass contributions for each species. Typically each species is weighted by a measure of trust in the individual measurements. The measure of trust can be adjusted for closeness to the minimum detection level, data completeness, sampling error or other user-defined attributes of the data. The results are constrained to be non-negative (although small negative values can occur) by adding penalty functions to the object function.
As with other forms of factor analysis, numerous procedural decisions have to be made and parameter values set when running PMF. These include the specification of data uncertainties, selection of the best number of factors, and choice of how to identify and deal with outliers. Results may be sensitive to these decisions, so the procedures used and assumptions made should always be fully documented in order to ensure that analysis is transparent.
PMF models are expressed as follows:
where:
p is the number of sources;
j is the number of species, with j ≥ p;
Csub>ij is the measured ambient concentration of species j in samplei;
fpj (source profiles) is the fractional concentration of species j in the emissions from source p;
gip is the concentration contribution of source p to samplei; and
eij is the portion of the measured concentration that cannot be explained by the model.
References
- Anderson, M.J., Miller, S.L. and Milford J.B. 2001 Source apportionment of exposure to toxic volatile organic compounds using positive matrix factorization. Journal of Exposure Analysis and Environmental Epidemiology 11, 295-307.
- Batelle and Sonoma Technology 2002 Source apportionment analysis of air quality monitoring data: PhaseI.Final Report. Baltimore: Mid Atlantic/Northeast Visibility Union.
- Hopke, P.K., Ito, K., Mar, T., Christensen, W. F., Eatough, D.J., Henry, R.C., Kim, E., Laden, F., Lall, R.,Larson, T., Liu, H., Neas, L., Pinto, J., Stölzel, M., Suh, H., Paatero, P. and Thurston, G.D. 2006 PM source apportionment and health effects: 1. Intercomparison of source apportionment results. Journal of Exposure Science and Environmental Epidemiology 16, 275-286.
- Paatero, P., Hopke, P.K., Begum, B.A. and Biswas, S.K. 2005 A graphical diagnostic method for assessing the rotation in factor analytical models of atmospheric pollution. Atmospheric Environment 2005 39, 193–201.
- Reff, A., Eberly, S.H. and Bhave, P.V. 2007 Receptor modeling of ambient particulate matter data using positive matrix factorization: review of existing methods. Journal of Air and Waste Management Association 57, 146–154.
See also
Tools for PMF:
More info on source attribution:
- Source attribution in general
- Source attribution database contains results from source attribution studies
Other source attribution methods: