Talk:Assessment of the health impacts of H1N1 vaccination
Evaluation of the H1N1 assessment
The evaluation was originally made for the purpose of demonstrating the application of the properties of good assessment framework in an article considering evaluation of model and assessment effectiveness, but in the end was omitted from the manuscript. Here the assessment of the health impacts of H1N1 vaccination is evaluated according to the application of the Properties of Good Assessment framework, originally published by Tuomisto and Pohjola (2007). For the purpose of this evaluation, the framework has been slightly updated as is described in more detail below.
Properties of good assessment
The framework consists of three categories, which are broken down into nine properties that jointly constitute the performance. The framework is designed to be applicable for evaluating both quantitative and qualitative information. Also the evaluation can be made either quantitatively or qualitatively, depending on the specific evaluation methods that are applied. The framework itself does not bind into using any particular evaluation methods, as long as their application is in line with how the properties in the framework are defined. The framework is scalable so that it can be applied for evaluating a whole model or assessment, one of its parts, e.g. a sub-model, node, variable, parameter etc., or for a set of models, assessments, or their parts. In terms of management towards effective interaction between modelling or assessment and use of their outputs, the framework is probably most intuitively comprehensible on the level of considering one model or assessment.
The categories and properties of the framework are presented in Table 1 and discussed in more detail below. In the table, the description column provides a general explanation of the meaning of each property. The question column then attempts to explicate what is intended by the description by providing example questions that could be asked in evaluating a model or assessment in terms of that property. For clarity the example questions are formulated on the level of evaluating a whole assessment consisting of only one assessment question and one corresponding answer (along with its reasoning), unless otherwise indicated.
Category | Property | Description | Question |
---|---|---|---|
Quality of content | Informativeness | Specificity of information, e.g. tightness of spread for a distribution. | How many possible worlds does the answer rule out? How few possible interpretations are there for the answer? |
Calibration | Exactness or correctness of information. In practice often in comparison to some other estimate or a golden standard. | How close is the answer to reality or real value? | |
Coherence | Correspondence between questions and answers. Also between sets of questions and answers. | How completely does the answer address the assessment question? Is everything addressed? Is something unnecessary? | |
Applicability | Relevance | Correspondence between output and its intended use. | How well does the information provided by the assessment serve the needs of the users? Is the assessment question good? |
Availability | Accessibility of the output to users in terms of e.g. time, location, extent of information, extent of users. | Is the information provided by the assessment available when, where and to whom is needed? | |
Usability | Potential of the information in the output to trigger understanding in its user(s) about what it describes. | Can the users perceive and internalize the information provided by the assessment? Does users' understanding increase about the assessed issue? | |
Acceptability | Potential of the output being accepted by its users. Fundamentally a matter of its making and delivery, not its information content. | Is the assessment result (output), and the way it is obtained and delivered for use, perceived as acceptable by the users? | |
Efficiency | Intra-assessment efficiency | Resource expenditure of producing the assessment output. | How much effort is spent in the making of an assessment? |
Inter-assessment efficiency | Resource expenditure of producing assessment outputs in a series of assessments. | If another (somewhat similar) assessment was made, how much (less) effort would be needed? |
Quality of content
As the name implies, the properties in the first category, quality of content, address characteristics of the information content in the assessment output. These properties characterize performance in relation to the general purpose of modelling and assessment, describing reality.
Informativeness and calibration are tightly interlinked properties and it makes most sense to consider them together as describing the truthlikeness (cf. Niiniluoto, 1997) of the answers provided by an assessment. Informativeness and calibration form a couple, which is quite similar to e.g. accuracy and precision of quantitative information (see e.g. accuracy and precision in Wikipedia, http://en.wikipedia.org/wiki/Accuracy_and_precision), but has a somewhat different and more flexible interpretation particularly in terms of non-quantitative information. The basic challenge regarding calibration is that in most cases it is not possible to know what is the absolute truth, and in practice calibration often needs to be evaluated against e.g. gold standards or estimates obtained by other means or indirectly through evaluating the calibration of the source of information. Clarification and examples on informativeness and calibration in expert elicitation can be found e.g. from Cooke (1991) and Tuomisto et al. (2008).
Coherence considers the match between the questions asked and answers provided in an assessment; how completely are the questions answered? It necessitates explication of the assessment questions (and sub-questions). It does not, however, consider the goodness of the questions.
Applicability
The properties in the second category, applicability, consider the assessment output, i.e. information product delivered to its use. As not only the information content and the structural features of the output, but also the delivery to use is considered, these properties extend to consider aspects of the making as well as using the output. The applicability properties characterize performance in relation to the instrumental purposes of serving practical needs. This necessitates identification and explication of the purposes.
Fundamentally the question in applicability is about the performance of the output in triggering the intended cognitive processes among the users that lead to increased understanding and consequential decisions and actions guided by that understanding. The potential of achieving this is considered to be a function of whether the questions looked at are right in relation to the needs (relevance), how well the information produced by modelling and assessment reaches its targets (availability), to what extent the receivers can make use of the information (usability), and if it is accepted or rejected by the users (acceptability). We say potential of achieving, because the ultimate overall applicability is a result of multiple factors of which many, e.g. the cognitive capacities of the users and many situational factors, can be considered to be beyond the influence of modellers and assessors. Consequently, whereas the first two applicability properties, availability and usability, are explicitly, although not necessarily easily, measurable, the last two properties, usability and acceptability, are more tricky as they vary significantly from an individual user as well as situation to another. The issues of availability are also addressed in the dimensions of openness, a framework for designing and managing effective processes for participatory assessment and policy making, in terms of scope of participation, access to information and timing of openness (Pohjola et al., 2011c).
Efficiency
The third category, efficiency, takes a relatively simple and straightforward approach to characterizing the process of modelling and assessment. It consists of two measures of resource expenditure. Intra-assessment efficiency considers resource expenditure for given output in one assessment. Inter-assessment efficiency considers the change in efficiency, or a corresponding change in resource expenditure, for given output in a series of assessments. Whereas the first measure is probably intuitive and easy to grasp, the latter may require some explanation.
The idea behind the inter-assessment efficiency is that given the output that was produced in an assessment and the corresponding expenditure of resources, it can be assumed that a related assessment could be made with less resource expenditure for comparable output or better output with same resource expenditure. This can take place e.g. through the learning of modellers and assessors, but particularly through development, dissemination and sharing of re-usable assessment modules, sub-models etc. (cf. Haas and Jaeger, 2005; Harmsze, 2000). This saves the efforts of unnecessary duplicate work, and allows focusing on the most important or complicated aspects of modelling or assessment exercises.
Effectiveness
The first two categories, quality of content and applicability, characterize the potential of an assessment to deliver the intended outcomes by both describing reality and serving practical needs. The third category characterizes the efficiency by which this potential is produced. The overall performance constituted as an aggregate of all properties can be called effectiveness. Above effectiveness was defined according to Hokkanen and Kojo (2003) as the likelihood of an assessment process achieving the desired results and the goals set for it. The Properties of Good Assessment framework can be considered as providing an operationalization of this definition by stating that effectiveness of a model or assessment is a function of the quality of content and applicability of its output and the efficiency of its making and delivery. It should be reminded, however, that this measure of effectiveness characterizes the likelihood for delivering the outcomes, not the actual realization thereof. Anyhow, the Properties of Good Assessment framework provides a major step forward towards bridging the modelling and assessment outputs with their intended outcomes. The information provided by evaluations according to the framework serves well the needs in designing and managing effective modelling and assessment endeavours. In retrospective follow-up evaluations of model and assessment effectiveness the information needs to be complemented with direct outcome evaluations e.g. as proposed by Matthews et al. (2011).
Evaluation
On the course the assessment primarily served the purposes of illustrating essential aspects of decision analysis and risk management, as well as an explicating the swine flu (AH1N1 influenza) pandemic and related vaccination campaign in Finland 2009 - 2010 as examples of practical contexts for decision analysis and risk management. Here we focus on its secondary purpose: to evaluate the decision to launch a nationwide vaccination campaign to alleviate the pandemic in Finland. The discourse erupted during autumn 2010 when suspicions regarding a relationship between the AH1N1 vaccine and the sudden increase in prevalence of narcolepsy in Finland were publicized in the media. Soon after that the National Institute for Health and Welfare (THL) in Finland set up a task force to determine whether such a causal relationship exists (THL, 2011a,b).
The setting for evaluating the assessment:
- Time of assessment: spring 2011.
- Assessors: organizers of the DARM course, participants of the course.
- Intended primary user: Ministry of Social Affairs and Health in Finland.
- Intended use: basis for communication about public concerns regarding swine flu vaccines and narcolepsy.
- Evaluation method: qualitative expressions on a 5-point scale from very low to very high.
- Evaluation focus: whole assessment (main message, supported by all other information).
- Basis for evaluation: information provided on the assessment page in Opasnet, complemented with additional information obtained from assessment participants where necessary.
Category | Property | Characterization | Explanation |
---|---|---|---|
Quality of content | Informativeness | Medium | The conclusion that vaccinating whole population was a better alternative than no vaccination is well supported by the model results. The conclusion that vaccinating whole population was a better alternative than vaccinating all except 5-19 old is more vaguely supported. The uncertainty of some variables in the model is high. |
Calibration | Medium | Model results and assessment conclusions are in line with the analyses by the European Medical Agency (EMA, 2011) and the National narcolepsy task force in Finland (THL, 2011b). The calibration of some variables in the model, e.g. DALY weight for narcolepsy, may well be questioned as they are based on assumptions rather than data. | |
Coherence | High | The question is well addressed and the answer is reasoned with a model that takes account of the most important factors known to have effect on the outcome. Limitations in the comprehensiveness of the model and its parts exist, e.g. in terms of assumptions, but many of them identified and explicated. Value of information analysis indicates high coherence within the assessment. | |
Applicability | Relevance | Very low (potentially high) | The assessment addresses an issue that can be considered as underlying much of the discussion regarding the swine flu pandemic and the vaccination campaign. The assessment could thus be claimed to serve a real, existing need and the potential relevance could be considered high. However, in reality there was practically no interaction between the assessors and intended users and there was no demand from the users for the assessors to address the question. Consequently, the actual relevance is very low. |
Availability | Low (potentially very high) | The model was developed and presented on a freely accessible assessment page in Opasnet at a time when many of the issues related to the case were still unresolved and under active public discussion. Practically no technical limits to availability exist. However, awareness about the assessment among intended users remained low despite (or due to only) the minor efforts of informing them by e-mail. Consequently, the relevance is actually low, although potentially very high. | |
Usability | Medium | The main message and its basis is presented in a structured way and is relatively easily perceivable even for non-experts. Assumptions and limitations are described and access to the data and calculations is provided and easy. However, obtaining a deeper and detailed understanding of the model requires specific knowledge, detailed scrutiny, and possibly also assistance from the developers. Explication of the intended use and guidance of use are omitted from the assessment (cf. use plans in Vermaas and Houkes, 2006). Due to lack of interaction with users the actual usability in intended use is unknown. | |
Acceptability | Medium | The open approach can be considered to have increased acceptability in a situation where authorities were accused for non-warranted withholding of important information. On the other hand the model was developed by non-experts regarding infectious diseases and vaccines. Also the credibility of the organization, THL, that the main developers represented was strongly questioned in public at the time of developing and delivering the model. Due to lack of interaction with users the actual acceptability in intended use is unknown. | |
Efficiency | Intra-assessment efficiency | High | The assessment was developed as a side product of the DARM course. The development of the model consumed about 2 person months work, consisting mostly of the efforts of the course assistant, a high school graduate with good mathematical and computer skills, but no prior specific expertise on vaccines or infectious diseases. |
Inter-assessment efficiency | High | The assessment is mostly structured as independent variables that are applicable in other assessments. The assessment also applies some variables that were developed in previous assessments. However, the calculation in the model for the most part was not coded as independently applicable modules. | |
Overall evaluation | Effectiveness | Low | Despite mostly relatively good scores with regard to many properties, the overall effectiveness remains low because the intended use did not take place in reality. The potential for outcome effectiveness can be seen, but the failure of the delivery, i.e. lack of interaction between the assessment and its use, prevents it from becoming realized. The realized impacts are mostly process effects, increasing the knowledge among the participants of the assessment. As many of the participants work in roles that are relevant to the interests of the intended user, the Ministry of Social Affairs and Health, it can be assumed that some of that knowledge will eventually trickle to its intended use, but indirectly and with delay. |
The example assessment can be considered as somewhat typical in the sense that it fails to convey its as such good results into practice. Although the evaluation example above can be considered somewhat superficial, and is made only qualitatively, it highlights some important aspects of assessment and model performance:
- In terms of outcome effectiveness, there is a major difference between the potential of an assessment or model to deliver its intended outcomes and the actual delivery thereof.
- The properties that have been least addressed within the common contemporary approaches to performance, namely relevance and availability, are critical for transforming the potential of an assessment or model to effectiveness.
- The delivery of the assessment or model outputs to their intended use is necessary to take account of in considering assessment and model performance.
- Improving effectiveness of assessments and models is not an issue to be addressed within the communities of assessment and modelling, but requires simultaneous development of the use processes and the capacity of policy making to make use of what assessments and models can deliver.
The major limitations of the assessment indicated by the evaluation according to the properties of good assessment may seem apparent, but they would probably not show up in evaluations applying more conventional approaches. Altogether, the example shows that, despite still lacking explicit methods for its application, the Properties of Good Assessment framework can already be a useful and powerful means for evaluating and managing assessment and model performance.
References
Cooke, R.M., 1991. Experts in Uncertainty: Opinion and Subjective Probability in Science. Oxford University Press, New York.
EMA, 2011. Press release 27 July 2011: European Medicines Agency recommends restricting use of Pandemrix. European Medicines Agency. Available: http://www.ema.europa.eu/docs/en_GB/document_library/Press_release/2011/07/WC500109182.pdf
Haas, A., Jaeger, C., 2005. Agents, Bayes, and Climatic Risks – a modular modelling approach. Advances in Geosciences 4, 3–7.
Harmsze, F.A.P., 2000. A modular structure for scientific articles in an electronic environment. A Doctor's Thesis, University of Amsterdam. Available: http://www.science.uva.nl/projects/commphys/papers/thesisfh/Front.html
Hokkanen, P., Kojo, M., 2003. How environmental impact assessment influences decision-making [in Finnish]. Ympäristöministeriö, Helsinki.
Matthews, K.B., Rivington, M., Blackstock, K.L., McCrum, G., Buchan, K., Miller, D.G., 2011. Raising the bar? - The challenges of evaluating the outcomes of environmental modelling and software. Environmental Modelling & Software 26 (3), 247-257.
Niiniluoto, I., 1997: Reference invariance and truthlikeness. Philosophy of Science 64, 546-554.
Pohjola, M.V., Tuomisto, J.T., 2011. Openness in participation, assessment, and policy making upon issues of environment and environmental health: a review of literature and recent project results. Environmental Health 10, 58. doi:10.1186/1476-069X-10-58
THL, 2011a. National narcolepsy task force interim report 31 January 2011. National Institute for Health and Welfare (THL), Helsinki. Available: http://www.thl.fi/thl-client/pdfs/dce182fb-651e-48a1-b018-3f774d6d1875
THL, 2011b. National narcolepsy task force final report 31 August 2011 (in Finnish). National Institute for Health and Welfare (THL), Helsinki. Available: http://www.thl.fi/thl-client/pdfs/c02a3788-a691-47a4-bca8-5161b6cff077
Tuomisto, J.T., Pohjola, M.V., 2007. Open Risk Assessment - A new way of providing information for decision-making. Publications of the National Public Health Institute B18/2007. KTL - National Public Health Institute, Kuopio
Tuomisto, J.T., Wilson, A., Evans, J.S., Tainio, M., 2008. Uncertainty in mortality response to airborne fine particulate matter: Combining European air pollution experts. Reliability Engineering & System Safety 93, 732-744.
Vermaas, P.E., Houkes, W., 2006. Technical functions: a drawbridge between the intentional and structural natures of technical artefacts. Studies in History and Philosophy of Science 37, 5-18.
Pandemrix should not be used because of narcolepsy risk
Instructions for the exercise 4 of Darm course 2011 can be found here
Fact discussion: . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Opening statement: Pandemrix should not be used any more anywhere because its narcolepsy risk is too high.
Closing statement: Not accepted. Pandemrix is still an effective and safe vaccine. However, due to precautionary reasons, other alternatives should be used when available, because the occurrence of narcolepsy is not understood. (Resolved, i.e., a closing statement has been found and updated to the main page.) | ||||||||||
Argumentation:
⇤--J5: . Pandemrix is a safe vaccine and narcolepsy risk is low. --Jouni 18:17, 6 April 2011 (EEST) (type: truth; paradigms: science: attack)
⇤--D3b: . The vaccine may still have been used where no other option was available and upon consideration in individual cases, for instance for people travelling to areas where an epidemic was in progress. --Carmen Gil 11:25, 1 April 2011 (EEST) (type: truth; paradigms: science: attack) [6] ⇤--J1: . Despite risks, Pandemrix is an effective vaccine and has clearly net positive effects in countries where emergency treatment is poorly available for severe swine flu cases. --Jouni 23:05, 31 March 2011 (EEST) (type: truth; paradigms: science: attack)
⇤--E6: . The vaccination used last year will most likely protect also against the possible swine flu epidemic of this year, although the virus has changed a bit. --Sallamari Tynkkynen 10:57, 1 April 2011 (EEST) (type: truth; paradigms: science: attack) [5]
⇤--E5: . Nursing staff in hospitals should be vaccinated; it is their responsibility as medical professionals. --Sallamari Tynkkynen 10:54, 1 April 2011 (EEST) (type: truth; paradigms: science: attack) [7] |
Discussion groups (DARM 2011):
- Talk:Assessment of the health impacts of H1N1 vaccination/Group A
- Talk:Assessment of the health impacts of H1N1 vaccination/Group B
- Talk:Assessment of the health impacts of H1N1 vaccination/Group C
- Talk:Assessment of the health impacts of H1N1 vaccination/Group D (Finnish material)
- Talk:Assessment of the health impacts of H1N1 vaccination/Group E (Finnish material)
References
- ↑ 1.0 1.1 1.2 1.3 Rokotusinfo: Swine flu
- ↑ YLE: EU agency does not find link between Pandemrix and narcolepsy
- ↑ 3.0 3.1 3.2 Rokotusinfo
- ↑ 4.0 4.1 4.2 European Centre for Disease Prevention and Control (ECDC): Questions and answers
- ↑ 5.0 5.1 THL press release 9 Dec 2010
- ↑ 6.0 6.1 6.2 6.3 THL press release 25 Aug 2010
- ↑ 7.0 7.1 Helsingin Sanomat: Arkkiatri moittii sikainfluenssarokotteen vastustajia (in Finnish)
- ↑ WHO Europe: Pandemrix® vaccine and increased risk of narcolepsy
- ↑ WHO Global Advisory Committee on Vaccine Safety: Statement on narcolepsy and vaccination
- ↑ 10.0 10.1 European Medicines Agency (EMA): Information page on Swine flu
- ↑ PreventDisease.com: Total of 2300 Reports of Adverse Reactions From Pandemrix Vaccine in Sweden
- ↑ THL recommends to stop the use of Pandemrix