© Springer Science+Business Media Dordrecht 2016
Eduard Verhagen and Annie Janvier (eds.)Ethical Dilemmas for Critically Ill BabiesInternational Library of Ethics, Law, and the New Medicine6510.1007/978-94-017-7360-7_77. Predicting Outcomes in the Very Preterm Infant
(1)
Department of Pediatrics, University of Montreal, 2900 Boulevard Edouard-Montpetit, Montreal, QC, H3T 1J4, Canada
Abstract
Extremely preterm infants have a high mortality and increased long term morbidity compared to babies born at term or later in gestation. They may have delays in motor, cognitive or developmental domains or persistent impairments. Frequently, in the NICU, investigations or evaluations are performed with the goal of trying to predict the long term outcomes for many different purposes. The characteristics of a test required for such predictions differ according to the different purposes. Individual prediction of profoundly abnormal outcome, based on any currently available test, is severely limited, and the use of any test in order to limit or redirect intensive care is difficult to justify, particularly because survivors of neonatal intensive care have almost universally good quality of life.
Extremely preterm infants have a high mortality and increased long term morbidity compared to babies born at term or later in gestation. They may have delays in motor, cognitive or developmental domains or persistent impairments. Frequently, in the NICU, investigations or evaluations are performed with the goal of trying to predict whether an individual infant is likely to be impaired.
According to the widely followed WHO definitions ‘An impairment is any loss or abnormality of psychological, physiological or anatomical structure or function; whereas a disability is any restriction or lack (resulting from an impairment) of ability to perform an activity in the manner or within the range considered normal for a human being; and a handicap is a disadvantage for a given individual, resulting from an impairment or a disability, that prevents the fulfilment of a role that is considered normal (depending on age, sex and social and cultural factors) for that individual’.
Although many publications speak blithely of handicap, we should preferably refer to attempts to predict impairment, as whether or not an impairment leads to significant disability starts to imply societal values, such as, what are our expectations? Two children with the same impairment, may have different disabilities. Handicap, using this definition, can often be minimized depending on how we define the roles of individuals with impairments, and on whether society seeks to integrate or to marginalize those with disabilities.
7.1 What Do We Want to Predict?
If we then suppose that predicting impairment may be possible, we need to ask which impairments are important to predict, I suggest that minor impairments with little effect on function, and which rarely lead to disability, are unlikely to be predictable, and may not be worthwhile to try and predict. We should focus on impairments which lead to important effects on clinical functioning, or which affect quality of life, which is where I focus in this chapter.
The majority of studies examining neonatal outcomes, and associating them with evaluations in the neonatal period have examined developmental assessments at 2 years of age or earlier. This has been done for a number of reasons: follow up programs become more expensive, and follow-up rates fall, the later the follow up is performed. So in order to achieve reasonable proportions of babies followed up, which is essential in order to have accurate descriptions of the outcomes of the whole group, a compromise has to be reached; a compromise between how predictive the tests really are for long term function on the one hand, and loss to follow up on the other. As a result developmental screening tests at between 18 and 24 months of age have been largely used, most commonly the Bayley Scales of Infant Development (BSID), as well as the Griffiths and other tests. Although valuable as a screening test for developmental delay, there are good data that demonstrate how poorly 2 year BSID scores predict longer term intellectual outcomes, and even more poorly functional outcomes. Maureen Hack, for example, showed that only 1/3 of preterm infants who had a low BSID score at 20 months had an IQ that was below normal at 8 years of age [1]. The 5 year CAP study outcomes (with evaluation by IQ testing) have also been compared with the 2 year BSID scores [2]. There was overall a substantial improvement in mean scores between the 2 time periods, and a reduction in the proportion of infants with scores more than 2 SD below the mean. In addition those infants with the lowest scores on the BSID tended to have the greatest improvements. We have shown that the biggest influence on whether a child would have an improvement in scores was in fact their socio-economic status. Some children with BSID scores more than 3 SD below the mean at 2 years were within the normal range at 5 years of age. Such data make the reliance on 2 year BSID and other developmental delay screening tools as indicators of neurodevelopmental impairment highly suspect.
The outcomes that we should be trying to predict, and the kind of reliability that we require in our predictions, should really depend on the purpose for which the predictions are being made. For some purposes, a statistically significant correlation between a particular finding and a poor Bayley score at 2 years might be important. For others a very high positive predictive value for profound disability may be required.
To re-emphasize, a low BSID score is not a disability. It is not an impairment, it is just a screening test trigger that should lead to further evaluation and surveillance. As Colombo and Carlson put it [3], ‘The BSID is, to be charitable, only modestly related to school-age cognitive development (i.e., the outcome that is most meaningful to investigators in this field). The BSID is a global measure of developmental status in infancy that assesses and aggregates the timely attainment of relatively crude milestones in infancy and early childhood.’
So the ability of a neonatal finding to predict a low BSID score is of limited interest, and is of extremely questionable value when we are discussing major changes in medical care based on predictions from prior literature. Long term outcomes which have impacts on function and quality of life are the outcomes most important to families in terms of their impact on the family, and their significance for their daily lives.
I would say that if the purpose of attempting to predict outcomes is in order to select patients for follow up, then a high sensitivity is required, to ensure that few patients with delays or impairments are missed, and prediction of a low 2-year Bayley score (which itself has a fairly high sensitivity (and low specificity) for long term functional problems) would be reasonable. Enrolling identified children in a follow up program is not generally a harm, so enrolling more children than the proportion that will truly eventually have significant impairments would be acceptable.
If the reasoning is to initiate a targeted early intervention, then again high sensitivity is required, so that all infants who will potentially benefit will be enrolled; if the intervention carries potential risks, or is costly, then a high specificity is also required.
If we wish, by our predictions, to prepare parents for their future, we need a high positive predictive value (PPV) for outcomes that are going to impact on their lives, and a high negative predictive value (NPV) to ensure that we do not inappropriately reassure them. In this case a high PPV for a low BSID score is of questionable benefit, as there is little evidence that a BSID ˂70 affects the function of families.
If we are trying to understand the causes of disability among preterm infants, then statistically significant associations, even if PPV is low, may help to direct our attention to findings which require further study.
In our efforts to perform research to reduce disability, or the impacts of disability, we really need predictive methods that have a high PPV for these outcomes (for example to enroll infants in prevention trials without enrolling infants at low risk) and a high sensitivity, so that a high proportion of affected children will be enrolled. In this case, minor or moderately severe impairments might be worthwhile predicting.
In contrast some articles explicitly state that the purpose of performing a particular test is that, with the results, we can redirect intensive care to comfort care, and prevent the survival of disabled children. I propose that for such a purpose, if it is considered morally acceptable, we should demand findings that have extremely high PPVs, and only for profoundly abnormal outcomes. In such a situation a prediction that a 2 year BSID will be 2 SD below the mean is entirely inadequate.
7.2 Predicting Outcomes Before Birth, Lack of Prediction by Gestational Age
When deciding on immediate neonatal intervention for an infant about to deliver in the extremely preterm range, we have limited data to use. Decisions are usually based on estimated gestational age (EGA) and often follow recommendations from professional societies. It is clear from multiple data sources that gestational age among the extremely preterm infant is strongly correlated with survival, even though birth weight is a better predictor of survival. In contrast, a recent systematic review [4] shows that when infants are examined at a sufficiently advanced age (over 4 years at least) there is no distinction in intellectual impairment between infants born at 22, 23, 24, 25 or 26 weeks. Although they differ in their survival, among survivors there is little evidence of different frequencies of significant impairments [5].
Therefore gestational age cannot be used to predict long term outcomes, however sex can be, as there are consistent and substantial differences between boys and girls [6]. This would suggest that if predictions of long term intellectual impairments are going to be made to decide on active intervention, it makes no sense to alter intervention based on gestational age, it would be more rational to resuscitate girls, and not boys. The moral acceptability of this is worthy of discussion.