Lieuwe D. J. Bos,Pratik Sinha, Robert P. Dickson
European Respiratory Journal 2020; DOI: 10.1183/13993003.01768-2020
By prematurely phenotyping patients with COVID-19, we expose ourselves and our patients to considerable and preventable risk. If we do not insist on data-driven phenotypes, our cognitive biases guarantee that we'll end up with phenotype-driven data.
The Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) poses an unprecedented global healthcare challenge. Severe novel Coronavirus disease (COVID-19) pneumonia frequently causes hypoxemic respiratory failure, manifesting in the acute respiratory distress syndrome (ARDS). Recently, authors have proposed distinct clinical phenotypes of COVID-19 pneumonia in several influential, high-profile essays. For example, in a recent Perspective in this journal, authors speculated that COVID-19 has five phenotypic presentations: three phenotypes based on severity of hypoxemia and need for supportive care (no hypoxemia, mild hypoxemia, and moderate hypoxemia), and two phenotypes of severely hypoxemic patients based on additional physiologic and clinical features. Aligned with other recent efforts to phenotype COVID patients, the authors subtyped patients into a supposedly prevalent phenotype with normal compliance, low lung weight, and predominant perfusion abnormalities (“L”-phenotype), and a less-prevalent phenotype with more typical features of ARDS such as profound consolidation and low compliance (“H”-phenotype). The authors advocate for distinct management strategies for these purported phenotypes, include permitting increased tidal volumes and restricted positive end-expiratory pressure in the “L” phenotype patients.
The urge to phenotype patients with COVID-19 pneumonia is understandable and relatable. Outside of critical care medicine, the past decade has been characterised by major advances in precision medicine, promising tailored therapies based on individual patients’ physiological and biological characteristics. The emergence of a novel disease without effective treatment incentivizes heuristic-based identification of subsets of patients who may respond similarly to a particular intervention. Yet this temptation to define phenotypes based on early clinical experience should be resisted. By prematurely phenotyping patients, we risk causing considerable harm and generating more static than signal. In this Perspective, we provide four arguments against premature phenotyping, discuss the features of responsible phenotyping, and recommend a path forward in advancing our understanding of the true heterogeneity underlying patients with COVID-19.
The first - and simplest - argument against premature phenotyping is that our initial intuitions are often wrong. As a vivid example, a prominent essay recently asserted without qualification that “soon after onset of respiratory distress from COVID-19, patients initially retain relatively good compliance despite very poor oxygenation.” This claim, while not supported by references cited, formed the basis for extended discussions of the pathophysiology and tailored management of patients with this purported “L phenotype” of COVID-19 (discussed above). Yet subsequent cohort studies have demonstrated that lung compliance in COVID-19 patients is in fact quite low, entirely congruent with non-COVID-19 ARDS cohorts, and normally distributed along a continuum rather than existing as discrete phenotypes. Further, purported radiographic and physiologic features of these phenotypes (e.g. dense airspace filling on CT scans paired with decreased compliance in the “H” phenotype) have subsequently been shown to be entirely uncorrelated with each other. Identification of clinical phenotypes - and speculation regarding their underlying biology - should be deferred until after careful, objective inspection of adequately sized cohorts. Human intuitions are simply too fallible – and clinical experience too contingent and heterogenous – to reliably identify phenotypes without sufficient data.
A related argument against premature phenotyping is that it exacerbates our inherent susceptibility to cognitive biases. Once we are informed of clinical categories (however false they may be), our brains treat them as real and begin selectively filtering our observations. As an example, following dissemination of the since-disproven claim that COVID-19 patients have preserved lung compliance, the myth was reinforced by common cognitive traps. The Baader-Meinhof phenomenon (also called the “frequency illusion”) ensured that once clinicians were prompted to notice COVID-19 patients with near-normal lung compliance, they began noticing them everywhere (when in fact their frequency was no higher than in non-COVID ARDS). Similarly, clinicians could dismiss low-compliance COVID-19 cases by unintentionally committing the “no true Scotsman” fallacy: by dismissing away purported exceptions on an ad hoc basis, claiming that low-compliance COVID-19 cases must be atypical, as “real COVID-19” has near-normal respiratory mechanics. If we do not insist on data-driven phenotypes, our cognitive biases guarantee that we'll end up with phenotype-driven data.
A third argument against premature phenotyping is that it distracts us from sound, evidence-based practices. Clinical outcomes in ARDS have improved markedly in recent decades , driven not by blockbuster drug discoveries, but rather by incremental improvements in the delivery of supportive care. These slow but cumulative advances have been built on hard-won lessons from rigorous randomised controlled trials. By their design, these trials have “lumped” heterogenous ARDS patients together under a syndrome-based definition. Despite this, these trials have provided the field with an extensive literature informing evidence-supported therapies. By presumptuously splitting COVID-19 patients into false phenotypes – and by recommending “tailored management” based on untested physiologic intuitions – authors have advocated for abandonment of what remains our most effective tool against COVID-19: meticulous, evidence-driven critical care delivery.
A final argument against premature phenotyping is that it worsens the already-unfavorable ratio of signal and noise in the ICU. At the bedside, critical care physicians must filter, process, and interpret a tremendous stream of data generated by every patient: physiological, biochemical, radiographic, etc.. Clinicians must synthesise these findings with the published literature, which is similarly daunting: more than 10 000 PubMed-indexed manuscripts on COVID-19 were published in the first 4 months of 2020. This deluge of information threatens the most overlooked and precious resources in the ICU: clinicians’ attention, time, and bandwidth. By needlessly clouding the clinical picture, false phenotypes consume time on rounds and distract us from more immediate concerns. As a field, our research prioritisation has been similarly clouded: investigators’ time and resources are squandered trying to explain the biology underlying clinical phenomena that, upon inspection of patient data, simply do not exist.
So what does responsible phenotyping look like? As with any scientific experiment, there needs to be an explicit purpose as to why we seek phenotypes. In medical science, this mandate ultimately converges on improving patient outcomes (although gleaming novel biological and clinical insights is an equally important motivating factor as they may be critical to achieving this goal). To that end, our field has recent examples of empirically-derived phenotypes that have been successfully used to identify treatment-responsive and/or biologically distinct subgroups. In asthma for example, using a data-driven unbiased clustering approach, two distinct phenotypes of asthma were identified based on interleukin-13 inducible gene-expression. The phenotype signature specific to high gene-expression was later shown, in a randomised-controlled trial (RCT), to be responsive to a monoclonal antibody that specifically inhibits interleukin-13 activity. In ARDS, again using unbiased clustering methods, two phenotypes have been identified with distinct biological and clinical characteristics, consistent across five RCTs, and with markedly different clinical outcomes. Importantly, in three of these RCTs, divergent treatment responses were observed to randomised interventions. Further, simpler models have been recently described that offer the potential for the clinical application of these phenotypes.
These data-driven approaches to clustering are not impervious to errors and misuse. These are powerful tools and - independent of the validity of the research question or study design - clusters will inevitably emerge. It is, therefore, incumbent on the investigators to demonstrate the validity and utility of the identified phenotypes. In the absence of ground truth, the conditions that optimally surrogate for validity are 1) robustness, 2) consistency and 3) reproducibility in data external from the population from which they were derived. In almost all algorithms, the phenotypes identified are highly subject to the predictor variables. Taking critically-ill COVID-19 patients as a specific example, it is imperative to acknowledge that we are studying complex biological systems in which inter-connected pathways share non-linear associations. Seeking univariate solutions in such populations, therefore, seems unlikely to yield meaningful subgroups other than for prognostication. Further, univariate solutions, particularly when sought prematurely, can be more susceptible to the central limit theorem. This mathematical theorem states that given a sufficiently large sample size, the distribution of means of a variable will converge to a normal distribution, suggesting that continuous variables that appear bimodal with limited data, will become normally distributed over time. Thus both in terms of biologic plausibility and mathematical principles, we are unlikely to derive useful phenotypes if we anchor on simplistic, one-dimensional features of disease.
To avoid these pitfalls, predictor variables in multivariate models should be selected with the research question in mind and be highly informative in terms of effectively splitting the population. Moreover, the use of these complex data-science algorithms, intended to overcome cognitive bias, will be limited to theorising unless they are accompanied by a measurement system that can identify the phenotypes rapidly and consistently. Regardless of the motivation or approach used, phenotyping in critical care is typically a data-hungry exercise, and in studies currently purporting COVID-19 phenotypes, the requisite quantity and quality of data are regrettably lacking. Ultimately, however, the true success of phenotypes in diseases will be judged by the identification of actionable interventions. In critical care, although many examples of heterogeneous treatment effects in phenotypes have been described in secondary analyses, their efficacy will need testing via randomised controlled trials (RCTs). The mere identification of disease phenotypes – whether derived prematurely or responsibly – should not itself change clinical practice, but instead inform prospective, “phenotype-aware” trials.
In summary, the COVID-19 pandemic has posed novel challenges for clinicians and researchers. While we share the ultimate goal of tailoring therapies to the specific pathophysiology of each patient's condition, it is imperative that we first objectively collect, collate, and interpret sufficient data to “type” and understand the disease comprehensively. By prematurely phenotyping patients with COVID-19, we expose ourselves and our patients to considerable and preventable risk.
The perils of premature phenotyping. By focusing on extremes of a normally distributed continuum, we risk creating arbitrary “phenotypes” that are not representative of meaningful underlying differences in pathophysiology. Premature phenotyping is often based on erroneous initial impressions, and contributes to cognitive biases including the Baader-Meinhoff phenomenon (the “frequency bias”) and the “No true Scotsman” fallacy (excluding incompatible observations via an ad hoc purity test). Premature phenotyping can compromise the delivery of care by inspiring deviation from evidence-based practices as well as contributing needlessly to the cognitive load of clinicians.
Conflict of interest: Dr. Bos reports grants from the Dutch lung foundation (Young investigator grant), grants from the Dutch lung foundation (Public-Private Partnership grant), personal fees from Bayer (for consultancy) , grants from the Dutch lung foundation (Dirkje Postma Award), outside the submitted work;.
Conflict of interest: Dr. Sinha has nothing to disclose.
Conflict of interest: Dr. Dickson has nothing to disclose.