[Source: Proceedings of the National Academy of Sciences of the United States of America, full page: (LINK). Abstract, edited.]
A near full-length HIV-1 genome from 1966 recovered from formalin-fixed paraffin-embedded tissue
Sophie Gryseels, Thomas D. Watts, Jean-Marie Kabongo Mpolesha, Brendan B. Larsen, Philippe Lemey, Jean-Jacques Muyembe-Tamfum, Dirk E. Teuwen, and Michael Worobey
PNAS first published May 19, 2020 https://doi.org/10.1073/pnas.1913682117
Edited by Beatrice H. Hahn, University of Pennsylvania, Philadelphia, PA, and approved April 6, 2020 (received for review August 16, 2019)
Inferring the precise timing of the origin of the HIV/AIDS pandemic is of great importance because it offers insights into which factors did—or did not—facilitate the emergence of the causal virus. Previous estimates have implicated rapid development during the early 20th century in Central Africa, which wove once-isolated populations into a more continuous fabric. We recovered the first HIV-1 genome from the 1960s, and it provides direct evidence that HIV-1 molecular clock estimates spanning the last half-century are remarkably reliable. And, because this genome itself was sampled only about a half century after the estimated origin of the pandemic, it empirically anchors this crucial inference with high confidence.
With very little direct biological data of HIV-1 from before the 1980s, far-reaching evolutionary and epidemiological inferences regarding the long prediscovery phase of this pandemic are based on extrapolations by phylodynamic models of HIV-1 genomic sequences gathered mostly over recent decades. Here, using a very sensitive multiplex RT-PCR assay, we screened 1,645 formalin-fixed paraffin-embedded tissue specimens collected for pathology diagnostics in Central Africa between 1958 and 1966. We report the near-complete viral genome in one HIV-1 positive specimen from Kinshasa, Democratic Republic of Congo (DRC), from 1966 (“DRC66”)—a nonrecombinant sister lineage to subtype C that constitutes the oldest HIV-1 near full-length genome recovered to date. Root-to-tip plots showed the DRC66 sequence is not an outlier as would be expected if dating estimates from more recent genomes were systematically biased; and inclusion of the DRC66 sequence in tip-dated BEAST analyses did not significantly alter root and internal node age estimates based on post-1978 HIV-1 sequences. There was larger variation in divergence time estimates among datasets that were subsamples of the available HIV-1 genomes from 1978 to 2014, showing the inherent phylogenetic stochasticity across subsets of the real HIV-1 diversity. Our phylogenetic analyses date the origin of the pandemic lineage of HIV-1 to a time period around the turn of the 20th century (1881 to 1918). In conclusion, this unique archival HIV-1 sequence provides direct genomic insight into HIV-1 in 1960s DRC, and, as an ancient-DNA calibrator, it validates our understanding of HIV-1 evolutionary history.
HIV-1 – evolution – virus – phylogeny
1 To whom correspondence may be addressed. Email: email@example.com.
Author contributions: M.W. designed research; S.G., T.D.W., and M.W. performed research; J.-M.K.M., J.-J.M.-T., and D.E.T. contributed new reagents/analytic tools; S.G., T.D.W., B.B.L., P.L., and M.W. analyzed data; and S.G. and M.W. wrote the paper.
The authors declare no competing interest.
This article is a PNAS Direct Submission.
Data deposition: The DRC66 genome sequence is deposited in GenBank with accession number MN082768. Alignments and BEAST xml files are available at https://github.com/sophiegryseels/DRC66.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1913682117/-/DCSupplemental.
Published under the PNAS license.
Keywords: HIV/AIDS; DRC; Genetics; Evolution.