COVID-19 origins: Rumor of Chinese scientists deleting sequences data cleared up

A rumor about Chinese scientists deleting genetic sequences data related to COVID-19 patients has been cleared up: The speculation resulted from a simple editorial oversight by a scientific journal.

The story dates back to early 2020, when researchers at Wuhan University submitted their findings of a new coronavirus sequencing method to international academic journal Small. The related sequences taken from 34 patients were also uploaded to an online database called the Sequence Read Archive, maintained by the U.S. National Institutes of Health (NIH).

The paper was published in June 2020.

One year later, American virologist Jesse Bloom at the Fred Hutchinson Cancer Center in Seattle found himself unable to find the relevant data when researching the virus.

According to The New York Times, the scientist emailed Chinese scientists on June 6 to ask where the data is yet didn’t get a response. On June 22, he posted his report which was covered by multiple media outlets including The New York Times, posing questions about why Chinese scientists deleted the data.

Zeng Yixin, deputy head of China’s National Health Commission (NHC), cleared up the confusion at a press briefing on July 22.

He explained that the editors at Small deleted a paragraph noting that the sequences data was in the Sequence Read Archive. Without the paragraph, no one would know about or where to find the sequences.

Screenshot of the online correction from the journal Small. /Wiley Online Library

On June 9, 2020, Chinese researchers received the draft edited paper from Small and found the paragraph was deleted, so they thought the sequences were no longer useful and not necessary to be archived in the U.S. database. They emailed NIH to retract the data and NIH did, according to Zeng.

Zeng also mentioned the sequences had been uploaded to a new Genome Sequence Archive at China National Center for Bioinformation, which is open to the public globally.

In addition, he also said the earliest sequence among them dates back to January 30, a substantial time after the outbreak, so the sequences are not the earliest phase samples which provide limited value to help tracing the origins.

The journal Small also confirmed the claim. According to The New York Times, editor Plamena Dogandzhiyski at Small said, “The data availability statement was mistakenly deleted. We will issue a correction very shortly, which will clarify the error and include a link to the depository where the data is now hosted.”

The journal posted a formal correction on July 29.