Highly accurate genome polishi... Note

Highly accurate genome polishing with DeepPolisher: Enhancing the foundation of genomic research

Understanding heredity, disease, and evolution hinges on deciphering the genome, encoded by DNA bases. While DNA sequencers read these bases, achieving accuracy at scale is difficult due to the minuscule size of base pairs. Creating a near-perfect reference genome is crucial, as assembly errors can hinder gene identification and lead to missed disease-causing variants. Genome assembly involves repeatedly sequencing the same genome to iteratively correct errors. However, the human genome's three billion nucleotides mean even small error rates accumulate significantly, limiting utility.To address these challenges, DeepPolisher, an open-source genome assembly method, was developed to enhance accuracy. This pipeline, described in a recent paper, reduces assembly errors by 50% and indel errors by 70%, which are particularly disruptive to gene identification. Various sequencing technologies exist, with Illumina's method improving signal but limiting read length. Long-read sequencing technologies, initially error-prone, were improved by Pacific Biosciences and Google's collaborative efforts, reducing error rates.DeepPolisher, adapted from DeepConsensus, utilizes a Transformer architecture trained on a highly characterized human genome. This method identifies and corrects remaining errors in genome assemblies. DeepPolisher significantly reduces indel errors, crucial for preventing gene annotation issues. The tool improves genome assembly quality, demonstrated by an increase in Q-scores from Q66.7 to Q70.1 on average.The Human Pangenome Reference Consortium's second data release benefited from DeepPolisher, reducing errors and enabling more accurate diagnosis of genetic diseases across diverse ancestries. By making DeepPolisher open-source, the goal is to broadly disseminate these advancements within the scientific community.
CdXz5zHNQW_4NEnms7GRh.png