Pragmatic soft-decision data readout of encoded large DNA
Citations Over TimeTop 10% of 2025 papers
Abstract
The encoded large DNA can be cloned and stored in vivo, capable of write-once and stable replication for multiple retrievals, offering potential in economic data archiving. Nanopore sequencing is advantageous in data access of large DNA due to its rapidity and long-read sequencing capability. However, the data readout is commonly limited by insertion and deletion (indel) errors and sequence assembly complexity. Here, a pragmatic soft-decision data readout is presented, achieving assembly-free sequence reconstruction, indel error correction, and ultra-low coverage data readout. Specifically, the watermark is cleverly embedded within large DNA fragments, allowing for the direct localization of raw reads via watermark alignment to avoid complex read assembly. A soft-decision forward-backward algorithm is proposed, which can identify indel errors and provide probability information to the error correction code, enabling error-free data recovery. Additionally, a minimum state transition is maintained, and a read segmentation is incorporated to achieve fast information reading. The readout assays for two circular plasmids (~51 kb) with different coding rates were demonstrated and achieved error-free recovery directly from noisy reads (error rate ~1%) at coverage of 1-4×. Simulations conducted on large-scale datasets across various error rates further confirm the scalability of the method and its robust performance under extreme conditions. This readout method enables nearly single-molecule recovery of large DNA, particularly suitable for rapid readout of DNA storage.
Related Papers
- → Nanopore DNA sequencing: Are we there yet?(2014)20 cited
- → Nanopore sequencing: The fourth generation sequencing(2019)2 cited
- → A single-molecule nanopore sequencing platform(2024)22 cited
- → Improving DNA Sequencing with Nanopore MSPA(2016)
- → Advances in Nanopore and Photoelectron-Based High-Throughput Sequencing Technology for Single-Molecule Sequencing(2023)