Information-Theoretic Approach to Nanopore Sequencing

Project synopsis

Fast and inexpensive DNA sequencing technology is beginning to impact society through applications ranging from personalized medicine to understanding of ecological systems enabled by second generation sequencers. Wider applicability of second-generation sequencing technology is limited, however, by the short length of the DNA fragments that can be read. Nanopore sequencing has the potential to overcome several shortcomings of state-of-the art short-read sequencing.

The goal of this project is to create new foundational theory and algorithms enabling several applications of nanopore sequencing. This research will support nanopore technology to become a leading next-generation sequencing approach. This project also contains a unique inter-university education and research program, which will include joint and collaborative student advising and curricular development.

To realize the advantages of nanopore sequencing, methods for reducing and combating sequencing errors need to be developed. In a nanopore sequencer, DNA is transmigrated through a nanopore, and the ion current variations through the pore are measured to infer the DNA sequence. The mapping from the DNA sequence to the observed current trace, has several impairments (causing errors) including multiple nucleotides affecting each observation, random variations in nanopore response, dwelling time variations, synchronization errors, and noise. This project develops a holistic approach using tools from information theory and bio-informatics based on multiple interacting thrusts:

  1.  Developing mathematical models,
  2.  Information theory for nanopore sequencing,
  3. Decoding algorithms exploiting the structure of the nanopore channel, and
  4.  The theory and methodology for applications in DNA forensics, DNA phasing and DNA information storage. We couple this with an existing experimental nanopore sequencing research program, guiding models and the theory/algorithms with specific data as well as validating these ideas.