Western University Computer ScienceWestern Science

PhD Defense

 

Michael Molnar

Error Correction and De Novo Genome Assembly of DNA Sequencing Data

 

Date:
Time:
Place:
Supervisor:
Thesis Examiners:

Extra-Departmental
Examiner:
External Examiner:
Monday, November 13, 2017
1:30 p.m.
Middlesex College, Room 320
Dr. Lucian Ilie
Dr. Kaizhong Zhang
Dr. Kostas Kontogiannis

Dr. Greg Gloor (BioChemistry)
Dr. Bin Ma

 

Abstract:

The ability to obtain the genetic code of any species has caused a revolution in biological sciences. Current technologies are capable of sequencing short pieces of DNA with very high quality. These short pieces of DNA are used to determine the genetic code, called the genome, of any species. This information is key in understanding many of the aspects of how life functions.

The accuracy of the sequencing is extremely important since the differences between individuals of the same species are caused by very few changes. All sequencing technologies make errors, and before the data can be used for downstream applications it is usually best to correct the errors first. I present an error correction program called RACER that is a state-of-the-art error correction program that is targeted for substitution sequencing errors.

There are many substitution error correction programs available for DNA sequencing technologies, so it is important for biologists to know which program is best to use for their sequencing technology. I present a comprehensive survey of the state-of-the-art substitution error correction programs for DNA sequencing data to address this issue. I also present two programs to evaluate the performance of error correcting programs.

Current technologies can only obtain small pieces of DNA, software is needed to assemble the full genome. Current assembly programs cannot assemble the entire genome due to the repeats in genomes. I present a genome assembly program called SAGE2, which improves upon the current state-of-the-art genome assembly programs.