BIOINFORMATION AND BIOCOMPUTATION LABORATORY

Address

Middlesex College MC326, University of Western Ontario, London, Ontario, N6A 5B7, Canada, Tel: (519) 661-2111 ext. 84024

Research meetings: Tuesdays, 1:30pm-3:30pm, MC 326.

Current Research Projects

Theoretical studies of bioinformation and biocomputation (Zihao Wang)

DNA nanocomputing by self-assembly (Amir Simjour)

Molecular Distance Maps: A quantitative method for measuring and visualizing species' relatedness in 3D (Stephen Solis-Reyes, Gurjit Randhawa, Rallis Karamichalis)

Research Description

The research topics investigated are complementary facets of the same problem, believed to be among the most fundamental ones facing scientists today: understand biological information in all its aspects, understand and harness the computational capabilities of cellular organisms, understand the underlying computational aspects of nature. This line of investigation has sprouted from interest in the field of DNA computing wherein it was first demonstrated that computation can be accomplished entirely by the use of biological means. DNA computing is based on the idea that molecular biology processes can be used to perform arithmetic and logic operations on information encoded as DNA strands.

A DNA single-strand consists of four different types of units called bases strung together by an oriented backbone like beads on a wire. The bases are Adenine (A), Guanine (G), Cytosine (C) and Thymine (T), and A can chemically bind to an opposing T on another single strand, while C can similarly bind to G. Bases that can thus bind are called Watson/Crick (W/C) complementary , and two DNA single strands with opposite orientation and with W/C complementary bases at each position can bind to each other to form a DNA double strand in a process called base-pairing. To encode information using DNA, one can choose an encoding scheme mapping the original alphabet onto strings over {A, C, G, T}, and proceed to synthesize the obtained information-encoding strings as DNA single strands. A computation will consists of a succession of bio-operations , such as cutting and pasting DNA strands, separating DNA sequences by length, extracting DNA sequences containing a given pattern or making copies of DNA strands. The DNA strands representing the output of the computation can then be read out and decoded.

Herein lies a wealth of problems to be explored, stemming from the fact that encoding information in DNA differs from encoding it electronically, and bio-operations differ from electronic computer operations. A fundamental difference arises from the fact that in electronic computing data interaction is fully controlled while, in a test-tube DNA-computer, free-floating data-encoding DNA single strands can bind because of W/C complementarity. Another difference is that, in DNA computing, a bio-operation usually consumes both operands. This implies that, if one of the operands is either involved in an illegal binding or has been consumed by a previous bio-operation, it is unavailable for the desired computation. Yet another difference is that, while in electronic computing a bit is a single individual element, in DNA experiments each submicroscopic DNA molecule is usually present in millions of identical copies. The bio-operations operate in a massively parallel fashion on all identical strands. However, this process is governed by the laws of chemistry and thermodynamics, and the output obeys statistical laws.

Differences like the ones mentioned above point to the fact that a new approach should be employed when analyzing bioinformation and biocomputation. This issue is addressed by the research projects below:

DNA motivated language/coding theoretic problems

The long term goal of this line of research is to understand and utilize the mathematical and computational properties of DNA-encoded information for data storage or computational purposes. The results of this research could play a significant role in understanding of genomic information, as well as in the design of an error-free DNA computer which could become -- for certain applications-- superior to current electronic computers by providing huge information density, massive parallelism and high energy efficiency.

Mathematical structure of bioinformation

This project has as its goal the understanding of natural information processing by investigating the mathematics behind genomic information, by modeling biological information processing, and by studying natural occurring self-assembly phenomena as mathematical processes.

The proposed research into the mathematical theory of bioinformation has several aims: to contribute to a deeper understanding of genomic information, to model and study computation as a biological process, and to understand self-assembly as computation. Its results could have implications for many fields, such as genomics, theory of computation, material science, DNA computing and nano-technology.

DNA nanocomputing by self-assembly

The objective of this approach to the understanding natural information processing is to probe the power and limitations of nanocomputation by self-assembly.

It is becoming clear that computation plays an important role in understanding self-organization of matter, and several studies have in fact demonstrated that self-assembly can be used to compute. The practical importance of self-assembly comes from the rise of the field of nanotechnology, wherein the individual nano-components are too small to be manipulated by existing tools, and novel methods such as self-assembly have to be employed for modelling, design and engineering of extremely complex structures at this scale

In its simplest form, a self-assembly system consists of a set of so-called tiles that can bind to each other by prescribed binding rules to form complex structures. Self-assembly can thus model processes as different as inorganic crystal formation (each tile represents a molecule and the binding rules represent the chemical bonds that can form), and DNA computing with interacting DNA ``tiles''. A DNA tile is a complex two-dimensional DNA structure, for example rectangle-shaped, that has at its corners single strands capable of ``sticking'' to complementary ones. Depending on the corner sequences, the tiles can attach only to certain other tiles, contributing to forming a DNA lattice that can simulate a computation.

The mathematical study of self-assembly poses many interesting and challenging questions, some related to classical tiling, dynamical systems and cellular automata, and others new to the self-assembly context.

We intend to continue research in the area of self-assembly by investigating the effects of topology on the complexity of the obtained structures. We propose to analyze the effect on nanocomputations of working with arbitrary neighbourhoods, including ``Swiss cheese'' neighbourhoods of a tile where adjacent tiles are not neighbours, while far-away tiles may be neighbours. We also intend to initiate a study of a complexity hierarchy of nanocomputations by self-assembly based on their intrinsic characteristics such as neighbourhood topology, complexity and number of types of ``bonds'' between individual tiles.

It is anticipated that this research will contribute to a deeper insight into the theoretical properties of nanocomputation by self-assembly, as well as contribute to understanding the power and limitations of experimental nanocomputing techniques.

Molecular Distance Maps: A quantitative method of visualizing genomes' relatedness in a Cartesian coordinate system

Comparing species based on portions of their genome has been identified as one of the most important challenges in the post-genomic era. To find how distinct different species are, many methods and algorithms have been proposed in literature. We compared different species based on their genomic signature generated using the Chaos Game Representation (CGR) method. Chaos Game Representation is a method of visualizing a DNA sequence as a 2D image. 2D images corresponding to different DNA sequences can then be compared using various distance measures, resulting in quantitative comparisons of the originating DNA sequences.

We improved the sensitivity of the comparison methods, by introducing some image comparisons for CGRs. We used this method to build a phylogenetic tree for 26 model organisms, based on their mitochondrial DNA, and achieved a smaller number of anomalies than previous methods, including ClustalW.

We also introduced the novel notion of Molecular Distance Map , where each genome is represented by a "genome-point" whose (x, y)-coordinates are computed based on a matrix of relative distances between species. In the resulting plot in the Cartesian coordinate system, the distance between two genome-points corresponds to the relatedness of their corresponding genomes.

This visualization method showed a clustering of genome-points on the plane that is compatible with the known relatedness of species. The method has the potential to be used for the identification of new species based on any long-enough (> 2,000 bp) portions of their genome, by identifying the region in space that the corresponding new genome-point belongs to.