A DNA single-strand consists of four different types of units called bases strung together by an oriented backbone like beads on a wire. The bases are Adenine (A), Guanine (G), Cytosine (C) and Thymine (T), and A can chemically bind to an opposing T on another single strand, while C can similarly bind to G. Bases that can thus bind are called Watson/Crick (W/C) complementary , and two DNA single strands with opposite orientation and with W/C complementary bases at each position can bind to each other to form a DNA double strand in a process called base-pairing. To encode information using DNA, one can choose an encoding scheme mapping the original alphabet onto strings over {A, C, G, T}, and proceed to synthesize the obtained information-encoding strings as DNA single strands. A computation will consists of a succession of bio-operations , such as cutting and pasting DNA strands, separating DNA sequences by length, extracting DNA sequences containing a given pattern or making copies of DNA strands. The DNA strands representing the output of the computation can then be read out and decoded.
Herein lies a wealth of problems to be explored, stemming from the fact that encoding information in DNA differs from encoding it electronically, and bio-operations differ from electronic computer operations. A fundamental difference arises from the fact that in electronic computing data interaction is fully controlled while, in a test-tube DNA-computer, free-floating data-encoding DNA single strands can bind because of W/C complementarity. Another difference is that, in DNA computing, a bio-operation usually consumes both operands. This implies that, if one of the operands is either involved in an illegal binding or has been consumed by a previous bio-operation, it is unavailable for the desired computation. Yet another difference is that, while in electronic computing a bit is a single individual element, in DNA experiments each submicroscopic DNA molecule is usually present in millions of identical copies. The bio-operations operate in a massively parallel fashion on all identical strands. However, this process is governed by the laws of chemistry and thermodynamics, and the output obeys statistical laws.
Differences like the ones mentioned above point to the fact that a new approach should be employed when analyzing bioinformation and biocomputation. This issue is addressed by the research projects below:
The long term goal of this line of research is to understand and utilize the mathematical and computational properties of DNA-encoded information for data storage or computational purposes. The results of this research could play a significant role in understanding of genomic information, as well as in the design of an error-free DNA computer which could become -- for certain applications-- superior to current electronic computers by providing huge information density, massive parallelism and high energy efficiency.
To circumvent the difficulties raised by the error rates of bio-operations in DNA computing, we proposed to explore in vivo DNA computing. The main idea of our approach is that one could study the computational capabilities of unicellular organisms with the aim of understanding and ultimately harnessing their computational power. We propose to complement existing approaches by investigating from a computational perspective the biological process of gene rearrangement in hypotrichous ciliates.
The impact of this research will be an influx of new ideas from computer science to the study of complex biological systems involving programmed genome rearrangement, with the ultimate goal of harnessing the process for the purpose of performing in vivo computation.
The third project has as its goal the understanding of natural information processing by investigating the mathematics behind genomic information, by modeling biological information processing, and by studying natural occurring self-assembly phenomena as mathematical processes.
The proposed research into the mathematical theory of bioinformation has several aims: to contribute to a deeper understanding of genomic information, to model and study computation as a biological process, and to understand self-assembly as computation. Its results could have implications for many fields, such as genomics, theory of computation, material science, DNA computing and nano-technology.