Implementing models of DNA computing: Concerns from the laboratory
ABSTRACT
By Robin Varghese
University of Western Ontrario
e-mail: rvarghes@julian.uwo.ca
Aldeman demonstrated that DNA could be used to solve
a difficult computational problem of the combinatorial type. Subsequently,
several improvements of Adleman's pioneering experiment have been proposed
and novel models of DNA computing introduced. These approaches make use
of various biological enzymes and laboratory techniques that currently
limit the advancement of this technology. I have provided a concise and
clear summary of some of the difficulties associated with the various processes
utilized by these models and offered improvements where possible.
In addition, I have proposed the use of microtechnology to improve the
rate at which DNA solutions are processed and identified. The "Gel
on a chip" method of size fractionating DNA and the production of oligonucleotides
in a fixed pattern on a silica chip are also presented as methods of detecting
and characterizing DNA solutions to computational problems. I suggest
that these new tools maybe useful to incorporated into the current models
of DNA computing. It seems clear that advancements in the field of
molecular biology will be the key to the practical implementation of DNA
computing. In addition, experiments proposed in the future must incorporate
the appropriate controls, a practice that has been overlooked. It does
not seem that DNA computing will be able to solve a broad range of computational
general problems in the near future until advances in biotechnology address
these practical problems.
INTRODUCTION
By Robin Varghese
University of Western Ontario
e-mail: rvarghes@julian.uwo.ca
Evolution has selected for organisms that utilized
nucleic acids as the storage media for their genetic information. The vast
majority of these organisms use deoxyribonucleic acid (DNA) to contain
all the necessary information to produce a functional organism. The ability
to store vast quantities of information (ie the genome) combined with the
ability of the cell to perform a massive number of parallel operations
has stimulated interest in using biological molecules as possible tools
for computation and data storage. In a paper by Smith 1996 [12] he was
able to show that "RNA editing" operations were computationally very similar
to a Turing machine suggesting as other researcher had shown previously
that biological molecules and the operation they undergo have the potential
to perform computations. The pioneering experiment by Adleman [1] was the
first example of a mathematical problem solved using biomolecules and molecular
biology techniques. Subsequently, Lipton [9] showed that DNA could be used
to solve a general problem of the class of NP-complete problems. In doing
so it was shown that DNA could be used to solve a variety of combinatorial
problems. The type of computational applications that DNA can be used to
solve has expanded to include binary addition [8], Turing machine like
operations utilizing restriction endonucleases [14] and the storage of
information in DNA using various encoding methods [2,11]. These various
approaches to utilizing DNA may allow the practical incorporation of DNA
into a variety of general algorithms and applications.
In this paper the practical difficulties associated
with techniques employed in Adleman’s experiment have been identified and
potential sources of error in his model of DNA computation identified.
I have proposed the use of a new ligase to decrease the ligation of mis-annealed
oligos and templates. Several other groups have proposed new models
of DNA computation that avoid some of the problems associated with Adleman’s
experiments. General difficulties associated with applying molecular biology
techniques to DNA computing seems to plague all of the models examined
and as such general solutions where available will be suggested.
I have also emphasized the importance of controls in the experiment carried
out in the field. It seems that the use of appropriate controls for
the various molecular biology techniques have not been implimented in the
published literature. In addition I have introduced a new tool that
should make seperation and detection of DNA solutions a faster and more
efficient process.
Fig.1a) Adleman's Directed Hamiltonian Path Problem.
Fig.1b) Method of encoding of the path between any two nodes.
Adleman’s experiment in brief, computed the shortest non-redundant
path between seven points on a graph, a directed Hamiltonian path problem,
visiting each point once and only once (Fig.1a). Each node was represented
by a series of nucleotides (nct) and the path between any two nodes encoded
as illustrated in Fig. 1b. The experiment involved annealing of
all the oligonucleotides representing both nodes and paths, ligation
of all the annealed species followed by amplification of all products
starting with node 0 and ending with node 6 by PCR, gel purification
of the DNA species of the expected size and selective removal by hybridization
(annealing) of DNA products containing each node sequence. A
successful result was identified by the presence of DNA following the final
hybridization.
SCALING UP A COMBINATORIAL SEARCH
The sources of error in combinatorial search experiment
are many even though for the scale of the problem that Adleman addressed
they were apparently manageable. The Directed Hamiltonian Path Problem
becomes unmanageable when the number of nodes increases to 20-30 nodes
[7] due to the amount of DNA required to produced all the possible paths.
In addition, each new selection step increases the associated risk of losing
the solution(s) due to DNA loss.
Scaling up of the combinatorial search would also
result in increased problem with annealing. As the number of DNA fragments
increases so does the number of fragments that must be searched by a given
oligo until its complement is found. This sequence of events becomes very
inefficient. Scaling up of the reaction results in an increased time requirement
for annealing which decreases the usefulness of this application of DNA
computing. The combinatorial search is not a particularly useful task for
DNA computing given the brute force algorithm that was used by Adleman.
Perhaps with algorithms that do not require all possible paths to be produced
this problem will be conquerable.
ANNEALING
Models that depend heavily on the correct annealing
of the various single stranded DNAs are prone to run into difficulties
associated with miss-annealed DNA sequences. Correct hybridization of oligonucleotides
(oligos) is dependent on the sequence of each oligo, the annealing temperature,
time of reaction, salt concentration in the hybridization solution and
DNA concentration. Several of these components of the annealing reaction
determine the melting temperature (Tm) of a given oligo and as such guide
the choice of annealing temperature. Generally a few degrees
below the Tm is a safe choice for the annealing temperature. The closer
the annealing temperature is to the Tm the greater the stringency of the
annealing process. The goal of selecting an appropriate annealing temperature
is to provides the oligo with enough kinetic energy to break away from
annealing events that do not represent optimum binding (ie. a perfect match)
and balance this with not providing too much kinetic energy such that a
perfect match will not be seperated. At temperatures below the optimum
annealing temperature the oligo is more likely to bind to a sequence that
is not completely complementary. One of the difficulties with annealing
a large pool of oligos at once as done in Adleman’s experiment is that
as the reaction is allowed to cool there is no way to ensure that each
oligo only binds its complimentary DNA sequence. This inherent problem
limits the usefulness of this approach to only a small number of DNA species
that can be annealed at high stringency with certainty. The problem with
incorrectly bound oligos becomes apparent in subsequent rounds of DNA amplifications
that can replicate the sequence between incorrectly bound fragment perhaps
creating paths that were not represented in the problem, hence providing
false positive answers.
LIGATIONS
The ligation of a large number of DNA fragments represents
a serious problems in that there really is no efficient method of determining
if all the possible solutions have been made and secondly if they have
been successfully ligated. Adleman size fractionated his ligation reaction
on agarose gel and observed a smear on the gel indicating that concatenation
of the various oligos had occurred. Unfortunately he had no way to determine
if a correct solution had been ligated. The quantity of DNA ligase used
in such a reaction is a matter of hit and miss since the amount of DNA
that is actually ligated can only be estimated by assuming that all the
strands introduced will bind to more than one other oligo and hence require
ligation.
In order to improve the likelihood that correct annealing
occurs temperature near the Tm are employed, however if the annealing reaction
is allowed to cool the potential exist to allow oligos that would not have
otherwise bound to bind. If ligations are undertaken at elevated
temperatures using appropriate thermal stable enzyme such as Tsc DNA ligase
it may be possible to reduce the ligation of incorrectly annealed DNA fragments.
Tsc DNA ligase is only able to ligate oligos annealed to a template DNA
strand. Unlike conventional T4 DNA ligase, Tsc DNA ligase is unable to
ligate blunt end DNA fragment and has only minimal ligation capabilities
to ligate 2 and 4 bp cohesive ends [3], these properties make it an excellent
candidate for use in experiments like Adleman's that ligate overlapping
oligos. The benefit of this type of enzyme is that it would allow ligations
to be performed at relatively high temperature which would improve the
likelihood that proper annealing of oligos would take place.
PCR AMPLIFICATION
The amplification of the DNA products was required at
several stages in Adleman's experiment to maintain a detectable level of
DNA through the successive rounds of separation. The use of the polymerase
chain reaction (PCR) to amplify the DNA species employs the aid of a thermal
stable DNA polymerase (typically Taq a DNA polymerase isolated from Thermus
aquaticus) and two oligos specific to either end of the products to
be amplified. The use of PCR could be employed in this model because the
sequences at the 5’ and 3’ termini of the solution strand of DNA were known.
In a general model of computation the sequence of the solution would not
be known. The lack of sequence information of the solution need not preclude
the use of PCR from amplifying the possible solutions. The ligation of
a known sequence (often referred to as a linker) to the ends of all the
products would allow amplification of all DNA species to be undertaken
without prior knowledge of the sequence of the solution . PCR is a good
method of increasing the likelihood that a solution is not lost.
However, the exact sequence of the solution may be altered due to the incorporation
of erroneous bases during strand replication. The relative fidelity for
Taq is shown in Table 1 in comparison to several other commercially available
enzymes.
| Enzyme |
Fidelity relative
to Taq |
Activity |
Yield |
Stability |
| Taq |
1 |
5’-3’ polymerase |
1 |
T1/2 @ 100 C 5 min |
| Pwo* |
10* |
5’-3’ polymerase/3’-5’ exonuclease |
Lower |
T1/2 @ 100 C 2 hr |
| Tth* |
1 |
5’-3’ polymerase / reverse transcriptase |
? |
Untested |
| Expand* |
3 |
5’-3’ polymerase / 3’-5’ exonuclease |
2 fold |
Untested |
| Pfu¨ |
6.36 |
5’-3’ polymerase / 3’-5 exonuclease |
0.25 |
Remains active @ temps above 95
C |
| ElongaseÄ |
5.0 |
Mix Two enzymes
5’-3’ polymerase / 3’-5- exonuclease |
>1 |
Untested |
* Statistics made available by Boehringer-Mannheim Canada
¨ Statistics made available
by Stratagene
Ä Statistics provided
by Life Technologies
Table 1. The above table emphasizes that new biological
reagents are constantly being produced and may aid in making DNA computing
a feasible and practice method of computing. If
we examine the properties of Pwo compared to Taq it is apparent that Pwo
posses 10 times the fidelity of Taq that mean that Pwo is 10 times more
likely to remain bound to its template and produce a full length product.
The activity of Pwo also includes a 3'-5'exonuclease activity that allows
it to correct errors that may occur during template replication.
The yield of a reaction carried out in parallel between Pwo and Taq would
yield less product from the Pwo reaction compared to a Taq reaction.
Pwo is also more thermal stable than Taq such that after 2hours at 100
C the enzyme retains 50% activity unlike Taq that become inactivated after
only 5 minutes. The thermal stability becomes important when we consider
that each round of PCR requires a 94 C cycle to denature the previously
annealed templates and primers.
In models of DNA computing that employ volume decreasing
algorithms, amplification of DNA has been suggested as a means around the
potential loss of solutions from manipulation or incorrect hybridization.
A general model of DNA computing cannot rely on the size of a product or
simply its presence to be proof of a valid solution, rather a DNA solution
may be sequenced, at least in part to confirm a DNA solution. The use of
DNA polymerases with proofreading ability (3’5’-exonuclease activity) allows
corrections in the sequence to be made as the DNA is being amplified.
One draw back of such enzymes in the past has been their diminished yields
in a given span of time (Fig.2a). Pfu only begins to amply a detectable
amount of DNA after 1minute of extension while for the same size product
Taq produces a detectable product after 15 seconds of extension.
Fig.
2 Comparison of the efficiency of a thermal stable DNA polymerase with
exonuclease active with Taq. a)Amplification with Taq requires less extension
time as depicted in this comparison of the amplification ability of both
Taq and Pfu to amplify a 1.9 kilobase DNA fragment given various extension
times. b) The accuracy of Taq is 6 fold less that Pfu emphasizing the benifit
of using proof-reading enzymes for PCR amplification
The cost of producing a more accurate copy comes at the price of the
number of copies. This has recently been overcome with the introduction
of mixes of DNA polymerases. Several biotechnology companies have marketed
a variety of PCR enzyme mixes to provide maximum accuracy while maintaining
product yield. The mechanism by which combinations of enzymes are useful
to improve fidelity is as depicted in Fig. 3.
Fig.3 Mechanism of error correction during DNA replication
using a mix of DNA polymerases. Taq may fall off a template when
it incorporates an erroneous base. The proof-reading enzyme can subsequently
bind to the vacent site and correct the mis-match and continue replication.
The problem is not completely overcome as Taq does not always drop off
when a mismatch is made and it may continue leaving a mismatched basepair.
This mismatch will then be replicated in subsequent amplification cycles.
Therefore amplification of DNA to maintain DNA volume in decreasing algorithms
is perhaps not as effective a solution as originally thought. The suppliers
of Taq polymerase, an enzyme commonly used in PCR to amplify DNA, all describe
the activity of their enzymes using a standard unit definition. Test of
comparable units of enzymes under controlled conditions have shown that
the activity can vary substantially (Fig.4).
Fig. 4 Variability of Taq activity from a number of
commercial suppliers. 8 different producers of Taq have been assayed
in parallel using a variety of parameters, the significance of such a study
is that even using the same enzyme variability in the effectiveness of
Taq to amplify DNA exists.
Note that from Table 1 Pfu DNA polymerase seems to be the most efficient
enzyme with the greatest fidelity. This enzyme would be the first choice
of all researchers if the cost were not significantly greater than for
regular Taq. The choice to use Taq is in part a financial one and in part
based on the application. For the type of analysis that Adleman undertook,
the accuracy of the amplification was not essential to the end result.
However for models of DNA computing that actually produce a novel answer,
the sequence of the product will be very important. Depending on the method
of encoding an error in a single base pair can result in a correct solution
appearing to be false.
One method of reducing amplification errors is the use of a"hot start"
to avoid premature elongation of mis-annealed primer template products.
A hot start involves the addition of the thermal stable DNA polymerase
during the first denaturation step of the PCR reaction, this will ensure
that any primers that may have mis-annealed during the preparation of the
reaction will not be amplified prior to the first denaturation step.
An alternative to hot starts is to temporarily inactivate the enzyme until
the first denaturation has occurred and the primers have annealed properly
to its template [Platinum Taq product information sheet Life Technologies].
Several biotech company now market antibody inactivated enzymes that become
active only after the first cycle of denaturation, reducing the possibility
of spurious products due to mis-annealing.
DNA SIZE FRACTIONATION
The use of gel purification as
employed by Adleman is a very inefficient method of DNA purification. The
recover rate of typical gel purification methods is approximately 60-95%
[Quiagen, QuiaexII]. This type of loss is clearly unacceptable if multiple
purification steps are used in a single calculation. It is therefore advisable
to avoid the use of such techniques whenever possible. 1n 1994, a physicist
Robert Austin [5] introduced a technological advance that may make size
fractionation of DNA fragment a practical operation. His invention was
the "Gel on a chip" concept that uses a thin layer of gel on the surface
of a computer chip to separate DNA utilizing very small channels cut into
the surface of this chip. This technology seems to provide a rapid and
consistent method of separating DNA that may make this technology useful
to separate DNA with sufficient size discrimination to allow the sequence
of the DNA to be determined on such a gel. This type of rapid sequencing
procedure on a consistent media would allow automated determination of
the solution to a DNA computed problem to become a rapid operation.
Recently, this technology has been employed to capillary electrophoresis
systems that have been shown to reproducibly seperate PCR products and
other DNA fragments for analysis [15].
DNA LOSS DUE TO ADHESION
DNA loss during the separation process is an important
point of concern. The loss due to solution retention on equipment is a
concern in models of DNA computing that involve diminishing volumes of
DNA. The concern is that possible solutions may be lost during each manipulation.
To reduce the loss of DNA on lab equipment, the use of Teflon coated container
or siliconized glass and plasticware is suggested, as it will provide hydrophobic
surfaces that repel liquids. In all the suggested operations DNA
is always in an aqueous solution hence the use of equipment that resists
the adhesion of water will minimize DNA loss in this manner. An alternative
that has been proposed in the literature has been to scale up the aqueous
volumes that are used; by this method any losses of DNA are essentially
diluted, reducing the impact on the DNA population.
RESTRICTION ENDONUCLEASES
The use of restriction enzyme to
perform a variety of computational operations analogous to a Turing machine
are also riddled with difficulties. Rothemund [10] suggests using asymmetric
restriction endonucleases to manipula te DNA in a manner analogous to operations
performed by a Turing machine. The use of a large number of restriction
enzymes in conjunction with ligases to repair the DNA following digestion
requires careful control of buffers and temperatures. Restriction
endonucleases have been reported to have decreased efficiency and are prone
to spurious cutting when the salt concentration and temperature of the
reaction are sub-optimal [4,6].
Fig. 5 3-D structure of EcoR I complexed with a DNA strand.
In Fig. 5 the structure of EcoR I is presented and
it is clear that the enzyme exhibits a close interaction with the DNA substrate.
The nature of this interaction is sensitive to the conditions around it.
Restriction endonucleases have been reported to recognize sequences other
than the expected recognition site when the reaction conditions are not
optimal. The efficiency (amount of DNA digested/unit of enzyme) of restriction
endonucleases also varies resulting in a requirement for a mechanism to
ensure that digestion is complete. Restriction endonucleases and DNA ligase
like all biological enzymes are subject to loss of activity over time and
must be assayed regularily to ensure an acceptable level of activity.
The use of such component in a DNA computer would require the constant
testing and monitoring of enzymes activity. Inconsistencies from
supplier to supplier for biological reagents as reported for Taq that must
also be considered when employing these reagents.
SEQUENCING AS A REQUIREMENT OF DNA COMPUTING
Sequencing of the entire DNA molecules may be unnecessary
according to Baum [2] who suggests that given the proper design of substrings
only the initial sequence of a substring may need to be characterized before
the identity of the solution is known. This requires that each substring
be well characterized and that all solutions be present at the beginning
of the operations. This type of solution is based on the model in which
the starting tube contains all possible solutions to a given problem. The
function of the DNA operations in these models is to selectively isolate
the answer that satisfies the problem. This model is only useful for small-scale
problems. The volumetric problems associated with DNA computing become
quite obvious when the pool of all possible solution is prepared for a
large problem say the travelling salesman problem (similiar to the Directed
Hamiltonian Path Problem) with 20-30 nodes [7]. The estimated amount of
DNA required would be extremely large. These types of problems are clearly
not useful for creating a general method of computation. Models of DNA
computing for use in general computations will not depend on selecting
the answer to a problem from a pool of possible solutions but rather on
generating perhaps a few possibilities towards the solution to a problem.
In order to be able to definitely identify sequences
and determine the solution to a problem encoded in DNA one must sequence
the answer or at the very least select for answer that contain the correct
sequence using methods that involve fewer error introducing steps. A new
technology called the "DNA chip" may contain an advance in technology that
will reduce the number of step required to ascertain if the solution to
a problem is valid [13].
The DNA chip employs a technology that allows for
the mass production of oligonucleotides on the order of thousands in the
span of a few hours. The ability to produce a large quantity of oligonucleotides
at a fast rate may not at first seem like a remarkable advance for DNA
computing. This becomes useful when one considers that most of the models
of DNA computation involve some form of hybridization that employ an oligo
to "fish" out solutions that match a given criteria. The extraction of
a DNA solution from a pool of possible solutions may utilize an oligonucleotide
fixed to a surface to allow the solution to also co-segregate with the
fixed oligo. The use of DNA affixed to solid surface is not new however
the ability to build thousands of specific oligonucleotides simultaneously
in a fixed array on a surface will allow the use of such technology to
potentially speed up the rate at which problems can be assembled. It may
also be possible to use these DNA chips to identify the sequence content
of DNA fragments. This can be accomplished by hybridization of the DNA
fragment with the DNA chip and an analysis of the hybridization pattern
to determine the sequence of a bound fragment. The use of radioisotope
has typically been used to visualize DNA sequencing reactions, more recently
the introduction of fluorescently tagged nucleotides have enabled automated
reading of DNA sequencing reactions. The length of target DNA that
could be analyzed is the square root of the number of oligonucleotides
on the chip. Currently a DNA chip can contain as many as 65536 oligonucleotides
that occupy 25X25-cm square which in theory is capable of analyzing the
sequence of a 200 bp nucleotide. The large size of such a chip is
envisioned to decrease rapidly to at least 300 nucleotide sequences per
inch as current technology allows point discrimination at this level of
miniaturization.
CONTROLS
The need to check that each step of an experiment
worked as expected is a very labor-intensive, though necessary step when
undertaking typical molecular biology operations. Even among suppliers
of the various enzymes, differences in the activity of enzymes exists (Fig.3)
suggesting that the use of Taq in a DNA computer would require titration
of the correct amount of enzyme for each batch of enzyme used. The importance
of postive and negative controls when using molecular biological techniques
is essential to evaluate the results obtained. It seems that the use of
controls is one aspect that the reported experiments in biocomputing have
failed to incorporate. It is difficult to say without controls if a reaction
did not produce a product due to a failed step in the procedure or if there
was no solution to be found. This seems to be one question that has not
been well addressed and must be for this field to progress.
CONCLUSION
I've identified several complications that are associated
with the current models of DNA computing. In his pioneering experiment
Adleman solved the Direct Hamiltonian Path Problem which represented the
solving of a combinatorial search using DNA molecules. The problems
associated with such an application have been presented and the various
steps involved in this process have been analyzed in a clear and consice
manner. I have present suggestions that minimize the difficulties
associated with the various molecular biology techniques that are utilized
by researchers in this field. The important parameters associated
with annealing, ligation, PCR amplification and DNA size fractionation
are addressed and alternative to traditional tools and methods proposed.
Additionally I have proposed the use of new DNA seperation technology based
on capillary electrophoresis for seperation and sequencing of DNA products
at an improved rate compared to traditional methods of fragment seperation
and DNA sequencing. Finally, the incorporation of molecular
biology techniques in DNA computing without the inclusion of appropriate
controls must be corrected. DNA computing may be destined to await
advances in biotechnology and the material sciences that will provide new
and more efficient methods of DNA oligo synthesis (an issue not discussed
here), separation of DNA solutions from a pool containing many species
of DNA strands and a method of DNA sequencing that will be rapid and accurate.
Until significant advances in these and other areas including the automation
of the entire process have been made, the concept of a DNA computer may
remain strictly a theoretical pursuit.
References
-
Adleman ML 1994 Molecular Computation of Solutions
to Combinatorial Problems. Science V266 1021-1023
-
Baum EB, 1996 A DNA Associative Memory Potentially
Larger than the Brain.DIMACS Series in Discrete Mathematics and Theoretical
Computer Science V27 p23-28
-
Biochemica N4 1997 - on line journal from Boehringer-Manheim
Website
-
Blanck A, Glück B, Wartbickler R, Bender S, Pöll
M, Brandl A 1997 Biochemica. Activity of Restriction Enzymes in a PCR
Mix. 3 p25
-
Edgington SM 1994 Bio/Technology. Biotech’s New Nanotools.
12(5) 468-7111
-
Frequently Asked Questions section of Life Technologies Tech-Online
service. P1-3
-
Gibbons A, Amos M, Hodgson D, 1997 DNA computing.
Current Opinion in Biotechnology. 8(1) 103-6
-
Guarnieri F, Fliss M, Bancroft C, 1996 Making DNA
Add. Science V273 p220-3
-
Lipton R, 1995 DNA Solution of Hard Computational
Problems. Science V268 p542-545
-
Rothemund P 1996 A DNA and restriction enzyme implementation
of Turing machines. DIMACS series v27 75-119
-
Roweis S, Winfree E, Burgoyne R, Chelyapov NV, Goodman
MF, Rothemund PWK, Adleman LM, 1996. A Sticker Based Model for DNA
Computation. DIMACS workshop on DNA based computers p1-27
-
Smith WD 1996 DNA computers in vitro and vivo.
DIMACS Series in Discrete Mathematics and Theoretical Computer Science.
V27 p121-185
-
Southern EM, 1996 Trends in Genetics. DNA chips: analysing
sequence by hybridization to oligonucleotides on a large scale. 12(3) 110-115
-
Wilhelm P, Rothemund K, 1996 A DNA and restriction
enzyme implementation of Turing Machines. DIMACS Series in Discrete Mathematics
and Theoretical Computer Science. V27 p75-119
-
Woolley AT, Lao K, Glazer AN, Mathies RA 1998 Anal
Chem Capillary electrophoresis chips with integrated electrochemical
detection.Feb 15;70(4):684-688
Fig. 1 - Taken from Reference # 2
Fig. 2-3 Taken from the Web page of Boehringer Mannheim
Canada
Fig 4 (Product Review Biochemica N1 1996) pg
Fig.5 Taken from Biochemistry 3rd Ed. By Stryer.