Implementing models of DNA computing: Concerns from the laboratory

ABSTRACT
By Robin Varghese
University of Western Ontrario
e-mail: rvarghes@julian.uwo.ca

    Aldeman demonstrated that DNA could be used to solve a difficult computational problem of the combinatorial type. Subsequently, several improvements of Adleman's pioneering experiment have been proposed and novel models of DNA computing introduced. These approaches make use of various biological enzymes and laboratory techniques that currently limit the advancement of this technology. I have provided a concise and clear summary of some of the difficulties associated with the various processes utilized by these models and offered improvements where possible.  In addition, I have proposed the use of microtechnology to improve the rate at which DNA solutions are processed and identified.  The "Gel on a chip" method of size fractionating DNA and the production of oligonucleotides in a fixed pattern on a silica chip are also presented as methods of detecting and characterizing DNA solutions to computational problems.  I suggest that these new tools maybe useful to incorporated into the current models of DNA computing.  It seems clear that advancements in the field of molecular biology will be the key to the practical implementation of DNA computing. In addition, experiments proposed in the future must incorporate the appropriate controls, a practice that has been overlooked. It does not seem that DNA computing will be able to solve a broad range of computational general problems in the near future until advances in biotechnology address these practical problems.



INTRODUCTION
By Robin Varghese
University of Western Ontario
e-mail: rvarghes@julian.uwo.ca

    Evolution has selected for organisms that utilized nucleic acids as the storage media for their genetic information. The vast majority of these organisms use deoxyribonucleic acid (DNA) to contain all the necessary information to produce a functional organism. The ability to store vast quantities of information (ie the genome) combined with the ability of the cell to perform a massive number of parallel operations has stimulated interest in using biological molecules as possible tools for computation and data storage. In a paper by Smith 1996 [12] he was able to show that "RNA editing" operations were computationally very similar to a Turing machine suggesting as other researcher had shown previously that biological molecules and the operation they undergo have the potential to perform computations. The pioneering experiment by Adleman [1] was the first example of a mathematical problem solved using biomolecules and molecular biology techniques. Subsequently, Lipton [9] showed that DNA could be used to solve a general problem of the class of NP-complete problems. In doing so it was shown that DNA could be used to solve a variety of combinatorial problems. The type of computational applications that DNA can be used to solve has expanded to include binary addition [8], Turing machine like operations utilizing restriction endonucleases [14] and the storage of information in DNA using various encoding methods [2,11]. These various approaches to utilizing DNA may allow the practical incorporation of DNA into a variety of general algorithms and applications.
    In this paper the practical difficulties associated with techniques employed in Adleman’s experiment have been identified and potential sources of error in his model of DNA computation identified.  I have proposed the use of a new ligase to decrease the ligation of mis-annealed oligos and templates.  Several other groups have proposed new models of DNA computation that avoid some of the problems associated with Adleman’s experiments. General difficulties associated with applying molecular biology techniques to DNA computing seems to plague all of the models examined and as such general solutions where available will be suggested.  I have also emphasized the importance of controls in the experiment carried out in the field.  It seems that the use of appropriate controls for the various molecular biology techniques have not been implimented in the published literature.  In addition I have introduced a new tool that should make seperation and detection of DNA solutions a faster and more efficient process.


Fig.1a) Adleman's Directed Hamiltonian Path Problem.





Fig.1b) Method of encoding of the path between any two nodes.

 
 Adleman’s experiment in brief, computed the shortest non-redundant path between seven points on a graph, a directed Hamiltonian path problem, visiting each point once and only once (Fig.1a). Each node was represented by a series of nucleotides (nct) and the path between any two nodes encoded as illustrated in Fig. 1b.  The experiment involved annealing of all the oligonucleotides representing both nodes and paths, ligation of all the annealed species followed by amplification of all products starting with node 0 and ending with node 6 by PCR, gel purification of the DNA species of the expected size and selective removal by hybridization (annealing) of DNA products containing each node sequence. A successful result was identified by the presence of DNA following the final hybridization.
 

SCALING UP A COMBINATORIAL SEARCH

    The sources of error in combinatorial search experiment are many even though for the scale of the problem that Adleman addressed they were apparently manageable. The Directed Hamiltonian Path Problem becomes unmanageable when the number of nodes increases to 20-30 nodes [7] due to the amount of DNA required to produced all the possible paths. In addition, each new selection step increases the associated risk of losing the solution(s) due to DNA loss.

    Scaling up of the combinatorial search would also result in increased problem with annealing. As the number of DNA fragments increases so does the number of fragments that must be searched by a given oligo until its complement is found. This sequence of events becomes very inefficient. Scaling up of the reaction results in an increased time requirement for annealing which decreases the usefulness of this application of DNA computing. The combinatorial search is not a particularly useful task for DNA computing given the brute force algorithm that was used by Adleman. Perhaps with algorithms that do not require all possible paths to be produced this problem will be conquerable.

ANNEALING

    Models that depend heavily on the correct annealing of the various single stranded DNAs are prone to run into difficulties associated with miss-annealed DNA sequences. Correct hybridization of oligonucleotides (oligos) is dependent on the sequence of each oligo, the annealing temperature, time of reaction, salt concentration in the hybridization solution and DNA concentration. Several of these components of the annealing reaction determine the melting temperature (Tm) of a given oligo and as such guide the choice of annealing temperature.   Generally a few degrees below the Tm is a safe choice for the annealing temperature. The closer the annealing temperature is to the Tm the greater the stringency of the annealing process. The goal of selecting an appropriate annealing temperature is to provides the oligo with enough kinetic energy to break away from annealing events that do not represent optimum binding (ie. a perfect match) and balance this with not providing too much kinetic energy such that a perfect match will not be seperated. At temperatures below the optimum annealing temperature the oligo is more likely to bind to a sequence that is not completely complementary. One of the difficulties with annealing a large pool of oligos at once as done in Adleman’s experiment is that as the reaction is allowed to cool there is no way to ensure that each oligo only binds its complimentary DNA sequence. This inherent problem limits the usefulness of this approach to only a small number of DNA species that can be annealed at high stringency with certainty. The problem with incorrectly bound oligos becomes apparent in subsequent rounds of DNA amplifications that can replicate the sequence between incorrectly bound fragment perhaps creating paths that were not represented in the problem, hence providing false positive answers.

LIGATIONS

    The ligation of a large number of DNA fragments represents a serious problems in that there really is no efficient method of determining if all the possible solutions have been made and secondly if they have been successfully ligated. Adleman size fractionated his ligation reaction on agarose gel and observed a smear on the gel indicating that concatenation of the various oligos had occurred. Unfortunately he had no way to determine if a correct solution had been ligated. The quantity of DNA ligase used in such a reaction is a matter of hit and miss since the amount of DNA that is actually ligated can only be estimated by assuming that all the strands introduced will bind to more than one other oligo and hence require ligation.

    In order to improve the likelihood that correct annealing occurs temperature near the Tm are employed, however if the annealing reaction is allowed to cool the potential exist to allow oligos that would not have otherwise bound to bind.  If ligations are undertaken at elevated temperatures using appropriate thermal stable enzyme such as Tsc DNA ligase it may be possible to reduce the ligation of incorrectly annealed DNA fragments.   Tsc DNA ligase is only able to ligate oligos annealed to a template DNA strand. Unlike conventional T4 DNA ligase, Tsc DNA ligase is unable to ligate blunt end DNA fragment and has only minimal ligation capabilities to ligate 2 and 4 bp cohesive ends [3], these properties make it an excellent candidate for use in experiments like Adleman's that ligate overlapping oligos. The benefit of this type of enzyme is that it would allow ligations to be performed at relatively high temperature which would improve the likelihood that proper annealing of oligos would take place.

PCR AMPLIFICATION

    The amplification of the DNA products was required at several stages in Adleman's experiment to maintain a detectable level of DNA through the successive rounds of separation. The use of the polymerase chain reaction (PCR) to amplify the DNA species employs the aid of a thermal stable DNA polymerase (typically Taq a DNA polymerase isolated from Thermus aquaticus) and two oligos specific to either end of the products to be amplified. The use of PCR could be employed in this model because the sequences at the 5’ and 3’ termini of the solution strand of DNA were known. In a general model of computation the sequence of the solution would not be known. The lack of sequence information of the solution need not preclude the use of PCR from amplifying the possible solutions. The ligation of a known sequence (often referred to as a linker) to the ends of all the products would allow amplification of all DNA species to be undertaken without prior knowledge of the sequence of the solution . PCR is a good method of increasing the likelihood that a solution is not lost.  However, the exact sequence of the solution may be altered due to the incorporation of erroneous bases during strand replication. The relative fidelity for Taq is shown in Table 1 in comparison to several other commercially available enzymes.
 
Enzyme Fidelity relative to Taq Activity Yield  Stability
Taq 1 5’-3’ polymerase 1 T1/2 @ 100 C 5 min
Pwo* 10* 5’-3’ polymerase/3’-5’ exonuclease Lower T1/2 @ 100 C 2 hr
Tth* 1 5’-3’ polymerase / reverse transcriptase ? Untested
Expand* 3 5’-3’ polymerase / 3’-5’ exonuclease 2 fold  Untested
Pfu¨  6.36 5’-3’ polymerase / 3’-5 exonuclease 0.25 Remains active @ temps above 95 C
ElongaseÄ  5.0 Mix Two enzymes  

5’-3’ polymerase / 3’-5- exonuclease

>1 Untested
* Statistics made available by Boehringer-Mannheim Canada
¨ Statistics made available by Stratagene
Ä Statistics provided by Life Technologies
Table 1. The above table emphasizes that new biological reagents are constantly being produced and may aid in making DNA computing a feasible and practice method of computing. If we examine the properties of Pwo compared to Taq it is apparent that Pwo posses 10 times the fidelity of Taq that mean that Pwo is 10 times more likely to remain bound to its template and produce a full length product.  The activity of Pwo also includes a 3'-5'exonuclease activity that allows it to correct errors that may occur during template replication.  The yield of a reaction carried out in parallel between Pwo and Taq would yield less product from the Pwo reaction compared to a Taq reaction.  Pwo is also more thermal stable than Taq such that after 2hours at 100 C the enzyme retains 50% activity unlike Taq that become inactivated after only 5 minutes.  The thermal stability becomes important when we consider that each round of PCR requires a 94 C cycle to denature the previously annealed templates and primers.

    In models of DNA computing that employ volume decreasing algorithms, amplification of DNA has been suggested as a means around the potential loss of solutions from manipulation or incorrect hybridization. A general model of DNA computing cannot rely on the size of a product or simply its presence to be proof of a valid solution, rather a DNA solution may be sequenced, at least in part to confirm a DNA solution. The use of DNA polymerases with proofreading ability (3’5’-exonuclease activity) allows corrections in the sequence to be made as the DNA is being amplified.  One draw back of such enzymes in the past has been their diminished yields in a given span of time (Fig.2a).  Pfu only begins to amply a detectable amount of DNA after 1minute of extension while for the same size product Taq produces a detectable product after 15 seconds of extension.


 
Fig. 2 Comparison of the efficiency of a thermal stable DNA polymerase with exonuclease active with Taq. a)Amplification with Taq requires less extension time as depicted in this comparison of the amplification ability of both Taq and Pfu to amplify a 1.9 kilobase DNA fragment given various extension times. b) The accuracy of Taq is 6 fold less that Pfu emphasizing the benifit of using proof-reading enzymes for PCR amplification

The cost of producing a more accurate copy comes at the price of the number of copies. This has recently been overcome with the introduction of mixes of DNA polymerases. Several biotechnology companies have marketed a variety of PCR enzyme mixes to provide maximum accuracy while maintaining product yield. The mechanism by which combinations of enzymes are useful to improve fidelity is as depicted in Fig. 3.

Fig.3 Mechanism of error correction during DNA replication using a mix of DNA polymerases.  Taq may fall off a template when it incorporates an erroneous base.  The proof-reading enzyme can subsequently bind to the vacent site and correct the mis-match and continue replication. 

The problem is not completely overcome as Taq does not always drop off when a mismatch is made and it may continue leaving a mismatched basepair. This mismatch will then be replicated in subsequent amplification cycles. Therefore amplification of DNA to maintain DNA volume in decreasing algorithms is perhaps not as effective a solution as originally thought. The suppliers of Taq polymerase, an enzyme commonly used in PCR to amplify DNA, all describe the activity of their enzymes using a standard unit definition. Test of comparable units of enzymes under controlled conditions have shown that the activity can vary substantially (Fig.4).

 
 



Fig. 4 Variability of Taq activity from a number of commercial suppliers.  8 different producers of Taq have been assayed in parallel using a variety of parameters, the significance of such a study is that even using the same enzyme variability in the effectiveness of Taq to amplify DNA exists.

Note that from Table 1 Pfu DNA polymerase seems to be the most efficient enzyme with the greatest fidelity. This enzyme would be the first choice of all researchers if the cost were not significantly greater than for regular Taq. The choice to use Taq is in part a financial one and in part based on the application. For the type of analysis that Adleman undertook, the accuracy of the amplification was not essential to the end result. However for models of DNA computing that actually produce a novel answer, the sequence of the product will be very important. Depending on the method of encoding an error in a single base pair can result in a correct solution appearing to be false.

One method of reducing amplification errors is the use of a"hot start" to avoid premature elongation of mis-annealed primer template products. A hot start involves the addition of the thermal stable DNA polymerase during the first denaturation step of the PCR reaction, this will ensure that any primers that may have mis-annealed during the preparation of the reaction will not be amplified prior to the first denaturation step.  An alternative to hot starts is to temporarily inactivate the enzyme until the first denaturation has occurred and the primers have annealed properly to its template [Platinum Taq product information sheet Life Technologies]. Several biotech company now market antibody inactivated enzymes that become active only after the first cycle of denaturation, reducing the possibility of spurious products due to mis-annealing.

DNA SIZE FRACTIONATION

    The use of gel purification as employed by Adleman is a very inefficient method of DNA purification. The recover rate of typical gel purification methods is approximately 60-95% [Quiagen, QuiaexII]. This type of loss is clearly unacceptable if multiple purification steps are used in a single calculation. It is therefore advisable to avoid the use of such techniques whenever possible. 1n 1994, a physicist Robert Austin [5] introduced a technological advance that may make size fractionation of DNA fragment a practical operation. His invention was the "Gel on a chip" concept that uses a thin layer of gel on the surface of a computer chip to separate DNA utilizing very small channels cut into the surface of this chip. This technology seems to provide a rapid and consistent method of separating DNA that may make this technology useful to separate DNA with sufficient size discrimination to allow the sequence of the DNA to be determined on such a gel. This type of rapid sequencing procedure on a consistent media would allow automated determination of the solution to a DNA computed problem to become a rapid operation.  Recently, this technology has been employed to capillary electrophoresis systems that have been shown to reproducibly seperate PCR products and other DNA fragments for analysis [15].

DNA LOSS DUE TO ADHESION

    DNA loss during the separation process is an important point of concern. The loss due to solution retention on equipment is a concern in models of DNA computing that involve diminishing volumes of DNA. The concern is that possible solutions may be lost during each manipulation. To reduce the loss of DNA on lab equipment, the use of Teflon coated container or siliconized glass and plasticware is suggested, as it will provide hydrophobic surfaces that repel liquids. In all the suggested operations  DNA is always in an aqueous solution hence the use of equipment that resists the adhesion of water will minimize DNA loss in this manner. An alternative that has been proposed in the literature has been to scale up the aqueous volumes that are used; by this method any losses of DNA are essentially diluted, reducing the impact on the DNA population.

RESTRICTION ENDONUCLEASES

    The use of restriction enzyme to perform a variety of computational operations analogous to a Turing machine are also riddled with difficulties. Rothemund [10] suggests using asymmetric restriction endonucleases to manipula te DNA in a manner analogous to operations performed by a Turing machine. The use of a large number of restriction enzymes in conjunction with ligases to repair the DNA following digestion requires careful control of buffers and temperatures.  Restriction endonucleases have been reported to have decreased efficiency and are prone to spurious cutting when the salt concentration and temperature of the reaction are sub-optimal [4,6].

Fig. 5 3-D structure of EcoR I complexed with a DNA strand.

    In Fig. 5 the structure of EcoR I is presented and it is clear that the enzyme exhibits a close interaction with the DNA substrate. The nature of this interaction is sensitive to the conditions around it. Restriction endonucleases have been reported to recognize sequences other than the expected recognition site when the reaction conditions are not optimal. The efficiency (amount of DNA digested/unit of enzyme) of restriction endonucleases also varies resulting in a requirement for a mechanism to ensure that digestion is complete. Restriction endonucleases and DNA ligase like all biological enzymes are subject to loss of activity over time and must be assayed regularily to ensure an acceptable level of activity.  The use of such component in a DNA computer would require the constant testing and monitoring of enzymes activity.  Inconsistencies from supplier to supplier for biological reagents as reported for Taq that must also be considered when employing these reagents.

SEQUENCING AS A REQUIREMENT OF DNA COMPUTING

    Sequencing of the entire DNA molecules may be unnecessary according to Baum [2] who suggests that given the proper design of substrings only the initial sequence of a substring may need to be characterized before the identity of the solution is known. This requires that each substring be well characterized and that all solutions be present at the beginning of the operations. This type of solution is based on the model in which the starting tube contains all possible solutions to a given problem. The function of the DNA operations in these models is to selectively isolate the answer that satisfies the problem. This model is only useful for small-scale problems. The volumetric problems associated with DNA computing become quite obvious when the pool of all possible solution is prepared for a large problem say the travelling salesman problem (similiar to the Directed Hamiltonian Path Problem) with 20-30 nodes [7]. The estimated amount of DNA required would be extremely large. These types of problems are clearly not useful for creating a general method of computation. Models of DNA computing for use in general computations will not depend on selecting the answer to a problem from a pool of possible solutions but rather on generating perhaps a few possibilities towards the solution to a problem.

    In order to be able to definitely identify sequences and determine the solution to a problem encoded in DNA one must sequence the answer or at the very least select for answer that contain the correct sequence using methods that involve fewer error introducing steps. A new technology called the "DNA chip" may contain an advance in technology that will reduce the number of step required to ascertain if the solution to a problem is valid [13].

    The DNA chip employs a technology that allows for the mass production of oligonucleotides on the order of thousands in the span of a few hours. The ability to produce a large quantity of oligonucleotides at a fast rate may not at first seem like a remarkable advance for DNA computing. This becomes useful when one considers that most of the models of DNA computation involve some form of hybridization that employ an oligo to "fish" out solutions that match a given criteria. The extraction of a DNA solution from a pool of possible solutions may utilize an oligonucleotide fixed to a surface to allow the solution to also co-segregate with the fixed oligo.  The use of DNA affixed to solid surface is not new however the ability to build thousands of specific oligonucleotides simultaneously in a fixed array on a surface will allow the use of such technology to potentially speed up the rate at which problems can be assembled. It may also be possible to use these DNA chips to identify the sequence content of DNA fragments. This can be accomplished by hybridization of the DNA fragment with the DNA chip and an analysis of the hybridization pattern to determine the sequence of a bound fragment. The use of radioisotope has typically been used to visualize DNA sequencing reactions, more recently the introduction of fluorescently tagged nucleotides have enabled automated reading of DNA sequencing reactions.  The length of target DNA that could be analyzed is the square root of the number of oligonucleotides on the chip. Currently a DNA chip can contain as many as 65536 oligonucleotides that occupy 25X25-cm square which in theory is capable of analyzing the sequence of a 200 bp  nucleotide. The large size of such a chip is envisioned to decrease rapidly to at least 300 nucleotide sequences per inch as current technology allows point discrimination at this level of miniaturization. 

CONTROLS

    The need to check that each step of an experiment worked as expected is a very labor-intensive, though necessary step when undertaking typical molecular biology operations. Even among suppliers of the various enzymes, differences in the activity of enzymes exists (Fig.3) suggesting that the use of Taq in a DNA computer would require titration of the correct amount of enzyme for each batch of enzyme used. The importance of postive and negative controls when using molecular biological techniques is essential to evaluate the results obtained. It seems that the use of controls is one aspect that the reported experiments in biocomputing have failed to incorporate. It is difficult to say without controls if a reaction did not produce a product due to a failed step in the procedure or if there was no solution to be found. This seems to be one question that has not been well addressed and must be for this field to progress.

CONCLUSION

    I've identified several complications that are associated with the current models of DNA computing. In his pioneering experiment Adleman solved the Direct Hamiltonian Path Problem which represented the solving of a combinatorial search using DNA molecules.  The problems associated with such an application have been presented and the various steps involved in this process have been analyzed in a clear and consice manner.  I have present suggestions that minimize the difficulties associated with the various molecular biology techniques that are utilized by researchers in this field.   The important parameters associated with annealing, ligation, PCR amplification and DNA size fractionation are addressed and alternative to traditional tools and methods proposed.  Additionally I have proposed the use of new DNA seperation technology based on capillary electrophoresis for seperation and sequencing of DNA products at an improved rate compared to traditional methods of fragment seperation and DNA sequencing.   Finally, the incorporation of molecular biology techniques in DNA computing without the inclusion of appropriate controls must be corrected.  DNA computing may be destined to await advances in biotechnology and the material sciences that will provide new and more efficient methods of DNA oligo synthesis (an issue not discussed here), separation of DNA solutions from a pool containing many species of DNA strands and a method of DNA sequencing that will be rapid and accurate. Until significant advances in these and other areas including the automation of the entire process have been made, the concept of a DNA computer may remain strictly a theoretical pursuit.


References
  1. Adleman ML 1994 Molecular Computation of Solutions to Combinatorial Problems. Science V266 1021-1023
  2. Baum EB, 1996 A DNA Associative Memory Potentially Larger than the Brain.DIMACS Series in Discrete Mathematics and Theoretical Computer Science V27 p23-28
  3. Biochemica N4 1997 - on line journal from Boehringer-Manheim Website
  4. Blanck A, Glück B, Wartbickler R, Bender S, Pöll M, Brandl A 1997 Biochemica. Activity of Restriction Enzymes in a PCR Mix. 3 p25
  5. Edgington SM 1994 Bio/Technology. Biotech’s New Nanotools. 12(5) 468-7111
  6. Frequently Asked Questions section of Life Technologies Tech-Online service. P1-3
  7. Gibbons A, Amos M, Hodgson D, 1997 DNA computing. Current Opinion in Biotechnology. 8(1) 103-6
  8. Guarnieri F, Fliss M, Bancroft C, 1996 Making DNA Add. Science V273 p220-3
  9. Lipton R, 1995 DNA Solution of Hard Computational Problems. Science V268 p542-545
  10. Rothemund P 1996 A DNA and restriction enzyme implementation of Turing machines. DIMACS series v27 75-119
  11. Roweis S, Winfree E, Burgoyne R, Chelyapov NV, Goodman MF, Rothemund PWK, Adleman LM, 1996. A Sticker Based Model for DNA Computation. DIMACS workshop on DNA based computers p1-27
  12. Smith WD 1996 DNA computers in vitro and vivo. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. V27 p121-185
  13. Southern EM, 1996 Trends in Genetics. DNA chips: analysing sequence by hybridization to oligonucleotides on a large scale. 12(3) 110-115
  14. Wilhelm P, Rothemund K, 1996 A DNA and restriction enzyme implementation of Turing Machines. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. V27 p75-119
  15. Woolley AT, Lao K, Glazer AN, Mathies RA 1998 Anal Chem  Capillary electrophoresis chips with integrated electrochemical detection.Feb 15;70(4):684-688

  16.  
     
     
     
Fig. 1 - Taken from Reference # 2
Fig. 2-3 Taken from the Web page of Boehringer Mannheim Canada
Fig 4 (Product Review Biochemica N1 1996) pg
Fig.5 Taken from Biochemistry 3rd Ed. By Stryer.