Finding a gene by yourself.

 

 

  1. Get the sequence at http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=3253144  or copy from the end of this page.
  2. Go to http://genes.mit.edu/GENSCAN.html and use the server there to predict the gene.
  3. Find a sequence alignment program from the internet, and compare the predicted protein with the real one at the end of this page.
  4. Did GeneScan do a good job?
  5. Can you observe all the eukaryote gene promoter and splicing sites characteristics? 
  6. Check the link provided in 1 for the correct gene information.  What mistakes GeneScan did? Why did GeneScan make mistakes? 

 

DNA sequence to be used for gene prediction.

 

        1 ttcttcggag attcttttac tgtcacacaa ccagcccaac tgaaaatgtc tctctctgat
       61 gcgtaagtct atttcccctt ttaaatatat ttttacttcc agactgactt ttaagtcatc
      121 agtaatgttt caccttagaa aataggcctt ggatggctgc ctggtcgtgc ggtttgagcg
      181 ctggactgtc gtttggattt atcgatggtg ccgggttcga acctgcccgc tcccatcccc
      241 cgtcgtcctg cgggagattt agactaggaa gtatattatt tcctactctg aagggacatc
      301 cgaaacatgt aaaacaaaca cattttgtta aaacactaca atagaaagat atgctttcan
      361 agggacggca ttccgcattt taaaatcttt aaagacacat cgagtgttaa cgttcttggc
      421 tgaaaatacc ttttataaaa tatatctacc gagttcgctg gggaagtttc gaggtattta
      481 aagcagtgtt tcccacactt tttccgtaac gggacacttc gcaatttccg agtattaagt
      541 ggaacactat ttttagagag attatttcac atggtagcct acttattaat tattccgtag
      601 ttcgttgaac acctattcag gcctcgcaga atactagggt tcatcggaaa acagtttggg
      661 aaacactgat ctaaagtata cgatcttttg ccaaatagag atacgttttt tctaagcgta
      721 atattgattt cgcttcatca tacagacana actaagttat tatctcatag tgacattaga
      781 gaatagaatg ggtggcctga atcagccaag aaagccaacg atttagcaga gttcaggtca
      841 ttgattaaca tgcatggcta gattgacaca tgaaatgcgt aagacgtaat tatgttcttt
      901 tttgaagtaa cgtctgtaat ccataagata agataagata agataagata agataagata
      961 agataaaaac gtaattatgg aataagaaaa cttcaagcgt catttcagta ttcgttttcg
     1021 ttgctaatga cgcatatgga acaaaaataa tatttaaaat tactttcatt gatattgacg
     1081 ataataatca aaaatagtaa tctactatag ttactataac cactctatat taagcaaaaa
     1141 tatcaatcaa ttatcaagtc attcatcttc tgtaacagtg acaagaaagc tcttgatgcc
     1201 tcatggaaga aactgacagc cggagctgac ggcaaaaaga acgctggaat caacttggtt
     1261 ctgtggtgag tacgtccctt ggttcaaaac tcgcgggcgg gggatatcgt tatgtaaatg
     1321 tatgaagaac ttgtcactag tgaagggtgt gagcgctcgg gggccgggtg ggatagaaac
     1381 aggagtcagt tacattgttc ttgttcaaac atccagttag accgatgaac atccaaccgc
     1441 ctaaatagcc aacataatga aagaataaga tgtgaatata ttagaaaata tcttgaaatt
     1501 tgaaacaagt aagaaaaaga agttattttt tttaaagaca aatcctcctt tagaaagctg
     1561 atcactcttt taagagagat tacctagctc ttgataacta attaatggta gatttaagaa
     1621 aatggcaaac ttttttttta tcccgccccg attgcctccc cctcccactc ttctatttcg
     1681 aaaacaaaaa aaaatcctgg ttaaaaccaa tgtataagat aaatctactt agagtagata
     1741 caattctaag gataggtcta ttagttcaga ccaatctaga cctcagggcg gtgagaaata
     1801 tctattcact agtttaactc tttgatgaat gcgggaatgc tcaatggcgt atgtgtgttc
     1861 aagattgaga ttgtctagac ttctagccca aaccaggttc caggtttgct gatgaagaat
     1921 ttgcaagctg ataagaggaa ttgtttgttt aaaaaaaatt gcaaaaagaa aagatacata
     1981 tatttcaaaa aatgtcatta cattaaaaaa gggaaacaag tattatttat tttagtgtac
     2041 tatatacatt gcataatccc cctcccgtgg acttatagtg tatttttgta tgtaaaaagg
     2101 ctttttgtcg aaacaaaggg cttaaatatt ctgaaaatct tttcactatt atcactttgt
     2161 cagatatata cacttagcga agtcatatta gttatgtatt tactttttta tatcaacaaa
     2221 gaaatctaca tttgagtact aactatgaca atgttctata atatcttcaa aaggatgttc
     2281 gccaatgttc ccaacatgcg cgctcagttc tccaaattca acgccaacca gtctgacgac
     2341 gccctgaagg gagacgctga attcatcaag caggtcaacg tcatcgttgc cgcccttgac
     2401 ggtctcttgc aatccgtcaa caacccaggc cagctccagg ccaacttgga caagctcgcc
     2461 aagtctcacg tcaacctgaa aatcggactc gagttcttcg gagtaagtat acatgaagac
     2521 ctgagtctga ccactaagtt tggttttaac agagcagaag tgacagactt tttaaaatta
     2581 gacatttagc tttaggatca tttacgggtg tgtggtggct gactggtaaa agcgatcagc
     2641 ttcctgaact gagtgatcga gggtttgaat cacgttgaag attgggattt ttcattttga
     2701 gattttttaa agcgtacctg agtccaccca actttaatgg gtacttgaca tgagttgggg
     2761 gaaagtaaaa tcggttagtc gttgtgctgg ctacatggca ccttgtgggt catatatgac
     2821 ctctacatca tctgctccat agatcgctaa gtctgaaaag ggaacttaaa ttttatttta
     2881 ctttcttcaa ttaacgggaa aaccaaaaaa ggtcataccg cgttttattg gacccacaaa
     2941 aactttcaaa gtcactccgg ctgctggctc acaactgctt tactttttct aagacttctt
     3001 tctttaaagg ccccttccaa cggagggaac tctctctctc tctctctctc tctccctaaa
     3061 acgtgccgtt attgttgttt cagcctctgc aacagaacat tcacagcttc attgaaagtg
     3121 ctcttggagt cggtgccgga agtgacgaac ccaaagcctg gggaaacttg atcgctgcct
     3181 tcaacgagac cctcaagaag gcatagacat gacctatgta ataattgaac tctttaagca
     3241 ggaaatacca taaaagtcat tattcgccca cggatcctgt cccaactcca aaatgatatt
     3301 ttgcaatacg gtgaaccaag aattagacca aaaaaaatgt tgattttaat cttctgttga
     3361 ttactgcttt caatactttc atatagttct accaatcaaa atttggaaaa caaaacaaaa
     3421 aaaaaattat attaaattta tattttaaat aaatatatac atttactaaa acaaaaaatc
     3481 catgattgtt tttagcctaa gatttttgtg aaaatgtaat gtgccttcat tcaatgttga
     3541 ctactgtatg ctatttgata agtgtattat tgttgtattt attggaattc atgcaatcaa
     3601 tgcaaatatt tgcttccaat tcagaagatc tgcgaaaaca ttaaagagaa atgatataac
     3661 attctatttg gttacatatt tagaagcatt cagatgtttt atttaaaatg tgatatctga
     3721 ctaattaaca ttctaataaa atgccaacaa aaatacattt aaaacacgga caatgtataa
     3781 ctcttttttt ttaagttcta cattatcttc ctctttatct tgagcggcgc aaacaaaaca
     3841 atgaatagca acaatttcaa gtatatgatt ggcttttttt taaaatgtac gccatcttta
     3901 cgatgtctgc ttccattctt tatttagctg cgagagataa ggggagacta atggtagaaa
     3961 tagaaatcga tgtaattgtc tctctttgtt aagcacacta tgtgtttttg taaacatact
     4021 tggtttctta gaccgtagct tttgttactt agcaagaaaa tgtttcctag atttgtcctt
     4081 ttttgtgtca ttactatatg tcattgatcc atgtcatttt ctgtctctta taggctttat
     4141 aactaaaaaa ttctaataaa

 

The real protein sequence encoded by this gene.

 

        MSLSDADKKALDASWKKLTAGADGKKNAGINLVLWMFANVPNMRAQFSKFNANQSDDALKGDAEFIKQVNVIVAALDGLLQSVNNPGQLQANLDKLAKSHVNLKIGLEFFGPLQQNIHSFIESALGVGAGSDEPKAWGNLIAAFNETLKKA