Рассмотрим следующую последовательность ДНК Лаванды:
GCTAAATTTGTTCAGCCAGATGTAGGCTTACAAATCAAGCTGTCCGCTCGGCACGGCCTACACACGTCGTGTAACTACAACAGCTAGTTAATCTGGATATCACCATGACCGAATCATAGATTTCGCCTTAAGGAGCTTTACCATGGCTTGGGATCCAATACTAAGGGCTCGACCTAGGCGAATGAGTTTCAGGTTGGCAATCAGCAACGCTCGCCATCCGGACGACGGCTTACAGTTAGTAGCATAGTACGCGATTTTCGGGAAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCCGTCAACTCATTCACACCGCATCCTTTCCTGCCACTGTAACTAGTCGACTGGGGAACCTCATCATCCATACTCTCCCACATTATGCCTCCCAACCTTGTTAAGCGTGGCATGCTTGGGATTGCATTGATGCTTCTTGGAGAGGACGCTTTCGTTTTGGAGATTACAGGGATCCAATTTTATCATCGGTTCGACTCCCGTAACGACTTAGCAGTAAGGGTGCTAGTTCCTGGTTAGAATCTTAATAAATCACGTCGCTTGGAGCAAGACAAAGATCGTCGTAATGCCAAGTGCACGACCACCTTCAGACTTGCAGGACCCGTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTCGATAGCTATGCGGTTCAATACAATCTTAACGCAATGCAGCGATGTGGTTTCGTACACTTAGCATAAAACCCCCCACATTAAATCGATGTACCCGCCCTCTTAGACGCCAATTTCAATGCCGAACCTCCGGCGGGTATCTCTGCACTAGGAGAAGTAGCACGTCGCTGTAGCGAACTCCTATCGTGAGATAATTTGTAGAGCTGCTCTTATAATACAATAGCTCAGATGGATTATTCCATGGACATCCCCGTGCGTTGTTTCGAGGATGGTAGGTGGAAATTTTGCCAGACCTCTAGTCTTAAACATGGTTGACGTTATAGGCGCTATCTCTTGCGTCTGGAAGTGTTAATCCGTGAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAACACGCAACTCTGGAGGAGGGCACTGCACTGCAAACTTGCGTAATATCCTTCACCCACACTTGCCTGGCCTCCTTGCTTAAAGCTCTGGCGATGCGATTTTTCGGCCCAGTAGCTGAATAGGTCATGAAATGGGCACCGAACTGGAAAGACCCATATATTCGATACTCACAACTTAATGATAGCGCGATTAAGAGCGACACCAAAAACCAAATTACGTTCACGAACCTTTGAGAGTCAAGGAGACTTAGACCGAATTGAATGATCACTGATGCGCCCGCTGATACTGAGCCTCACCATTAATCGCCGACCAATACGGCGTGTACCGGGCGCGGCCTTGCCGCATAACGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATATCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTACACAGCCCCGTCCTCATTGCTAAGTGCACTGGCAACTGGACCTAAAGATTTTTCGAGTATGGCCCTCGAATCAAGCGCCCACCCAGAAACCTACGAGCCAGTAACCCCAGTAAACAAGCATTAGTGCTATATGCTTGCTGCCCACTAGGACCCTTATGGTTCATACCAGGGTGACGTGTCTTGCGGGCCAAGGATGAACCAGAAGCAAGATCCTTAGATGGACGACTGTCTCATTGCTTAAACTCCACATACCAAAGGGCGCGGTAAACGATAGTTTTAGGTAATGTTAGTCGGATGGTTGTCTGCAGCTACCAATACAGCCTGGCACCCAGGGTCTGAACAATAACGCGTGAGAGCAGCTCTCCCGCGTGTGGTGGATTTGCCGTCTATGAAATTGAGGCTCTTGCAACTATTCGCACTCGGAATGCCCTCATATCTGGTGCCTAGCGGCCTTTGCCCCGTGCCGGTAGGACTAAACTCTACGGATCGTTGACGGATCTCGATGTGGAAGATGGTTATGAAAGATAACAACGCGTGTGCTAATTGATTTAGACAAGTATTGCGGCAGTAAAAGATAATCGGCTGCAGAGTTACGAAAGACTTCCATGCATGGATTCCATTCCTTCTAGTATAGGACCCACTCTGAATACACGTCTTGCGGGCCGATCATCTCCACCGCTGCGGAAGAAAGCAATTAAGAATCTATGCTCATTAAGAGTGCGACTATAATGCGGATCTTACAGTGCTAATGATCAGGACGTCGTCCAAGCAGGCTGCATGCCGAATTTAGCTTACGTCAGGATCAGGCGTTATAGCCTGGGAATCGGACTATGAGGACGCCACGACCTCTGGGAGAAAGCTATATACATTGAGGATCGCGCCATCTTTATGAGACTCAAATGAATCTAGATAGGTAGCATTGCGGACTTGAGTTAGCACATCGGTATTGGAAGGTGAGGGTCCTGCCGCTCGTTCTATGTTCGGTTTATAGTATACAAATAGGTCATCCCGAACGTTGAAGTTAAACTCATGACACGTTGTCGTAATGAAACGGGCCTGTTATTAGGGATACAGACAAAAGGCACAAGCTGGCTTGCACATTAAGGCGCACTAGAGATCCTCACAACCGTTGCCCGCACGGAGGTCGTGTCTAACAGACAGTGAACCAGCCGTATTGGGGTGGATGACCTGAGCTTCTTGGGGCCTGTTGTACACCGCGTGTGGTTCAACTGGTACACATACTACGAATATTCGAAATCATTGTACTGTGCTCTTCGGTGCTACTGACTGTGAGCGAATGCATCCCAATCCCAAACAATGCTTGTGGTAGGAGAATTGAAACTCTCGAAGCCTGGCCCAATGTCATCTACTTTTAACATGTCGGGCCAGGAGTTACGGGCATTGCTTACTTACTTTGCCCCCTTACACCACAGCAGCGCGATTCTTGTTGTAGTAGATTTTATACGACTCGCGAATTAAATGGAACTTGTCTGTCCCATATCGATCGTGTCCATCGTAAGATGAGATTGTAGGAGCATTCGGAAGTCTATGCGGCCCAGGGACTACTACGTTAAATCTGGTCAGACGTGGTTTACAAGGCGTCCCGATCTTCTCAGAACATATGGGAAAGCACTACCGTTCCTTCACGCATACAGTTGTTCGTGCCGAACGAGTAAGCTTGCGACCAGCCCACCCGCTAGGGCTATGCAGCGGGTCATGGCTGGCGCCATACTGTGCGGACAACCCACGCTCTGGCAGAAAGCGTCTTGTGTTTTGTAGTAGCTCCAACGGTTAGACCTTCGATATCTATTCAGAGCGCGAGCGACCACTATTAGACGGCATGTAAACAATGTGTATTTGTTCGGCCCAACCGGTATATGGGTAAGACCGCGAAGGGCCTGCGCGAATACCAGCGTCCAAAAATTCCTCACCCGAGATATGCGGTTAGTACCCCTTGGGTAACGGTCCGCTACGGGTAGCGACGCGAGCCGGCCGCATCGGTTGGAGCCGAGTTGTCGGGCAGGCGAGTAACGTGTGCAATTTGATGGGCCCAAGCCTCCGGCACTATCCACCTCATACATCGACAAAAGCACCAAATATGGGGAAAAGCTGAGCGTCGATATGTACATCTACCCAGGAACCGGCCCGAACATTAGGCGGACGTGAATTTCCGACCTAGGTTCGGCTACATTTCTACGATCCAAGCACACGTGAAGGAGGAGGGGTGTTCCGACCGTAAATGAACGAGGTGCGCAGTGACCCGATGGCGTTTAGCGGATAGCCTTCCTATGCCGGCCTATGCTGTATGGTAGTTGGTTGGTGCCTCCAGAGCCACTGCACCCAATCATAGGGTCTACAGCAGCGTACTTATAAAATTGTACGGGTGACCCATATCCATTACGGGTTGCGACCAGTATAGGAGAGTATAACTGCGTGAACTAATGCGTTATGACGCTTCAGAGTTTGCTCGGGCCCGAGTTCTAGGGCTATAATGTGTTAGGGCGCAAGTATGCCAAGCTAAGATGTGGCGTGCACACTAGGAGTTGTGTTCCTCTGCAAGCAGACACGAGCACTCTGGCAGTAGTTTGACCACACCCGGGTATCACTGCTACTCCATTTCGAACAAGCTATTGGAGCGGACAAAATATGCTACTCAAGAGCATTAGTTATAGGTCTACGAGACAGAAGCAGTTACTGAGTCTGAATATTCGATATAAGTAGGCATGGAGGCGGAGCAAAACAACGTCTGCGATCAATCGTGTTGATGACGTATGGCGACTGGAAGGTAAGGACTATGGCCGGACGGAATGATTCATGTTCTGTTCAAAGCTATATTTCGAAGGGGTATATTAGCGGTCCTACACTTGGTTAGCACCCTCCCCCCTCTGGATCCTGCACTAATTCGAGCTGGCCTCCATCGGTATCAGTCCGGAAGCTCCACTCTCTATCGTAGTCCTAATCAACAGGGTGCCAGTTTGCTCACGTGGAAGTTTGAGGCCCTTTGTGCTCCATAGCCAATCACTAACCATGCACGCGCGACCCACTCTACGTCCAGATCGGCTATAATAGTTGCGCCCGGGACTGGCAGAGTAGACATGTAAGCTAGATAGAGCCCCGACATCGGCCAAGAGATCCTACGCTGCTTCCAGATAATGAGAGACATTCTAGCATTAGACATGCAAGTCGGCAGGGACTCCCCTTATCTAGTAATTTCGATGAATTGGTTTTTCGGCTAGCATCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGACCATGCCGACCTCATCATAGAAGGAATGCTCTAAACTTAGAGTGCTACTAGGAAAACTATTAATCAATGATCGTCCTGCTTACATAGCTGGACGGCGAAAGTTCTTATACTGCGGAGGTTGCTGACGTAGAGTGCGCTGGGTACAGCGGATAAGTTGATCAGGGTGGGGATAGGGTGGCTCACCGTTTATACTCATATAGATTCCTGGCGTCGACGCTGTGACAGGGTCGAGATCGAGGGGGAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGCGGAGCGGAGGGAAAATTATCACCAGAGGGTAGGGGCTCGCGACATTCTATTCAATGCATTTCAAGCTACTTACGTATTTCGGCACAGTGACTACTGCCTGCGCGGCAGCCGTAAGGTTTCCCGTCAATAGGTGGCACGTATCATTGATGAAAGTGTCAGCTAATCATTCAGGCCTTA
и этот словарь
sqc_large_dictionary = ["AGATC","TTTTTTCT","AATG","TCTAG","GATA","TATC","GAAA","TCTG"]
Это слова, которые мы должны искать в последовательности ДНК Лаванды.
Проблема в том, что, например, мне удалось с помощью re и collection создать алгоритм, который обнаруживает все AATG (66). Но спецпредложение c говорит мне, что я должен считать только самую длинную последовательную последовательность, в данном случае 43, так как остальные поливаются, но есть 43 AATG, которые один за другим, и это число, которое я должен взять. Как я мог это реализовать?