Overlap Graphs

A problem from rosalind “Bioinformatics Stronghold” category, String Algorithm

  • A problem from rosalind “Bioinformatics Stronghold” category
    Rosalind Problem Link

  • Description

    • Make graph with a collection of DNA strings in FASTA format having total length at most 10 kbp.
    • Method for making the graph
      • There is a positive number k
      • EX) One DNA string t & Another DNA string s
        • if suffix of t == prefix of s (length of suffix and prefix is k)
        • t is linked to s (print : FASTA_name of t FASTA_name of s)
    • k = 3

My Solution


name = input()
file = open(name, 'r')
FASTA_name = []
seqs = []
flag = True


for line in file.readlines():
    if line[0] == '>':
        if flag == True:
            FASTA_name.append(line[1:-1])
            resultSeq = ""
            flag = False
        else:
            seqs.append(resultSeq)
            FASTA_name.append(line[1:-1])
            resultSeq = ""
    else:
        resultSeq += line[:-1]

seqs.append(resultSeq)

for i in range(len(FASTA_name)):
    suffix = seqs[i][-3] + seqs[i][-2] + seqs[i][-1]

    for j in range(len(FASTA_name)):
        if i == j:
            continue
        else:
            prefix = seqs[j][0:3]
            if suffix == prefix:
                print(FASTA_name[i], FASTA_name[j])


  • DNA sequence can be stored in input file with x (x>1) lines if its length is big
    • Until current line is FASTA_name, collect the DNA sequence line to resultSeq
  • With Brute-force Algorithm, we find all combinations satisfying the condition (suffix == prefix)

© 2017. All rights reserved.

Powered by Hydejack v조현진