Skip to content Skip to sidebar Skip to footer

Counting The Number Of CGG In Microsatelites

I have this task to find a number of repeats of CGG in a sequence that stored as a value in a dictionary (named 'dict' below as an example). The number of repeats in a row should b

Solution 1:

It can be done by testing the presence of n*"CGG" in the string with .index() and decreasing the value of n (int). For example, in a string of length 20, you test if 6*"CGG" is present : if yes, you remember it and you create the substring without this 6*"CGG" and then you try it with 5*"CGG" etc...

The function below works on this logic and is able to detect if you have more than one tandem of the same lenght in the string:

def tandem_search(pattern,string):
    st=string
    result=[]
    for i in range(len(dic['ind_1'])//3+1,5,-1):
        while True:
            try:
                j=st.index(i*pattern)
                result.append(i)
                st=st[:j]+st[j+i*3:]
            except:
                break
    return(result)

With it, I get the following results:

tandem_search("CGG",dic['ind_1']) = [47]
tandem_search("CGG",dic['ind_10']) = [70]

Post a Comment for "Counting The Number Of CGG In Microsatelites"