Counting The Number Of CGG In Microsatelites
I have this task to find a number of repeats of CGG in a sequence that stored as a value in a dictionary (named 'dict' below as an example). The number of repeats in a row should b
Solution 1:
It can be done by testing the presence of n*"CGG" in the string with .index() and decreasing the value of n (int). For example, in a string of length 20, you test if 6*"CGG" is present : if yes, you remember it and you create the substring without this 6*"CGG" and then you try it with 5*"CGG" etc...
The function below works on this logic and is able to detect if you have more than one tandem of the same lenght in the string:
def tandem_search(pattern,string):
st=string
result=[]
for i in range(len(dic['ind_1'])//3+1,5,-1):
while True:
try:
j=st.index(i*pattern)
result.append(i)
st=st[:j]+st[j+i*3:]
except:
break
return(result)
With it, I get the following results:
tandem_search("CGG",dic['ind_1']) = [47]
tandem_search("CGG",dic['ind_10']) = [70]
Post a Comment for "Counting The Number Of CGG In Microsatelites"