Combining Values For A Large Number Of Overlapping Intervals Of Dictionary Keys
Solution 1:
Ok, now i think i get it: Basically you have a bunch of overlapping intervals, represented by bars at a certain position with a given thickness. You would draw these bars below each other and see how thick they are together at any given point.
I think it's easiest/fastest to abuse the fact that you have integer positions to do this:
all={
1:{ ('a',123,145):20, ('a',155,170):12, ('b',234,345): 34},
2:{ ('a',121,135):10, ('a',155,175):28, ('b',230,345): 16},
3:{ ('a',130,140):20, ('a',150,170):10, ('b',234,345): 30}
}
from collections import defaultdict
summer = defaultdict(int)
mini, maxi = 0,0
for d in all.values():
for (name, start, stop), value in d.iteritems():
# im completely ignoring the `name` here, not sure if that's what you want
# else just separate the data before doing this ...
if mini == 0:
mini = start
mini, maxi = min(mini, start), max(maxi, stop)
for i in range(start, stop+1):
summer[i]+=value
# now we have the values at each point, very redundant but very fast so far
print summer
# now we can find the intervals:
def get_intervals(points, start, stop):
cstart = start
for i in range(start, stop+1):
if points[cstart] != points[i]: # did the value change ?
yield cstart, i-1, points[cstart]
cstart = i
if cstart != i:
yield cstart, i, points[cstart]
print list(get_intervals(summer, mini, maxi))
When using only the 'a' items it give:
[(121, 122, 10), (123, 129, 30), (130, 135, 50), (136, 140, 40), (141, 145, 20), (146, 149, 0), (150, 154, 10), (155, 170, 50), (171, 175, 28)]
Edit: It just hit me how to do this really simple:
from collections import defaultdict
from heapq import heappush, heappop
class Summer(object):
def __init__(self):
# its a priority queue, kind of like a sorted list
self.hq = []
def additem(self, start, stop, value):
# at `start` add it as a positive value
heappush(self.hq, (start, value))
# at `stop` subtract that value again
heappush(self.hq, (stop, -value))
def intervals(self):
hq = self.hq
start, val = heappop(hq)
while hq:
point, value = heappop(hq)
yield start, point, val
# just maintain the current value and where the interval started
val += value
start = point
assert val == 0
summers = defaultdict(Summer)
for d in all.values():
for (name, start, stop), value in d.iteritems():
summers[name].additem(start, stop, value)
for name,s in summers.iteritems():
print name, list(s.intervals())
Solution 2:
Alright, if these are chromosomes, lets start by mapping them out separately:
{"Chr1": {(121,122):10, (123,130):30, ...},
"Chr2": {(230,233):16, ...},
...
}
The numbers you're adding up are, I take it, some sort of scores--expression scores or whatever.
If the range of positions (these 121, 130 numbers definining the intervals) is small enough--anything up to several thousand--then you'd probably save yourself a headache by storing a summed score for each position, and simply adding the score for an interval to every position within that interval.
If they're something like individual base positions, and there are millions of possible positions, you'll need to stick with intervals. So for each one, you'll need to check the relevant chromosome for intervals it overlaps, then remove those, and break them down into as many smaller intervals as you need to store all the different summed scores.
Here's a rough framework, but it's not complete:
for (start, end), score in intervals_to_add.items():
overlapping = {}
for (start1, end1), score1 in current_chromosome.items():
if start1 <= start <= end1 or start1 <= end <= end1:
overlapping[(start1, end1)] = score1
for interval in overlapping:
current_chromosome.pop(interval)
# Process overlapping into smaller intervals, adding in the current interval
current_chromosome.update(new_intervals)
Post a Comment for "Combining Values For A Large Number Of Overlapping Intervals Of Dictionary Keys"