Skip to content Skip to sidebar Skip to footer

"grouped/clustered" Regions In Vector In R/python

I struggle a bit with following problem. I would like to find 'grouped/clustered' regions with 1s based on following criteria: Starting with position of first 1, if in window after

Solution 1:

You can do this easily by run length encoding the vector:

x <- c(1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
       0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
       0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
       1,1,0,1,0,0,0,0,0,0,1,0,0,0,1,1,1,0,0,0,0,0,0,1,0,0,1,1,1,
       1,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
       0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1)

which(x == 1)
#[1]   1  15  62  63  67  88  89  91  98 102 103 104 111 114 115 116 117 119 121 125 127 145 150 160 164 166

window <- 5

#run length encoding
y <- rle(x)
#Run Length Encoding
#  lengths: int [1:37] 1 13 1 46 2 3 1 20 2 1 ...
#  values : num [1:37] 1 0 1 0 1 0 1 0 1 0 ...

#if run length for zeros is smaller than window replace with 1
y$values[(y$values == 0) & (y$lengths < window)] <- 1

#combine runs of ones
y <- rle(inverse.rle(y))

start <- cumsum(y$lengths)[y$values == 1] - y$lengths[y$values == 1] + 1
#[1]   1  15  62  88  98 111 145 160
end <- cumsum(y$lengths)[y$values == 1]
#[1]   1  15  67  91 104 127 150 166

Post a Comment for ""grouped/clustered" Regions In Vector In R/python"