Skip to content Skip to sidebar Skip to footer

Search For Any Number Of Unknown Substrings In Place Of * In A List Of String

First of all, sorry if the title isn't very explicit, it's hard for me to formulate it properly. That's also why I haven't found if the question has already been asked, if it has.

Solution 1:

Consider using 'fnmatch' which provides Unix-like file pattern matching. More info here http://docs.python.org/2/library/fnmatch.html

from fnmatch import fnmatch
strList = ['obj_1_mesh',
       'obj_2_mesh',
       'obj_TMP',
       'mesh_1_TMP',
       'mesh_2_TMP',
       'meshTMP']

searchFor = '*_1_*'

resultSubList = [ strList[i] for i,x in enumerate(strList) if fnmatch(x,searchFor) ]

This should do the trick


Solution 2:

I would use the regular expression package for this if I were you. You'll have to learn a little bit of regex to make correct search queries, but it's not too bad. '.+' is pretty similar to '*' in this case.

import re

def search_strings(str_list, search_query):
    regex = re.compile(search_query)
    result = []
    for string in str_list:
        match = regex.match(string)
        if match is not None:
            result+=[match.group()]
    return result

strList= ['obj_1_mesh',
          'obj_2_mesh',
          'obj_TMP',
          'mesh_1_TMP',
          'mesh_2_TMP',
          'meshTMP']

print search_strings(strList, '.+_1_.+')

This should return ['obj_1_mesh', 'mesh_1_TMP']. I tried to replicate the '*_1_*' case. For 'mesh_*' you could make the search_query 'mesh_.+'. Here is the link to the python regex api: https://docs.python.org/2/library/re.html


Solution 3:

The simplest way to do this is to use fnmatch, as shown in ma3oun's answer. But here's a way to do it using Regular Expressions, aka regex.

First we transform your searchFor pattern so it uses '.+?' as the "wildcard" instead of '*'. Then we compile the result into a regex pattern object so we can efficiently use it multiple tests.

For an explanation of regex syntax, please see the docs. But briefly, the dot means any character (on this line), the + means look for one or more of them, and the ? means do non-greedy matching, i.e., match the smallest string that conforms to the pattern rather than the longest, (which is what greedy matching does).

import re

strList = ['obj_1_mesh',
           'obj_2_mesh',
           'obj_TMP',
           'mesh_1_TMP',
           'mesh_2_TMP',
           'meshTMP']

searchFor = '*_1_*'
pat = re.compile(searchFor.replace('*', '.+?'))

result = [s for s in strList if pat.match(s)]
print(result)

output

['obj_1_mesh', 'mesh_1_TMP']

If we use searchFor = 'mesh_*' the result is

['mesh_1_TMP', 'mesh_2_TMP']

Please note that this solution is not robust. If searchFor contains other characters that have special meaning in a regex they need to be escaped. Actually, rather than doing that searchFor.replace transformation, it would be cleaner to just write the pattern using regex syntax in the first place.


Solution 4:

If the string you are looking for looks always like string you can just use the find function, you'll get something like:

for s in strList:
    if s.find(searchFor) != -1:
        do_something()

If you have more than one string to look for (like abc*123*test) you gonna need to look for the each string, find the second one in the same string starting at the index you found the first + it's len and so on.


Post a Comment for "Search For Any Number Of Unknown Substrings In Place Of * In A List Of String"