Extracting Comments From Python Source Code
Solution 1:
Answer for more general cases (extracting from modules, functions):
Modules:
The documentation specifies that one needs to provide a callable which exposes the same interface as the readline()
method of built-in file objects. This hints to: create an object that provides that method.
In the case of module, we can just open
a new module as a normal file and pass in it's readline
method.
This is the key, the argument you pass is the method readline()
.
Given a small scrpt.py
file with:
# My amazing foo function.deffoo():
""" docstring """# I will printprint"Hello"return0# Return the value# Maaaaaaainif __name__ == "__main__":
# this is mainprint"Main"
We will open it as we do all files:
fileObj = open('scrpt.py', 'r')
This file object now has a method called readline
(because it is a file object) which we can safely pass to tokenize.generate_tokens
and create a generator.
tokenize.generate_tokens
(simply tokenize.tokenize
in Py3 -- Note: Python 3 requires readline return bytes
so you'll need to open the file in 'rb'
mode) returns a named tuple of elements which contain information about the elements tokenized. Here's a small demo:
for toktype, tok, start, end, line in tokenize.generate_tokens(fileObj.readline):
# we can also use token.tok_name[toktype] instead of 'COMMENT'# from the token module if toktype == tokenize.COMMENT:
print'COMMENT' + " " + tok
Notice how we pass the fileObj.readline
method to it. This will now print:
COMMENT # My amazing foo function
COMMENT # I will print
COMMENT # Return the value
COMMENT # Maaaaaaain
COMMENT # this is main
So all comments regardless of position are detected. Docstrings of course are excluded.
Functions:
You could achieve a similar result without open
for cases which I really can't think of. Nonetheless, I'll present another way of doing it for completeness sake. In this scenario you'll need two additional modules, inspect
and StringIO
(io.StringIO
in Python3
):
Let's say you have the following function:
defbar():
# I am barprint"I really am bar"# bar bar bar baaaar# (bar)return"Bar"
You need a file-like object which has a readline
method to use it with tokenize
. Well, you can create a file-like object from an str
using StringIO.StringIO
and you can get an str
representing the source of the function with inspect.getsource(func)
. In code:
funcText = inpsect.getsource(bar)
funcFile = StringIO.StringIO(funcText)
Now we have a file-like object representing the function which has the wanted readline
method. We can just re-use the loop we previously performed replacing fileObj.readline
with funcFile.readline
. The output we get now is of similar nature:
COMMENT # I am bar
COMMENT # bar bar bar baaaar
COMMENT # (bar)
As an aside, if you really want to create a custom way of doing this with re
take a look at the source for the tokenize.py
module. It defines certain patters for comments, (r'#[^\r\n]*'
) names et cetera, loops through the lines with readline
and searches within the line
list for pattterns. Thankfully, it's not too complex after you look at it for a while :-).
Answer for function extract
(Update):
You've created an object with StringIO
that provides the interface but have you haven't passed that intereface (readline
) to tokenize.generate_tokens
, instead, you passed the full object (stringio
).
Additionally, in your else
clause a TypeError
is going to be raised because untokenize
expects an iterable as input. Making the following changes, your function works fine:
defextract(code):
res = []
comment = None
stringio = StringIO.StringIO(code)
# pass in stringio.readline to generate_tokensfor toktype, tokval, begin, end, line in tokenize.generate_tokens(stringio.readline):
if toktype != tokenize.COMMENT:
res.append((toktype, tokval))
else:
# wrap (toktype, tokval) tupple in listprint tokenize.untokenize([(toktype, tokval)])
return tokenize.untokenize(res)
Supplied with input of the form expr = extract('a=1+2#A comment')
the function will print out the comment and retain the expression in expr
:
expr = extract('a=1+2#A comment')
#A commentprintexpr'a =1 +2 '
Furthermore, as I later mention io
houses StringIO
for Python3 so in this case the import
is thankfully not required.
Solution 2:
Use this Third-Party Library from PyPI
Post a Comment for "Extracting Comments From Python Source Code"