Extract Figures From Latex File
Hi I could use a hand with the following problem. I'm trying to write a python script that would extract the figures from a tex file and put them into another file. The input file
Solution 1:
Thanks a lot for the answers I've finally done it this way. It probably isn't the optimal way but it works. I tried several of the proposed solutions but they need some tweaking to get them to work.
infile = open('data.tex', 'r')
outfile = open('result.tex', 'w')
extract_block = False
for line in infile:
if'begin{figure}'in line:
extract_block = True
if extract_block:
outfile.write(line)
if'end{figure}'in line:
extract_block = False
outfile.write("------------------------------------------\n\n")
infile.close()
outfile.close()
Solution 2:
You can do it with regular expression (re
module) findall()
function. The things to note are:
- use the
re.DOTALL
flag to allow "." to match newlines, - the "lazy" operator on that dot (the question mark in ".*?") which means the regex won't greedily run past the first
\end{figure}
in search of the longest possible match - make sure your regex string is a
r'raw string'
otherwise you have to escape every regex backslash to "\\" and a literal backslash in the regex to "\\\\". The same goes for hard-coded input strings.
Here we go:
import re
TEXT = r"""\documentclass[]....
\begin{document}
% More text
\begin{figure}
figure_info 1
\end{figure}
\begin{figure}
figure_info 2
\end{figure}
%More text
"""
RE = r'(\\begin\{figure\}.*?\\end\{figure\})'
m = re.findall(RE, TEXT, re.DOTALL)
if m:
for match in m:
print match
print''#blank line
Solution 3:
I would probably take the easy way out and read the whole file into a string variable. This
import string
f = open('/tmp/workfile', 'r')
f = f.read()
text = string.split(f,"\begin{figure} ")
text.pop(0)
for a in text:
a = string.split(a,"\end{figure}")
print"\begin{figure}\n"print a[0]
print"\end{figure}"
You could execute this from the command line like this:
your_script.py > output_file.tex
Solution 4:
import re
# re.M means match across line boundaries# re.DOTALL means the . wildcard matches \n newlines as well
pattern = re.compile('\\\\begin\{figure\}.*?\\\\end\{figure\}', re.M|re.DOTALL)
# 'with' is the preferred way of opening files; it# ensures they are always properly closedwithopen("file1.tex") as inf, open("fileout.tex","w") as outf:
for match in pattern.findall(inf.read()):
outf.write(match)
outf.write("\n\n")
Edit: found the problem - not in the regex, but in the test text I was matching against (I forgot to escape the \b's in it).
Post a Comment for "Extract Figures From Latex File"