Python Beautifulsoup Img Tag Parsing
I am using beautifulsoup to parse all img tags which is present in '' The code is import urllib2 from BeautifulSoup import BeautifulSoup page = urllib2.urlopen('http
Solution 1:
Seems to work when I try it here
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen('')
soup = BeautifulSoup(page)
print "\n".join(set(tag['src'] for tag in tags))
Produces this which looks OK to me
Solution 2:
I had the similar problem. I couldn't find all images. So here is the piece of code that will give you any attribute value of an image tag.
from BeautifulSoup import BeautifulSoup as BSHTML
import urllib2
page = urllib2.urlopen('')
soup = BSHTML(page)
images = soup.findAll('img')
for image in images:
#print image source
print image['src']
#print alternate text
print image['alt']
Solution 3:
Explicitly using soup.findAll(name='img')
worked for me, and I don't appear to be missing anything from the page.
Solution 4:
def grabimagetags():
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen('')
soup = BeautifulSoup(page)
tags = soup.findAll('img')
list.extend(set(tag['src'] for tag in tags))
return list
i would only make this change so that you can pass the list of img tags
Solution 5:
in my case some images didn't contain src
so i did this to avoid keyError
art_imgs = set(img['src'] for img in article.find_all('img') if img.has_attr('src'))
Post a Comment for "Python Beautifulsoup Img Tag Parsing"