Parse Url Beautifulsoup
import requests import csv from bs4 import BeautifulSoup page = requests.get('https://www.google.com/search?q=cars') soup = BeautifulSoup(page.content, 'lxml') import re links = so
Solution 1:
If every time redundant part of url starts with &
, you can apply split()
to each url:
url = 'http://www.imdb.com/title/tt0317219/&sa=U&ved=0ahUKEwjg5fahi7nVAhWdHsAKHSQaCekQFgg9MAk&usg=AFQjCNFu_Vg9v1oVhEtR-vKqCJsR2YGd2A'
url = url.split('&')[0]
print(url)
output:
http://www.imdb.com/title/tt0317219/
Solution 2:
Not the best way, but you could do one more time split, adding one more line after a:
a=[a[0].split("&")[0]]
print(a)
Result:
['https://de.wikipedia.org/wiki/Cars_(Film)']['http://webcache.googleusercontent.com/search%3Fq%3Dcache:I2SHYtLktRcJ']['https://de.wikipedia.org/wiki/Cars_(Film)%23Handlung']['https://de.wikipedia.org/wiki/Cars_(Film)%23Synchronisation']['https://de.wikipedia.org/wiki/Cars_(Film)%23Soundtrack']['https://de.wikipedia.org/wiki/Cars_(Film)%23Kritik']['https://www.mytoys.de/disney-cars/']['http://webcache.googleusercontent.com/search%3Fq%3Dcache:9Ohx4TRS8KAJ']['https://www.youtube.com/watch%3Fv%3DtNmo09Q3F8s']['https://www.youtube.com/watch%3Fv%3DtNmo09Q3F8s']['https://www.youtube.com/watch%3Fv%3DkLAnVd5y7M4']['https://www.youtube.com/watch%3Fv%3DkLAnVd5y7M4']['http://cars.disney.com/']['http://webcache.googleusercontent.com/search%3Fq%3Dcache:1BoR6M9fXwcJ']['http://cars.disney.com/']['http://cars.disney.com/']['https://www.whichcar.com.au/car-style/12-cartoon-cars']['https://www.youtube.com/watch%3Fv%3D6JSMAbeUS-4']['http://filme.disney.de/cars-3-evolution']['http://webcache.googleusercontent.com/search%3Fq%3Dcache:fO7ypFFDGk0J']['http://www.4players.de/4players.php/spielinfonews/Allgemein/36859/2169193/Project_CARS_2-Zehn_Ferraris_erweitern_den_virtuellen_Fuhrpark.html']['http://www.4players.de/4players.php/spielinfonews/Allgemein/36859/2169193/Project_CARS_2-Zehn_Ferraris_erweitern_den_virtuellen_Fuhrpark.html']['http://www.play3.de/2017/08/02/project-cars-2-6/']['http://www.imdb.com/title/tt0317219/']['http://webcache.googleusercontent.com/search%3Fq%3Dcache:-xdXy-yX2fMJ']['http://www.carmagazine.co.uk/']['http://webcache.googleusercontent.com/search%3Fq%3Dcache:PRPbHf_kD9AJ']['http://google.com/search%3Ftbm%3Disch%26q%3DCars']['http://www.imdb.com/title/tt0317219/']['https://de.wikipedia.org/wiki/Cars_(Film)']
Post a Comment for "Parse Url Beautifulsoup"