Data Mining Imdb Reviews - Only Extracting The First 25 Reviews

May 31, 2023 Post a Comment

I am currently trying to extract all the reviews on Spiderman Homecoming movie but I am only able to get the first 25 reviews. I was able to load more in IMDB to get all the review

Solution 1:

Well, actually, there's no need to use Selenium. The data is available via sending a GET request to the websites API in the following format:

https://www.imdb.com/title/tt6320628/reviews/_ajax?ref_=undefined&paginationKey=MY-KEY

where you have to provide a key for the paginationKey in the URL (...&paginationKey=MY-KEY)

The key is found in the class load-more-data:

<divclass="load-more-data"data-key="g4wp7crmqizdeyyf72ux5nrurdsmqhjjtzpwzouokkd2gbzgpnt6uc23o4zvtmzlb4d46f2swblzkwbgicjmquogo5tx2"></div>

So, to scrape all the reviews into a DataFrame, try:

import pandas as pd
import requests
from bs4 import BeautifulSoup


url = (
    "https://www.imdb.com/title/tt6320628/reviews/_ajax?ref_=undefined&paginationKey={}"
)
key = ""
data = {"title": [], "review": []}

whileTrue:
    response = requests.get(url.format(key))
    soup = BeautifulSoup(response.content, "html.parser")
    # Find the pagination key
    pagination_key = soup.find("div", class_="load-more-data")
    ifnot pagination_key:
        break# Update the `key` variable in-order to scrape more reviews
    key = pagination_key["data-key"]
    for title, review inzip(
        soup.find_all(class_="title"), soup.find_all(class_="text show-more__control")
    ):
        data["title"].append(title.get_text(strip=True))
        data["review"].append(review.get_text())

df = pd.DataFrame(data)
print(df)

Output (truncated):

                                                title                                             review
0                              Terrific entertainment  Spiderman: Far from Home is not intended to be...
1         THe illusion of the identity of Spider man.  Great story in continuation of spider man home...
2                       What Happened to the Bad Guys  I believe that Quinten Beck/Mysterio got what ...
3                                         Spectacular  One of the best if not the best Spider-Man mov...

......

Python Dictionary

Data Mining Imdb Reviews - Only Extracting The First 25 Reviews

Solution 1:

Post a Comment for "Data Mining Imdb Reviews - Only Extracting The First 25 Reviews"