Python, Text Detection Ocr

August 29, 2023 Post a Comment

I am trying to extract data from a scanned form. The form has a standard format similar to the one shown in the image below: I have tried using pytesseract (tesseract OCR) to dete

Solution 1:

I think you have the answer already in your own post. I did recently something similar and this is how I did it:

//id_image was loaded with cv2.imread
temp_image = id_image[start_y:end_y,start_x:end_x]
img = Image.fromarray(temp_image)
text = pytesseract.image_to_string(img, config="-psm 7")

So basically, if your format is predefined, you just need to know the location of the fields that you want the text of (which you already know), crop it, and then apply the ocr (tesseract) extraction.

In this case you need import pytesseract, PIL, cv2, numpy.

Baca Juga