Validate And Format Json Files
Solution 1:
The built-in JSON module can be used as a validator:
import json
defparse(text):
try:
return json.loads(text)
except ValueError as e:
print('invalid json: %s' % e)
returnNone# or: raise
You can make it work with files by using:
with open(filename) as f:
return json.load(f)
instead of json.loads
and you can include the filename as well in the error message.
On Python 3.3.5, for {test: "foo"}
, I get:
invalid json: Expecting property name enclosed indouble quotes: line 1column2 (char1)
and on 2.7.6:
invalid json: Expecting property name: line 1column2 (char1)
This is because the correct json is {"test": "foo"}
.
When handling the invalid files, it is best to not process them any further. You can build a skipped.txt file listing the files with the error, so they can be checked and fixed by hand.
If possible, you should check the site/program that generated the invalid json files, fix that and then re-generate the json file. Otherwise, you are going to keep having new files that are invalid JSON.
Failing that, you will need to write a custom json parser that fixes common errors. With that, you should be putting the original under source control (or archived), so you can see and check the differences that the automated tool fixes (as a sanity check). Ambiguous cases should be fixed by hand.
Solution 2:
Yes, there are ways to validate that a JSON file is valid. One way is to use a JSON parsing library that will throw exceptions if the input you provide is not well-formatted.
try:
load_json_file(filename)
except InvalidDataException: # or something# oops guess it's not valid
Of course, if you want to fix it, you naturally cannot use a JSON loader since, well, it's not valid JSON in the first place. Unless the library you're using will automatically fix things for you, in which case you probably wouldn't even have this question.
One way is to load the file manually and tokenize it and attempt to detect errors and try to fix them as you go, but I'm sure there are cases where the error is just not possible to fix automatically and would be better off throwing an error and asking the user to fix their files.
I have not written a JSON fixer myself so I can't provide any details on how you might go about actually fixing errors.
However I am not sure whether it would be a good idea to fix all errors, since then you'd have assume your fixes are what the user actually wants. If it's a missing comma or they have an extra trailing comma, then that might be OK, but there may be cases where it is ambiguous what the user wants.
Solution 3:
Here is a full python3 example for the next novice python programmer that stumbles upon this answer. I was exporting 16000 records as json files. I had to restart the process several times so I needed to verify that all of the json files were indeed valid before I started importing into a new system.
I am no python programmer so when I tried the answers above as written, nothing happened. Seems like a few lines of code were missing. The example below handles files in the current folder or a specific folder.
verify.py
import json
import os
import sys
from os.path import isfile,join
# check if a folder name was specifiediflen(sys.argv) > 1:
folder = sys.argv[1]
else:
folder = os.getcwd()
# array to hold invalid and valid files
invalid_json_files = []
read_json_files = []
defparse():
# loop through the folderfor files in os.listdir(folder):
# check if the combined path and filename is a fileif isfile(join(folder,files)):
# open the filewithopen(join(folder,files)) as json_file:
# try reading the json file using the json interpretertry:
json.load(json_file)
read_json_files.append(files)
except ValueError as e:
# if the file is not valid, print the error # and add the file to the list of invalid filesprint("JSON object issue: %s" % e)
invalid_json_files.append(files)
print(invalid_json_files)
print(len(read_json_files))
parse()
Example:
python3 verify.py
or
python3 verify.py somefolder
tested with python 3.7.3
Solution 4:
It was not clear to me how to provide path to the file folder, so I'd like to provide answer with this option.
path = r'C:\Users\altz7\Desktop\your_folder_name'# use your path
all_files = glob.glob(path + "/*.json")
data_list = []
invalid_json_files = []
for filename in all_files:
try:
df = pd.read_json(filename)
data_list.append(df)
except ValueError:
invalid_json_files.append(filename)
print("Files in correct format: {}".format(len(data_list)))
print("Not readable files: {}".format(len(invalid_json_files)))
#df = pd.concat(data_list, axis=0, ignore_index=True) #will create pandas dataframe from readable files, if you like
Post a Comment for "Validate And Format Json Files"