Skip to content Skip to sidebar Skip to footer

How To Deal With Large Json Files (flattening It To Tsv)

I am working with a large JSON file specifically the persona dataset (download here) Each entry in Persona-Chat is a dict with two keys personality and utterances, and the dataset

Solution 1:

To fully flatten that file, you'd need something like

import json


defread_personachat_file(name="personachat_self_original.json"):
    withopen(name, "r") as f:
        data = json.load(f)

    for entry_type, chats in data.items():
        for chat_id, chat inenumerate(chats):
            personality = "|".join(chat["personality"])
            for utt_id, utt inenumerate(chat["utterances"]):
                for key in ("candidates", "history"):
                    for phrase_id, phrase inenumerate(utt[key]):
                        yield (entry_type, chat_id, personality, utt_id, key, phrase_id, phrase)


for entry in read_personachat_file():
    print(entry)

The output will be something like

('train', 313, 'i like to wear red .|i wear a red purse .|i like to wear red shoes also .|i use red lipstick .|i drive a red car .', 5, 'candidates', 7, 'my sister will be my mom , she wants me to get married')
('train', 313, 'i like to wear red .|i wear a red purse .|i like to wear red shoes also .|i use red lipstick .|i drive a red car .', 5, 'candidates', 8, 'hi , how are ya ?')
('train', 313, 'i like to wear red .|i wear a red purse .|i like to wear red shoes also .|i use red lipstick .|i drive a red car .', 5, 'candidates', 9, 'sounds good . i am just sitting here with my dog . i love animals .')
('train', 313, 'i like to wear red .|i wear a red purse .|i like to wear red shoes also .|i use red lipstick .|i drive a red car .', 5, 'candidates', 10, "sure i'll go with you but i am baking a pizza right now , my favorite . come eat .")
('train', 313, 'i like to wear red .|i wear a red purse .|i like to wear red shoes also .|i use red lipstick .|i drive a red car .', 5, 'candidates', 11, 'where do you work then soccer person ?')
('train', 313, 'i like to wear red .|i wear a red purse .|i like to wear red shoes also .|i use red lipstick .|i drive a red car .', 5, 'candidates', 12, 'it is so pretty in the fall and winter , my favorite time to go')
('train', 313, 'i like to wear red .|i wear a red purse .|i like to wear red shoes also .|i use red lipstick .|i drive a red car .', 5, 'candidates', 13, 'i to travel and meet new people')

(whether or not that's useful for you).

Post a Comment for "How To Deal With Large Json Files (flattening It To Tsv)"