Best way to obtain data from a nested JSON dataset

Hello,

I am hoping someone can provide some advice on how to extract data from the following dataset - https://ped.uspto.gov/peds/ If you click on the JSON tab for the entire dataset, I am trying to work with the 2020 portion of the dataset.

Anyways, I am beginning a project to obtain patent examiner statistics (how many office actions are issued per examiner).

The issue is that the dataset is nested - a lot. I would like to first pull each unique patent examiner name and then traverse the dataset based on the patent examiner name to calculate statistics from the “Transaction History”.

In short, are there any good methods in Python for running through a nested JSON dataset? Or should I be looking at tackling this project from a different vantage point.

Thanks!

I see only normal .zip file with many .json files and you have to only uncompress it to access file 2020.json. You can do it manually or write code with python module zipfile.

Python doesn’t have special method to work with JSON data - it only converts it to Python’s dictionary and you have to write own method to work with this dictionary. You could write something similar to XPATH for xml data - see Xpath like query for nested python dictionaries or external module jsonpath-ng (see more modules)

Eventually you can use external tools like jq but it may need to use module subprocess to run it in Python.

© Copyright 2013-2020 Analytics Vidhya