IDSP-Convert DataFrame to JSON


We have a Pandas DateFrame like this:
index | location | date | people_fully_vaccinated |
---|---|---|---|
0 | Alabama | 2021-01 | 55331 |
1 | Alaska | 2021-01 | 24545 |
2 | Arizona | 2021-01 | 92666 |
3 | Arkansas | 2021-01 | 56264 |
4 | California | 2021-01 | 558977 |
… | … | … | … |
695 | Virginia | 2022-02 | 6140686 |
696 | Washington | 2022-02 | 5437090 |
697 | West Virginia | 2022-02 | 1017207 |
698 | Wisconsin | 2022-02 | 3758625 |
699 | Wyoming | 2022-02 | 0 |
Our goal is to format this table to json file like this:
{
"Alabama": [
{
"date": "2021-01",
"people_fully_vaccinated": 55331
},
{
"date": "2021-02",
"people_fully_vaccinated": 313290
},
{
"date": "2021-03",
"people_fully_vaccinated": 656899
}
],
"Alaska": [
{
"date": "2021-01",
"people_fully_vaccinated": 24545
},
{
"date": "2021-02",
"people_fully_vaccinated": 97554
},
{
"date": "2021-03",
"people_fully_vaccinated": 163100
}
]
}
Pandas do have df.to_json(orient='index')
function to convert dataframe to json file. But the index/key should be unique.
This is the solution:
out = df.set_index('location').groupby('location').apply(lambda x: x.to_dict('r')).to_dict()
json.dumps(out)
Breakdown:
- set index
df.set_index('location')
- create nested list
df.set_index('location').groupby('location').apply(lambda x: x.to_dict('r'))
############
#location
#Alabama [{'date': '2021-01', 'people_fully_vaccinated'...
#Alaska [{'date': '2021-01', 'people_fully_vaccinated'...
#Arizona [{'date': '2021-01', 'people_fully_vaccinated'...
#Arkansas [{'date': '2021-01', 'people_fully_vaccinated'...
#California [{'date': '2021-01', 'people_fully_vaccinated'...
'r' = 'records' : list like [{column -> value}, … , {column -> value}]
reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html
- create dictionary
df.set_index('location').groupby('location').apply(lambda x: x.to_dict('r')).to_dict()
############
#{'Alabama': [{'date': '2021-01', 'people_fully_vaccinated': 55331},
# {'date': '2021-02', 'people_fully_vaccinated': 313290},
# {'date': '2021-03', 'people_fully_vaccinated': 656899},
# {'date': '2021-04', 'people_fully_vaccinated': 1143488},
# {'date': '2021-05', 'people_fully_vaccinated': 1432446},.......
- convert into json
json.dumps(out)
Reference: https://stackoverflow.com/questions/64817724/pandas-df-to-json-with-duplicate-keys/64817777#64817777