如何将嵌套字典解析到 Pandas DataFrame 中

问题描述 投票:0回答:1

我有一本非常疯狂的字典,我正在尝试将其解析为 pandas 数据框。这是字典的较小版本:

import datetime
from decimal import *

test_dict = [{'record_id': '43bbdfbf',
  'date': datetime.date(2023, 3, 25),
  'person': {
      'id': '123abc',
      'name': 'Person1'
  },
  'venue': {
      'id': '5bd6c74c',
      'name': 'Place1',
      'city': {
          'id': '3448439',
          'name': 'São Paulo',
          'state': 'São Paulo',
          'state_code': 'SP',
          'coords': {'lat': Decimal('-23.5475'), 'long': Decimal('-46.63611111')},
          'country': {'code': 'BR', 'name': 'Brazil'}
      },
   },
  'thing_lists': {'thing_list': [
      {'song': [
          {'name': 'Thing1','info': None,'dup': None},
          {'name': 'Thing2', 'info': None, 'dup': None},
          {'name': 'Thing3', 'info': None, 'dup': None},
          {'name': 'Thing4', 'info': None, 'dup': None}],
         'extra': None},
     {'song': [
          {'name': 'ExtraThing1','info': None,'dup': None},
          {'name': 'ExtraThing2', 'info': None, 'dup': None}],
         'extra': 1
     }]}}]

这是我开始构建的一个函数,用于从字典中解析出信息:

def extract_values(dictionary):
    record_id = dictionary[0]['record_id'],
    date = dictionary[0]['date'],
    country = dictionary[0]['venue']['city']['country']['name']
    
    return record_id, date, venue, city, lat, long, country

这是我尝试将这些片段提取到数据框中的片段。

import pandas as pd
df = pd.DataFrame(extract_values(test_dict)).transpose()
df.rename(
    columns={
        df.columns[0]: 'record_id',
        df.columns[1]: 'date',
        df.columns[3]: 'city',
        df.columns[6]: 'country'
    }, 
    inplace=True
)

正如您所看到的,除了字符串字段之外,它大部分都有效,字符串字段被分割出来,每行都有一个字符。我不知道如何解决这个问题。然而,如果我拉的最后一个字段不是一根绳子,那么它就会被压回原位。有没有办法手动将字符串推到一起,这样我就不必依赖最终字段的数据类型?

此外,最后几个字段似乎很难提取。理想情况下,我希望我的最终数据框如下所示:

RecordID Date       City      Country ThingName    Dup   Extra
43bbdfbf 2023-03-25 São Paulo Brazil  Thing1       None  None
43bbdfbf 2023-03-25 São Paulo Brazil  Thing2       None  None
43bbdfbf 2023-03-25 São Paulo Brazil  Thing3       None  None 
43bbdfbf 2023-03-25 São Paulo Brazil  Thing4       None  None
43bbdfbf 2023-03-25 São Paulo Brazil  ExtraThing1  None  1
43bbdfbf 2023-03-25 São Paulo Brazil  ExtraThing2  None  1

有人可以帮我指出如何正确解析这本字典的正确方向吗?

python json pandas dictionary datetime
1个回答
0
投票

除了使用大量嵌套循环来提取所有值之外,我没有看到解决此问题的简单方法:

def extract_values(data):
    records = []
    for record in data:
        for thing in record['thing_lists']['thing_list']:
            for song in thing['song']:
                records.append({ 
                    'RecordID' : record['record_id'],
                    'Date': record['date'],
                    'City': record['venue']['city']['name'],
                    'Country': record['venue']['city']['country']['name'],
                    'ThingName': song['name'],
                    'Dup': song['dup'],
                    'Extra': thing['extra']
                })
    return records

records = extract_values(test_dict)
df = pd.DataFrame(records)

输出:

   RecordID        Date       City Country    ThingName   Dup  Extra
0  43bbdfbf  2023-03-25  São Paulo  Brazil       Thing1  None    NaN
1  43bbdfbf  2023-03-25  São Paulo  Brazil       Thing2  None    NaN
2  43bbdfbf  2023-03-25  São Paulo  Brazil       Thing3  None    NaN
3  43bbdfbf  2023-03-25  São Paulo  Brazil       Thing4  None    NaN
4  43bbdfbf  2023-03-25  São Paulo  Brazil  ExtraThing1  None    1.0
5  43bbdfbf  2023-03-25  São Paulo  Brazil  ExtraThing2  None    1.0
© www.soinside.com 2019 - 2024. All rights reserved.