将 JSON 导出到各个列

Question

我有一个名为

df

的数据框，其中包含

revieweddata

列。每行都有一个带有嵌套部分的 JSON 结构。我想将信息解压到单独的列中，但由于某种原因，我无法做到。

列中第一行的 JSON

reiveweddata


  {
      "query": "Amazon Fresh Bestellungen sehen ",
      "response": "Klicke auf „Letzte Einkäufe“ auf der Amazon Fresh Startseite, um deine Bestellungen zu sehen.",
      "pills":
      [
          {
              "name": "Amazon Fresh",
              "link": "https://www.amazon.de/alm/storefront?almBrandId=QW1hem9uIEZyZXNo&ref=fs_dsk_sn_logo",
              "position": 37
          },
          {
              "name": "Letzte Einkäufe",
              "link": "https://www.amazon.de/afx/lists/pastpurchases/?almBrandId=QW1hem9uIEZyZXNo&ref=fs_dsk_sn_pastPurchases"
          }
      ]
  }

}

我想要的是单独的列，例如：

query

，

response

，

name1

，

link1

，

position1

，

name2

，

link2

目前，我使用此代码：

# Upload csv file with raw data
df = pd.read_csv('raw_data.csv', encoding = 'utf8') 
df

# Define a function to extract information from JSON data
def extract_info(row):
    query = row['query']
    response = row['response']
    pills = row['pills']
    data = []
    for pill in pills:
        name = pill.get('name')
        link = pill.get('link')
        position = pill.get('position')
        data.append({'query': query, 'response': response, 'name': name, 'link': link, 'position': position})
    return data

# Apply the function to each row of the DataFrame
df['parsed_data'] = df['revieweddata'].apply(extract_info)

# Check if 'parsed_data' column exists
if 'parsed_data' in df.columns:
    # Create a new DataFrame from the 'parsed_data' column
    parsed_df = pd.DataFrame(df.pop('parsed_data').tolist())

    # Join the new DataFrame to the original DataFrame with a suffix
    df = df.join(parsed_df.add_suffix('_parsed'))

# Print the resulting DataFrame
print(df)

这段代码应该可以工作，但我永久收到错误。也许有人可以帮我验证一下。

Answer 1

实现此目的的一种方法是使用一个函数，该函数将使用嵌套字段展平任何类型的数据帧（此处建议将 JSON 字典转换为数据帧）：

import pandas as pd

data = {
    "revieweddata": [
        {
            "query": "Amazon Fresh Bestellungen sehen",
            "response": "Klicke auf „Letzte Einkäufe“ auf der Amazon Fresh Startseite, um deine Bestellungen zu sehen.",
            "pills": [
                {
                    "name": "Amazon Fresh",
                    "link": "https://www.amazon.de/alm/storefront?almBrandId=QW1hem9uIEZyZXNo&ref=fs_dsk_sn_logo",
                    "position": 37
                },
                {
                    "name": "Letzte Einkäufe",
                    "link": "https://www.amazon.de/afx/lists/pastpurchases/?almBrandId=QW1hem9uIEZyZXNo&ref=fs_dsk_sn_pastPurchases"
                }
            ]
        }
    ]
}

df = pd.DataFrame(data)
def flatten_nested_json_df(df):
    df = df.reset_index()
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()

    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
            exploded.index = df.index
            df = pd.concat([df, exploded], axis=1).drop(columns=[col])
            new_columns.extend(exploded.columns)  # inplace

        for col in list_columns:
            # print(f"exploding: {col}")
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()

        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    return df

应用于您的数据框将返回：

   index               revieweddata.query  \
0      0  Amazon Fresh Bestellungen sehen   
0      0  Amazon Fresh Bestellungen sehen   

                               revieweddata.response revieweddata.pills.name  \
0  Klicke auf „Letzte Einkäufe“ auf der Amazon Fr...            Amazon Fresh   
0  Klicke auf „Letzte Einkäufe“ auf der Amazon Fr...         Letzte Einkäufe   

                             revieweddata.pills.link  \
0  https://www.amazon.de/alm/storefront?almBrandI...   
0  https://www.amazon.de/afx/lists/pastpurchases/...   

   revieweddata.pills.position  
0                         37.0  
0                          NaN

如果需要，您可以重命名列。

将 JSON 导出到各个列

问题描述投票：0回答：1

列中第一行的 JSON
`reiveweddata`

目前，我使用此代码：

1个回答

最新问题

将 JSON 导出到各个列

问题描述 投票：0回答：1

列中第一行的 JSON reiveweddata

目前，我使用此代码：

1个回答

最新问题

问题描述投票：0回答：1

列中第一行的 JSON
`reiveweddata`