如何创建变量以遍历文件并出现在数据框中?

问题描述 投票:-1回答:1

我想创建一个数据框,其中包含特定球员'Lenny Hampel'的网球比赛数据。为此,我下载了很多.json文件,其中包含了他的比赛数据-总共大约有100个文件。由于它是一个json文件,因此我需要将每个文件转换为字典,最后将其放入数据帧。最后,我需要将每个文件连接到数据框。我可以对它进行硬编码,但是我认为这很愚蠢,但是我找不到合适的方法来迭代这个问题。

您能帮助我了解如何创建循环或其他方式以智能方式对其进行编码吗?

from bs4 import BeautifulSoup
import requests 
import json
import bs4 as bs
import urllib.request
from urllib.request import Request, urlopen
import pandas as pd
import pprint

with open('lenny/2016/lenny2016_match (1).json') as json_file:
    lennymatch1 = json.load(json_file)

player = [item 
          for item in lennymatch1["stats"] 
          if item["player_fullname"] == "Lenny Hampel"]

with open('lenny/2016/lenny2016_match (2).json') as json_file:
    lennymatch2 = json.load(json_file)

player2 = [item 
          for item in lennymatch2["stats"] 
          if item["player_fullname"] == "Lenny Hampel"]

with open('lenny/2016/lenny2016_match (3).json') as json_file:
    lennymatch3 = json.load(json_file)

player33 = [item 
          for item in lennymatch3["stats"] 
          if item["player_fullname"] == "Lenny Hampel"]


with open('lenny/2016/lenny2016_match (4).json') as json_file:
    lennymatch4 = json.load(json_file)

player4 = [item 
          for item in lennymatch4["stats"] 
          if item["player_fullname"] == "Lenny Hampel"]


tabelle1 = pd.DataFrame.from_dict(player)
tabelle2 = pd.DataFrame.from_dict(player2)
tabelle3 = pd.DataFrame.from_dict(player33)
tabelle4 = pd.DataFrame.from_dict(player4)

tennisstats = [tabelle1, tabelle2, tabelle3, tabelle4]

result = pd.concat(tennisstats)
result
python pandas dataframe
1个回答
0
投票

嗯,似乎基础知识太深,以至于我不明白你为什么要这样做。

# --- before loop ---

tennisstats = []

# --- loop ---

for filename in ["lenny/2016/lenny2016_match (1).json", "lenny/2016/lenny2016_match (2).json"]: 

    with open(filename) as json_file:
         lennymatch = json.load(json_file)

    player = [item 
         for item in lennymatch["stats"] 
         if item["player_fullname"] == "Lenny Hampel"]

    tabele = pd.DataFrame.from_dict(player)

    tennisstats.append(tabele)

# --- after loop ---

result = pd.concat(tennisstats)

如果文件名相似且只有不同的数字,则>]

for number in range(1, 101):

    filename = f"lenny/2016/lenny2016_match ({number}).json"

    with open(filename) as json_file:

其余部分与第一版相同。


如果所有文件都在同一文件夹中,那么也许您应该使用os.listdir()

directory = "lenny/2016/"

for name in os.listdir(directory):

    filename = directory + name

    with open(filename) as json_file:

其余部分与第一版相同。

© www.soinside.com 2019 - 2024. All rights reserved.