访问 json 文件中的值比 for 循环更好的方法？

Question

我有一个 json 文件，如下所示：

[{'data': [{'text': 'add '},
  {'text': 'Stani, stani Ibar vodo', 'entity': 'entity_name'},
  {'text': ' songs in '},
  {'text': 'my', 'entity': 'playlist_owner'},
  {'text': ' playlist '},
  {'text': 'música libre', 'entity': 'playlist'}]},
{'data': [{'text': 'add this '},
  {'text': 'album', 'entity': 'music_item'},
  {'text': ' to '},
  {'text': 'my', 'entity': 'playlist_owner'},
  {'text': ' '},
  {'text': 'Blues', 'entity': 'playlist'},
  {'text': ' playlist'}]},
{'data': [{'text': 'Add the '},
  {'text': 'tune', 'entity': 'music_item'},
  {'text': ' to the '},
  {'text': 'Rage Radio', 'entity': 'playlist'},
  {'text': ' playlist.'}]}]

我想为该列表中的每个“数据”附加“文本”中的值。

我尝试过以下方法：

lst = []

for item in data:
    p = item['data']
    p_st = ''
    for item_1 in p:
        p_st += item_1['text'] + ' '
    lst.append(p_st)
    
print(lst)

Out: ['add  Stani, stani Ibar vodo  songs in  my  playlist  música libre ', 'add this  album  to  my   Blues  playlist ', 'Add the  tune  to the  Rage Radio  playlist. ']

它有效，但我对 json 很陌生，想知道是否有更好的方法来做到这一点？也许是一些内置的 json 方法或库？谢谢你。

Answer 1

您的代码非常适合从 JSON 数据中提取文本值。但是，如果您想要一种更简洁的方法来实现相同的结果，您可以在 Python 中使用列表推导式，这可以使您的代码更短且更具可读性。具体方法如下：

使用 JSON 模块和列表推导式:

import json

data = [{'data': [{'text': 'add '}, {'text': 'Stani, stani Ibar vodo', 'entity': 'entity_name'}, {'text': ' songs in '}, {'text': 'my', 'entity': 'playlist_owner'}, {'text': ' playlist '}, {'text': 'música libre', 'entity': 'playlist'}]},
        {'data': [{'text': 'add this '}, {'text': 'album', 'entity': 'music_item'}, {'text': ' to '}, {'text': 'my', 'entity': 'playlist_owner'}, {'text': ' '}, {'text': 'Blues', 'entity': 'playlist'}, {'text': ' playlist'}]},
        {'data': [{'text': 'Add the '}, {'text': 'tune', 'entity': 'music_item'}, {'text': ' to the '}, {'text': 'Rage Radio', 'entity': 'playlist'}, {'text': ' playlist.'}]}]

text_values = [' '.join(item['text'] for item in entry['data']) for entry in data]

print(text_values)

使用熊猫：

import pandas as pd

data = [{'data': [{'text': 'add '}, {'text': 'Stani, stani Ibar vodo', 'entity': 'entity_name'}, {'text': ' songs in '}, {'text': 'my', 'entity': 'playlist_owner'}, {'text': ' playlist '}, {'text': 'música libre', 'entity': 'playlist'}]},
        {'data': [{'text': 'add this '}, {'text': 'album', 'entity': 'music_item'}, {'text': ' to '}, {'text': 'my', 'entity': 'playlist_owner'}, {'text': ' '}, {'text': 'Blues', 'entity': 'playlist'}, {'text': ' playlist'}]},
        {'data': [{'text': 'Add the '}, {'text': 'tune', 'entity': 'music_item'}, {'text': ' to the '}, {'text': 'Rage Radio', 'entity': 'playlist'}, {'text': ' playlist.'}]}]

# Create a DataFrame from the data
df = pd.DataFrame(data)

# Extract and join the 'text' values for each 'data' entry
text_values = df['data'].apply(lambda x: ' '.join(item['text'] for item in x))

print(text_values.tolist())

如果您计划对 JSON 数据执行额外的数据分析或操作，则 pandas 方法更适合，因为它提供了一种强大而灵活的方式来处理结构化数据。

Answer 2

这里没有特殊的 JSON 工具可以提供帮助，因为您已经解析了 JSON，并且拥有普通的旧 Python

dict

s 和

list

s 和

str

s（不，解析过程不是'不能以任何简单的方式修改来做你想做的事，这应该在解析之后完成）。

也就是说，你的代码是不惯用的，并且有一些效率低下的地方（CPython 尝试来帮助解决这些问题，但是

str

重复串联的优化很脆弱，不可移植，而且仍然比这样做更糟糕这是正确的方式

str.join

）。改进后的代码如下所示：

lst = [' '.join([item_1['text'] for item_1 in item['data']])
       for item in data]
print(lst)

它使用列表理解来生成外部列表，其中生成的每个元素都是该

'text'

的

item

的所有

'data'

值的空格分隔串联。在外部部分使用 listcomp 会使速度更快一些（这是利用 listcomp 的解释器优化的微优化，但并不是很大的改进）。不过，使用 ' '.join

是

的一大算法改进；重复的字符串连接是 O(n²) （CPython 将其优化为几乎

O(n)

有时

，但不是那么好，而且不可靠），而通过 ' '.join 的批量连接可以保证

O(n)

。如果您的数据只有少量字符串（如图所示），则差异可能可以忽略不计，但代码更简单且更易于阅读/维护。如果数据有

很多

字符串需要连接，这可能会显着加快速度。

Answer 3

with open(filename,'r+') as file: #open and load json file into dict file_data = json.load(file) #append new data to dict file_data[].append(new_data) #sets file's current position at offset file.seek(0) #convert back to json json.dump(file_data, file, indent = 4)

访问 json 文件中的值比 for 循环更好的方法？

问题描述投票：0回答：3

3个回答

最新问题

访问 json 文件中的值比 for 循环更好的方法？

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3