"transcript": "The universe is bustling with matter and energy. Even in the vast apparent emptiness of intergalactic space, there's one hydrogen atom per cubic meter. That's not the mention a barrage of particles and electromagnetic radiation passing every which way from stars, galaxies, and into black holes. There's even radiation left over from the Big Bang...
#initialize dataframe for the universe transcript
dfJson = pd.read_json('test1.json')
这是我尝试提取它的代码。
dfJsonTranscript = dfJson.get('transcript').to_string()
pprint.pprint(dfJsonTranscript)
text_file = open("sample.txt", "wt")
n = text_file.write(dfJsonTranscript)
text_file.close()
0 The universe is bustling with matter and energ...
1 The universe is bustling with matter and energ...
2 The universe is bustling with matter and energ...
3 The universe is bustling with matter and energ...
4 The universe is bustling with matter and energ...
5 The universe is bustling with matter and energ...
6 The universe is bustling with matter and energ...
7 The universe is bustling with matter and energ...
8 The universe is bustling with matter and energ...
"transcript": "The universe is bustling with matter and energy. Even in the vast apparent emptiness of intergalactic space, there's one hydrogen atom per cubic meter. That's not the mention a barrage of particles and electromagnetic radiation passing every which way from stars, galaxies, and into black holes. There's even radiation left over from the Big Bang... universe. ",
"words": [
{
"alignedWord": "the",
"case": "success",
"end": 6.31,
"endOffset": 3,
"phones": [
{
"duration": 0.09,
"phone": "dh_B"
},
{
"duration": 0.05,
"phone": "iy_E"
}
],
"start": 6.17,
"startOffset": 0,
"word": "The"
},
{
"alignedWord": "universe",
"case": "success",
"end": 6.83,
"endOffset": 12,
"phones": [
{
"duration": 0.08,
"phone": "y_B"
},
为什么要在其上运行toString()方法时丢失键的原始值。我会因为通过熊猫将其变成数据框而丢失它吗?
尝试一下:
dfJsonTranscript = dfJson.get('transcript').to_string(index=False)
设置
index=False
,我们可以指示to_string
的DataFrame
方法不要打印索引(行)标签。
编辑:
为了防止字符串被截断,您可以在熊猫上设置max_colwidth
属性,需要在调用to_string
方法之前进行设置。pd.set_option("display.max_colwidth", 10000)