进行tostring（）时丢失原始文本

Question

我有一个json对象，其中一个键是：

"transcript": "The universe is bustling with matter and energy. Even in the vast apparent emptiness of intergalactic space, there's one hydrogen atom per cubic meter. That's not the mention a barrage of particles and electromagnetic radiation passing every which way from stars, galaxies, and into black holes. There's even radiation left over from the Big Bang...

加载数据框：

#initialize dataframe for the universe transcript 

dfJson = pd.read_json('test1.json')

这是我尝试提取它的代码。

dfJsonTranscript = dfJson.get('transcript').to_string()
pprint.pprint(dfJsonTranscript)

text_file = open("sample.txt", "wt")
n = text_file.write(dfJsonTranscript)
text_file.close()

我的输出

0      The universe is bustling with matter and energ...
1      The universe is bustling with matter and energ...
2      The universe is bustling with matter and energ...
3      The universe is bustling with matter and energ...
4      The universe is bustling with matter and energ...
5      The universe is bustling with matter and energ...
6      The universe is bustling with matter and energ...
7      The universe is bustling with matter and energ...
8      The universe is bustling with matter and energ...

原始JSON：

"transcript": "The universe is bustling with matter and energy. Even in the vast apparent emptiness of intergalactic space, there's one hydrogen atom per cubic meter. That's not the mention a barrage of particles and electromagnetic radiation passing every which way from stars, galaxies, and into black holes. There's even radiation left over from the Big Bang... universe. ",
  "words": [
    {
      "alignedWord": "the",
      "case": "success",
      "end": 6.31,
      "endOffset": 3,
      "phones": [
        {
          "duration": 0.09,
          "phone": "dh_B"
        },
        {
          "duration": 0.05,
          "phone": "iy_E"
        }
      ],
      "start": 6.17,
      "startOffset": 0,
      "word": "The"
    },
    {
      "alignedWord": "universe",
      "case": "success",
      "end": 6.83,
      "endOffset": 12,
      "phones": [
        {
          "duration": 0.08,
          "phone": "y_B"
        },

为什么要在其上运行toString（）方法时丢失键的原始值。我会因为通过熊猫将其变成数据框而丢失它吗？

Answer 1

尝试一下：

dfJsonTranscript = dfJson.get('transcript').to_string(index=False)
设置index=False，我们可以指示to_string的DataFrame方法不要打印索引（行）标签。
编辑：
为了防止字符串被截断，您可以在熊猫上设置max_colwidth属性，需要在调用to_string方法之前进行设置。pd.set_option("display.max_colwidth", 10000)

进行tostring（）时丢失原始文本

问题描述投票：1回答：1

我有一个json对象，其中一个键是：

加载数据框：

我的输出

原始JSON：

1个回答

最新问题

进行tostring（）时丢失原始文本

问题描述 投票：1回答：1

我有一个json对象，其中一个键是：

加载数据框：

我的输出

原始JSON：

1个回答

最新问题

问题描述投票：1回答：1