Pandas 读取 csv 其中第 1 列中的 id、第 2 列中的标题和第 3 列中的值? [重复]

问题描述 投票:0回答:1

我有以下格式的 CSV 数据:

"/some/page-1.md","title","My title 1"
"/some/page-1.md","description","My description 1"
"/some/page-1.md","type","Tutorial"
"/some/page-1.md","index","True"
"/some/page-2.md","title","My title 2"
"/some/page-2.md","description","My description 2"
"/some/page-2.md","type","Tutorial"
"/some/page-2.md","index","False"
"/some/page-2.md","custom_1","abc"
"/some/page-3.md","title","My title 3"
"/some/page-3.md","description","My description 3"
"/some/page-3.md","type","Tutorial"
"/some/page-3.md","index","True"
"/some/page-3.md","custom_2","def"

我正在将其读入 Pandas DataFrame:

df = pd.read_csv(csvFile, index_col=False, dtype=object, header=None)
print(df)

输出如下:

                  0            1                 2
0   /some/page-1.md        title        My title 1
1   /some/page-1.md  description  My description 1
2   /some/page-1.md         type          Tutorial
3   /some/page-1.md        index              True
4   /some/page-2.md        title        My title 2
5   /some/page-2.md  description  My description 2
6   /some/page-2.md         type          Tutorial
7   /some/page-2.md        index             False
8   /some/page-2.md     custom_1               abc
9   /some/page-3.md        title        My title 3
10  /some/page-3.md  description  My description 3
11  /some/page-3.md         type          Tutorial
12  /some/page-3.md        index              True
13  /some/page-3.md     custom_2               def

我想将其转换为以下格式的 DataFrame,其中第一个标题是“文件”,值来自第 0 列。其他标题取自第 1 列,值来自第 2 列:

              file       title       description      type  index  custom_1  custom_2
0  /some/page-1.md  My title 1  My description 1  Tutorial   True       NaN       NaN
1  /some/page-2.md  My title 2  My description 2  Tutorial  False       abc       NaN
2  /some/page-3.md  My title 3  My description 3  Tutorial   True       NaN       def

有办法用 Pandas 做到这一点吗?

python pandas dataframe csv read-csv
1个回答
0
投票

我已将您的第一个列名称更改为文件、标题和值。所以,可以轻松处理你想要的事情。您需要使用

pivot_table
方法来达到您的目标。最终代码如下所示。

df = pd.DataFrame(data, columns=["file", "header", "value"])


result = df.pivot_table(index='file', columns='header', values='value', aggfunc='first').reset_index()

result = result[result.index.notna()]

你的输出将会是这样的。所以我们需要删除“标题”标签。

header             file custom_1 custom_2       description  index       title      type
0       /some/page-1.md      NaN      NaN  My description 1   True  My title 1  Tutorial
1       /some/page-2.md      abc      NaN  My description 2  False  My title 2  Tutorial
2       /some/page-3.md      NaN      def  My description 3   True  My title 3  Tutorial

要删除“标题”标签,您需要使用:

result.columns.name = None

最终输出如下

              file custom_1 custom_2       description  index       title      type
0  /some/page-1.md      NaN      NaN  My description 1   True  My title 1  Tutorial
1  /some/page-2.md      abc      NaN  My description 2  False  My title 2  Tutorial
2  /some/page-3.md      NaN      def  My description 3   True  My title 3  Tutorial
© www.soinside.com 2019 - 2024. All rights reserved.