我有以下格式的 CSV 数据:
"/some/page-1.md","title","My title 1"
"/some/page-1.md","description","My description 1"
"/some/page-1.md","type","Tutorial"
"/some/page-1.md","index","True"
"/some/page-2.md","title","My title 2"
"/some/page-2.md","description","My description 2"
"/some/page-2.md","type","Tutorial"
"/some/page-2.md","index","False"
"/some/page-2.md","custom_1","abc"
"/some/page-3.md","title","My title 3"
"/some/page-3.md","description","My description 3"
"/some/page-3.md","type","Tutorial"
"/some/page-3.md","index","True"
"/some/page-3.md","custom_2","def"
我正在将其读入 Pandas DataFrame:
df = pd.read_csv(csvFile, index_col=False, dtype=object, header=None)
print(df)
输出如下:
0 1 2
0 /some/page-1.md title My title 1
1 /some/page-1.md description My description 1
2 /some/page-1.md type Tutorial
3 /some/page-1.md index True
4 /some/page-2.md title My title 2
5 /some/page-2.md description My description 2
6 /some/page-2.md type Tutorial
7 /some/page-2.md index False
8 /some/page-2.md custom_1 abc
9 /some/page-3.md title My title 3
10 /some/page-3.md description My description 3
11 /some/page-3.md type Tutorial
12 /some/page-3.md index True
13 /some/page-3.md custom_2 def
我想将其转换为以下格式的 DataFrame,其中第一个标题是“文件”,值来自第 0 列。其他标题取自第 1 列,值来自第 2 列:
file title description type index custom_1 custom_2
0 /some/page-1.md My title 1 My description 1 Tutorial True NaN NaN
1 /some/page-2.md My title 2 My description 2 Tutorial False abc NaN
2 /some/page-3.md My title 3 My description 3 Tutorial True NaN def
有办法用 Pandas 做到这一点吗?
我已将您的第一个列名称更改为文件、标题和值。所以,可以轻松处理你想要的事情。您需要使用
pivot_table
方法来达到您的目标。最终代码如下所示。
df = pd.DataFrame(data, columns=["file", "header", "value"])
result = df.pivot_table(index='file', columns='header', values='value', aggfunc='first').reset_index()
result = result[result.index.notna()]
你的输出将会是这样的。所以我们需要删除“标题”标签。
header file custom_1 custom_2 description index title type
0 /some/page-1.md NaN NaN My description 1 True My title 1 Tutorial
1 /some/page-2.md abc NaN My description 2 False My title 2 Tutorial
2 /some/page-3.md NaN def My description 3 True My title 3 Tutorial
要删除“标题”标签,您需要使用:
result.columns.name = None
最终输出如下
file custom_1 custom_2 description index title type
0 /some/page-1.md NaN NaN My description 1 True My title 1 Tutorial
1 /some/page-2.md abc NaN My description 2 False My title 2 Tutorial
2 /some/page-3.md NaN def My description 3 True My title 3 Tutorial