我想为一列分配另一列的可变长度的切片,但由于某种原因它无法按我的预期工作,并且我不明白为什么:
import numpy as np
import pandas as pd
m = np.array([[1, 'AAAAA'],
[2, 'BBBB'],
[3, 'CCC']])
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x['s1'].str.slice(start=0, stop=x['s1'].str.len()-1))
)
print(df)
这导致
id s1 s2
0 1 AAAAA NaN
1 2 BBBB NaN
2 3 CCC NaN
但是,我希望以下内容:
id s1 s2
0 1 AAAAA AAAA
1 2 BBBB BBB
2 3 CCC CC
知道在这里发生什么吗?
您需要str[:-1]
来索引没有后缀的列的所有值:
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x['s1'].str[:-1])
)
print(df)
id s1 s2
0 1 AAAAA AAAA
1 2 BBBB BBB
2 3 CCC CC
您唯一有效的解决方案是使用apply
分别检查每一行,例如:
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x.apply(lambda y: y['s1'][0:len(y['s1'])-1], axis=1))
)
print(df)
id s1 s2
0 1 AAAAA AAAA
1 2 BBBB BBB
2 3 CCC CC
问题出在您的slice()
stop
arg中,只需要是-1
。
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x['s1'].str.slice(start=0, stop=-1)
)
您可以使用像这样申请熊猫:
在[1中:导入 熊猫 as pd在[2]中:df = pd.DataFrame({“ id”:[1,2,3],“ s1”:[“ AAAAA”,“ BBBB”,“ CCC”]})在[3]中:df出[3]:编号s10 1 AAAAA1 2 BBBB2 3 CCC在[4]中:df [“ s2”] = df [“ s1”]。apply( x:x [:-1])在[5]中:df出[5]:编号s1 s20 1 AAAAA AAAA1 2 BBBB BBBB2 3 CCC CC在[6]: