我有df:
domain orgid
csyunshu.com 108299
dshu.com 108299
bbbdshu.com 108299
cwakwakmrg.com 121303
ckonkatsunet.com 121303
我想添加一个新列,用每个 orgid 的数字 id 替换域列:
domain orgid domainid
csyunshu.com 108299 1
dshu.com 108299 2
bbbdshu.com 108299 3
cwakwakmrg.com 121303 1
ckonkatsunet.com 121303 2
我已经尝试过这条线,但它没有给出我想要的结果:
df.groupby('orgid').count['domain'].reset_index()
有人可以帮忙吗?
您可以在
rank
对象上调用 groupby
并传递参数 method='first'
:
In [61]:
df['domainId'] = df.groupby('orgid')['orgid'].rank(method='first')
df
Out[61]:
domain orgid domainId
0 csyunshu.com 108299 1
1 dshu.com 108299 2
2 bbbdshu.com 108299 3
3 cwakwakmrg.com 121303 1
4 ckonkatsunet.com 121303 2
如果您想覆盖该列,您可以执行以下操作:
df['domain'] = df.groupby('orgid')['orgid'].rank(method='first')
您可以使用 sklearn.preprocessing 中的 LabelEncoder,如下所示:
df["domain"] = LabelEncoder().fit_transform(df.domain)
非常直观
dplyr
df %>% group_by(orgid) %>% mutate(domainid=row_number())
datar
:
from datar.all import *
df = tibble(
domain=['csyunshu.com', 'dshu.com', 'bbbdshu.com', 'cwakwakmrg.com', 'ckonkatsunet.com'],
orgid=[108299,108299,108299,121303,121303]
)
df >> group_by(f.orgid) >> mutate(domainid=row_number())
# <pandas.core.groupby.generic.DataFrameGroupBy object at 0x7ff728cba490>
df >> group_by(f.orgid) >> mutate(domainid=row_number()) >> showme()
[2021-03-13 00:55:12][datar][ INFO] # [DataFrameGroupBy] Groups: ['orgid'] (2)
# domain orgid domainid
# 0 csyunshu.com 108299 0
# 1 dshu.com 108299 1
# 2 bbbdshu.com 108299 2
# 3 cwakwakmrg.com 121303 0
# 4 ckonkatsunet.com 121303 1
df >> group_by(f.orgid) >> mutate(domainid=row_number()+1) >> showme()
[2021-03-13 00:55:26][datar][ INFO] # [DataFrameGroupBy] Groups: ['orgid'] (2)
# domain orgid domainid
# 0 csyunshu.com 108299 1
# 1 dshu.com 108299 2
# 2 bbbdshu.com 108299 3
# 3 cwakwakmrg.com 121303 1
# 4 ckonkatsunet.com 121303 2
与 EdChum 类似的方法,除了比
rank()
更好的功能是 cumcount()
,原因有多种:
cumcount()
返回整数而不是浮点数,这可能就是您想要的 id。+ 1
。这是代码:
In [1]:
df['domainId'] = df.groupby('orgid').cumcount()
df
Out[1]:
domain orgid domainId
0 csyunshu.com 108299 0
1 dshu.com 108299 1
2 bbbdshu.com 108299 2
3 cwakwakmrg.com 121303 0
4 ckonkatsunet.com 121303 1