从数据集中获取最高的代码重复和不重复

Question

我有一个看起来像这样的数据集：

id	代码	约会
1	39	20180527
1	17	20180223
1	17	20180223
1	17	20180223
1	30	20120612
1	14	20120214
2	40	20210605
2	32	20210412
2	25	20210315
3	39	20170504
3	17	20170205
3	40	20150506

如您所见，最高代码具有最高日期。我想提取那些代码和日期。其中一些正在重复。例如，在 id 1 中，最高代码是 39 和 30，我想获得这些代码以及与它们关联的日期。这里要注意的一件事是最高代码日期的值之间存在一年的差距。上述数据集的输出应该是：

id	代码	约会
1	39	20180527
1	30	20120612
2	40	20210605
3	39	20170504
3	40	20150506

我尝试使用下面的代码，但它只给我最高值，并没有考虑重复值。

latest_dates = column_selection.groupby("id").max() # group the data by id and get the max date for each group

latest_dates = latest_dates.reset_index() # reset the index

latest_dates # print the latest date for each ID with the new index

这是我使用上面的代码得到的输出：

id	代码	约会
1	39	20180527
2	40	20210605
3	39	20170504

我将不胜感激任何帮助。

从数据集中获取最高的代码重复和不重复

问题描述投票：0回答：0

最新问题

从数据集中获取最高的代码重复和不重复

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0