熊猫会忽略缺少的日期来查找百分位数

问题描述 投票:1回答:1

我有一个数据框。我试图找到日期时间的百分位数。我正在使用该功能:

数据框:

student, attempts, time
student 1,14, 9/3/2019  12:32:32 AM
student 2,2, 9/3/2019  9:37:14 PM
student 3, 5
student 4, 16, 9/5/2019  8:58:14 PM

studentInfo2 = [14, 4, Timestamp('2019-09-04 00:26:36')]
data['time'] = pd.to_datetime(data['time_0001'], errors='coerce')
perc1_first = stats.percentileofscore(data['time'].notnull(), student2Info[2], 'rank')

其中student2Info [2]保存特定学生的日期时间。当我尝试执行此操作时,出现错误:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

关于在列中缺少时间的情况下如何获得百分位数以正确计算的任何想法?

python pandas percentile
1个回答
1
投票

您需要将时间戳转换为percentileofscore可以理解的单位。另外,pd.DataFrame.notnull()返回一个布尔列表,您可以使用它来过滤DataFrame,它不返回过滤后的列表,因此我为您更新了此列表。这是一个工作示例:

import pandas as pd
import scipy.stats as stats

data = pd.DataFrame.from_dict({
    "student": [1, 2, 3, 4],
    "attempts": [14, 2, 5, 16],
    "time_0001": [
        "9/3/2019  12:32:32 AM",
        "9/3/2019  9:37:14 PM",
        "",
        "9/5/2019  8:58:14 PM"
    ]
})

student2Info = [14, 4, pd.Timestamp('2019-09-04 00:26:36')]
data['time'] = pd.to_datetime(data['time_0001'], errors='coerce')
perc1_first = stats.percentileofscore(data[data['time'].notnull()].time.transform(pd.Timestamp.toordinal), student2Info[2].toordinal(), 'rank')
print(perc1_first)  #-> 66.66666666666667
© www.soinside.com 2019 - 2024. All rights reserved.