将数据帧中的值标识为整数或混合分数并将其分别转换为浮点数

问题描述 投票:1回答:1

[该程序将为KBO幻想棒球联盟编制棒球统计数据。程序使用Excel文件中的名册从类似http://eng.koreabaseball.com/Teams/PlayerInfoPitcher/GameLogs.aspx?pcode=65320的网页中抓取数据,并将新的每日统计信息附加到运行报告和分析的数据集。

    MAY    OPP    ERA  RES  PA     IP  H  HR  BB  HBP  K  R  ER   OAVG
0  5.06     SK   6.23  NaN  18  4 1/3  3   2   1    0  5  3   3  0.188
1  5.12    KIA   0.00  NaN  25      7  1   0   3    0  8  0   0  0.045
2  5.17  LOTTE   1.29  NaN  26      7  2   1   3    0  6  1   1  0.087
3  5.23     NC   3.18    L  27  5 2/3  7   0   3    1  6  2   2  0.304
4  5.29     SK  14.73    L  20  3 2/3  7   2   2    0  2  6   6  0.389,     
    JUN     OPP  ERA RES  PA  IP  H  HR  BB  HBP  K  R  ER  OAVG
0  6.04  KIWOOM  6.0   L  26   6  8   2   1    0  8  4   4  0.32

“ IP”列(节距)存储为整数或混合分数。将两者都转换为浮点数可能是最简单的。

from fractions import Fraction

def mixed_to_float(x):
    return float(sum(fractions.Fraction(term) for term in x.split()))

    for i, df in enumerate(dfpitcher):
            # I need to manipulate the IP to convert it to float; I've tried several approaches. 
            # Closest attempt thus far. The "innings_pitched variable returns the Index, the value, the name, and the dtype.
            innings_pitched = todaystats['IP']

            print(player_name,' had innings pitched: ', innings_pitched)
            todaystats.loc[((todaystats['IP'] >= 6) | (todaystats['ER'] <= 3)),'QS'] = 1

当前结果在str和int实例之间不支持> =上。

python pandas dataframe floating-point fractions
1个回答
0
投票

我想在那儿放那些多余的行(作为每个月的标题)就是在烦扰它。您不能将“ IP”字符串更改为float / int。我会考虑将“月份”列'MAY''JUNE'等更改为简单的'DATE',然后转换为精算日期(我假设这是2020赛季?尽管他们在韩国联赛打棒球了?)

然后仅应用函数进行转换:

import pandas as pd
from fractions import Fraction

def mixed_to_float(ip):
    try:
        ip = str(ip)
        if len(ip.split()) > 1:
            ipFloat = ip.split()
            ipFloat = (int(ipFloat[0]) + float(Fraction(ipFloat[-1])))
        else:
            ipFloat = float(ip)
        return ipFloat
    except:
        return ip


url = 'http://eng.koreabaseball.com/Teams/PlayerInfoPitcher/GameLogs.aspx?pcode=65320'
dfs = pd.read_html(url)    

results = pd.DataFrame()
for df in dfs:
    df['IP'] = df['IP'].apply(mixed_to_float)
    df = df.rename(columns={df.columns[0]:'DATE'})
    df['DATE'] = df['DATE'].astype(str) + '.2020'
    df['DATE'] = pd.to_datetime(df['DATE'])
    df['DATE'] = df.DATE.apply(lambda x: x.date())
    results = results.append(df, sort=False)
results = results.reset_index(drop=True)        

输出:

print (results.to_string())
         DATE      OPP    ERA  RES  PA        IP  H  HR  BB  HBP  K  R  ER   OAVG
0  2020-05-07      KIA   3.60  NaN  24  5.000000  9   0   2    0  4  2   2  0.409
1  2020-05-13  SAMSUNG   2.45    L  27  7.333333  4   0   0    0  6  4   2  0.154
2  2020-05-19       SK  13.50  NaN  17  2.666667  4   0   4    0  2  5   4  0.308
3  2020-05-24    LOTTE   1.50    L  23  6.000000  5   0   0    0  3  1   1  0.217
4  2020-05-03       KT   3.00    W  22  6.000000  5   1   0    0  5  2   2  0.238
5  2020-06-05       LG   2.57    W  26  7.000000  5   1   0    0  4  2   2  0.192
© www.soinside.com 2019 - 2024. All rights reserved.