[该程序将为KBO幻想棒球联盟编制棒球统计数据。程序使用Excel文件中的名册从类似http://eng.koreabaseball.com/Teams/PlayerInfoPitcher/GameLogs.aspx?pcode=65320的网页中抓取数据,并将新的每日统计信息附加到运行报告和分析的数据集。
MAY OPP ERA RES PA IP H HR BB HBP K R ER OAVG
0 5.06 SK 6.23 NaN 18 4 1/3 3 2 1 0 5 3 3 0.188
1 5.12 KIA 0.00 NaN 25 7 1 0 3 0 8 0 0 0.045
2 5.17 LOTTE 1.29 NaN 26 7 2 1 3 0 6 1 1 0.087
3 5.23 NC 3.18 L 27 5 2/3 7 0 3 1 6 2 2 0.304
4 5.29 SK 14.73 L 20 3 2/3 7 2 2 0 2 6 6 0.389,
JUN OPP ERA RES PA IP H HR BB HBP K R ER OAVG
0 6.04 KIWOOM 6.0 L 26 6 8 2 1 0 8 4 4 0.32
“ IP”列(节距)存储为整数或混合分数。将两者都转换为浮点数可能是最简单的。
from fractions import Fraction
def mixed_to_float(x):
return float(sum(fractions.Fraction(term) for term in x.split()))
for i, df in enumerate(dfpitcher):
# I need to manipulate the IP to convert it to float; I've tried several approaches.
# Closest attempt thus far. The "innings_pitched variable returns the Index, the value, the name, and the dtype.
innings_pitched = todaystats['IP']
print(player_name,' had innings pitched: ', innings_pitched)
todaystats.loc[((todaystats['IP'] >= 6) | (todaystats['ER'] <= 3)),'QS'] = 1
当前结果在str和int实例之间不支持> =上。
我想在那儿放那些多余的行(作为每个月的标题)就是在烦扰它。您不能将“ IP”字符串更改为float / int。我会考虑将“月份”列'MAY'
,'JUNE'
等更改为简单的'DATE'
,然后转换为精算日期(我假设这是2020赛季?尽管他们在韩国联赛打棒球了?)
然后仅应用函数进行转换:
import pandas as pd
from fractions import Fraction
def mixed_to_float(ip):
try:
ip = str(ip)
if len(ip.split()) > 1:
ipFloat = ip.split()
ipFloat = (int(ipFloat[0]) + float(Fraction(ipFloat[-1])))
else:
ipFloat = float(ip)
return ipFloat
except:
return ip
url = 'http://eng.koreabaseball.com/Teams/PlayerInfoPitcher/GameLogs.aspx?pcode=65320'
dfs = pd.read_html(url)
results = pd.DataFrame()
for df in dfs:
df['IP'] = df['IP'].apply(mixed_to_float)
df = df.rename(columns={df.columns[0]:'DATE'})
df['DATE'] = df['DATE'].astype(str) + '.2020'
df['DATE'] = pd.to_datetime(df['DATE'])
df['DATE'] = df.DATE.apply(lambda x: x.date())
results = results.append(df, sort=False)
results = results.reset_index(drop=True)
输出:
print (results.to_string())
DATE OPP ERA RES PA IP H HR BB HBP K R ER OAVG
0 2020-05-07 KIA 3.60 NaN 24 5.000000 9 0 2 0 4 2 2 0.409
1 2020-05-13 SAMSUNG 2.45 L 27 7.333333 4 0 0 0 6 4 2 0.154
2 2020-05-19 SK 13.50 NaN 17 2.666667 4 0 4 0 2 5 4 0.308
3 2020-05-24 LOTTE 1.50 L 23 6.000000 5 0 0 0 3 1 1 0.217
4 2020-05-03 KT 3.00 W 22 6.000000 5 1 0 0 5 2 2 0.238
5 2020-06-05 LG 2.57 W 26 7.000000 5 1 0 0 4 2 2 0.192