我目前有代码可以抓取 2 组信息所需的所有信息,并将它们输入到 2 个不同的数据框中。代码是:
daily_racecard_info = beautifulSoupText.find(class_='w-racecard-grid-container widget-content widget-content-no-padding')
# Finds main body for INDIVIDUAL racecards
individual_course_racecard = daily_racecard_info.find_all(class_='w-racecard-grid-meeting')
# Get each race info
race_results = daily_racecard_info.find_all("li", class_="w-racecard-grid-race-result")
# Create an empty list to store the data
data = []
# Loop over each race course
for item in individual_course_racecard:
course_names = item.find(class_='w-racecard-grid-course clickable').text.strip()
course_going = item.find(class_='w-racecard-grid-info w-racecard-grid-info-going').text.strip()
# Removes word "going" from going
goings_simple = course_going.split('Going')[-1].strip()
data.append({
"Date": date,
"Course": course_names,
"Going": goings_simple,
})
df = pd.DataFrame(data)
# Create an empty list to store the data
data2 = []
# Loop over each race result
for race in race_results:
# Find the a tag within the li
a_tag = race.find("a")
# Extract the distance from the second span
time, race_type = a_tag.contents[1].text.split()
# Extract the distance from the second span
distance = a_tag.contents[3].text
# Removes word "(Inner)" from distance
distance_simple = distance.split('(Inner)')[-0].strip()
# Extract the title from the third span
title = a_tag.contents[5].text
# Construct the full URL using urljoin
full_url = urljoin(url, a_tag["href"])
# Add the data to the list
data2.append({
"Time": time,
"Race Type": race_type,
"Distance": distance_simple,
"Title": title,
"URL": full_url
})
# Convert the list of dicts to a pandas DataFrame
df2 = pd.DataFrame(data2)
# Joins dataframes in to one
data_frames2 = pd.concat([df ,df2],axis='columns')
print(data_frames2)
这个(缩短的)输出是:
我需要做的是输出更像这样的东西(正确的课程附加到正确的时间等):
我觉得它需要在第二个 for 循环(“for race in race_results”)中添加一条语句,然后根据时间提取“父”课程名称,但我无法理解它。