我的目的是从这个网站的表格中提取数据。https:/www.coteur.commatchcotes-start-stromsgodset-rid1106841.html
数据存储在tr balise中,导入所有tr balise后,感谢xpath,我检查了前3行的元素数量,但它是空的。如果我的代码是正确的,我应该有[6 6 6]
这是我的代码。
#!/usr/bin/python3
# -*- coding: utf-8 -*-
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
import lxml.html as lh
import pandas as pd
url = 'https://www.coteur.com/match/cotes-start-stromsgodset-rid1106841.html'
#Create a handle , page, to handle the contents of the first soccer game
page = requests.get(url)
#Store the contents of the website under doc
doc = lh.fromstring(page.content)
#Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//tr')
#Check the length of the first 3 rows
a = [len(T) for T in tr_elements[:3]]
print(a)
这是输出。
hao@hao-ThinkPad-T420:~$ ./extractodds.py
[]