我正在尝试从此网页中抓取表格中的所有日期。如何:使用查找,指定表的元素及其属性(蓝色)问题:语法错误,当我尝试提取整个表时字符标识符无效。其他相关信息:该站点需要用户名和密码,因此我正在使用会话来保留我的凭据。
import requests
from getpass import getpass
from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
from requests.auth import HTTPBasicAuth
URL = "https://d2l.pima.edu/d2l/lms/dropbox/user/folders_list.d2l?ou=475011&isprv=0"
s = requests.Session()
s.auth = ("myusername", "mypass")
s.headers.update({"x-test": "true"})
# both "x-test" and "x-test2" are sent
s.get("https://d2l.pima.edu/d2l/lms/dropbox/user/folders_list.d2l?ou=475011&isprv=0", headers={"x-test2": "true"})
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find("div", attrs= {"id":"id_content_r_c1"})
错误引用了代码的最后一行:标识符中的无效字符但是,我进行了三重检查,并与其他有效的代码进行了比较,没有发现任何差异。请帮忙,谢谢
回溯:
runfile('/Users/rahelmizrahi/Python/scripts/d2lwebscrape1.py', wdir='/Users/rahelmizrahi/Python/scripts')
File "/Users/rahelmizrahi/Python/scripts/d2lwebscrape1.py", line 26
results = soup.find("div", attrs= {"id":"id_content_r_c1"})
^
SyntaxError: invalid character in identifier
这可能是复制/粘贴代码的结果-让我们看一下失败的行
>>> import unicodedata as ud
>>> s = 'results = soup.find("div", attrs= {"id":"id_content_r_c1"})'
>>> for c in s:print(c, ud.name(c))
...
r LATIN SMALL LETTER R
e LATIN SMALL LETTER E
s LATIN SMALL LETTER S
u LATIN SMALL LETTER U
l LATIN SMALL LETTER L
t LATIN SMALL LETTER T
s LATIN SMALL LETTER S
SPACE
= EQUALS SIGN
SPACE
s LATIN SMALL LETTER S
...
1 DIGIT ONE
" QUOTATION MARK
} RIGHT CURLY BRACKET
ZERO WIDTH SPACE
) RIGHT PARENTHESIS
倒数第二个字符“零宽度空间”是问题所在。删除它或重新输入代码行。