美丽的汤错误“无效字符标识符”

问题描述 投票:1回答:1

我正在尝试从此网页中抓取表格中的所有日期。如何:使用查找,指定表的元素及其属性(蓝色)问题:语法错误,当我尝试提取整个表时字符标识符无效。其他相关信息:该站点需要用户名和密码,因此我正在使用会话来保留我的凭据。

import requests
from getpass import getpass
from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
from requests.auth import HTTPBasicAuth

URL = "https://d2l.pima.edu/d2l/lms/dropbox/user/folders_list.d2l?ou=475011&isprv=0"
s = requests.Session()
s.auth = ("myusername", "mypass")
s.headers.update({"x-test": "true"}) 

# both "x-test" and "x-test2" are sent
s.get("https://d2l.pima.edu/d2l/lms/dropbox/user/folders_list.d2l?ou=475011&isprv=0", headers={"x-test2": "true"})
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")
results = soup.find("div", attrs= {"id":"id_content_r_c1"}​)

错误引用了代码的最后一行:标识符中的无效字符但是,我进行了三重检查,并与其他有效的代码进行了比较,没有发现任何差异。请帮忙,谢谢

另外这里是我网页的DOCenter image description hereenter image description here

回溯:

runfile('/Users/rahelmizrahi/Python/scripts/d2lwebscrape1.py', wdir='/Users/rahelmizrahi/Python/scripts')
  File "/Users/rahelmizrahi/Python/scripts/d2lwebscrape1.py", line 26
    results = soup.find("div", attrs= {"id":"id_content_r_c1"}​)
                                                              ^
SyntaxError: invalid character in identifier
python unicode syntax-error special-characters
1个回答
0
投票

这可能是复制/粘贴代码的结果-让我们看一下失败的行

>>> import unicodedata as ud
>>> s = 'results = soup.find("div", attrs= {"id":"id_content_r_c1"})'
>>> for c in s:print(c, ud.name(c))
... 
r LATIN SMALL LETTER R
e LATIN SMALL LETTER E
s LATIN SMALL LETTER S
u LATIN SMALL LETTER U
l LATIN SMALL LETTER L
t LATIN SMALL LETTER T
s LATIN SMALL LETTER S
  SPACE
= EQUALS SIGN
  SPACE
s LATIN SMALL LETTER S
...
1 DIGIT ONE
" QUOTATION MARK
} RIGHT CURLY BRACKET
 ZERO WIDTH SPACE
) RIGHT PARENTHESIS

倒数第二个字符“零宽度空间”是问题所在。删除它或重新输入代码行。

© www.soinside.com 2019 - 2024. All rights reserved.