这是我的 python 代码,它对 API 调用响应内容执行 BeautifulSoup:
soup = BeautifulSoup(resp.content, 'lxml')
如果我输出汤体,它看起来像这样:
<html>
<body>
....
<script src="/site_media/js/jquery/jquery.js" type="text/javascript"></script>
<script nonce="" type="text/javascript">
var username_field = document.getElementById("id_username");
if(username_field.value){
document.getElementById("id_password").focus();
} else {
username_field.focus();
}
$(".toggle-password").click(function() {
$(this).toggleClass("fa-eye fa-eye-slash");
var input = $($(this).attr("toggle"));
if (input.attr("type") == "password") {
input.attr("type", "text");
} else {
input.attr("type", "password");
}
});
var iam_login_link = document.getElementById("iam_login_link");
var iam_login_enabled = "False";
if (iam_login_enabled === 'True') {
iam_login_link.style.display = ''
} else {
iam_login_link.style.display = 'none'
}
$('#iamLogin').on('click', function() {
window.location.href = "/saml-idp/applebananapeach/iam_login/?SAMLRequest=BlaBlaBla";
});
</script>
</body>
</html>
我的问题是如何从汤中提取 window.location.href 链接?
认为在这种特定情况下不需要
beautiful soup
,因为您必须从 JavaScript 中提取值,因此可以使用正则表达式:
pattern = r'window\.location\.href\s*=\s*["\']([^"\']+)["\']'
match = re.search(pattern, resp.content)
if match:
print(match.group(1))
else:
print('not found')