如何在汤体内提取这个值

Question

这是我的 python 代码，它对 API 调用响应内容执行 BeautifulSoup：

soup = BeautifulSoup(resp.content, 'lxml')

如果我输出汤体，它看起来像这样：

<html> 
<body>
.... 
<script src="/site_media/js/jquery/jquery.js" type="text/javascript"></script>
<script nonce="" type="text/javascript">
  var username_field = document.getElementById("id_username");
  if(username_field.value){
    document.getElementById("id_password").focus();
  } else {
    username_field.focus();
  }
  $(".toggle-password").click(function() {
    $(this).toggleClass("fa-eye fa-eye-slash");
    var input = $($(this).attr("toggle"));
    if (input.attr("type") == "password") {
      input.attr("type", "text");
    } else {
      input.attr("type", "password");
    }
  });
  var iam_login_link = document.getElementById("iam_login_link");
  var iam_login_enabled = "False";
  if (iam_login_enabled === 'True') {
    iam_login_link.style.display = ''
  } else {
    iam_login_link.style.display = 'none'
  }
  $('#iamLogin').on('click', function() {
    window.location.href = "/saml-idp/applebananapeach/iam_login/?SAMLRequest=BlaBlaBla";
  });
</script>
</body>
</html>

我的问题是如何从汤中提取 window.location.href 链接？

Answer 1

认为在这种特定情况下不需要

beautiful soup

，因为您必须从 JavaScript 中提取值，因此可以使用正则表达式：

pattern = r'window\.location\.href\s*=\s*["\']([^"\']+)["\']'
match = re.search(pattern, resp.content)

if match:
    
    print(match.group(1))
else:
    print('not found')

如何在汤体内提取这个值

问题描述投票：0回答：1

1个回答

最新问题

如何在汤体内提取这个值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1