如何在汤体内提取这个值

问题描述 投票:0回答:1

这是我的 python 代码,它对 API 调用响应内容执行 BeautifulSoup:

soup = BeautifulSoup(resp.content, 'lxml')

如果我输出汤体,它看起来像这样:

<html> 
<body>
.... 
<script src="/site_media/js/jquery/jquery.js" type="text/javascript"></script>
<script nonce="" type="text/javascript">
  var username_field = document.getElementById("id_username");
  if(username_field.value){
    document.getElementById("id_password").focus();
  } else {
    username_field.focus();
  }
  $(".toggle-password").click(function() {
    $(this).toggleClass("fa-eye fa-eye-slash");
    var input = $($(this).attr("toggle"));
    if (input.attr("type") == "password") {
      input.attr("type", "text");
    } else {
      input.attr("type", "password");
    }
  });
  var iam_login_link = document.getElementById("iam_login_link");
  var iam_login_enabled = "False";
  if (iam_login_enabled === 'True') {
    iam_login_link.style.display = ''
  } else {
    iam_login_link.style.display = 'none'
  }
  $('#iamLogin').on('click', function() {
    window.location.href = "/saml-idp/applebananapeach/iam_login/?SAMLRequest=BlaBlaBla";
  });
</script>
</body>
</html>

我的问题是如何从汤中提取 window.location.href 链接?

python-3.x web-scraping beautifulsoup
1个回答
0
投票

认为在这种特定情况下不需要

beautiful soup
,因为您必须从 JavaScript 中提取值,因此可以使用正则表达式:

pattern = r'window\.location\.href\s*=\s*["\']([^"\']+)["\']'
match = re.search(pattern, resp.content)

if match:
    
    print(match.group(1))
else:
    print('not found')
© www.soinside.com 2019 - 2024. All rights reserved.