我想从代码中给出的链接中获取所有链接,尤其是这个
https://api.smartrecruiters.com/v1/companies/cermaticom/postings/
链接。我在网上找到的所有正则表达式都只获取像https://api.smartrecruiters.com
这样的简单链接
import requests
import re
from bs4 import BeautifulSoup
url='https://www.cermati.com/karir/'
x = requests.get(url)
html_doc=x.text
soup = BeautifulSoup(html_doc,"html.parser" )
print(soup)
findall
直接从响应中获取网址内容:
p= r'https://api\.smartrecruiters\.com/.*?(?=")'
urls = re.findall(p, html_doc)
输出:
['https://api.smartrecruiters.com/v1/companies/cermaticom/postings/743999896690090',
'https://api.smartrecruiters.com/v1/companies/cermaticom/postings/743999896676342',
'https://api.smartrecruiters.com/v1/companies/cermaticom/postings/743999896672229',
'https://api.smartrecruiters.com/v1/companies/cermaticom/postings/743999894177703',
'https://api.smartrecruiters.com/v1/companies/cermaticom/postings/743999874413809',
'https://api.smartrecruiters.com/v1/companies/cermaticom/postings/743999898147783',
'https://api.smartrecruiters.com/v1/companies/cermaticom/postings/743999898110826',
'https://api.smartrecruiters.com/v1/companies/cermaticom/postings/743999897538273',
'https://api.smartrecruiters.com/v1/companies/cermaticom/postings/743999897207847',
...