为什么
print()
会返回这些标签下的所有文本,而 return
却不会?
这是我正在使用的功能-
def parse_html(data):
ls = []
htmlParse = BeautifulSoup(data, 'html.parser')
for para in htmlParse.find_all(['script', 'head', 'title', 'meta', '[document]', 'p', 'body', 'a', "form", "input", "button", "style"]):
ls.append(para.text.strip())
return ls
Text = '<!DOCTYPE html><html><head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>FlexPortalen - Log ind</title> <link rel="stylesheet" href="/Content/bootstrap.css" /> <link rel="stylesheet" href="/Content/bootstrap-theme.min.css" /> <link rel="stylesheet" href="/login.css" /> <!--[if lt IE 9]> <script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script> <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> <![endif]--></head><body> <div class="container"> <div class="login-box"> <form method="post"> <input name="__RequestVerificationToken" type="hidden" value="w4YgqRKtcaPFQn6ncaavNgPVb5rLp0CtbylMJ3zYYa2fTGoAfkJ97araAO5i4Nbwo0wERIboCQssguo0UviOaM3HvECpjfuokKcq4rt_ADM1" /> <h2 class="text-center login-heading">FlexPortalen</h2> <div class="form-group"> <input type="text" class="form-control input-lg" name="username" id="username" placeholder="Brugernavn... " /> </div> <div class="form-group"> <input type="password" class="form-control input-lg" name="password" id="password" placeholder="Adgangskode..." /> </div> <div class="checkbox text-center"> <label> <input type="checkbox" name="rememberMe" id="rememberMe" /> Husk mig? </label> </div> <p class="text-center"> <button type="submit" class="btn btn-primary btn-lg">Log ind</button> </p> </form> </div> </div> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script> <script src="/Scripts/bootstrap.min.js"></script></body></html>'
如果我打印,它会给出:
FlexPortalen - Log ind
FlexPortalen
Husk mig?
Log ind
但是当我返回时,它只给出:
['FlexPortalen - Log ind']
检查
return
的缩进 - 要返回 list
以及所有信息,请将其放在 for loop
之外,否则它将在第一次迭代时返回 ls
:
def parse_html(data):
ls = []
htmlParse = BeautifulSoup(data, 'html.parser')
for para in htmlParse.find_all(['script', 'head', 'title', 'meta', '[document]', 'p', 'body', 'a', "form", "input", "button", "style"]):
ls.append(para.text.strip())
return ls