使用 Beautifulsoup 解析 HTML - Print 可以工作,但 Return 不行

问题描述 投票:0回答:1

为什么

print()
会返回这些标签下的所有文本,而
return
却不会?

这是我正在使用的功能-

def parse_html(data):
    ls = []
    htmlParse = BeautifulSoup(data, 'html.parser')
    for para in htmlParse.find_all(['script', 'head', 'title', 'meta', '[document]', 'p', 'body', 'a', "form", "input", "button", "style"]): 
        ls.append(para.text.strip())
        return ls
Text = '<!DOCTYPE html><html><head>    <meta charset="utf-8">    <meta http-equiv="X-UA-Compatible" content="IE=edge">    <meta name="viewport" content="width=device-width, initial-scale=1">    <title>FlexPortalen - Log ind</title>    <link rel="stylesheet" href="/Content/bootstrap.css" />    <link rel="stylesheet" href="/Content/bootstrap-theme.min.css" />    <link rel="stylesheet" href="/login.css" />    <!--[if lt IE 9]>      <script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>    <![endif]--></head><body>    <div class="container">        <div class="login-box">            <form method="post">                <input name="__RequestVerificationToken" type="hidden" value="w4YgqRKtcaPFQn6ncaavNgPVb5rLp0CtbylMJ3zYYa2fTGoAfkJ97araAO5i4Nbwo0wERIboCQssguo0UviOaM3HvECpjfuokKcq4rt_ADM1" />                <h2 class="text-center login-heading">FlexPortalen</h2>                <div class="form-group">                    <input type="text" class="form-control input-lg" name="username" id="username" placeholder="Brugernavn...    " />                </div>                <div class="form-group">                        <input type="password" class="form-control input-lg" name="password" id="password" placeholder="Adgangskode..." />                </div>                <div class="checkbox text-center">                    <label>                        <input type="checkbox" name="rememberMe" id="rememberMe"  /> Husk mig?                    </label>                </div>                                <p class="text-center">                    <button type="submit" class="btn btn-primary btn-lg">Log ind</button>                </p>            </form>        </div>    </div>    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>    <script src="/Scripts/bootstrap.min.js"></script></body></html>'

如果我打印,它会给出:

FlexPortalen - Log ind
FlexPortalen  
Husk mig?                 
Log ind

但是当我返回时,它只给出:

['FlexPortalen - Log ind']
python html web-scraping beautifulsoup html-parsing
1个回答
0
投票

检查

return
的缩进 - 要返回
list
以及所有信息,请将其放在
for loop
之外,否则它将在第一次迭代时返回
ls

def parse_html(data):
    ls = []
    htmlParse = BeautifulSoup(data, 'html.parser')
    for para in htmlParse.find_all(['script', 'head', 'title', 'meta', '[document]', 'p', 'body', 'a', "form", "input", "button", "style"]): 
        ls.append(para.text.strip())
    return ls
© www.soinside.com 2019 - 2024. All rights reserved.