在Django中显示已删除的HTML div的所有段落

问题描述 投票:0回答:3

所以我有一个来自新闻网站的HTML格式。这是一个HTML:

<div class="cn-content">

<figure><img src="https://cimg.co/w/articles-attachments/1/5ca/71a090479e.jpg" sizes="(min-width: 640px) 720px, 100vw" srcset="https://cimg.co/w/articles-attachments/1/5ca/71a090479e.jpg 300w, https://cimg.co/w/articles-attachments/2/5ca/71a090479e.jpg 600w, https://cimg.co/w/articles-attachments/3/5ca/71a090479e.jpg 720w, https://cimg.co/w/articles-attachments/4/5ca/71a090479e.jpg 900w, https://cimg.co/w/articles-attachments/0/5ca/71a090479e.jpg 1337w" alt="OKEx Announced its First Token Sale via IEO 101" class="content-img"><figcaption>Source: iStock/baona</figcaption></figure>
<p>Major cryptocurrency exchange <b>OKEx</b> has announced an initial exchange offering (IEO) for the <b>BLOC</b> token, on their newly-presented OK Jumpstart token sale platform. The sale marks the first such endeavor of the exchange, joining the likes of <a href="https://cryptonews.com/ext/binance/" target="_blank" rel="nofollow noopener">Binance </a>and <a href="https://cryptonews.com/ext/bittrex/" target="_blank" rel="nofollow noopener">Bittrex </a>in the so-called killer app club.</p>
<p>The token in question is BLOC, native to the <b>Blockcloud</b> blockchain, and the sale is set to start at AM 12:00 UTC on April 10th. “Combining the advantages of blockchain and Future Internet technology, it reconstructs the technology layers below where current blockchain networks and Internet applications operate,” explains the project’s website. In short, it is a blockchain-based TCP/IP architecture, where TCP/IP is a suite of communication protocols used to interconnect network devices on the internet. </p>
<p>The token sale uses a subscription + allotment approach. Users will have a timeframe of 30 minutes to subscribe, and allotment will be based on the amount of the exchange’s native <a href="https://cryptonews.com/coins/okb/">OKB tokens</a> they hold over a seven-day period. The minimum threshold for a subscription is 500 OKB tokens (USD 1,145) held for those seven consecutive days, or buying in 3,500 OKB tokens on the last day - but to have their subscription guaranteed, users need to hold at least 2,500 OKB tokens daily or buy 17,500 OKB tokens on the final day before snapshot time.</p>
<p>The snapshots, which will be used to prove the users’ eligibility for participation, will be taken every day at AM 10:00 UTC, starting seven days before the token sale day. Then, users get their individual allotment coefficients based on the sum of OKB holdings in the moment of those snapshots. Users will have their individual subscription amounts in OKB locked up, and receive tokens based on a formula available on the OKEx blog. This formula bases the token allotment on both how many tokens users held during this period, as well as the amount of OKB they locked in as their subscription. </p>
<p>This move lets OKEx join the club of exchanges offering fundraising services. The latest example was Bittrex, where the token sale of <b>VeriBlock</b> tokens took a <a href="https://cryptonews.com/news/bittrex-beats-binance-in-selling-out-tokens-at-lightning-spe-3633.htm">mere 10 seconds</a>, beating even Binance’s speed of 22 seconds for the <b><a href="https://cryptonews.com/coins/fetch-ai/">Fetch.AI</a></b> token. Binance’s co-founder and CEO Changpeng Zhao coined the term “killer app” back in February, when he said in an interview that he views exchange-based fundraising as the next killer app.</p>
        </div>

所以在我的模型中我定义了一个属性来清理这个HTML,所以我只显示段落文本,如下所示:

@property
def description_clean(self):
    soup = BeautifulSoup(self.description)
    description = soup.find_all('div',attrs={"class":"cn-content"})
    for item in description:
        return item.find('p').text

但是,当我在{{ post.description_clean }}模板中使用它时,这只是渲染第一段

输出是:

主要的加密货币交易所OKEx在其新推出的OK Jumpstart令牌销售平台上宣布了BLOC令牌的初始交换产品(IEO)。此次拍卖标志着该交易所的首次此类努力,加入Binance和Bittrex等所谓的杀手级应用俱乐部。

为什么其他段落没有被渲染,因为我循环它?

python django beautifulsoup django-templates
3个回答
1
投票

你需要:

main_div = soup.find('div', attrs={"class": "cn-content"})
paragraphs = main_div.find_all('p')
for p in paragraphs:
    # save p text

1
投票

获得div标签后,您没有遍历所有p标签。将您的代码更新为:

@property
def description_clean(self):
    soup = BeautifulSoup(self.description)
    description = soup.find_all('div',attrs={"class":"cn-content"})
    p_tags = []  # result list
    for item in description:
        individual_p_tags = []  # preserve each individual "div"
        for p in item.find_all('p'):  # loop over all the "p" tags in each "div"
            individual_p_tags.append(p.text)  # append to a temp list
        p_tags.append("\n".join(individual_p_tags)) # convert the list to a string and append to the result list
    return p_tags  # this is a list of strings

0
投票

您可以返回段落列表

description = [item.text for item in soup.select('div.cn-content')]

然后

return description
© www.soinside.com 2019 - 2024. All rights reserved.