如何根据某些特性合并Python列表中的某些元素

问题描述 投票:0回答:1

这是一个列表,每个元素由两个字符串和中间的“/t”组成。我们可以将左侧的字符串称为“标签”,右侧的部分称为“文本”。

continued   In the film, "Girl Interrupted," Winona Ryder plays an 18-year-old
continued   who enters a mental institution for
continued   what
anecdote    is diagnosed as borderline personality disorder
anecdote    The year is 1967
anecdote    the country is in turmoil over Vietnam and civil rights
continued   While
continued   lying on her bed one night
continued   and
continued   watching TV
continued   ,
anecdote    she sees a news report about a demonstration
continued   The narrator says something
continued   that might apply to today's turmoil
continued   :
continued   "We live in a time of doubt
continued   .
continued   The institutions
continued   we once trusted no longer
anecdote    seem reliable."
continued   As 2014 ends    Modd-NU
statistics  the stock market is at record highs
assumption  our traditional institutions and self-confidence are in decline
continued   A Pew Research Center study confirms one trend
testimony   that has been obvious over several years
assumption  The "typical" American family is no longer typical
statistics  Just 46 percent of American children now live in homes with their married, heterosexual parents
statistics  Five percent have no parents at home
continued   They most likely are living with grandparents
continued   ,
testimony   says the study
assumption  These startling figures about the decline of the American family contrast with the year 1960
continued   when    Modd-NU
statistics  73 percent of American children lived in traditional families
assumption  A major contributor to this trend has been the assault on marriage and other institutions by the Baby Boom generation

我的问题是我想将标记为“继续”的元素与其后面的第一个未标记为“继续”的元素合并。例如,前三个元素标记为“继续”,所以我想将它们合并与第四个元素,并为第四个元素使用标签“轶事”。我是初学者,对迭代运算不太熟悉,非常感谢!

Chatgpt 给出了一段代码:

result = [] 
iterator = iter(lines)
current_item = next(iterator, None)
while current_item:
    label, text = current_item.split('\t', 1) 
    if label == 'continued':
        text_list = [text]
        while label == 'continued':
            next_item = next(iterator, None)
            if next_item:
                label, next_text = next_item.split('\t', 1)
                text_list.append(next_text)
            else:
                break
        label = label
        text = ''.join(text_list)
    else:
        next_item = next(iterator, None)
    result.append(f'{label}\t{text}')
    current_item = next_item
for item in result:
    print(item)

但是这段代码最终会生成一些重复的元素。如果您可以微调这段代码,我也将不胜感激!

python nlp text-processing
1个回答
0
投票

尝试在迭代器中“预读”并不是一个好计划。相反,一次只执行一行,然后收集“连续”行,直到得到一个不“连续”的行。

result = [] 
build = []
for line in open('x.txt'):
    line = line.strip()
    if not line:
        continue
    label, text = line.split('\t', 1) 
    if label == 'continued':
        build.append( text )
    else:
        if build:
            build = " ".join(build)
            result.append(f"continued\t{build}")
            build = []
        result.append(f'{label}\t{text}')
if build:
    build = " ".join(build)
    result.append(f"continued\t{build}")
for item in result:
    print(item)

输出:

continued   In the film, "Girl Interrupted," Winona Ryder plays an 18-year-old who enters a mental institution for what
anecdote    is diagnosed as borderline personality disorder
anecdote    The year is 1967
anecdote    the country is in turmoil over Vietnam and civil rights
continued   While lying on her bed one night and watching TV ,
anecdote    she sees a news report about a demonstration
continued   The narrator says something that might apply to today's turmoil : "We live in a time of doubt . The institutions we once trusted no longer
anecdote    seem reliable."
continued   As 2014 ends    Modd-NU
statistics  the stock market is at record highs
assumption  our traditional institutions and self-confidence are in decline
continued   A Pew Research Center study confirms one trend
testimony   that has been obvious over several years
assumption  The "typical" American family is no longer typical
statistics  Just 46 percent of American children now live in homes with their married, heterosexual parents
statistics  Five percent have no parents at home
continued   They most likely are living with grandparents ,
testimony   says the study
assumption  These startling figures about the decline of the American family contrast with the year 1960
continued   when    Modd-NU
statistics  73 percent of American children lived in traditional families
assumption  A major contributor to this trend has been the assault on marriage and other institutions by the Baby Boom generation
© www.soinside.com 2019 - 2024. All rights reserved.