使用 python nlp 库将产品标题缩短到特定长度

问题描述 投票:0回答:1

我有一系列产品,需要一个少于 40 个字符的特定产品名称。我输入的产品名称是一个字符串列,每个项目的长度超过 40 个字符,因此我需要将其缩短。我可以使用一些字符串方法,但在这种情况下,某些产品名称可能会变成毫无意义的名称。 例如,输入名称可以是“Cut Resistant Gloves,Size 8,Grey/Black - 12 per DZ”(52)。例如,我怎样才能得到“Resistant Size 8 Grey/Black Gloves”(34)? 预先感谢

我想在我的数据框中添加一个新列,其中包含少于 40 个字符的新产品名称。

python string nlp nltk e-commerce
1个回答
0
投票

您可以根据您的需求修改下面实现的逻辑:

import pandas as pd
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(product_name)

shortened_tokens = []
noun_tokens = []
adjective_tokens = []
size_tokens = []

# Iterate over tokens and identify nouns, adjectives, and size/volume information
for token in doc:
    if token.pos_ == "NOUN":
        noun_tokens.append(token.text)
    elif token.pos_ == "ADJ":
        adjective_tokens.append(token.text)
    elif token.pos_ == "NUM" and token.head.text.lower() in ["size", "vol", "volume"]:
        size_tokens.append(token.text)
    elif token.lower_ in ["size", "vol", "volume"]:
        size_tokens.append(token.text)

# Determine the number of adjectives and nouns to include
num_adjectives = min(len(adjective_tokens), Max_Adj_count)  # Initialise Max_Adj_count as the max number of adjectives permissible
num_nouns = min(len(noun_tokens), Max_noun_count)           # Initialise Max_Noun_count as the max number of nouns permissible

# Construct the shortened name using specific rules
size_info = " ".join(size_tokens[:1])  
shortened_tokens.extend(adjective_tokens[:num_adjectives])
shortened_tokens.extend(size_info.split())  
shortened_tokens.extend(noun_tokens[:num_nouns])


shortened_name = " ".join(shortened_tokens)

# If the shortened name is longer than 40 characters, truncate at the nearest word boundary
if len(shortened_name) > 40:
    shortened_name = " ".join(shortened_name.split()[:7]) 
© www.soinside.com 2019 - 2024. All rights reserved.