使用 python nlp 库将产品标题缩短到特定长度

Question

我有一系列产品，需要一个少于 40 个字符的特定产品名称。我输入的产品名称是一个字符串列，每个项目的长度超过 40 个字符，因此我需要将其缩短。我可以使用一些字符串方法，但在这种情况下，某些产品名称可能会变成毫无意义的名称。例如，输入名称可以是“Cut Resistant Gloves，Size 8，Grey/Black - 12 per DZ”(52)。例如，我怎样才能得到“Resistant Size 8 Grey/Black Gloves”（34）？预先感谢

我想在我的数据框中添加一个新列，其中包含少于 40 个字符的新产品名称。

Answer 1

您可以根据您的需求修改下面实现的逻辑：

import pandas as pd
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(product_name)

shortened_tokens = []
noun_tokens = []
adjective_tokens = []
size_tokens = []

# Iterate over tokens and identify nouns, adjectives, and size/volume information
for token in doc:
    if token.pos_ == "NOUN":
        noun_tokens.append(token.text)
    elif token.pos_ == "ADJ":
        adjective_tokens.append(token.text)
    elif token.pos_ == "NUM" and token.head.text.lower() in ["size", "vol", "volume"]:
        size_tokens.append(token.text)
    elif token.lower_ in ["size", "vol", "volume"]:
        size_tokens.append(token.text)

# Determine the number of adjectives and nouns to include
num_adjectives = min(len(adjective_tokens), Max_Adj_count)  # Initialise Max_Adj_count as the max number of adjectives permissible
num_nouns = min(len(noun_tokens), Max_noun_count)           # Initialise Max_Noun_count as the max number of nouns permissible

# Construct the shortened name using specific rules
size_info = " ".join(size_tokens[:1])  
shortened_tokens.extend(adjective_tokens[:num_adjectives])
shortened_tokens.extend(size_info.split())  
shortened_tokens.extend(noun_tokens[:num_nouns])


shortened_name = " ".join(shortened_tokens)

# If the shortened name is longer than 40 characters, truncate at the nearest word boundary
if len(shortened_name) > 40:
    shortened_name = " ".join(shortened_name.split()[:7])

使用 python nlp 库将产品标题缩短到特定长度

问题描述投票：0回答：1

1个回答

最新问题

使用 python nlp 库将产品标题缩短到特定长度

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1