将带逗号分隔值的记录转换为集合

问题描述 投票:-1回答:3

我有一个包含多个记录的字符串。每条记录都有不同的单词列表,用逗号分隔。我想将每个记录转换为一个集合,其中记录中的单词是集合中的值。你能告诉我如何将它转换成一套吗?

例如。下面是文件中的两个记录。

citrus fruit,semi-finished bread,margarine,ready soups
tropical fruit,yogurt,coffee

我想将它们转换为:

{'citrus fruit','semi-finished bread','margarine','ready soups'}
{'tropical fruit','yogurt','coffee'}
python python-3.x
3个回答
0
投票

我不相信你想要一个set,因为在你的例子中,你按照你想要的结果列出了所有内容。 sets是无序的,不能包含重复项。目前还不清楚你的记录是如何划分的。

以下是使用list维护订单的示例:

>>> first_record = "citrus fruit,semi-finished bread,margarine,ready soups"
>>> second_record = "tropical fruit,yogurt,coffee"
>>> def tokenize(s, delim=","):
...   return s.split(delim)
... 
>>> first_result = tokenize(first_record)
>>> first_result
['citrus fruit', 'semi-finished bread', 'margarine', 'ready soups']
>>> second_result = tokenize(second_record)
>>> second_result
['tropical fruit', 'yogurt', 'coffee']

如果你真的想要一个set,只需将它包装在对set构造函数的调用中:

>>> first_result_set = set(first_result)
>>> second_result_set = set(second_result)
>>> first_result_set
{'margarine', 'ready soups', 'semi-finished bread', 'citrus fruit'}
>>> second_result_set
{'coffee', 'yogurt', 'tropical fruit'}

编辑:您可以在一个shebang中完成所有操作,因为我们知道记录是由换行符分隔的:

>>> records = "citrus fruit,semi-finished bread,margarine,ready soups\ntropical fruit,yogurt,coffee"
>>> def setitize_records(records, record_delim="\n", item_delim=","):
...   record_list = records.split(record_delim)
...   record_sets = [set(record.split(",")) for record in record_list]
...   return record_sets
... 
>>> result = setitize_records(records)
>>> result
[{'margarine', 'ready soups', 'semi-finished bread', 'citrus fruit'}, {'coffee', 'yogurt', 'tropical fruit'}]

0
投票

使用csv模块。

import csv

def readsets(filename):
    with open(filename) as f:
        for row in csv.reader(f):
            yield set(row)

0
投票

这应该适合你。看看这个。

rec = "citrus fruit,semi-finished bread,margarine,ready soups tropical
fruit,yogurt,coffee"
#result 
myset = set() 

while rec != "":
    head, _ , rec = rec.partition(',')
    myset.add(head)
print(myset)
© www.soinside.com 2019 - 2024. All rights reserved.