我有一个文件的格式是这样的
{'apple': 4, 'orange': 3, 'peach': 1}
{}
{'apple': 1, 'banana': 1}
{'peach': 1}
{}
{}
{'pear': 3}
...
[10k more lines like this]
我想创建一个新的文本文件来存储这些水果对象的总计数,就像这样--。
apple:110
banana:200
pineapple:50
...
我怎么做呢?
我的尝试。我试着用Python (如有疑惑,请跳过) -
f = open("fruits.txt","r") lines = f.readlines() f.close() g = open("number_of_fruits.txt","a") for line in lines: #Iterating through every line, for character in "{}'": #Removing extra characters, line = line.replace(character, "") for i in range(0,line.count(":")): #Using the number of colons as a counter, line = line[ [m.start() for m in re.finditer("[a-z]",line)][i] : [m.start() for m in re.finditer("[0-9]",line)][i] + 1 ] #Slice the line like this - line[ith time I detect any letter : ith time I detect any number + 1] #And then somehow store that number in temp, slicing however needed for every new fruit #Open a new file #First look if any of the fruits in my line already exist #If they do: #Convert that sliced number part of string to integer, add temp to it, and write it back to the file #else: #Make a newline entry with the object name and the sliced number from line.
首先Python中的函数数量非常多,让人难以承受。而此时我只是考虑使用C语言,这已经是一个糟糕的想法了。
避免使用eval。
如果你能确保格式化会像上面一样,我会选择把它当作JSON。
import json
from collections import Counter
with open('fruits.txt') as f:
counts = Counter()
for line in f.readlines():
counts.update(json.loads(line.replace("'", '"')))
如果你想按照上面的定义输出。
for fruit, count in counts.items():
print(f"{fruit}:{count}")
根据@DarryIG在评论中的literal_eval建议,否定了JSON的使用。
from ast import literal_eval
from collections import Counter
with open('fruits.txt') as f:
counts = Counter()
for line in f.readlines():
counts.update(literal_eval(line))
你可以使用python的内置函数,比如 字面意义_eval 用于在python中对每一行进行字典评估。
from ast import literal_eval
from collections import defaultdict, Counter
with open("input.txt", 'r') as inputFile:
counts = Counter()
for line in inputFile:
a = literal_eval(line)
counts.update(Counter(a))
print(dict(counts))
输出:
{'apple': 5, 'orange': 3, 'banana': 1, 'peach': 2, 'pear': 3}
使用defaultdict和json
import json
from collections import defaultdict
result = defaultdict(int)
with open('fruits.txt') as f:
for line in f:
data = json.loads(line.replace("'", '"'))
for fruit, num in data.items():
result[fruit] += num
print(result)
产出
defaultdict(<class 'int'>, {'apple': 5, 'orange': 3, 'peach': 2, 'banana': 1, 'pear': 3})
EDIT:我建议使用@BenjaminRowell的答案(我加了票)。为了简洁起见,我还是保留这个吧。
EDIT2(2020年5月22日)。如果是用双引号而不是单引号,这将是: ndjsonjsonlines 格式(这里是有趣的讨论 之间的关系)。) 您可以使用 ndjson 或 jsonlines 包来处理它,例如。
import ndjson
from collections import Counter
with open('sample.txt') as f:
# if using double quotes, you can do:
#data = ndjson.load(f)
# because it uses single quotes - read the whole file and replace the quotes
data = f.read()
data = ndjson.loads(data.replace("'", '"'))
counts = Counter()
for item in data:
counts.update(item)
print(counts)