从.csv文件中提取最常发现的名称的Python程序

问题描述 投票:0回答:1

我创建了一个程序,该程序可以生成5000个随机名称,ssn,城市,地址和电子邮件,并将它们存储在fakeprofile.csv文件中。我正在尝试从文件中提取最常用的名称。我能够使该程序在语法上正常工作,但无法提取常用名称。这是代码:

import re
import statistics

import collections
from collections import Counter
with open('fakeprofile.csv', 'r') as f:
    text = f.read()
    frequent_names = re.compile(r"[\w']+", re.U).findall(text)   # re.U == re.UNICODE
    counts = collections.Counter(frequent_names)
    print(counts)

文件样本:

Alicia Walters 419-52-4141 Yorkstad 66616 Schultz Extensions Suite 225
Reynoldsmouth, VA 72465 [email protected]
Nicole Duffy 212-38-9009 West Timothy 51077 Phillips Ports Apt. 314
Hubbardville, IN 06723 [email protected]
Stephanie Lewis 442-20-1279 Jacquelineshire 650 Gutierrez Forge Apt. 839
West Christianbury, TN 13654 [email protected]
Michael Harris 108-81-3733 East Toddberg 14387 Douglas Mission Suite 038
Garciaview, WI 58624 [email protected]
Aaron Moreno 171-30-7715 Port Taraburgh 56672 Wagner Path
Lake Christopher, VA 37884 [email protected]
Alicia Zimmerman 286-88-9507 Barberstad 5365 Heath Extensions Apt. 731
South Randyburgh, NJ 79367 [email protected]
Brittney Mcmillan 334-44-0321 Lisahaven PSC 3856, Box 2428
APO AE 03215 [email protected]
Amanda Perkins 327-31-6610 Perryville 8750 Hurst Harbor Apt. 929

样本输出:

'25135': 1, 'Hahn': 1, 'Ayersland': 1, '97974': 1, 'erinbird': 1, 'orozco': 1, '2297': 1, 'Villarrealmouth': 1, 'Andrewmouth': 1, '21167': 1, 'ghenderson': 1, '1764': 1, 'Kentmouth': 1, '41928': 1, 'brentnolan': 1, '0705': 1, 'Katieburgh': 1, '903': 1, 'Ortizmouth': 1, '61612': 1, 'paul75': 1, '0612': 1, '9392': 1, 'Kristineville': 1, '17953': 1, 'nicole71': 1, '6300': 1, '73959': 1, 'cochranlaura': 1, '70507': 1, 'esampson': 1, '6749': 1, 'Heatherfort': 1, '90358': 1, 'Joshuaport': 1, '57824': 1, 'sergiochung': 1, '5634': 1, 'Clarkside': 1, 'Gilmoreside': 1, '25385': 1, 'melissa74': 1, '7231': 1, 'Hobbsstad': 1, '45351': 1, 'zamorarandy': 1, 'Haney': 1, '8504': 1, '46954': 1, 'apham': 1, '2943': 1, '10622': 1, 'awheeler': 1, '5374': 1, 'Staffordton': 1, '2164': 1, 'Jamestown': 1, '01525': 1, '5919': 1, 'Clayville': 1, '24981': 1, 'jordanhernandez': 1, 'Mullen': 1, '4018': 1, '9030': 1, '66590': 1, 'lvaughn': 1, 'Pugh': 1, 'Hernandezhaven': 1, '56157': 1, 'ehoward': 1, 'Hurley': 1, '3243': 1, '48238': 1, 'martinezholly': 1, 'murray': 1, '2820': 1, '1679': 1, 'Heidibury': 1, '98893': 1, 'karen57': 1, '7224': 1, '8931': 1, 'Veronicachester': 1, '87637': 1, 'yrichard': 1, '0063': 1, 'Christiehaven': 1, '12461': 1, 'vanessa86': 1, 'Gibbston': 1, '03973': 1, 'martinricardo': 1, '2726': 1, '985': 1, '32306': 1, 'chungkathy': 1, 'ferguson': 1, 'Webbshire': 1, '7008': 1, 'Suzanneland': 1, '63896': 1, '6158': 1, '59351': 1, 'navarrowilliam': 1, '9284': 1, 'Heberttown': 1, '39012': 1, 'tashajones': 1, '2593': 1, '01270': 1, 'sergio56': 1, '6376': 1, '340': 1, 'Jennymouth': 1, '74285': 1, 'natalie82': 1, '5394': 1, '15818': 1, '99194': 1, 'thomaslauren': 1, '52213': 1, '47963': 1, 'qpope': 1, 'Garciastad': 1, '50622': 1, '23687': 1, 'mcdonaldhunter': 1, '5819': 1, 'Westtown': 1,
python-3.x csv extraction names
1个回答
0
投票

[我认为,如果使用pandas库进行CSV操作(收集需求信息),然后将python集合(如counter(df ['name'])应用到其中,那会更好,否则您可以给我们有关CSV文件的更多信息。

正则表达式也可以考虑城市谢谢

© www.soinside.com 2019 - 2024. All rights reserved.