有人可以说在pandas上阅读csv FIle有什么问题

问题描述 投票:0回答:1

我正在使用名为parser.py的脚本将json数据转换为csv文件,并在另一个名为analyzer.py的脚本中对它们进行计数。我的问题是输出的CSV文件是正确的但是当我尝试读取analyzer.py中的文件时,数据框中的许多其他行已经破坏,新行不遵循列的顺序而没有放置任何值。抱歉英语不好我很伤心:(

CSV列:

 status_id, created_at, user_id, user_screen_name, status_text, hashtags,
 user_metions, url, status_fav_count, status_rewtweet_count, is_retweet, 
 ori_status_id, ori_creted_at, ori_user_id, ori_user_screen_name, 
 ori_text, ori_hashtags, ori_user_metions, ori_urls, ori_fav_count,
 ori_rewtweet_count, is_quoted, quoted_status_id, quoted_status_creted_at,
 quoted_status_user_id, quoted_status_screen_name, quoted_text,
 quoted_hashtags, quoted_user_metions, quoted_urls, quoted_fav_count, 
 quoted_rewtweet_count

例:

1106517910707679235,2019-03-15 11:30:02.19888170,Cout_ma,@Marish_ RT:kkkkkkkkkkkkkkkkkkkkkk,0.0,真,110651 6443468845061,五年03月15十一时24分12秒0000 2019.61990620,Marish_,kkkkkkkkkkkkkkkkkkkkkk ,,,, 5,6,True,1106513884314324992,Fri Mar 15 11:14:02 +0000 2019,14594813,叶子,农村主义者在Bolsonaro政府抱怨反华恶习,,, 160,34

执行读取测试时的输出:

Pandas(Index = 39498,status_id ='URL HERE',created_at = nan,user_id = nan,user_screen_name ='URL HERE',status_text ='0',hashtags ='0',user_metions ='False',url = nan, status_fav_count = nan,is_retweet = nan,ori_status_id = nan,ori_creted_at = nan,ori_user_id = nan,ori_user_screen_name = nan,ori_text = nan,ori_hashtags = nan,ori_user_metions ='False',ori_urls = nan,ori_fav_count = nan, ori_rewtweet_count = nan,is_quoted = nan,quoted_status_id = nan,quoted_status_creted_at = nan,quoted_status_user_id = nan,quoted_status_screen_name = nan,quoted_text = nan,quoted_hashtags = nan,quoted_user_metions = nan,quoted_urls = nan,quoted_fav_count = nan,quoted_rewtweet_count = nan)

写csv代码:

    df = pandas.DataFrame(to_csv,columns=['status_id',
                               'created_at',
                               'user_id',
                               'user_screen_name',
                                   .
                                   .
                                   .
                                    ])
    df = df.sort_values(by='status_id')
    df.to_csv(to + index + '_' + start.strftime('%Y-%m-%d %H:%M:%S') + '.csv',index=False,encoding='utf8')

阅读csv代码:

 data = pd.read_csv(path + '/' + name) # var name contains the csv file name
for i in data.itertuples():
   print(i)
python pandas csv dataframe
1个回答
1
投票

确保推文文本或其他字段不包含您使用的CSV分隔符(在本例中为逗号),否则在读取行时将无法判断分隔符是否要分离,或者只是原始文本中的字符串。如果引用字符串,则问题不应该持续存在。

© www.soinside.com 2019 - 2024. All rights reserved.