如何替换pandas数据框架中列中有括号或数字的条目?

问题描述 投票:1回答:2
I have a dataframe like this:         
    Country                  Energy Supply      Energy Supply per Capita
16  Afghanistan              3.210000e+08       10.0    
17  Albania                  1.020000e+08       35.0    
18  Algeria                  1.959000e+09       51.0    
19  American Samoa           NaN                                        
40  Bolivia 
   (Plurinational State of)  3.360000e+08       32.0
... ... ... ...
213 Switzerland17            1.113000e+09       136.0   
214 Syrian Arab Republic     5.420000e+08       28.0    
215 Tajikistan               1.060000e+08       13.0    
216 Thailand                 5.336000e+09       79.0    
228 Ukraine18                4.844000e+09       107.0   
232 United States of 
    America20                9.083800e+10       286.0

我需要替换所有在名字中包含括号或数字的国家名称。例如:'玻利维亚(多民族国家)'应该是'玻利维亚','瑞士17'应该是'瑞士','美利坚合众国20'应该是'美利坚合众国'。

谁能帮助我解决这个问题。

python pandas data-science
2个回答
1
投票

你可以使用这个regex模式与 str.extract:

df['Country'] = df.Country.str.extract('^([^\d\(]*)')[0]

輸出。

                      Country  Energy Supply  Energy Supply per Capita
16                Afghanistan   3.210000e+08                      10.0
17                    Albania   1.020000e+08                      35.0
18                    Algeria   1.959000e+09                      51.0
19             American Samoa            NaN                       NaN
40                   Bolivia    3.360000e+08                      32.0
213               Switzerland   1.113000e+09                     136.0
214      Syrian Arab Republic   5.420000e+08                      28.0
215                Tajikistan   1.060000e+08                      13.0
216                  Thailand   5.336000e+09                      79.0
228                   Ukraine   4.844000e+09                     107.0
232  United States of America   9.083800e+10                     286.0

2
投票

你可以使用多个 regexstr.replace 像这样。

考虑到下面的数据帧。

In [1431]: df 
Out[1431]: 
                            Country
0                       Afghanistan
1  Bolivia (Plurinational State of)
2                     Switzerland17

In [1433]: df['Country'] = df['Country'].str.replace(r"\(.*\)|\d+",'')
In [1434]: df  
Out[1434]: 
         Country
0    Afghanistan
1       Bolivia 
2    Switzerland

1
投票
df.Country = df.Country.str.extract(r"([^(\d]+)")
      Country              Energy Supply     Energy Supply per Capita
16   Afghanistan           3.210000e+08     10.0
17   Albania               1.020000e+08     35.0
18   Algeria               1.959000e+09     51.0
19   American Samoa                 NaN     NaN
40   Bolivia               3.360000e+08     32.0
213  Switzerland           1.113000e+09     136.0
214  Syrian Arab Republic  5.420000e+08     28.0
215  Tajikistan            1.060000e+08     13.0
216  Thailand              5.336000e+09     79.0
228  Ukraine               4.844000e+09     107.0
© www.soinside.com 2019 - 2024. All rights reserved.