I have a dataframe like this:
Country Energy Supply Energy Supply per Capita
16 Afghanistan 3.210000e+08 10.0
17 Albania 1.020000e+08 35.0
18 Algeria 1.959000e+09 51.0
19 American Samoa NaN
40 Bolivia
(Plurinational State of) 3.360000e+08 32.0
... ... ... ...
213 Switzerland17 1.113000e+09 136.0
214 Syrian Arab Republic 5.420000e+08 28.0
215 Tajikistan 1.060000e+08 13.0
216 Thailand 5.336000e+09 79.0
228 Ukraine18 4.844000e+09 107.0
232 United States of
America20 9.083800e+10 286.0
我需要替换所有在名字中包含括号或数字的国家名称。例如:'玻利维亚(多民族国家)'应该是'玻利维亚','瑞士17'应该是'瑞士','美利坚合众国20'应该是'美利坚合众国'。
谁能帮助我解决这个问题。
你可以使用这个regex模式与 str.extract
:
df['Country'] = df.Country.str.extract('^([^\d\(]*)')[0]
輸出。
Country Energy Supply Energy Supply per Capita
16 Afghanistan 3.210000e+08 10.0
17 Albania 1.020000e+08 35.0
18 Algeria 1.959000e+09 51.0
19 American Samoa NaN NaN
40 Bolivia 3.360000e+08 32.0
213 Switzerland 1.113000e+09 136.0
214 Syrian Arab Republic 5.420000e+08 28.0
215 Tajikistan 1.060000e+08 13.0
216 Thailand 5.336000e+09 79.0
228 Ukraine 4.844000e+09 107.0
232 United States of America 9.083800e+10 286.0
你可以使用多个 regex
与 str.replace
像这样。
考虑到下面的数据帧。
In [1431]: df
Out[1431]:
Country
0 Afghanistan
1 Bolivia (Plurinational State of)
2 Switzerland17
In [1433]: df['Country'] = df['Country'].str.replace(r"\(.*\)|\d+",'')
In [1434]: df
Out[1434]:
Country
0 Afghanistan
1 Bolivia
2 Switzerland
df.Country = df.Country.str.extract(r"([^(\d]+)")
Country Energy Supply Energy Supply per Capita 16 Afghanistan 3.210000e+08 10.0 17 Albania 1.020000e+08 35.0 18 Algeria 1.959000e+09 51.0 19 American Samoa NaN NaN 40 Bolivia 3.360000e+08 32.0 213 Switzerland 1.113000e+09 136.0 214 Syrian Arab Republic 5.420000e+08 28.0 215 Tajikistan 1.060000e+08 13.0 216 Thailand 5.336000e+09 79.0 228 Ukraine 4.844000e+09 107.0