识别 R 数据框中缺失的国家/地区

问题描述 投票:0回答:1

我有一个数据框,其中包含带有各种国家/地区名称的“国家/地区”列。

我想找出哪些国家(例如联合国成员国)失踪了。

是否有任何快速的方法可以自动完成此操作,也许使用软件包

countrycode

这是我的输出:

structure(list(country = c("Albania", "Algeria", "Angola", "Antigua and Barbuda", 
"Argentina", "Armenia", "Australia", "Austria", "Azerbaijan", 
"Bahamas", "Bahrain", "Bangladesh", "Barbados", "Belarus", "Belgium", 
"Bhutan", "Bolivia", "Bosnia and Herzegovina", "Botswana", "Brazil", 
"Brunei", "Bulgaria", "Burkina Faso", "Cambodia", "Canada", "Chile", 
"Colombia", "Costa Rica", "Cote d'Ivoire", "Croatia", "Cuba", 
"Czechia", "Democratic Republic of the Congo", "Denmark", "Djibouti", 
"Dominica", "Dominican Republic", "Ecuador", "Egypt", "El Salvador", 
"Eritrea", "Estonia", "Ethiopia", "Fiji", "Finland", "France", 
"Gabon", "Georgia", "Germany", "Ghana", "Greece", "Guatemala", 
"Guinea", "Guyana", "Honduras", "Hungary", "Iceland", "India", 
"Indonesia", "Iran", "Iraq", "Ireland", "Israel", "Italy", "Jamaica", 
"Japan", "Jordan", "Kazakhstan", "Kenya", "Kuwait", "Kyrgyzstan", 
"Laos", "Latvia", "Lebanon", "Lesotho", "Liechtenstein", "Lithuania", 
"Luxembourg", "Macedonia", "Madagascar", "Malawi", "Malaysia", 
"Malta", "Mauritania", "Mauritius", "Mexico", "Micronesia", "Moldova", 
"Monaco", "Mongolia", "Morocco", "Myanmar", "Namibia", "Nepal", 
"Netherlands", "New Zealand", "Nicaragua", "Niger", "Nigeria", 
"Norway", "Oman", "Pakistan", "Palau", "Panama", "Papua New Guinea", 
"Paraguay", "People's Republic of China", "Peru", "Philippines", 
"Poland", "Portugal", "Qatar", "Romania", "Russia", "Rwanda", 
"Samoa", "San Marino", "Saudi Arabia", "Senegal", "Serbia", "Singapore", 
"Slovakia", "Slovenia", "South Africa", "South Korea", "Spain", 
"Sri Lanka", "Sudan", "Suriname", "Sweden", "Switzerland", "Syria", 
"Taiwan", "Tajikistan", "Tanzania", "Thailand", "Tonga", "Trinidad and Tobago", 
"Tunisia", "Turkey", "U.K.", "U.S.A.", "Uganda", "Ukraine", "United Arab Emirates", 
"Uruguay", "Uzbekistan", "Venezuela", "Vietnam", "Yemen", "Zambia", 
"Zimbabwe")), row.names = c(NA, -152L), class = c("tbl_df", "tbl", 
"data.frame"))
r missing-data country
1个回答
0
投票

您当然可以获得存储在

countrycodes
中的“国家”向量,而您自己的数据中缺少这些“国家”:

library(countrycode)

codelist$country.name.en[sapply(codelist$country.name.en.regex, function(x) {
  !any(grepl(x, df$country, perl = TRUE, ignore = TRUE))
  })]
#>   [1] "Afghanistan"                               
#>   [2] "Åland Islands"                             
#>   [3] "American Samoa"                            
#>   [4] "Andorra"                                   
#>   [5] "Anguilla"                                  
#>   [6] "Antarctica"                                
#>   [7] "Aruba"                                     
#>   [8] "Austria-Hungary"                           
#>   [9] "Baden"                                     
#>  [10] "Bavaria"                                   
#>  [11] "Belize"                                    
#>  [12] "Benin"                                     
#>  [13] "Bermuda"                                   
#>  [14] "Bouvet Island"                             
#>  [15] "British Indian Ocean Territory"            
#>  [16] "British Virgin Islands"                    
#>  [17] "Brunswick"                                 
#>  [18] "Burundi"                                   
#>  [19] "Cameroon"                                  
#>  [20] "Cape Verde"                                
#>  [21] "Caribbean Netherlands"                     
#>  [22] "Cayman Islands"                            
#>  [23] "Central African Republic"                  
#>  [24] "Chad"                                      
#>  [25] "Channel Islands"                           
#>  [26] "Christmas Island"                          
#>  [27] "Cocos (Keeling) Islands"                   
#>  [28] "Comoros"                                   
#>  [29] "Congo - Brazzaville"                       
#>  [30] "Cook Islands"                              
#>  [31] "Curaçao"                                   
#>  [32] "Cyprus"                                    
#>  [33] "Czechoslovakia"                            
#>  [34] "Equatorial Guinea"                         
#>  [35] "Eswatini"                                  
#>  [36] "Falkland Islands"                          
#>  [37] "Faroe Islands"                             
#>  [38] "French Guiana"                             
#>  [39] "French Polynesia"                          
#>  [40] "French Southern Territories"               
#>  [41] "Gambia"                                    
#>  [42] "German Democratic Republic"                
#>  [43] "Gibraltar"                                 
#>  [44] "Greenland"                                 
#>  [45] "Grenada"                                   
#>  [46] "Guadeloupe"                                
#>  [47] "Guam"                                      
#>  [48] "Guernsey"                                  
#>  [49] "Guinea-Bissau"                             
#>  [50] "Haiti"                                     
#>  [51] "Hamburg"                                   
#>  [52] "Hanover"                                   
#>  [53] "Heard & McDonald Islands"                  
#>  [54] "Hesse Electoral"                           
#>  [55] "Hesse Grand Ducal"                         
#>  [56] "Hesse-Darmstadt"                           
#>  [57] "Hesse-Kassel"                              
#>  [58] "Hong Kong SAR China"                       
#>  [59] "Isle of Man"                               
#>  [60] "Jersey"                                    
#>  [61] "Kiribati"                                  
#>  [62] "Kosovo"                                    
#>  [63] "Liberia"                                   
#>  [64] "Libya"                                     
#>  [65] "Macao SAR China"                           
#>  [66] "Maldives"                                  
#>  [67] "Mali"                                      
#>  [68] "Marshall Islands"                          
#>  [69] "Martinique"                                
#>  [70] "Mayotte"                                   
#>  [71] "Mecklenburg Schwerin"                      
#>  [72] "Micronesia (Federated States of)"          
#>  [73] "Modena"                                    
#>  [74] "Montenegro"                                
#>  [75] "Montserrat"                                
#>  [76] "Mozambique"                                
#>  [77] "Nassau"                                    
#>  [78] "Nauru"                                     
#>  [79] "Netherlands Antilles"                      
#>  [80] "New Caledonia"                             
#>  [81] "Niue"                                      
#>  [82] "Norfolk Island"                            
#>  [83] "North Korea"                               
#>  [84] "Northern Mariana Islands"                  
#>  [85] "Oldenburg"                                 
#>  [86] "Orange Free State"                         
#>  [87] "Palestinian Territories"                   
#>  [88] "Parma"                                     
#>  [89] "Piedmont-Sardinia"                         
#>  [90] "Pitcairn Islands"                          
#>  [91] "Prussia"                                   
#>  [92] "Puerto Rico"                               
#>  [93] "Republic of Vietnam"                       
#>  [94] "Réunion"                                   
#>  [95] "Saint Martin (French part)"                
#>  [96] "São Tomé & Príncipe"                       
#>  [97] "Sardinia"                                  
#>  [98] "Saxe-Weimar-Eisenach"                      
#>  [99] "Saxony"                                    
#> [100] "Serbia and Montenegro"                     
#> [101] "Seychelles"                                
#> [102] "Sierra Leone"                              
#> [103] "Sint Maarten"                              
#> [104] "Solomon Islands"                           
#> [105] "Somalia"                                   
#> [106] "Somaliland"                                
#> [107] "South Georgia & South Sandwich Islands"    
#> [108] "South Sudan"                               
#> [109] "St. Barthélemy"                            
#> [110] "St. Helena"                                
#> [111] "St. Kitts & Nevis"                         
#> [112] "St. Lucia"                                 
#> [113] "St. Pierre & Miquelon"                     
#> [114] "St. Vincent & Grenadines"                  
#> [115] "Svalbard & Jan Mayen"                      
#> [116] "Timor-Leste"                               
#> [117] "Togo"                                      
#> [118] "Tokelau"                                   
#> [119] "Turkmenistan"                              
#> [120] "Turks & Caicos Islands"                    
#> [121] "Tuscany"                                   
#> [122] "Tuvalu"                                    
#> [123] "Two Sicilies"                              
#> [124] "U.S. Virgin Islands"                       
#> [125] "United Arab Republic"                      
#> [126] "United Province CA"                        
#> [127] "United States Minor Outlying Islands (the)"
#> [128] "Vanuatu"                                   
#> [129] "Vatican City"                              
#> [130] "Wallis & Futuna"                           
#> [131] "Western Sahara"                            
#> [132] "Wuerttemburg"                              
#> [133] "Würtemberg"                                
#> [134] "Yemen Arab Republic"                       
#> [135] "Yemen People's Republic"                   
#> [136] "Yugoslavia"                                
#> [137] "Zanzibar"

但是,虽然这包含您的数据中缺少的许多现有国家(例如阿富汗、伯利兹、贝宁等),但其中一些是半自治区,本身并不是国家(泽西岛、桑给巴尔、直布罗陀)或者是历史性的并且不再作为国家存在(例如南斯拉夫)

创建于 2023-09-28,使用 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.