在R语言中识别一个框架中缺少的国家/地区

mpgws1up  于 8个月前  发布在  R语言
关注(0)|答案(1)|浏览(88)

我有一个框架,其中包括列“国家”与各种国家的名称。
我想知道哪些国家(比如说,联合国成员国)没有。
有没有什么快速的方法可以自动完成这个任务,比如使用countrycode软件包?
下面是我的dput:

structure(list(country = c("Albania", "Algeria", "Angola", "Antigua and Barbuda", 
"Argentina", "Armenia", "Australia", "Austria", "Azerbaijan", 
"Bahamas", "Bahrain", "Bangladesh", "Barbados", "Belarus", "Belgium", 
"Bhutan", "Bolivia", "Bosnia and Herzegovina", "Botswana", "Brazil", 
"Brunei", "Bulgaria", "Burkina Faso", "Cambodia", "Canada", "Chile", 
"Colombia", "Costa Rica", "Cote d'Ivoire", "Croatia", "Cuba", 
"Czechia", "Democratic Republic of the Congo", "Denmark", "Djibouti", 
"Dominica", "Dominican Republic", "Ecuador", "Egypt", "El Salvador", 
"Eritrea", "Estonia", "Ethiopia", "Fiji", "Finland", "France", 
"Gabon", "Georgia", "Germany", "Ghana", "Greece", "Guatemala", 
"Guinea", "Guyana", "Honduras", "Hungary", "Iceland", "India", 
"Indonesia", "Iran", "Iraq", "Ireland", "Israel", "Italy", "Jamaica", 
"Japan", "Jordan", "Kazakhstan", "Kenya", "Kuwait", "Kyrgyzstan", 
"Laos", "Latvia", "Lebanon", "Lesotho", "Liechtenstein", "Lithuania", 
"Luxembourg", "Macedonia", "Madagascar", "Malawi", "Malaysia", 
"Malta", "Mauritania", "Mauritius", "Mexico", "Micronesia", "Moldova", 
"Monaco", "Mongolia", "Morocco", "Myanmar", "Namibia", "Nepal", 
"Netherlands", "New Zealand", "Nicaragua", "Niger", "Nigeria", 
"Norway", "Oman", "Pakistan", "Palau", "Panama", "Papua New Guinea", 
"Paraguay", "People's Republic of China", "Peru", "Philippines", 
"Poland", "Portugal", "Qatar", "Romania", "Russia", "Rwanda", 
"Samoa", "San Marino", "Saudi Arabia", "Senegal", "Serbia", "Singapore", 
"Slovakia", "Slovenia", "South Africa", "South Korea", "Spain", 
"Sri Lanka", "Sudan", "Suriname", "Sweden", "Switzerland", "Syria", 
"Taiwan", "Tajikistan", "Tanzania", "Thailand", "Tonga", "Trinidad and Tobago", 
"Tunisia", "Turkey", "U.K.", "U.S.A.", "Uganda", "Ukraine", "United Arab Emirates", 
"Uruguay", "Uzbekistan", "Venezuela", "Vietnam", "Yemen", "Zambia", 
"Zimbabwe")), row.names = c(NA, -152L), class = c("tbl_df", "tbl", 
"data.frame"))
0sgqnhkj

0sgqnhkj1#

您当然可以获得存储在countrycodes中的“国家”向量,这些国家是您自己的数据中缺少的:

library(countrycode)

codelist$country.name.en[sapply(codelist$country.name.en.regex, function(x) {
  !any(grepl(x, df$country, perl = TRUE, ignore = TRUE))
  })]
#>   [1] "Afghanistan"                               
#>   [2] "Åland Islands"                             
#>   [3] "American Samoa"                            
#>   [4] "Andorra"                                   
#>   [5] "Anguilla"                                  
#>   [6] "Antarctica"                                
#>   [7] "Aruba"                                     
#>   [8] "Austria-Hungary"                           
#>   [9] "Baden"                                     
#>  [10] "Bavaria"                                   
#>  [11] "Belize"                                    
#>  [12] "Benin"                                     
#>  [13] "Bermuda"                                   
#>  [14] "Bouvet Island"                             
#>  [15] "British Indian Ocean Territory"            
#>  [16] "British Virgin Islands"                    
#>  [17] "Brunswick"                                 
#>  [18] "Burundi"                                   
#>  [19] "Cameroon"                                  
#>  [20] "Cape Verde"                                
#>  [21] "Caribbean Netherlands"                     
#>  [22] "Cayman Islands"                            
#>  [23] "Central African Republic"                  
#>  [24] "Chad"                                      
#>  [25] "Channel Islands"                           
#>  [26] "Christmas Island"                          
#>  [27] "Cocos (Keeling) Islands"                   
#>  [28] "Comoros"                                   
#>  [29] "Congo - Brazzaville"                       
#>  [30] "Cook Islands"                              
#>  [31] "Curaçao"                                   
#>  [32] "Cyprus"                                    
#>  [33] "Czechoslovakia"                            
#>  [34] "Equatorial Guinea"                         
#>  [35] "Eswatini"                                  
#>  [36] "Falkland Islands"                          
#>  [37] "Faroe Islands"                             
#>  [38] "French Guiana"                             
#>  [39] "French Polynesia"                          
#>  [40] "French Southern Territories"               
#>  [41] "Gambia"                                    
#>  [42] "German Democratic Republic"                
#>  [43] "Gibraltar"                                 
#>  [44] "Greenland"                                 
#>  [45] "Grenada"                                   
#>  [46] "Guadeloupe"                                
#>  [47] "Guam"                                      
#>  [48] "Guernsey"                                  
#>  [49] "Guinea-Bissau"                             
#>  [50] "Haiti"                                     
#>  [51] "Hamburg"                                   
#>  [52] "Hanover"                                   
#>  [53] "Heard & McDonald Islands"                  
#>  [54] "Hesse Electoral"                           
#>  [55] "Hesse Grand Ducal"                         
#>  [56] "Hesse-Darmstadt"                           
#>  [57] "Hesse-Kassel"                              
#>  [58] "Hong Kong SAR China"                       
#>  [59] "Isle of Man"                               
#>  [60] "Jersey"                                    
#>  [61] "Kiribati"                                  
#>  [62] "Kosovo"                                    
#>  [63] "Liberia"                                   
#>  [64] "Libya"                                     
#>  [65] "Macao SAR China"                           
#>  [66] "Maldives"                                  
#>  [67] "Mali"                                      
#>  [68] "Marshall Islands"                          
#>  [69] "Martinique"                                
#>  [70] "Mayotte"                                   
#>  [71] "Mecklenburg Schwerin"                      
#>  [72] "Micronesia (Federated States of)"          
#>  [73] "Modena"                                    
#>  [74] "Montenegro"                                
#>  [75] "Montserrat"                                
#>  [76] "Mozambique"                                
#>  [77] "Nassau"                                    
#>  [78] "Nauru"                                     
#>  [79] "Netherlands Antilles"                      
#>  [80] "New Caledonia"                             
#>  [81] "Niue"                                      
#>  [82] "Norfolk Island"                            
#>  [83] "North Korea"                               
#>  [84] "Northern Mariana Islands"                  
#>  [85] "Oldenburg"                                 
#>  [86] "Orange Free State"                         
#>  [87] "Palestinian Territories"                   
#>  [88] "Parma"                                     
#>  [89] "Piedmont-Sardinia"                         
#>  [90] "Pitcairn Islands"                          
#>  [91] "Prussia"                                   
#>  [92] "Puerto Rico"                               
#>  [93] "Republic of Vietnam"                       
#>  [94] "Réunion"                                   
#>  [95] "Saint Martin (French part)"                
#>  [96] "São Tomé & Príncipe"                       
#>  [97] "Sardinia"                                  
#>  [98] "Saxe-Weimar-Eisenach"                      
#>  [99] "Saxony"                                    
#> [100] "Serbia and Montenegro"                     
#> [101] "Seychelles"                                
#> [102] "Sierra Leone"                              
#> [103] "Sint Maarten"                              
#> [104] "Solomon Islands"                           
#> [105] "Somalia"                                   
#> [106] "Somaliland"                                
#> [107] "South Georgia & South Sandwich Islands"    
#> [108] "South Sudan"                               
#> [109] "St. Barthélemy"                            
#> [110] "St. Helena"                                
#> [111] "St. Kitts & Nevis"                         
#> [112] "St. Lucia"                                 
#> [113] "St. Pierre & Miquelon"                     
#> [114] "St. Vincent & Grenadines"                  
#> [115] "Svalbard & Jan Mayen"                      
#> [116] "Timor-Leste"                               
#> [117] "Togo"                                      
#> [118] "Tokelau"                                   
#> [119] "Turkmenistan"                              
#> [120] "Turks & Caicos Islands"                    
#> [121] "Tuscany"                                   
#> [122] "Tuvalu"                                    
#> [123] "Two Sicilies"                              
#> [124] "U.S. Virgin Islands"                       
#> [125] "United Arab Republic"                      
#> [126] "United Province CA"                        
#> [127] "United States Minor Outlying Islands (the)"
#> [128] "Vanuatu"                                   
#> [129] "Vatican City"                              
#> [130] "Wallis & Futuna"                           
#> [131] "Western Sahara"                            
#> [132] "Wuerttemburg"                              
#> [133] "Würtemberg"                                
#> [134] "Yemen Arab Republic"                       
#> [135] "Yemen People's Republic"                   
#> [136] "Yugoslavia"                                
#> [137] "Zanzibar"

然而,虽然这包含了许多从您的数据中缺失的现存国家(如阿富汗,伯利兹,贝宁等),但其中一些是半自治地区,它们本身不是国家(泽西岛,桑给巴尔,直布罗陀)或历史上不再存在的国家(例如。南斯拉夫)。
为了过滤掉不是当前国家的条目,我可能会使用类似rnaturalearth的东西:

missing <- codelist$country.name.en[sapply(codelist$country.name.en.regex, 
                                function(x) {
  !any(grepl(x, df$country, perl = TRUE, ignore = TRUE))
  })]

missing[missing %in% 
          rnaturalearth::ne_countries(scale = 110, returnclass = "sf")$name]
#>  [1] "Afghanistan"   "Antarctica"    "Belize"        "Benin"        
#>  [5] "Burundi"       "Cameroon"      "Chad"          "Cyprus"       
#>  [9] "Gambia"        "Greenland"     "Guinea-Bissau" "Haiti"        
#> [13] "Kosovo"        "Liberia"       "Libya"         "Mali"         
#> [17] "Montenegro"    "Mozambique"    "New Caledonia" "North Korea"  
#> [21] "Puerto Rico"   "Sierra Leone"  "Somalia"       "Somaliland"   
#> [25] "Timor-Leste"   "Togo"          "Turkmenistan"  "Vanuatu"

这为您提供了一个合理的28个当前国家的列表,这些国家不在您的原始列表中。其中,大多数是联合国成员国,但并非所有都是(据我所知,格陵兰岛,南极洲,科索沃,新喀里多尼亚,索马里兰和波多黎各do not have independent representation在撰写本文时在联合国)
创建于2023-09-28使用reprex v2.0.2

相关问题