ggplot() 面积图不能正确表示数据

问题描述 投票:0回答:1

我从联合国商品贸易统计数据库下载了高度概括的数据https://comtradeplus.un.org/TradeFlow

ns_eu_category
变量是我自己将世界划分为以下区域: “东亚和太平洋”、“全球北方”、“拉丁美洲和加勒比”、“中东和北非”、“非欧盟前苏联集团国家”、“南亚”和“撒哈拉以南非洲”。我认为这不是问题的根源,所以我们现在可以忽略确切的划分。

> longterm_trade_data
# A tibble: 7,364 × 6
   ns_eu_category         year sitc_code import_or_export      value sector                                  
   <chr>                 <dbl> <chr>     <chr>                 <dbl> <chr>                                   
 1 East Asia and Pacific  1962 0         Export            946694358 Food And Live Animals                   
 2 East Asia and Pacific  1962 0         Import            745286120 Food And Live Animals                   
 3 East Asia and Pacific  1962 1         Export             60846922 Beverages And Tobacco                   
 4 East Asia and Pacific  1962 1         Import             67321814 Beverages And Tobacco                   
 5 East Asia and Pacific  1962 2         Export           1479804622 Crude Materials, Inedible, Except Fuels 
 6 East Asia and Pacific  1962 2         Import            640428682 Crude Materials, Inedible, Except Fuels 
 7 East Asia and Pacific  1962 3         Export            482623764 Mineral Fuels, Lubric. And Related Mtrls
 8 East Asia and Pacific  1962 3         Import            416870707 Mineral Fuels, Lubric. And Related Mtrls
 9 East Asia and Pacific  1962 4         Export             66775599 Animal And Vegetable Oils,Fats And Waxes
10 East Asia and Pacific  1962 4         Import             42687574 Animal And Vegetable Oils,Fats And Waxes
# ℹ 7,354 more rows
# ℹ Use `print(n = ...)` to see more rows

我将这些汇总统计数据转化为百分比,以便我可以将其放入面积图中:

trade_data_sector <- longterm_trade_data %>%
  group_by(ns_eu_category, year, import_or_export) %>%
  mutate(total_of_sectors = sum(value)) %>%
  ungroup() %>%
  drop_na() %>%
  mutate(percent = value / total_of_sectors)

我尝试制作面积图

# "East Asia and Pacific"               "Global North"                        
# "Latin America and Caribbean"         "Middle East and North Africa"        "Non-EU Former Soviet Bloc Countries"
# "South Asia"                          "Sub-Saharan Africa"   
region <- "Sub-Saharan Africa"   
ix <- "Export"

trade_data_sector %>%
  mutate(truncated_name = sector %>% substr(0L, 10L),
         descriptor = paste0(sitc_code, ": ", truncated_name)) %>%
  filter(ns_eu_category == region, import_or_export == ix) %>%
  ggplot(aes(x = year, y = percent, fill = descriptor)) + 
  geom_area() +
  theme_minimal() + 
  labs(title = paste0(import_or_export, "s in ", region, " Over Time"), 
       caption = "Source: UN COMTRADE Database 1962-2023") +
  scale_y_continuous(breaks = seq(from = 0, to = 1, by = 0.1), labels = scales::percent, limits = c(0, 1)) +
  scale_x_discrete(limits = 1962:2023, expand = c(0,0)) +
  theme(
    # panel.grid.major.y = element_line(color = "dark gray", linewidth = 0.1, linetype = "dashed"),
    # panel.grid.major.x = element_blank(),
    axis.ticks.x=element_line(linewidth=0.2),
    axis.text.x = element_text(size = 6, family=my_font, angle=-90, vjust=0.5),
    axis.title.x = element_text(size = 8, family=my_font),
    axis.text.y=element_text(size = 6, family=my_font),
    # axis.ticks.y=element_line(), 
    axis.title.y = element_text(size = 8, family=my_font),
    panel.grid = element_blank(),
    legend.position="bottom",
    plot.title = element_text(size = 12, family=my_font),
    plot.subtitle = element_text(size = 10, family=my_font),
    legend.title = element_text( size=8, family=my_font),
    legend.text = element_text( size=8, family=my_font),
    strip.text = element_text(size=8, family=my_font),
    legend.key.size = unit(0.3, "cm"),
    plot.caption = element_text(size = 7, color="dark gray", family=my_font)
  )

结果是这样的:

Graph of Exports from Sub-Saharan Africa

注意:2010-2022 年的数据现在缺失,因此可以忽略图表的该部分。

它不仅看起来比应有的更加不稳定。有整个部分的 SITC 代码 0:食品和活体动物刚刚消失。但正如我们在下图中看到的,这个金额从来没有为零

sector_code <- "0"

trade_data_sector %>%
  filter(ns_eu_category == region, import_or_export == ix, sitc_code == sector_code) %>%
  ggplot(aes(x = year, y = value)) +
  geom_line() +
  theme_minimal() +
  labs(title =  paste0(import_or_export, "s in ", region, " Over Time (Sector ", sector, ")"),
       caption = "Source: UN COMTRADE Database 1962-2023") +
  scale_x_discrete(limits = 1962:2022) +
  # scale_y_continuous(breaks = seq(from = 0, to = 600, by = 100), limits=c(0,700)) +
  theme(
    panel.grid.major.y = element_line(color = "dark gray", linewidth = 0.1, linetype = "dashed"),
    # panel.grid.major.x = element_blank(),
    # axis.ticks.x=element_blank(),
    axis.text.x = element_text(size = 6, family=my_font, angle=-90, vjust=0.5),
    axis.title.x = element_text(size = 8, family=my_font),
    axis.text.y=element_text(size = 6, family=my_font),
    # axis.ticks.y=element_line(),
    axis.title.y = element_text(size = 8, family=my_font),
    panel.grid = element_blank(),
    legend.position="bottom",
    plot.title = element_text(size = 10, family=my_font),
    plot.subtitle = element_text(size = 8, family=my_font),
    legend.title = element_text( size=8, family=my_font),
    legend.text = element_text( size=8, family=my_font),
    strip.text = element_text(size=8, family=my_font),
    legend.key.size = unit(0.3, "cm"),
    plot.caption = element_text(size = 7, color="dark gray", family=my_font)
  )

Percentage of exports in Sector 0

这可能是什么原因造成的?这种情况不仅仅发生在撒哈拉以南非洲地区,世界其他地区也存在这种差距。

r ggplot2 geom-area
1个回答
0
投票

问题是在

limits = c(0, 1)
内设置
scale_y_continuous
的结果。通过删除限制,图表现在看起来正常

© www.soinside.com 2019 - 2024. All rights reserved.