我有两组不同的变量:年度百分比和年份。 Annualpercentage 从 1999 年开始到 2012 年结束,但年份从 1999 年开始到 2013 年。
countrylabel annualpercentageshare.1999 year1990 year1991 year1992
1 Austria NA NA NA NA
2 Belgium NA NA NA NA
3 Bulgaria 48.20000 NA NA NA
4 Estonia NA NA NA NA
5 France 47.52853 NA NA NA
6 Germany NA NA NA NA
类似这样的事情。
我已经尝试过这段代码:
merge_data2 <- reshape(merge_data2, varying = list(2:ncol(merge_data2)),
v.names = c("percentageshare", "Year"),
idvar = "countrylabel", direction = "long", times = 1990:2013)
但我收到此错误消息:
“reshapeLong 中的错误(数据,idvar = idvar,timevar = timevar,变化 = 变化,: “长度(变化)”必须全部匹配“长度(次)””
编辑:我想要一个像这样的数据框:
countrylabel time annualpercentageshare year
Austria 1990 NA NA
Austria 1991 NA NA
library(tidyr); library(dplyr)
df %>%
gather(variable, value, -countrylabel) %>%
separate("variable", into = c("stat", "time"), sep = -4) %>%
spread(stat, value)
输出
countrylabel time annualpercentageshare. year
1 Austria 1990 NA NA
2 Austria 1991 NA NA
3 Austria 1992 NA NA
4 Austria 1999 NA NA
5 Belgium 1990 NA NA
6 Belgium 1991 NA NA
7 Belgium 1992 NA NA
8 Belgium 1999 NA NA
9 Bulgaria 1990 NA NA
10 Bulgaria 1991 NA NA
11 Bulgaria 1992 NA NA
12 Bulgaria 1999 48.20000 NA
13 Estonia 1990 NA NA
14 Estonia 1991 NA NA
15 Estonia 1992 NA NA
16 Estonia 1999 NA NA
17 France 1990 NA NA
18 France 1991 NA NA
19 France 1992 NA NA
20 France 1999 47.52853 NA
21 Germany 1990 NA NA
22 Germany 1991 NA NA
23 Germany 1992 NA NA
24 Germany 1999 NA NA
reshape
喜欢"."
,所以首先我们将一个插入到year*
变量中。
names(d) <- gsub("year", "year.", names(d))
现在我们给出了
reshape
缺失的列和 order
,
d$annualpercentage.2002 <- NA
d$year.1999 <- NA
d <- d[c(1, order(names(d)[-1]) + 1)]
您的想法通过在列表中的
varying
中定义不同的列排序来实现:
res <- reshape(d, varying=list(2:5, 6:9), direction="long", idvar="countrylabel",
times=1999:2002, v.names=c("annualpercentage", "year"))
res
# countrylabel time annualpercentage year
# Austria.1999 Austria 1999 NA NA
# Belgium.1999 Belgium 1999 NA NA
# Bulgaria.1999 Bulgaria 1999 -0.6806495 NA
# Estonia.1999 Estonia 1999 NA NA
# France.1999 France 1999 NA NA
# Germany.1999 Germany 1999 NA NA
# Switzerland.1999 Switzerland 1999 -1.8497570 NA
# Austria.2000 Austria 2000 -0.6033900 0.14970015
# Belgium.2000 Belgium 2000 NA -0.49201756
# Bulgaria.2000 Bulgaria 2000 0.8263925 -0.36320990
# Estonia.2000 Estonia 2000 NA -2.51032544
# France.2000 France 2000 NA 0.57800624
# Germany.2000 Germany 2000 NA -0.52295712
# Switzerland.2000 Switzerland 2000 0.2783076 0.25616728
# Austria.2001 Austria 2001 -2.6962484 -0.15375642
# Belgium.2001 Belgium 2001 1.3088577 0.72528621
# Bulgaria.2001 Bulgaria 2001 NA NA
# Estonia.2001 Estonia 2001 NA -0.05563662
# France.2001 France 2001 0.2224629 0.74205086
# Germany.2001 Germany 2001 NA -0.01185349
# Switzerland.2001 Switzerland 2001 0.8354322 -1.40826638
# Austria.2002 Austria 2002 NA NA
# Belgium.2002 Belgium 2002 NA 1.60874778
# Bulgaria.2002 Bulgaria 2002 NA NA
# Estonia.2002 Estonia 2002 NA 0.55866704
# France.2002 France 2002 NA -1.59866472
# Germany.2002 Germany 2002 NA -0.11217415
# Switzerland.2002 Switzerland 2002 NA NA
数据
d <- structure(list(countrylabel = c("Austria", "Belgium", "Bulgaria",
"Estonia", "France", "Germany", "Switzerland"), annualpercentage.1999 = c(NA,
-2.58060150400384, -0.0623757258909573, 0.267776001395166, NA,
NA, 0.048219924249952), annualpercentage.2000 = c(NA, -0.249416955035044,
1.3525450891501, 1.04446768824697, NA, -0.0582347596434839, -0.891400228849837
), annualpercentage.2001 = c(1.82469277697851, NA, NA, 1.04231605324821,
NA, -0.900145118946308, -1.19320727433597), year2000 = c(0.633712375393134,
NA, 1.24760861316098, -0.092964787061478, -0.59403260962332,
NA, -0.650348234181285), year2001 = c(0.587318286831079, NA,
NA, 0.348890470222513, NA, NA, NA), year2002 = c(0.0645316087966406,
-0.279456557428068, NA, NA, -0.0627400036074545, 1.30419117694731,
-0.484654596062051)), row.names = c(NA, -7L), class = "data.frame")
回到原来的reshape代码,错误在于使用
list(2:ncol(d))
,它只创建了列数的列表:
list(2:ncol(d))
您需要的是带有列名称的向量:
list(colnames(d)[2:ncol(d)])
因此,
merge_data2 <- reshape(merge_data2, varying = list(colnames(merge_data2)[2:ncol(merge_data2)]),
v.names = c("percentageshare", "Year"),
idvar = "countrylabel", direction = "long", times = 1990:2013)
将呈现您想要的内容。