根据变量对总数的贡献对变量进行排名

问题描述 投票:0回答:2

请考虑以下示例数据:

 psu   |  sumsc   sumst   sumobc   sumother   sumcaste
-------|-----------------------------------------------
10018  |    3       2        0         4          9
       |
10061  |    0       0        2         5          7
       |
10116  |    1       1        2         4          8
       |
10121  |    3       0        1         2          6
       |
20002  |    4       1        0         1          6
-------------------------------------------------------

我想根据sumsc中对sumst(这是所有变量的总和)的贡献百分比对变量sumobcsumothersumcastepsu进行排名。

任何人都可以帮我在Stata做这个吗?

stata ranking rank
2个回答
1
投票

首先我们输入数据:

clear all
set more off

input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end

其次,我们准备reshape

local j=1
foreach var of varlist sumsc sumst sumobc sumother {
    gen temprl`j' = `var' / sumcaste
    ren `var' addi`j'
    local ++j
}

reshape long temprl addi, i(psu) j(ord)
lab def ord 1 "sumsc" 2 "sumst" 3 "sumobc" 4 "sumother"
lab val ord ord

第三,我们在提交之前订购:

gsort psu -temprl
by psu: gen nro=_n
drop temprl
order psu nro ord

四,提交数据:

br psu nro ord addi

编辑:

这是Aron与我的解决方案(@PearlySpencer)的组合:

clear

input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end

local i = 0
foreach var of varlist sumsc sumst sumobc sumother {
    local ++i
    generate pct`i' = 100 * `var' / sumcaste
    rename `var' temp`i'
    local rvars "`rvars' r`i'"                  
}

rowranks pct*, generate("`rvars'") field lowrank

reshape long pct temp r, i(psu) j(name)

label define name 1 "sumsc" 2 "sumst" 3 "sumobc" 4 "sumother"
label values name name

keep psu name pct r
bysort psu (r): replace r = sum(r != r[_n-1])

这为您提供了所需的输出:

list, sepby(psu) noobs

  +---------------------------------+
  |   psu       name        pct   r |
  |---------------------------------|
  | 10018   sumother   44.44444   1 |
  | 10018      sumsc   33.33333   2 |
  | 10018      sumst   22.22222   3 |
  | 10018     sumobc          0   4 |
  |---------------------------------|
  | 10061   sumother   71.42857   1 |
  | 10061     sumobc   28.57143   2 |
  | 10061      sumsc          0   3 |
  | 10061      sumst          0   3 |
  |---------------------------------|
  | 10116   sumother         50   1 |
  | 10116     sumobc         25   2 |
  | 10116      sumst       12.5   3 |
  | 10116      sumsc       12.5   3 |
  |---------------------------------|
  | 10121      sumsc         50   1 |
  | 10121   sumother   33.33333   2 |
  | 10121     sumobc   16.66667   3 |
  | 10121      sumst          0   4 |
  |---------------------------------|
  | 20002      sumsc   66.66666   1 |
  | 20002      sumst   16.66667   2 |
  | 20002   sumother   16.66667   2 |
  | 20002     sumobc          0   3 |
  +---------------------------------+

如果您需要变量进行进一步分析而不是仅显示结果,则此方法将非常有用。


1
投票

首先,您需要计算百分比:

clear

input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end

foreach var of varlist sumsc sumst sumobc sumother {
    generate pct_`var' = 100 * `var' / sumcaste
}

egen pcttotal = rowtotal(pct_*)

list pct_* pcttotal, abbreviate(15) noobs

  +--------------------------------------------------------------+
  | pct_sumsc   pct_sumst   pct_sumobc   pct_sumother   pcttotal |
  |--------------------------------------------------------------|
  |  33.33333    22.22222            0       44.44444        100 |
  |         0           0     28.57143       71.42857        100 |
  |      12.5        12.5           25             50        100 |
  |        50           0     16.66667       33.33333        100 |
  |  66.66666    16.66667            0       16.66667   99.99999 |
  +--------------------------------------------------------------+

然后你需要得到排名并做一些体操:

rowranks pct_*, generate(r_sumsc r_sumst r_sumobc r_sumother) field lowrank

mkmat r_*, matrix(A)
matrix A = A'
svmat A, names(row)

local matnames : rownames A
quietly generate name = " "

forvalues i = 1 / `: word count `matnames'' {
    quietly replace name = substr(`"`: word `i' of `matnames''"', 3, .) in `i'
}

ds row*

foreach var in `r(varlist)' {
    sort `var' name
    generate `var'b = sum(`var' != `var'[_n-1])
    drop `var'
    rename `var'b `var'
    list name `var' if name != " ", noobs
    display ""
}

以上将给你你想要的:

  +-----------------+
  |     name   row1 |
  |-----------------|
  | sumother      1 |
  |    sumsc      2 |
  |    sumst      3 |
  |   sumobc      4 |
  +-----------------+

  +-----------------+
  |     name   row2 |
  |-----------------|
  | sumother      1 |
  |   sumobc      2 |
  |    sumsc      3 |
  |    sumst      3 |
  +-----------------+

  +-----------------+
  |     name   row3 |
  |-----------------|
  | sumother      1 |
  |   sumobc      2 |
  |    sumsc      3 |
  |    sumst      3 |
  +-----------------+

  +-----------------+
  |     name   row4 |
  |-----------------|
  |    sumsc      1 |
  | sumother      2 |
  |   sumobc      3 |
  |    sumst      4 |
  +-----------------+

  +-----------------+
  |     name   row5 |
  |-----------------|
  |    sumsc      1 |
  | sumother      2 |
  |    sumst      2 |
  |   sumobc      3 |
  +-----------------+

请注意,在执行上述代码之前,首先需要安装社区提供的命令rowranks

net install pr0046.pkg
© www.soinside.com 2019 - 2024. All rights reserved.