我在 Stata 中有一个数据集,如下所示
ID Product Region Sales
001 JSF045 East~West~North 45~78~14
002 JSF056 East~West 56~35
003 GWW1 West~North~South~South East 11~16~45~36
我想将 Region 和 Sales 变量拆分为单独的变量(在 ~ 处分隔)。每行都有不同数量的区域和销售数字,但对于每一行,一个区域将对应一个销售数字。我在下面介绍了我希望最终数据的外观。
我希望最终的数据如下所示
ID Product Region Sales
001 JSF045 East 45
001 JSF045 West 78
001 JSF045 North 14
002 JSF056 East 56
002 JSF056 West 35
003 GWW1 West 11
003 GWW1 North 16
003 GWW1 South 45
003 GWW1 South East 36
* Example generated by -dataex-. For more info, type help dataex
clear
input str3 ID str6 Product str27 Region str11 Sales
"001" "JSF045" "East~West~North" "45~78~14"
"002" "JSF056" "East~West" "56~35"
"003" "GWW1" "West~North~South~South East" "11~16~45~36"
end
split Region, parse("~")
split Sales, parse("~") destring
drop Region Sales
reshape long Region Sales, i(ID) j(which)
drop if missing(Region)
list, sepby(ID)
+--------------------------------------------+
| ID which Product Region Sales |
|--------------------------------------------|
1. | 001 1 JSF045 East 45 |
2. | 001 2 JSF045 West 78 |
3. | 001 3 JSF045 North 14 |
|--------------------------------------------|
4. | 002 1 JSF056 East 56 |
5. | 002 2 JSF056 West 35 |
|--------------------------------------------|
6. | 003 1 GWW1 West 11 |
7. | 003 2 GWW1 North 16 |
8. | 003 3 GWW1 South 45 |
9. | 003 4 GWW1 South East 36 |
+--------------------------------------------+