Stata:如何分隔字符串变量中的地址

问题描述 投票:0回答:1

我有一个包含地址的字符串变量。如何将它们分开以便获得地址 1、地址 2、...等等?

例如:

7400 e roosevelt blvd # c201,philadelphia, pa 19152,7400 e roosevelt blvd # c201,philadelphia, pa 19152,7400 e roosevelt blvd # c201,philadelphia, pa 19152

我尝试过使用逗号来

split
,但结果不是我想要的。

split stata
1个回答
0
投票

类似这样的事情就可以了:

clear
input strL many_addresses
"4905 Lakeway Dr, College Station, TX 77845"
"7400 e roosevelt blvd # c201,philadelphia, pa 19152,7400 e roosevelt blvd # c201,philadelphia, pa 19152,7400 e roosevelt blvd # c201,philadelphia, pa 19152"
"9514 John Dr. Sacramento, CA 95820,344 Cedar Street Alabaster, AL 35007,786 Maiden Avenue Windermere, FL 34786,154 3rd Ave. Lititz, PA 17543,7032 Sherman Road Buffalo, NY 14215,87 East Constitution St. Uniondale, NY 11553"
end
compress


/* (1) find where the two-letter state abbreviation and ZIP5 begin */
moss many_addresses, match("( [A-z][A-z] [0-9][0-9][0-9][0-9][0-9])") regex
assert _count >=1 & !missing(_count)

/* (2) store the addresses in a1 - ak */
quietly sum _count
local max_addresses = r(max)
generate start = 0
forvalues v = 1/`max_addresses' {
    quietly generate a`v' = substr(many_addresses, start + 1, _pos`v' + 8 - start) if !missing(_match`v')
    quietly replace start = _pos`v'+ 9
}
drop _count _match* _pos* start 

基本思想是找出两个字母的州缩写和 ZIP5 开始使用正则表达式的位置,然后在知道要切割的位置后将长字符串切成碎片。

© www.soinside.com 2019 - 2024. All rights reserved.