我有一个像这样的pandas数据框:
> row extract_column
> 0 412952266-desiredtext1»randtext-irrelevant
> 1 512952766-desiredtext1»randtext-irrelevant
> 2 212952766-desiredtext1»randtext-irrelevant
> 3 112953066-desiredtext1»randtext-irrelevant
> 4 712953066-desiredtext1»randtext-irrelevant
> 5 612953366-desiredtext1»randtext-irrelevant
> 6 912953366-desiredtext1»randtext-irrelevant
> 7 412954866-desiredtext1»randtext-irrelevant
> 8 312954966-desiredtext1»randtext-irrelevant
> 9 212954966-desiredtext1»randtext-irrelevant
> 10 612955866-desiredtext1»randtext-irrelevant
> 11 912256266-desiredtext1»randtext-irrelevant
> 12 812256366-desiredtext1»randtext-irrelevant
> 13 512256566-desiredtext1»randtext-irrelevant
> 14 412256566-desiredtext1»randtext-irrelevant
> 15 312256566-desiredtext1»randtext-irrelevant
> 16 212256566-desiredtext1»randtext-irrelevant
> 17 612256566-desiredtext1»randtext-irrelevant
> 18 812956666-desiredtext2»randtext-irrelevant
> 19 912957166-desiredtext2»randtext-irrelevant
> 20 012957866-desiredtext2»randtext-irrelevant
> 21 12952966-desiredtext2»randtext-irrelevant
> 22 2012953066-desiredtext2»randtext-irrelevant
> 23 012953066-desiredtext2»randtext-irrelevant
> 24 312953066-desiredtext2»randtext-irrelevant
> 25 112254166-desiredtext2»randtext-irrelevant
> 26 712254166-desiredtext2»randtext-irrelevant
我想从extract_column中获取desiredtext1,desiredtext2字段。所需数据后面始终跟有符号,前面加9个数字,后跟短划线。
尝试使用extract
df.extract_column.str.extract(r'-([^\.]*)\»', expand=False)
df.extract_column.str.extract('-(\\w+)')
Out[100]:
0
0 desiredtext1
1 desiredtext1
2 desiredtext1
3 desiredtext1
4 desiredtext1
5 desiredtext1
6 desiredtext1
7 desiredtext1
8 desiredtext1
9 desiredtext1
10 desiredtext1
11 desiredtext1
12 desiredtext1
13 desiredtext1
14 desiredtext1
15 desiredtext1
16 desiredtext1
17 desiredtext1
18 desiredtext2
19 desiredtext2
20 desiredtext2
21 desiredtext2
22 desiredtext2
23 desiredtext2
24 desiredtext2
25 desiredtext2
26 desiredtext2