<< img src =“ https://image.soinside.com/eyJ1cmwiOiAiaHR0cHM6Ly9pLnN0YWNrLmltZ3VyLmNvbS9TTUFsbS5wbmcifQ==” alt =“在此处输入图像描述”>
大家好。有人知道如何将__label__1和__label__2拆分为索引1,将文本数据拆分为索引2吗?
dataset = pd.read_csv('train.ft.txt',header=None,sep='\t',error_bad_lines=False)
for line in dataset:
idx[0]=[]
idx[1]=[]
if line.str.contains('__label__1' | '__label__2'):
a=idx[0].append()
# infile.append(a)
else:
b=idx[1].append()
类似的事情应该起作用(可运行的示例):
import pandas as pd
df = pd.DataFrame()
df["0"] = ["__label__2 Amazing", "__label__1 test"]
df = df.merge(df["0"].apply(lambda s: pd.Series({'index_1':s.split(" ",1)[0], 'index_2':s.split(" ",1)[1]})),
left_index=True, right_index=True)
print(df)
0 index_1 index_2
0 __label__2 Amazing __label__2 Amazing
1 __label__1 test __label__1 test
我假设您正在尝试解析固定格式的文件。您可以如下使用pd.read_fwf函数:
import pandas as pd
df1=pd.read_fwf("SO_Answer.csv",colspecs=[(0,10),(11,-1)],header=None)
df1.head()
输出应类似于:
0 1
0 __label__2 Stuning even for non gramer
1 __label__2 The best of sound track ever
2 __label__2 Amazing!The soundtrack is my fav
3 __label__1 Don't do it!! The high chair
4 __label__1 is compact but hard to clean is compact but ha...