如何通过使用if条件分割数据框

问题描述 投票:0回答:2

<< img src =“ https://image.soinside.com/eyJ1cmwiOiAiaHR0cHM6Ly9pLnN0YWNrLmltZ3VyLmNvbS9TTUFsbS5wbmcifQ==” alt =“在此处输入图像描述”>

大家好。有人知道如何将__label__1和__label__2拆分为索引1,将文本数据拆分为索引2吗?

dataset = pd.read_csv('train.ft.txt',header=None,sep='\t',error_bad_lines=False)
for line in dataset:
idx[0]=[]
idx[1]=[]
if line.str.contains('__label__1' | '__label__2'):
    a=idx[0].append()
    # infile.append(a)
else:
    b=idx[1].append()
python pandas dataframe
2个回答
1
投票

类似的事情应该起作用(可运行的示例):

import pandas as pd

df = pd.DataFrame()

df["0"] = ["__label__2 Amazing", "__label__1 test"]

df = df.merge(df["0"].apply(lambda s: pd.Series({'index_1':s.split(" ",1)[0], 'index_2':s.split(" ",1)[1]})),
    left_index=True, right_index=True)

print(df)

输出:

                    0     index_1  index_2
0  __label__2 Amazing  __label__2  Amazing
1     __label__1 test  __label__1     test

0
投票

我假设您正在尝试解析固定格式的文件。您可以如下使用pd.read_fwf函数:

代码:

import pandas as pd

df1=pd.read_fwf("SO_Answer.csv",colspecs=[(0,10),(11,-1)],header=None)
df1.head()

输出:

输出应类似于:

   0            1
0   __label__2  Stuning even for non gramer
1   __label__2  The best of sound track ever
2   __label__2  Amazing!The soundtrack is my fav
3   __label__1  Don't do it!! The high chair
4   __label__1  is compact but hard to clean is compact but ha...

© www.soinside.com 2019 - 2024. All rights reserved.