如何基于定界符分隔数据框列中的字符串?

问题描述 投票:-1回答:1

所以,我有一个看起来像这样的数据框:

enter image description here

我想将“文件名”列中的值分成基于“-”和“。”的字符串。并删除扩展名。然后,我想将“路径”列中的值分成基于“ \”和“:”的字符串。我该怎么做?

python pandas dataframe text-processing
1个回答
0
投票

尚不清楚您在这里寻找什么。但是,这是我最好的解释。

设置:

df = pd.DataFrame({
    "Filename": ["doc-hi.txt", "oh-my-god.txt"],
    "Path": ["C:\asdf\asdf\asdf\kd.txt", "C:\asdcsc.docx"]
})

单独的字符串

# "separate the values in 'Filename' column into strings based on '-' and '.' and also remove the extension name"
df["Filename_split"] = df["Filename"].apply(lambda _: os.path.splitext(_)[0]).str.split(r'\.|-')

# "separate the values in 'Path' column into strings based on '\' and ':'"
df["Path_split"] = df["Path"].str.split(r'\\|:')

输出

    Filename        Path                    Filename_split  Path_split
0   doc-hi.txt      C:sdf\sdf\sdf\kd.txt    [doc, hi]       [C, , asdf, asdf, asdf, kd.txt]
1   oh-my-god.txt   C:sdcsc.docx            [oh, my, god]   [C, sdcsc.docx]
© www.soinside.com 2019 - 2024. All rights reserved.