仅正则表达式匹配副标题

Question

我有一个带有标题列的数据框（请参见下面的示例）

import numpy as np


Fairytales_in = {'Titles': ['Fairy Tales',
                    'Tales.3.2.Dancing Shoes, ballgowns and frogs',
                    'Tales.2.4.6.Red Riding Hood',
                    'Fairies.1Your own Fairy godmother',
                    'Ogres-1.1.The wondrous world of Shrek',
                    'Witches-1-4Maleficient and the malicious curse'
                    'Tales.2.1.The big bad wolf',
                    'Tales.2.Little Red riding Hood',
                    'Tales.2.4.6.1.Why the huntsman is underrated',
                    'Tales.5.f.Cinderella and the pumpkin carriage']}

Fairytales_in = pd.DataFrame.from_dict(Fairytales_in)

我想创建一个新列，其中包含与titles列完全相同的字符串，但仅当它是子标题时。（例如，Tales.3.2。或Ogres-1.1。或Witches-1-4或Tales.5.f）。

这将是我的预期输出：

Fairytales_expected_output = {'Titles': ['Fairy Tales',
                    'Tales.3.2.Dancing Shoes, ballgowns and frogs',
                    'Tales.2.4.6.Red Riding Hood',
                    'Fairies.1Your own Fairy godmother',
                    'Ogres-1.1.The wondrous world of Shrek',
                    'Witches-1-4Maleficient and the malicious curse',
                    'Tales.2.1.The big bad wolf',
                    'Tales.2.Little Red riding Hood',
                    'Tales.2.4.6.1.Why the huntsman is underrated',
                    'Tales.5.f.Cinderella and the pumpkin carriage'],
                    'Subheading': ['NaN', 
                                   'Tales.3.2.Dancing Shoes, ballgowns and frogs',
                                   'NaN',
                                   'NaN',
                                   'Ogres-1.1.The wondrous world of Shrek',
                                   'Witches-1-4Maleficient and the malicious curse',
                                   'Tales.2.1.The big bad wolf',
                                   'NaN',
                                   'NaN',
                                   'Tales.5.f.Cinderella and the pumpkin carriage']}

Fairytales_expected_output = pd.DataFrame.from_dict(Fairytales_expected_output)

我一直在努力寻找一种使我的模式仅与子标题匹配的方法。无论我尝试什么，仍然包括第一级或第三级标题。 This question大致相同，但它在C＃中，因此我无法在用例上使用它。

这是我到目前为止尝试过的：

Fairytales_in['Subheading'] = Fairytales_in.Titles.str.extract(r'(^(?:\w+\.|\-\d{1}\.\d{1}\.)\W*(?:\w+\b\W*){1,100})$')

但是如您所见，它无法产生预期的结果。我一直在尝试使用regex101.com，但现在已经停留了两天。修复我的图案的任何帮助将不胜感激！

Answer 1

0
投票

您可以使用

仅正则表达式匹配副标题

问题描述投票：0回答：1

1个回答

最新问题

仅正则表达式匹配副标题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1