如何更新字符串的熊猫数据框栏子

问题描述 投票:0回答:2

我有一个数据帧(“sp500news”),它看起来如下:

date_publish  \
79944   2007-01-29 19:08:35   
181781  2007-12-14 19:39:06   
213175  2008-01-22 11:17:19   
93554   2008-01-22 18:52:56   
  ...

title  
 79944   Microsoft Vista corporate sales go very well                                            
 181781  Williams No Anglican consensus on Episcopal Church                                      
 213175  CSX quarterly profit rises                                                              
 93554   Citigroup says 30 bln capital helps exceed target                                       
    ...

我试图用来自其相应的股票来更新每个公司名称DF的“符号”一栏(“成分”),它是这样的:

Symbol  Name    Sector
0   MMM 3M  Industrials
1   AOS A.O. Smith  Industrials
2   ABT Abbott  Health Care
3   ABBV    AbbVie  Health Care
...
116  C      Citigroup    Financials       
...

我已经尝试过:

for item in sp500news['title']:
    for word in item:
        if word in constituents['Name']:
            indx = constituents['Name'].index(word)
            str.replace(word, constituents['Symbol'][indx])
python pandas
2个回答
0
投票

试试下面的代码

df = pd.DataFrame({'title': ['Citigroup says 30 bln capital helps exceed target',
                             'Williams No Anglican consensus on Episcopal Church',
                             'Microsoft Vista corporate sales go very well']})

constituents = pd.DataFrame({'symbol': ['MMM', 'C', 'MCR', 'WLM'],
                             'name': ['3M', 'Citigroup', 'Microsoft', 'Williams']})

for name, symbol in zip(constituents['name'], constituents['symbol']):
    df['title'] = df['title'].str.replace(name, symbol)

产量

                                           title
0      C says 30 bln capital helps exceed target
1  WLM No Anglican consensus on Episcopal Church
2         MCR Vista corporate sales go very well

我基本上只是复制你的sp500news['title]几排并提出了一些constituents['Name']的只是为了演示的转变。从本质上讲,我访问来自pd.Seriestitlesp500news对象的字符串方法的对象,这样的话我可以申请replace到它时,它找到匹配的公司名称。


1
投票

尝试这个:

以下是这代表数据的虚拟dataframes

df1 = pd.DataFrame({'Symbol': ['MV', 'AOS','ABT'],
                  'Name': ['Microsoft Vista', 'A.0.', 'Abbot']})
df1
  Symbol    Name
0   MV  Microsoft Vista
1   AOS A.0.
2   ABT Abbot
df2 = pd.DataFrame({'title': [79944, 181781, 213175],
                   'comment': ['Microsoft Vista corporate sales go very well',
                              'Abbot consensus on Episcopal Church',
                              'A.O. says 30 bln captial helps exceed target']})

    title   comment
0   79944   Microsoft Vista corporate sales go very well
1   181781  Abbot consensus on Episcopal Church
2   213175  A.O. says 30 bln captial helps exceed target

使值映射名称的字典各自的符号

rep = dict(zip(df1.Name,df1.Symbol))
rep

{'Microsoft Vista': 'MV', 'A.0.': 'AOS', 'Abbot': 'ABT'}

使用Series.replace方法进行更换

df2['comment'] = df2['comment'].replace(rep, regex = True)
df2
   title    comment
0   79944   MV corporate sales go very well
1   181781  ABT consensus on Episcopal Church
2   213175  A.O. says 30 bln captial helps exceed target
© www.soinside.com 2019 - 2024. All rights reserved.