使用python查找举报人的层次结构

问题描述 投票:0回答:2

我有一个数据框,其中包含三列用户名、名称和垂直报告。用户名有一个名称,用户名报告的对象位于垂直报告列中。

DATASET


USERNAME              DESIGNATION CODE    VERTICAL REPORT
RAJKUMAR.MALVIYA           BM              GOURAV.MOD
GOURAV.MOD                 ASM             PRASANNA.NAIK
PRASANNA.NAIK              RSM             MILIND.DESHMUKH
SANJAY.BHATNAGAR           NSM             ARUN
UMANG.GOHIL1               BM              HIREN.JASANI
MILIND.DESHMUKH            ZSM             SANJAY.BHATNAGAR 
HIREN.JASANI               ASM             BHAVIN.GANDHI
SACHIN.PAWAR               NSM             ARUN
BHAVIN.GANDHI              DSM             ANURAG.JOSHI
ANURAG.JOSHI               ZSM             SACHIN.PAWAR
SACHIN.PAWAR               NSM             ARUN
SANGRAM.KEDARI             BM              NIKHIL.BELKHEDE
NIKHIL.BELKHEDE            DSM             SACHIN.PAWAR 
SACHIN.PAWAR               NSM             ARUN

这是我的数据集,其中第一个用户 RAJKUMAR.MALVIYA 的指定代码为 BM,他的垂直报告人员是 GOURAV.MOD,类似地,GOURAV.MOD 的指定名称为 ASM,他的垂直报告人员是 PRASANNA.NAIK 等等。这根据用户名和垂直报告创建了一个层次结构,我想在 python 中解决这个层次结构。如果缺少任何名称,则应使用 L 加 COLUMNNUMBER 编号进行填充。

源代码


`# Initialize designation dictionary
designations = ['BM', 'ASM', 'CSM', 'DSM', 'RSM', 'ZSM', 'NSM']
designation_dict = {designation: [] for designation in designations}

# Iterate through each row in the Data Frame
for _, row in df.iterrows():
    username = row['User Name']
    designation = row['Designation Code']
    vertical = row['Vertical Report']



for _, row in df.iterrows():
    username = row['User Name']
    designation = row['Designation Code']
    vertical = row['Vertical Report']

    # Check if the designation is not 'DST'
    if designation.strip() and designation != 'DST':
        # Append username to the appropriate designation list
        designation_dict[designation].append(username)

        # Find the row corresponding to the vertical report
        search row = df[df['User Name'] == vertical]

        # If a match is found, update username, designation, and vertical
        if not search_row.empty:
            username = search_row.iloc[0]['User Name']
            print(username)
            designation = search_row.iloc[0]['Designation Code']
            print(designation)
            vertical = search_row.iloc[0]['Vertical Report']
            print(vertical)

# Fill any remaining blank cells with 'BLANK'
max_length = max(len(lst) for lst in designation_dict.values())
for key in designation_dict:
    while len(designation_dict[key]) < max_length:
        designation_dict[key].append('BLANK')

# Create DataFrame from the designation dictionary
output_df = pd.DataFrame(designation_dict)
`


EXPECTED OUTPUT DATAFRAME

BM                ASM           CSM   DSM             RSM           ZSM               NSM
RAJKUMAR.MALVIYA  GOURAV.MOD     L3   L4              PRASANNA.NAIK MILIND.DESHMUK SANJAY.BHATNAG 
UMANG.GOHIL1      HIREN.JASANI   L3   BHAVIN.GANDHI   L5            ANURAG.JOSHI   SACHIN.PAWAR
SANGRAM.KEDARI    L2             L3   NIKHIL.BELKHEDE L5            L6             SACHIN.PAWAR
python pandas
2个回答
0
投票

逻辑

  • 使用 pandas 索引,我们可以过滤出每个名称的目标数据子集,

    .loc[]
    主要基于标签,但也可以与布尔数组一起使用。

  • 这里对于每个指定行的长度都不相同,为了使其相等,我们首先使用此操作找到所有列中的最大行数

max([len(i) for i in dict_.values()])

  • 根据我们当前的色谱柱长度,我们可以创建额外的新填料。

max([len(i) for i in dict_.values()]) - len(dict_[designation]))

解决方案

data = pd.DataFrame(
    {
        'USERNAME': ['RAJKUMAR.MALVIYA', 'GOURAV.MOD', 'PRASANNA.NAIK', 'SANJAY.BHATNAGAR', 'UMANG.GOHIL1', 'MILIND.DESHMUKH', 'HIREN.JASANI', 'SACHIN.PAWAR', 'BHAVIN.GANDHI', 'ANURAG.JOSHI', 'SACHIN.PAWAR', 'SANGRAM.KEDARI', 'NIKHIL.BELKHEDE', 'SACHIN.PAWAR'],
        'DESIGNATION CODE': ['BM', 'ASM', 'RSM', 'NSM', 'BM', 'ZSM', 'ASM', 'NSM', 'DSM', 'ZSM', 'NSM', 'BM', 'DSM', 'NSM'],
        'VERTICAL REPORT': ['GOURAV.MOD', 'PRASANNA.NAIK', 'MILIND.DESHMUKH', 'ARUN', 'HIREN.JASANI', 'SANJAY.BHATNAGAR', 'BHAVIN.GANDHI', 'ARUN', 'ANURAG.JOSHI', 'SACHIN.PAWAR', 'ARUN', 'NIKHIL.BELKHEDE', 'SACHIN.PAWAR', 'ARUN']
    }
)

designations = ['BM', 'ASM', 'CSM', 'DSM', 'RSM', 'ZSM', 'NSM']
dict_ = {}

for designation in designations:
    dict_[designation] = list(set(data.loc[data['DESIGNATION CODE'] == designation]['USERNAME'].tolist()))

for index, designation in enumerate(designations):
    dict_[designation].extend([f'L{index+1}']*(max([len(i) for i in dict_.values()]) - len(dict_[designation])))

df = pd.DataFrame(dict_)

输出

                 BM           ASM CSM              DSM            RSM              ZSM               NSM
0  RAJKUMAR.MALVIYA    GOURAV.MOD  L3  NIKHIL.BELKHEDE  PRASANNA.NAIK     ANURAG.JOSHI  SANJAY.BHATNAGAR
1      UMANG.GOHIL1  HIREN.JASANI  L3    BHAVIN.GANDHI             L5  MILIND.DESHMUKH      SACHIN.PAWAR
2    SANGRAM.KEDARI            L2  L3               L4             L5               L6                L7

0
投票

希望你一切都好,这对你有帮助。

它是如何工作的?

  • 按名称对用户名进行分组并将其转换为列表,创建一个名为 df_designation 的新 DataFrame。
  • 初始化一个名为 df_vertical 的新空 DataFrame。
  • 迭代指定列表中的每个指定,并将相应的用户名分配给 df_vertical DataFrame 的每一列。
  • 使用explode方法将用户名列表展开为单独的行,然后重置索引。
  • 用格式为 f'L{n}' 的字符串替换 df_vertical DataFrame 中的 NaN 值,其中 n 是列号。
  • 打印生成的数据帧。

'

# Import the pandas library
import pandas as pd

# Create the original dataframe
data = {
    'USERNAME': ['RAJKUMAR.MALVIYA', 'GOURAV.MOD', 'PRASANNA.NAIK', 
                 'SANJAY.BHATNAGAR', 'UMANG.GOHIL1', 'MILIND.DESHMUKH', 
                 'HIREN.JASANI', 'SACHIN.PAWAR', 'BHAVIN.GANDHI', 
                 'ANURAG.JOSHI', 'SANGRAM.KEDARI', 'NIKHIL.BELKHEDE', 
                 'SACHIN.PAWAR'],
    'DESIGNATION': ['BM', 'ASM', 'RSM', 'NSM', 'BM', 'ZSM', 'ASM', 'NSM', 
                    'DSM', 'ZSM', 'BM', 'DSM', 'NSM'],
    'VERTICAL REPORT': ['GOURAV.MOD', 'PRASANNA.NAIK', 'MILIND.DESHMUKH', 
                        'ARUN', 'HIREN.JASANI', 'SANJAY.BHATNAGAR', 
                        'BHAVIN.GANDHI', 'ARUN', 'ANURAG.JOSHI', 
                        'SACHIN.PAWAR', 'NIKHIL.BELKHEDE', 
                        'SACHIN.PAWAR', 'ARUN']
}

# List of designations
designations = ['BM', 'ASM', 'CSM', 'DSM', 'RSM', 'ZSM', 'NSM']

# Create the original dataframe
df = pd.DataFrame(data)
df_vertical = pd.DataFrame()

# Group usernames by designation
df_designation = df.groupby(
    'DESIGNATION')['USERNAME'].apply(list).reset_index()

# Assign usernames to each column
for des in designations:
    # Use explode to expand lists into separate rows and reset index
    df_vertical[
        des] = df_designation.loc[
            df_designation[
                'DESIGNATION'] == des, 'USERNAME'].explode().reset_index(drop=True)

# Replace NaN with f'L{n}' where n is the column number
for col in df_vertical.columns:
    # Get the column number
    n = df_vertical.columns.get_loc(col) + 1  
    
    # Fill NaN values with f'L{n}'
    df_vertical[col] = df_vertical[col].fillna(f'L{n}')

# Print the result
print(df_vertical)

'

© www.soinside.com 2019 - 2024. All rights reserved.