我有一个数据框,其中包含三列用户名、名称和垂直报告。用户名有一个名称,用户名报告的对象位于垂直报告列中。
DATASET
USERNAME DESIGNATION CODE VERTICAL REPORT
RAJKUMAR.MALVIYA BM GOURAV.MOD
GOURAV.MOD ASM PRASANNA.NAIK
PRASANNA.NAIK RSM MILIND.DESHMUKH
SANJAY.BHATNAGAR NSM ARUN
UMANG.GOHIL1 BM HIREN.JASANI
MILIND.DESHMUKH ZSM SANJAY.BHATNAGAR
HIREN.JASANI ASM BHAVIN.GANDHI
SACHIN.PAWAR NSM ARUN
BHAVIN.GANDHI DSM ANURAG.JOSHI
ANURAG.JOSHI ZSM SACHIN.PAWAR
SACHIN.PAWAR NSM ARUN
SANGRAM.KEDARI BM NIKHIL.BELKHEDE
NIKHIL.BELKHEDE DSM SACHIN.PAWAR
SACHIN.PAWAR NSM ARUN
这是我的数据集,其中第一个用户 RAJKUMAR.MALVIYA 的指定代码为 BM,他的垂直报告人员是 GOURAV.MOD,类似地,GOURAV.MOD 的指定名称为 ASM,他的垂直报告人员是 PRASANNA.NAIK 等等。这根据用户名和垂直报告创建了一个层次结构,我想在 python 中解决这个层次结构。如果缺少任何名称,则应使用 L 加 COLUMNNUMBER 编号进行填充。
源代码
`# Initialize designation dictionary
designations = ['BM', 'ASM', 'CSM', 'DSM', 'RSM', 'ZSM', 'NSM']
designation_dict = {designation: [] for designation in designations}
# Iterate through each row in the Data Frame
for _, row in df.iterrows():
username = row['User Name']
designation = row['Designation Code']
vertical = row['Vertical Report']
for _, row in df.iterrows():
username = row['User Name']
designation = row['Designation Code']
vertical = row['Vertical Report']
# Check if the designation is not 'DST'
if designation.strip() and designation != 'DST':
# Append username to the appropriate designation list
designation_dict[designation].append(username)
# Find the row corresponding to the vertical report
search row = df[df['User Name'] == vertical]
# If a match is found, update username, designation, and vertical
if not search_row.empty:
username = search_row.iloc[0]['User Name']
print(username)
designation = search_row.iloc[0]['Designation Code']
print(designation)
vertical = search_row.iloc[0]['Vertical Report']
print(vertical)
# Fill any remaining blank cells with 'BLANK'
max_length = max(len(lst) for lst in designation_dict.values())
for key in designation_dict:
while len(designation_dict[key]) < max_length:
designation_dict[key].append('BLANK')
# Create DataFrame from the designation dictionary
output_df = pd.DataFrame(designation_dict)
`
EXPECTED OUTPUT DATAFRAME
BM ASM CSM DSM RSM ZSM NSM
RAJKUMAR.MALVIYA GOURAV.MOD L3 L4 PRASANNA.NAIK MILIND.DESHMUK SANJAY.BHATNAG
UMANG.GOHIL1 HIREN.JASANI L3 BHAVIN.GANDHI L5 ANURAG.JOSHI SACHIN.PAWAR
SANGRAM.KEDARI L2 L3 NIKHIL.BELKHEDE L5 L6 SACHIN.PAWAR
逻辑
使用 pandas 索引,我们可以过滤出每个名称的目标数据子集,
.loc[]
主要基于标签,但也可以与布尔数组一起使用。
这里对于每个指定行的长度都不相同,为了使其相等,我们首先使用此操作找到所有列中的最大行数
max([len(i) for i in dict_.values()])
max([len(i) for i in dict_.values()]) - len(dict_[designation]))
解决方案
data = pd.DataFrame(
{
'USERNAME': ['RAJKUMAR.MALVIYA', 'GOURAV.MOD', 'PRASANNA.NAIK', 'SANJAY.BHATNAGAR', 'UMANG.GOHIL1', 'MILIND.DESHMUKH', 'HIREN.JASANI', 'SACHIN.PAWAR', 'BHAVIN.GANDHI', 'ANURAG.JOSHI', 'SACHIN.PAWAR', 'SANGRAM.KEDARI', 'NIKHIL.BELKHEDE', 'SACHIN.PAWAR'],
'DESIGNATION CODE': ['BM', 'ASM', 'RSM', 'NSM', 'BM', 'ZSM', 'ASM', 'NSM', 'DSM', 'ZSM', 'NSM', 'BM', 'DSM', 'NSM'],
'VERTICAL REPORT': ['GOURAV.MOD', 'PRASANNA.NAIK', 'MILIND.DESHMUKH', 'ARUN', 'HIREN.JASANI', 'SANJAY.BHATNAGAR', 'BHAVIN.GANDHI', 'ARUN', 'ANURAG.JOSHI', 'SACHIN.PAWAR', 'ARUN', 'NIKHIL.BELKHEDE', 'SACHIN.PAWAR', 'ARUN']
}
)
designations = ['BM', 'ASM', 'CSM', 'DSM', 'RSM', 'ZSM', 'NSM']
dict_ = {}
for designation in designations:
dict_[designation] = list(set(data.loc[data['DESIGNATION CODE'] == designation]['USERNAME'].tolist()))
for index, designation in enumerate(designations):
dict_[designation].extend([f'L{index+1}']*(max([len(i) for i in dict_.values()]) - len(dict_[designation])))
df = pd.DataFrame(dict_)
输出
BM ASM CSM DSM RSM ZSM NSM
0 RAJKUMAR.MALVIYA GOURAV.MOD L3 NIKHIL.BELKHEDE PRASANNA.NAIK ANURAG.JOSHI SANJAY.BHATNAGAR
1 UMANG.GOHIL1 HIREN.JASANI L3 BHAVIN.GANDHI L5 MILIND.DESHMUKH SACHIN.PAWAR
2 SANGRAM.KEDARI L2 L3 L4 L5 L6 L7
希望你一切都好,这对你有帮助。
它是如何工作的?
'
# Import the pandas library
import pandas as pd
# Create the original dataframe
data = {
'USERNAME': ['RAJKUMAR.MALVIYA', 'GOURAV.MOD', 'PRASANNA.NAIK',
'SANJAY.BHATNAGAR', 'UMANG.GOHIL1', 'MILIND.DESHMUKH',
'HIREN.JASANI', 'SACHIN.PAWAR', 'BHAVIN.GANDHI',
'ANURAG.JOSHI', 'SANGRAM.KEDARI', 'NIKHIL.BELKHEDE',
'SACHIN.PAWAR'],
'DESIGNATION': ['BM', 'ASM', 'RSM', 'NSM', 'BM', 'ZSM', 'ASM', 'NSM',
'DSM', 'ZSM', 'BM', 'DSM', 'NSM'],
'VERTICAL REPORT': ['GOURAV.MOD', 'PRASANNA.NAIK', 'MILIND.DESHMUKH',
'ARUN', 'HIREN.JASANI', 'SANJAY.BHATNAGAR',
'BHAVIN.GANDHI', 'ARUN', 'ANURAG.JOSHI',
'SACHIN.PAWAR', 'NIKHIL.BELKHEDE',
'SACHIN.PAWAR', 'ARUN']
}
# List of designations
designations = ['BM', 'ASM', 'CSM', 'DSM', 'RSM', 'ZSM', 'NSM']
# Create the original dataframe
df = pd.DataFrame(data)
df_vertical = pd.DataFrame()
# Group usernames by designation
df_designation = df.groupby(
'DESIGNATION')['USERNAME'].apply(list).reset_index()
# Assign usernames to each column
for des in designations:
# Use explode to expand lists into separate rows and reset index
df_vertical[
des] = df_designation.loc[
df_designation[
'DESIGNATION'] == des, 'USERNAME'].explode().reset_index(drop=True)
# Replace NaN with f'L{n}' where n is the column number
for col in df_vertical.columns:
# Get the column number
n = df_vertical.columns.get_loc(col) + 1
# Fill NaN values with f'L{n}'
df_vertical[col] = df_vertical[col].fillna(f'L{n}')
# Print the result
print(df_vertical)
'