如何在数据框架中添加以下自定义列?

问题描述 投票:0回答:1
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','','Steve','Tom','Jack',
   'Lee','David','','Betina','Andres']),
   'Age':pd.Series([25,,25,23,30,29,23,'NULL',40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)

summary = df.describe(include='all').T
print(summary)

我怎样才能创建两个列来获取total_duplicate_value_count和total_null_value_count。然后把它添加到现有的 摘要 数据框架 ?

Expected Output :
column_name     total_null_value_count  total_duplicate_value_count    count  ...
Name            2                       1                              12     ...
Age             2                       3                              12     ...
Rating          0                       0                              12     ...
python python-3.x pandas metrics summary
1个回答
0
投票

首先追加空值计数 isna().sum() 作为新的一行,进行移位,然后将新的列与新的行之间的差值追加到一起。countunique 作为重复的计数。

df.describe(include='all').append(df.isna().sum().rename('total_null_value_count')).T.assign(total_duplicate_count = df.describe(include='all').loc['count'] - df.describe(include='all').loc['unique'])
© www.soinside.com 2019 - 2024. All rights reserved.