将具有一个单一分类值的列添加到熊猫数据框

Question

我有一个

pandas.DataFrame

df

，并想添加一个新列

col

，只有一个值

"hello"

。我希望这个专栏是 dtype

category

和单一类别

"hello"

。我可以做到以下几点。

df["col"] = "hello"
df["col"] = df["col"].astype("category")

我真的需要写
```
df["col"]
```
三次才能达到这个目的吗？
在第一行之后，我担心中间数据框
```
df
```
可能会在新列转换为分类之前占用大量空间。（数据框相当大，有数百万行，值
```
"hello"
```
实际上是一个更长的字符串。）

在避免上述问题的同时，是否还有其他直接、“简短”的方法来实现这一目标？

另一种解决方案是

df["col"] = pd.Categorical(itertools.repeat("hello", len(df)))

但它需要

itertools

和

len(df)

的使用，我不确定内存使用情况。

Answer 1

我们可以显式构建正确大小和类型的 Series，而不是通过

__setitem__

隐式构建然后转换：

df['col'] = pd.Series('hello', index=df.index, dtype='category')

示例程序：

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3]})

df['col'] = pd.Series('hello', index=df.index, dtype='category')

print(df)
print(df.dtypes)
print(df['col'].cat.categories)

   a    col
0  1  hello
1  2  hello
2  3  hello

a         int64
col    category
dtype: object

Index(['hello'], dtype='object')

Answer 2

一个简单的方法是使用

df.assign

创建新变量，然后使用

category

以及特定列的 dtype 字典将 dtype 更改为

df.astype

。

df = df.assign(col="hello").astype({'col':'category'})

df.dtypes

A         int64
col    category
dtype: object

这样你就不必创建一系列长度等于数据帧。您可以直接广播输入字符串，这会有点更多的时间和内存效率。

如您所见，这种方法非常可扩展。您可以根据需要分配多个变量，一些变量也基于复杂的功能。然后根据要求为它们设置数据类型。

df = pd.DataFrame({'A':[1,2,3,4]})

df = (df.assign(col1 = 'hello',                    #Define column based on series or broadcasting
                col2 = lambda x:x['A']**2,         #Define column based on existing columns
                col3 = lambda x:x['col2']/x['A'])  #Define column based on previously defined columns
        .astype({'col1':'category',
                 'col2':'float'}))

print(df)
print(df.dtypes)

   A   col1  col2  col3
0  1  hello   1.0   1.0
1  2  hello   4.0   2.0
2  3  hello   9.0   3.0
3  4  hello  16.0   4.0


A          int64
col1    category  #<-changed dtype
col2     float64  #<-changed dtype
col3     float64
dtype: object

Answer 3

这个方案肯定解决了第一点，不确定第二点：

df['col'] = pd.Categorical(('hello' for i in len(df)))

本质上

我们首先创建一个'hello'的生成器，长度等于df中的记录数
然后我们将它传递给
```
pd.Categorical
```
使其成为分类列。

将具有一个单一分类值的列添加到熊猫数据框

问题描述投票：0回答：3

3个回答

最新问题

将具有一个单一分类值的列添加到熊猫数据框

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3