因此,我尝试生成包含 3 列的虚拟数据:平方英尺、价格和行政区。对于前两个,它们是纯数字的,这很好。我在电子表格上有 50,000 行数据。但是,当我添加自治市镇并从列表中指定随机值时,我收到以下输出:
Sq. feet Price Borough
0 112 345382 5
1 310 901500 5
2 215 661033 5
3 147 1038431 5
4 212 296497 5
我没有使用与数字生成相关的包,例如np.random.randint
我用的是
"Borough" : random.randrange(len(word))
我哪里出错了?
下面是我的代码
import random
import pandas as pd
import numpy as np
WORDS = ["Chelsea", "Kensington", "Westminster", "Pimlico", "Bank", "Holborn", "Camden", "Islington", "Angel", "Battersea", "Knightsbridge", "Bermondsey", "Newham"]
word = random.choice(WORDS)
np.random.seed(1)
data3 = pd.DataFrame({"Sq. feet" : np.random.randint(low=75, high=325, size=50000),
"Price" : np.random.randint(low=200000, high=1250000, size=50000),
"Borough" : random.randrange(len(word))
})
df = pd.DataFrame(data3)
df.to_csv("/Users/thomasmcnally/PycharmProjects/real_estate_dummy_date/realestate.csv", index=False)
print(df)
我期望从 WORDS [] 中得到一行随机的单词值,而返回值只是数字 5。显然,为基于文本的数据创建另一个模块并将它们打印在不同的文件中是没有意义的。
我猜你想从 WORDS 中生成 50,000 个随机选择的列表 - 它本身可以有效地重命名为 BOROUGHS:
import random
import pandas as pd
import numpy as np
SIZE = 50_000
BOROUGHS = ["Chelsea", "Kensington", "Westminster", "Pimlico", "Bank", "Holborn", "Camden", "Islington", "Angel", "Battersea", "Knightsbridge", "Bermondsey", "Newham"]
np.random.seed(1)
data3 = pd.DataFrame({"Sq. feet" : np.random.randint(low=75, high=325, size=SIZE),
"Price" : np.random.randint(low=200000, high=1250000, size=SIZE),
"Borough" : [random.choice(WORDS) for _ in range(SIZE)]
})
df = pd.DataFrame(data3)
df.to_csv("realestate.csv", index=False)
print(df)
输出
Sq. feet Price Borough
0 112 345382 Pimlico
1 310 901500 Battersea
2 215 661033 Holborn
3 147 1038431 Westminster
4 212 296497 Holborn
... ... ... ...
49995 252 1065034 Holborn
49996 117 752615 Holborn
49997 238 803058 Camden
49998 147 1163555 Bank
49999 269 888623 Westminster