Python 内核在生成随机数据时崩溃

问题描述 投票:0回答:0

当我执行一段特定的代码时,我一直无法找到 Python 内核崩溃的确切解决方案。我正在尝试为项目生成随机销售表。以下是我的代码和输出。我已经在 Jupyter 和 VSCode 中尝试过这个。我已经将它作为一个直接的 Python 文件进行了尝试。当我运行最后一段代码时,它要么崩溃,要么不执行最后一部分。

生成随机客户:

import numpy as np
import pandas as pd
import names

data = np.random.randint(1, 200000, size=50000)
df_customers = pd.DataFrame(data, columns=['CustomerID'])

def customer_generator_first(cell_val):
    cell_val = names.get_first_name()
    return cell_val 

# instantiate product id col with nan
df_customers['FirstName'] = np.nan

# apply your function to product id col
df_customers['FirstName'] = df_customers['FirstName'].apply(customer_generator_first)

def customer_generator_last(cell_val):
    cell_val = names.get_last_name()
    return cell_val 

# instantiate product id col with nan
df_customers['LastName'] = np.nan

# apply your function to product id col
df_customers['LastName'] = df_customers['LastName'].apply(customer_generator_last)

输出:

+------------+-----------+----------+
| CustomerID | FirstName | LastName |
+------------+-----------+----------+
|     157863 | Kimberly  | Archey   |
|     148101 | Tony      | Roberson |
|     113579 | Mandy     | Kridel   |
|      23000 | Russell   | Cornett  |
|     160104 | Craig     | Sterling |
+------------+-----------+----------+

根据我下载的CSV文件生成产品表:

import os
import numpy as np
import pandas as pd
import string
import random

# assign directory
directory = '[MYPATH]'

# myFilePath = os.listdir(directory)

f = 'Amazon-Products.csv'

myFileName = os.path.join(directory, f)

# print(myFilePath)

df = pd.read_csv(myFileName)

df['discount_price'] = df['discount_price'].str.replace(',','')
df['discount_price'] = df['discount_price'].str.replace('₹','')

df['actual_price'] = df['actual_price'].str.replace(',','')
df['actual_price'] = df['actual_price'].str.replace('₹','')

df2 = df.drop(df.columns[[0, 4, 5]],axis = 1)

df2['no_of_ratings'] = df2['no_of_ratings'].str.replace(',','')

df2['discount_price'] = df2['discount_price'].fillna(0)
df2['actual_price'] = df2['actual_price'].fillna(0)

df2['discount_price_USD'] = df2['discount_price'].astype(str).astype(float) * 0.0122
df2['actual_price_USD'] = df2['actual_price'].astype(str).astype(float) * 0.0122

df3 = df2.drop(df2.columns[[5, 6]],axis = 1)

df3['main_category'] = df3['main_category'].str.title()

# Just added cell_val as part of the arguments
def id_generator(cell_val , size=12, chars=string.ascii_uppercase + string.digits):
    cell_val = ''.join(random.choice(chars) for _ in range(size))
    return cell_val 

# instantiate product id col with nan
df3['ProductID'] = np.nan

# apply your function to product id col
df3['ProductID'] = df3['ProductID'].apply(id_generator)

输出:

+---------------------------------------------------+---------------+------------------+---------+---------------+--------------------+------------------+--------------+
|                       name                        | main_category |   sub_category   | ratings | no_of_ratings | discount_price_USD | actual_price_USD |  ProductID   |
+---------------------------------------------------+---------------+------------------+---------+---------------+--------------------+------------------+--------------+
| Lloyd 1.5 Ton 3 Star Inverter Split Ac (5 In 1... | Appliances    | Air Conditioners |     4.2 |          2255 |           402.5878 |          719.678 | D5QPATUY7NQ4 |
| LG 1.5 Ton 5 Star AI DUAL Inverter Split AC (C... | Appliances    | Air Conditioners |     4.2 |          2948 |           567.1780 |          927.078 | WDF3BP4HJXTV |
| LG 1 Ton 4 Star Ai Dual Inverter Split Ac (Cop... | Appliances    | Air Conditioners |     4.2 |          1206 |           420.7780 |          756.278 | Z5SESQAXVVWW |
| LG 1.5 Ton 3 Star AI DUAL Inverter Split AC (C... | Appliances    | Air Conditioners |     4.0 |            69 |           463.4780 |          841.678 | B7NPXS4E4IUQ |
| Carrier 1.5 Ton 3 Star Inverter Split AC (Copp... | Appliances    | Air Conditioners |     4.1 |           630 |           420.7780 |          827.038 | BAAGUH73J8VF |
+---------------------------------------------------+---------------+------------------+---------+---------------+--------------------+------------------+--------------+

创建商店代码和时间序列:

store_codes = np.arange(1,3)

date_range_2022 = pd.date_range(start = '2021-01-01', end = '2022-12-31', freq="D")

输出:

[1 2]

DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
               '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
               '2021-01-09', '2021-01-10',
               ...
               '2022-12-22', '2022-12-23', '2022-12-24', '2022-12-25',
               '2022-12-26', '2022-12-27', '2022-12-28', '2022-12-29',
               '2022-12-30', '2022-12-31'],
              dtype='datetime64[ns]', length=730, freq='D')

根据以上所有内容创建销售表。这是保持内核崩溃的部分,不管现在我限制上面的数据:

index = pd.MultiIndex.from_product(
   [date_range_2022, store_codes, df3['ProductID'], df_customers['CustomerID']],
   names = ['Date', 'StoreCode', 'ProductID', 'CustomerID'])

sales = pd.DataFrame(index = index)

VSCode 的输出:

Canceled future for execute_request message before replies were done

The Kernel crashed while executing code in the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of failure.

当我点击“更多信息”链接时,生成的 GitHub 存储库没有有用的信息。我在运行 MacOS Monterey 12.6.5 的 MacBook Pro 上。 Python 版本是 3.9.

python pandas numpy visual-studio-code jupyter-lab
© www.soinside.com 2019 - 2024. All rights reserved.