如何重构代码以修复 159 行 python 代码中已弃用的列表“.append”?

问题描述 投票:0回答:1

我们引用一位想要将数据迁移到 Hubspot 的客户的话,现在我们正在处理需要规划的数据建模和数据库问题。

在规划迁移时,我们对 Hubspot 数据进行 RTM,我的团队成员发现了这段代码,它在几个方面都有帮助。但是,已经使用了编辑列表 .append,这意味着我们必须更改其编写方式...取决于数据的结构。

我认为既然它说“DataFrame”对象没有属性“append”就意味着它是pandas,很抱歉造成混乱,因为我认为所有数据帧都是pandas数据帧。

对于这么大的代码块,我有几个问题,这里列出了整个代码Hubspot社区数据

  1. 欢迎提出所有问题,希望这不是问一个愚蠢的问题,我不知道如何解决这个问题。
  2. 你如何看待整个 152 行代码并将其解构为更小的块?最好,每个函数只做 1 或 2 件事,但不要更多,这就是我希望做的。
  3. 现在 .append 不再可用,我该如何重构或调整下面的块代码,以便数据字典可以工作?由于 _append 不太可能是最有效的选择,我不确定如何或从哪里开始。

编辑1。好的,我正在更新共享代码,并且错误“返回对象。getattribute(self,name) AttributeError:“DataFrame”对象没有属性“append”。您的意思是:“_append”吗?”来自第 90 行,位于数据字典之后的generate_data 函数的第一个 for 循环中。

        company_industry = faker.random_element(
            ["Technology", "Healthcare", "Finance", "Real Estate"]
        )

Hubspot 数据伪造者

# This code has been created by Michael Kupermann ([email protected] or [email protected])
# The purpose of this code is to generate dummy data that simulates a realistic dataset for HubSpot CRM.
# This data can then be used for demonstrations, testing, or other purposes that require a representative dataset.
# You need to amend the HubSpot Sales and Service Pipelines before you import the data.
#
# Required Packages:
# 1. Faker: This package is used to generate the fake data for our dataset.
# 2. Pandas: This package is used to handle the data in a tabular format and to write the data to an Excel file.
# 3. DateTime: This package is used to generate realistic date data for the 'close_date' field.
#
# To install the necessary packages, you can use pip, the Python package installer.
# Open your terminal (or command prompt on Windows), and enter the following commands:
# pip install faker
# pip install pandas
# pip install datetime
#
# If you're using a Jupyter notebook, you can prefix these commands with an exclamation mark:
# !pip install faker
# !pip install pandas
# !pip install datetime

from faker import Faker
import pandas as pd
from datetime import datetime, timedelta


#  Function to generate data for a given country. Here 100 companies with 10 contacts, deals, tickets for each company
def generate_data(
    country,
    company_rows=100,
    contacts_per_company=10,
    deals_per_company=10,
    products_per_deal=10,
):
    # Set the locale for Faker based on the country
    if country == "Germany":
        faker = Faker("de_DE")
    elif country == "United States":
        faker = Faker("en_US")
    elif country == "France":
        faker = Faker("fr_FR")
    elif country == "Italy":
        faker = Faker("it_IT")
    elif country == "Japan":
        faker = Faker("ja_JP")
    elif country == "United Kingdom":
        faker = Faker("en_GB")
    elif country == "Canada":
        faker = Faker("en_CA")
    elif country == "Austria":
        faker = Faker("de_AT")
    elif country == "Switzerland":
        faker = Faker("de_CH")

    # Create a dictionary to hold the data
    data = {
        "company_name": [],
        "company_domain": [],
        "company_industry": [],
        "company_address": [],
        "company_country": [],
        "contact_firstname": [],
        "contact_lastname": [],
        "contact_email": [],
        "contact_phone": [],
        "contact_address": [],
        "contact_country": [],
        "contact_function": [],
        "contact_department": [],
        "deal_name": [],
        "deal_stage": [],
        "deal_amount": [],
        "deal_type": [],
        "deal_source": [],
        "close_date": [],
        "ticket_title": [],
        "ticket_status": [],
        "ticket_priority": [],
        "product_name": [],
        "product_price": [],
        "product_description": [],
        "product_sku": [],
        "product_quantity": [],
    }

    # Loop to generate data for each company
    for _ in range(company_rows):
        company_name = faker.company()
        company_domain = faker.domain_name()
        company_industry = faker.random_element(
            ["Technology", "Healthcare", "Finance", "Real Estate"]
        )
        company_address = faker.address().replace("\n", ", ")
        company_country = country

        # Loop to generate data for each contact
        for _ in range(contacts_per_company):
            contact_firstname = faker.first_name()
            contact_lastname = faker.last_name()
            contact_email = faker.email()
            contact_phone = faker.phone_number()
            contact_address = faker.address().replace("\n", ", ")
            contact_country = country
            contact_function = faker.job()
            contact_department = faker.random_element(
                ["Sales", "Marketing", "Human Resources", "Engineering"]
            )

            # Append generated company and contact data to the lists in the dictionary
            data["company_name"].append(company_name)
            data["company_domain"].append(company_domain)
            data["company_industry"].append(company_industry)
            data["company_address"].append(company_address)
            data["company_country"].append(company_country)

            data["contact_firstname"].append(contact_firstname)
            data["contact_lastname"].append(contact_lastname)
            data["contact_email"].append(contact_email)
            data["contact_phone"].append(contact_phone)
            data["contact_address"].append(contact_address)
            data["contact_country"].append(contact_country)
            data["contact_function"].append(contact_function)
            data["contact_department"].append(contact_department)

            # Generate deal and product data
            data["deal_name"].append(f"Deal-{faker.uuid4()}")
            data["deal_stage"].append(
                faker.random_element(
                    [
                        "Appointment Scheduled",
                        "Qualified To Buy",
                        "Presentation Scheduled",
                        "Decision Maker Brought-In",
                    ]
                )
            )
            data["deal_amount"].append(faker.random_int(min=1000, max=50000))
            data["deal_type"].append(
                faker.random_element(["New Business", "Existing Business"])
            )
            data["deal_source"].append(
                faker.random_element(
                    ["Direct Traffic", "Organic Search", "Paid Search", "Social Media"]
                )
            )
            data["close_date"].append(
                (
                    datetime.today() + timedelta(days=faker.random_int(min=1, max=90))
                ).date()
            )

            # Generate product data
            data["product_name"].append(f"Product-{faker.uuid4()}")
            data["product_price"].append(faker.random_int(min=10, max=1000))
            data["product_description"].append(faker.catch_phrase())
            data["product_sku"].append(faker.random_int(min=10000, max=99999))
            data["product_quantity"].append(faker.random_int(min=1, max=100))

            # Generate ticket data
            data["ticket_title"].append(f"Ticket-{faker.uuid4()}")
            data["ticket_status"].append(
                faker.random_element(
                    ["New", "Waiting on contact", "Waiting on us", "Closed"]
                )
            )
            data["ticket_priority"].append(
                faker.random_element(["Low", "Medium", "High"])
            )

    # Convert the data dictionary to a pandas DataFrame
    df = pd.DataFrame(data)
    return df


# Define the list of countries for which we want to generate data
g7_countries = [
    "Canada",
    "France",
    "Germany",
    "Italy",
    "Japan",
    "United Kingdom",
    "United States",
    "Austria",
    "Switzerland",
]

# Create an empty DataFrame to hold the generated data
result = pd.DataFrame()
for country in g7_countries:
    df = generate_data(country)
    # Append the data for each country to the result DataFrame
    result = result.append(df)

# Write the generated data to an Excel file
result.to_excel(r"C:\~\~\~\hubspot_dummy_data.xlsx", index=False)

python dataframe data-migration hubspot
1个回答
0
投票

解决标题中有关使用已弃用方法的问题

pandas.DataFrame.append

正如评论中已经提到的,您在问题中共享的代码的第一部分中的

append
list.append
,它在恒定的时间内工作正常,并且没有被弃用。

问题出在以下部分,其中使用了

pandas.DataFrame.append

# Create an empty DataFrame to hold the generated data
result = pd.DataFrame()
for country in g7_countries:
    df = generate_data(country)
    # Append the data for each country to the result DataFrame
    result = result.append(df)

要摆脱

AttributeError
,您可以使用
concat
:

result_list = []

for country in g7_countries:
    df = generate_data(country)
    result_list.append(df)

result = pd.concat(result_list)

此代码会生成假数据,您可能只想将其用于测试。我不会费心去优化它。

© www.soinside.com 2019 - 2024. All rights reserved.