我们引用一位想要将数据迁移到 Hubspot 的客户的话,现在我们正在处理需要规划的数据建模和数据库问题。
在规划迁移时,我们对 Hubspot 数据进行 RTM,我的团队成员发现了这段代码,它在几个方面都有帮助。但是,已经使用了编辑列表 .append,这意味着我们必须更改其编写方式...取决于数据的结构。
我认为既然它说“DataFrame”对象没有属性“append”就意味着它是pandas,很抱歉造成混乱,因为我认为所有数据帧都是pandas数据帧。
对于这么大的代码块,我有几个问题,这里列出了整个代码Hubspot社区数据
编辑1。好的,我正在更新共享代码,并且错误“返回对象。getattribute(self,name) AttributeError:“DataFrame”对象没有属性“append”。您的意思是:“_append”吗?”来自第 90 行,位于数据字典之后的generate_data 函数的第一个 for 循环中。
company_industry = faker.random_element(
["Technology", "Healthcare", "Finance", "Real Estate"]
)
# This code has been created by Michael Kupermann ([email protected] or [email protected])
# The purpose of this code is to generate dummy data that simulates a realistic dataset for HubSpot CRM.
# This data can then be used for demonstrations, testing, or other purposes that require a representative dataset.
# You need to amend the HubSpot Sales and Service Pipelines before you import the data.
#
# Required Packages:
# 1. Faker: This package is used to generate the fake data for our dataset.
# 2. Pandas: This package is used to handle the data in a tabular format and to write the data to an Excel file.
# 3. DateTime: This package is used to generate realistic date data for the 'close_date' field.
#
# To install the necessary packages, you can use pip, the Python package installer.
# Open your terminal (or command prompt on Windows), and enter the following commands:
# pip install faker
# pip install pandas
# pip install datetime
#
# If you're using a Jupyter notebook, you can prefix these commands with an exclamation mark:
# !pip install faker
# !pip install pandas
# !pip install datetime
from faker import Faker
import pandas as pd
from datetime import datetime, timedelta
# Function to generate data for a given country. Here 100 companies with 10 contacts, deals, tickets for each company
def generate_data(
country,
company_rows=100,
contacts_per_company=10,
deals_per_company=10,
products_per_deal=10,
):
# Set the locale for Faker based on the country
if country == "Germany":
faker = Faker("de_DE")
elif country == "United States":
faker = Faker("en_US")
elif country == "France":
faker = Faker("fr_FR")
elif country == "Italy":
faker = Faker("it_IT")
elif country == "Japan":
faker = Faker("ja_JP")
elif country == "United Kingdom":
faker = Faker("en_GB")
elif country == "Canada":
faker = Faker("en_CA")
elif country == "Austria":
faker = Faker("de_AT")
elif country == "Switzerland":
faker = Faker("de_CH")
# Create a dictionary to hold the data
data = {
"company_name": [],
"company_domain": [],
"company_industry": [],
"company_address": [],
"company_country": [],
"contact_firstname": [],
"contact_lastname": [],
"contact_email": [],
"contact_phone": [],
"contact_address": [],
"contact_country": [],
"contact_function": [],
"contact_department": [],
"deal_name": [],
"deal_stage": [],
"deal_amount": [],
"deal_type": [],
"deal_source": [],
"close_date": [],
"ticket_title": [],
"ticket_status": [],
"ticket_priority": [],
"product_name": [],
"product_price": [],
"product_description": [],
"product_sku": [],
"product_quantity": [],
}
# Loop to generate data for each company
for _ in range(company_rows):
company_name = faker.company()
company_domain = faker.domain_name()
company_industry = faker.random_element(
["Technology", "Healthcare", "Finance", "Real Estate"]
)
company_address = faker.address().replace("\n", ", ")
company_country = country
# Loop to generate data for each contact
for _ in range(contacts_per_company):
contact_firstname = faker.first_name()
contact_lastname = faker.last_name()
contact_email = faker.email()
contact_phone = faker.phone_number()
contact_address = faker.address().replace("\n", ", ")
contact_country = country
contact_function = faker.job()
contact_department = faker.random_element(
["Sales", "Marketing", "Human Resources", "Engineering"]
)
# Append generated company and contact data to the lists in the dictionary
data["company_name"].append(company_name)
data["company_domain"].append(company_domain)
data["company_industry"].append(company_industry)
data["company_address"].append(company_address)
data["company_country"].append(company_country)
data["contact_firstname"].append(contact_firstname)
data["contact_lastname"].append(contact_lastname)
data["contact_email"].append(contact_email)
data["contact_phone"].append(contact_phone)
data["contact_address"].append(contact_address)
data["contact_country"].append(contact_country)
data["contact_function"].append(contact_function)
data["contact_department"].append(contact_department)
# Generate deal and product data
data["deal_name"].append(f"Deal-{faker.uuid4()}")
data["deal_stage"].append(
faker.random_element(
[
"Appointment Scheduled",
"Qualified To Buy",
"Presentation Scheduled",
"Decision Maker Brought-In",
]
)
)
data["deal_amount"].append(faker.random_int(min=1000, max=50000))
data["deal_type"].append(
faker.random_element(["New Business", "Existing Business"])
)
data["deal_source"].append(
faker.random_element(
["Direct Traffic", "Organic Search", "Paid Search", "Social Media"]
)
)
data["close_date"].append(
(
datetime.today() + timedelta(days=faker.random_int(min=1, max=90))
).date()
)
# Generate product data
data["product_name"].append(f"Product-{faker.uuid4()}")
data["product_price"].append(faker.random_int(min=10, max=1000))
data["product_description"].append(faker.catch_phrase())
data["product_sku"].append(faker.random_int(min=10000, max=99999))
data["product_quantity"].append(faker.random_int(min=1, max=100))
# Generate ticket data
data["ticket_title"].append(f"Ticket-{faker.uuid4()}")
data["ticket_status"].append(
faker.random_element(
["New", "Waiting on contact", "Waiting on us", "Closed"]
)
)
data["ticket_priority"].append(
faker.random_element(["Low", "Medium", "High"])
)
# Convert the data dictionary to a pandas DataFrame
df = pd.DataFrame(data)
return df
# Define the list of countries for which we want to generate data
g7_countries = [
"Canada",
"France",
"Germany",
"Italy",
"Japan",
"United Kingdom",
"United States",
"Austria",
"Switzerland",
]
# Create an empty DataFrame to hold the generated data
result = pd.DataFrame()
for country in g7_countries:
df = generate_data(country)
# Append the data for each country to the result DataFrame
result = result.append(df)
# Write the generated data to an Excel file
result.to_excel(r"C:\~\~\~\hubspot_dummy_data.xlsx", index=False)
pandas.DataFrame.append
。
正如评论中已经提到的,您在问题中共享的代码的第一部分中的
append
是list.append
,它在恒定的时间内工作正常,并且没有被弃用。
问题出在以下部分,其中使用了
pandas.DataFrame.append
:
# Create an empty DataFrame to hold the generated data
result = pd.DataFrame()
for country in g7_countries:
df = generate_data(country)
# Append the data for each country to the result DataFrame
result = result.append(df)
要摆脱
AttributeError
,您可以使用 concat
:
result_list = []
for country in g7_countries:
df = generate_data(country)
result_list.append(df)
result = pd.concat(result_list)
此代码会生成假数据,您可能只想将其用于测试。我不会费心去优化它。