我已经使用 GridSearchCv 构建了一个模型,现在我想将其部署到 sagemaker 并创建端点,我该怎么做。
代码示例
_data=pd.read_csv('s3://ml-titanic-dataset/train_data/train.csv')
_data.head()
#handling age NA
_data['Age'].fillna(_data.groupby('Sex')['Age'].transform(lambda x:x.mean()),inplace=True)
_data['Cabin']=_data['Cabin'].fillna('missing')
# selecting columns
x=_data.loc[:,['Pclass','Sex','Age','SibSp','Parch','Fare','Embarked']]
y=_data['Survived']
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.20,train_size=0.80,random_state=42)
cat_col=x.select_dtypes('O').columns.values
num_col=x.select_dtypes(['int64','float64']).columns.values
cat_pipe=Pipeline([
('impute',SimpleImputer(strategy='most_frequent')),
('encoding',OneHotEncoder(drop='first',handle_unknown='ignore'))
])
num_pipe=Pipeline([
('scaling',StandardScaler())
])
ct=ColumnTransformer([
('categorical',cat_pipe,cat_col),
('numerical',num_pipe,num_col)
],remainder='passthrough')
final_pipe=Pipeline([
('data_processing',ct),
('mdl','passthrough')
])
# final_pipe.fit_transform(x_train,y_train)
parameter=[
{
'mdl':[RandomForestClassifier()]
},
{
'mdl':[LogisticRegression()],
'mdl__C':[0.4,0.6,0.8,0.9]
},
{
'mdl':[AdaBoostClassifier()],
#'mdl__estimator':[LogisticRegression(),DecisionTreeClassifier()]
}
]
grid_search=GridSearchCV(final_pipe,parameter,cv=5,scoring='f1',verbose=5,error_score='raise')
grid_search.fit(x_train,y_train)
accuracy_score(y_test,grid_search.predict(x_test))
我是否可以将自行创建的模型部署到 sagemaker,或者我必须使用内置 sagemaker 算法来部署和创建端点,请帮助我理解
您不需要使用SageMaker内置算法,您可以自带算法,也可以自带图像。如果这是基于 sklearn 的模型,您可以使用预构建的 SKLearn 图像 并带上您自己的推理代码,详细信息请参见此处 - https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-推理代码.html