我正在开始机器学习,并且正在尝试从头开始在Kaggle Titanic dataset上实现Logistic回归。我编写的代码是从在线课程中学到的,在这里我无法实现梯度下降。问题是在计算了W和B梯度并在名为logisitic_regression的函数中实现更新后,其中W = W-alpha wgrad并且b = b-alpha bgrad,由于某种原因的损失不会减少,并且W和b参数不会更新。我似乎在代码中找不到错误,有人可以帮忙吗?请参阅以下功能。如果您需要更多信息,请告诉我。
#Implement sigmoid action potenial function
def sigmoid(z):
'''
Input:
z: Scalar or arry of dimension n
Output:
sgmd: Scalar or array of dimension n
'''
sgmd = 1/(1+np.exp(-z))
return sgmd
#Define prediction function
def yPredLogistic(X, w, b=0):
'''
Input:
X: nxd matric
w: d-dimensional vector
b: scalar (optional, if not pass on is treated as 0)
Output:
prob: n-dimensional vector
'''
prob = sigmoid(np.inner(X,w.T) +b)
return prob
#Define negative loglikelihood as log oss
def log_loss(X, y, w, b=0):
'''
Input:
X: nxd matrix
y: n-dimensional vector with labels (+1 or -1)
w: d=dimensional vector
Output:
nll: a scalar
'''
nll = -np.sum(np.log(sigmoid(y*(np.inner(w.T,X) +b))))
return nll
#define gradient
def gradient(X, y, w, b):
'''
Input:
X: nxd matrix
y: n-dimensional vector with labels +1 or -1
w: d-dimensional vector
b: scalr bias term
Output:
wgrad: d-dimensional vector with gradient
bgrad: a scalar with gradient
'''
n, d = X.shape
#wgrad = np.zeros(d)
#bgrad = 0.0
#h = y - yPredLogistic(X,w, b)
wgrad = -y*(sigmoid(-y*(np.inner(w.T,X) +b)))@X
#partialx = -y*(sigmoid(-y*(np.inner(w.T,X) +b)))@X
bgrad = np.sum(-y*(sigmoid(-y*(np.inner(w.T,X) +b))))
return wgrad, bgrad
#Implement weight update of gradient descent
def logisitic_regression(X,y, max_iter, alpha):
'''
Input:
X: nxd matrix
y: n-dimensional vector with labels +1 or -1
max_iter: max iterations
alpha: learning or step rate
Output:
w: d-dimensional vector
b: scalr bias term
losses: losses
'''
n, d = X.shape
w = np.zeros(d)
b = 0.0
#losses = np.zeros(max_iter)
losses = []
for step in range(max_iter):
#Get wgradient and b gradient
wgrad, bgrad = gradient(X,y, w,b)
w = w - alpha*wgrad
#update b
b = b - alpha*bgrad
#define losses
losses.append(log_loss(X,y,w,b))
return w, b, losses
我认为问题是您尝试实现+ 1,-1输出标签的代码,而泰坦尼克号数据集的输出为0/1,而不是+/- 1,因此您必须更改算法并正确计算导数和对数损失因为这不是您用于0/1标签的对数丢失公式。