假定我们的输入为
x
x
x,
x
x
x 可以是多个维度的,我们想要根据
x
x
x 去预测
y
y
y,
y
∈
{
0
,
1
}
y\in \{0,1\}
y∈{0,1}。逻辑斯谛的模型如下:
p
(
Y
=
1
∣
x
)
=
e
x
p
(
w
⋅
x
)
1
+
e
x
p
(
w
⋅
x
)
(1)
p(Y=1|x)=\frac{exp(w\cdot x)}{1+exp(w\cdot x)}\tag{1}
p(Y=1∣x)=1+exp(w⋅x)exp(w⋅x)(1)
其中的参数
w
w
w就是我们要进行学习的,注意:它是包含了权重系数和偏置(bias)b的。在书写程序时,这样表示更加简洁。
二、极大似然法参数估计
参数
w
w
w是我们需要学习的,我们采用极大似然法估计模型参数。
设:
P
(
Y
=
1
∣
x
)
=
π
(
x
)
,
P
(
Y
=
0
∣
x
)
=
1
−
π
(
x
)
(2)
P(Y=1|x)=\pi(x),\quad P(Y=0|x)=1-\pi(x)\tag{2}
P(Y=1∣x)=π(x),P(Y=0∣x)=1−π(x)(2)
似然函数为:
∏
i
=
1
N
[
π
(
x
i
)
]
y
i
[
1
−
π
(
x
i
)
]
1
−
y
i
(3)
\prod_{i=1}^N[\pi(x_i)]^{y_i}[1-\pi(x_i)]^{1-y_i} \tag{3}
i=1∏N[π(xi)]yi[1−π(xi)]1−yi(3)
因为这种指数的形式不利于求导我们需要将它们转化为对数的形式,如下:
L
(
w
)
=
∑
i
=
1
N
[
y
i
l
o
g
π
(
x
i
)
+
(
1
−
y
i
)
l
o
g
(
1
−
π
(
x
i
)
)
]
=
∑
i
=
1
N
[
y
i
l
o
g
(
π
(
x
i
)
1
−
π
(
x
i
)
)
+
l
o
g
(
1
−
π
(
x
i
)
)
]
=
∑
i
=
1
N
[
y
i
(
w
⋅
x
i
)
−
l
o
g
(
1
+
e
x
p
(
w
⋅
x
i
)
)
]
(4)
\begin{aligned} L(w)=&\sum_{i=1}^N[y_ilog\pi(x_i)+(1-y_i)log(1-\pi(x_i))] \\ =&\sum_{i=1}^N [y_ilog(\frac{\pi(x_i)}{1-\pi(x_i)})+log(1-\pi(x_i))]\\ =&\sum_{i=1}^{N}[y_i(w\cdot x_i)-log(1+exp(w\cdot x_i))] \end{aligned} \tag{4}
L(w)===i=1∑N[yilogπ(xi)+(1−yi)log(1−π(xi))]i=1∑N[yilog(1−π(xi)π(xi))+log(1−π(xi))]i=1∑N[yi(w⋅xi)−log(1+exp(w⋅xi))](4)
对
L
(
w
)
L(w)
L(w)求极大值,得到
w
w
w的估计值。
三、梯度下降法求解似然函数
梯度下降法是求极小值的,而我们想要得到的是
L
(
w
)
L(w)
L(w)的最大值,因此,我们取
L
(
w
)
L(w)
L(w)的相反数,即:
arg min
w
−
L
(
w
)
(5)
\argmin_{w}-L(w) \tag{5}
wargmin−L(w)(5)
对
L
(
w
)
L(w)
L(w)关于
w
w
w求导,如下:
(
−
L
(
w
)
)
′
=
−
∑
i
=
1
N
[
(
y
i
⋅
x
i
)
−
e
x
p
(
w
⋅
x
i
)
1
+
e
x
p
(
w
⋅
x
)
⋅
x
i
]
=
−
∑
i
=
1
N
[
(
y
i
−
e
x
p
(
w
⋅
x
i
)
1
+
e
x
p
(
w
⋅
x
)
)
⋅
x
i
]
=
∑
i
=
1
N
[
(
e
x
p
(
w
⋅
x
i
)
1
+
e
x
p
(
w
⋅
x
)
−
y
i
)
⋅
x
i
]
(6)
\begin{aligned} (-L(w))'=&-\sum_{i=1}^N[(y_i\cdot x_i)-\frac{exp(w\cdot x_i)}{1+exp(w\cdot x)}\cdot x_i]\\ =&-\sum_{i=1}^N[(y_i-\frac{exp(w\cdot x_i)}{1+exp(w\cdot x)})\cdot x_i]\\ =&\sum_{i=1}^N[(\frac{exp(w\cdot x_i)}{1+exp(w\cdot x)}-y_i)\cdot x_i] \end{aligned} \tag{6}
(−L(w))′===−i=1∑N[(yi⋅xi)−1+exp(w