$$ % Define your custom commands here \newcommand{\bmat}[1]{\begin{bmatrix}#1\end{bmatrix}} \newcommand{\E}{\mathbb{E}} \newcommand{\P}{\mathbb{P}} \newcommand{\S}{\mathbb{S}} \newcommand{\R}{\mathbb{R}} \newcommand{\S}{\mathbb{S}} \newcommand{\norm}[2]{\|{#1}\|_{{}_{#2}}} \newcommand{\pd}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\pdd}[2]{\frac{\partial^2 #1}{\partial #2^2}} \newcommand{\vectornorm}[1]{\left|\left|#1\right|\right|} \newcommand{\abs}[1]{\left|{#1}\right|} \newcommand{\mbf}[1]{\mathbf{#1}} \newcommand{\mc}[1]{\mathcal{#1}} \newcommand{\bm}[1]{\boldsymbol{#1}} \newcommand{\nicefrac}[2]{{}^{#1}\!/_{\!#2}} \newcommand{\argmin}{\operatorname*{arg\,min}} \newcommand{\argmax}{\operatorname*{arg\,max}} $$

Week 4: Recognizing Handwritten Digits

Image credits: Deep Learning by Michael Nielsen

MNIST: Modified National Institute of Standards and Technology
Consider handwritten digits from the MNIST database. Each digit is made up of \(28 \times 28 = 784\) grayscale pixels.

  • Consider the \(784\) pixel values as the input values to a function.
  • Is there a function \(y = f(x)\) whose output gives the digit number?
  • Specifically, with \(x \in \R^{784}\) and \(y \in \R^{10}\) we want a function \(f(x)\) as follows:

\[ f(x) = \left\{ \begin{array}{cl} \bmat{1 & 0 & \cdots & 0} & \text{if } x \text{ is an image of a } 0 \\ \bmat{0 & 1 & \cdots & 0} & \text{if } x \text{ is an image of a } 1 \\ \vdots & \vdots \\ \bmat{0 & 0 & \cdots & 1} & \text{if } x \text{ is an image of a } 9 \end{array} \right. \]

  • A neural network is a way to construct such a function \(f(x)\).

Network to identify handwritten digits

  • Pixel values: \(x \in \R^{784}\).

  • Outputs: \(a = f\left(x; {W}, {b}\right) \in \R^{10}\).

  • \({W}^{(2)} \in \R^{15 \times 784}\) \({b}^{(2)} \in \R^{15}\)

  • \({W}^{(3)} \in \R^{10 \times 15}\) \({b}^{(3)} \in \R^{10}\)

  • \(z^{(2)} = {W}^{(2)} x + {b}^{(2)}\) with output \(\sigma\left(z^{(2)}\right)\)

    • \(x \in \R^{784}\)
    • \({W}^{(2)} \in \R^{15 \times 784}\)
    • \({b}^{(2)} \in \R^{15}\)
  • \(z^{(3)} = {W}^{(3)} \sigma\left(z^{(2)}\right) + {b}^{(3)}\) with output \(a = \sigma\left(z^{(3)}\right)\)

    • \({W}^{(3)} \in \R^{10 \times 15}\)
    • \({b}^{(3)} \in \R^{10}\)