본문 바로가기
Deep Learning

[DIVE INTO DEEP LEARNING] 4.1 Softmax Regression

by ram_ 2023. 12. 20.

23.12.19 Keep Editing ...

 

 

4.1.1 Classification

 

We have two obvious choices. Perhaps the most natural impulse would be to choose 𝑦∈{1,2,3}, where the integers represent {dog,cat,chicken} respectively. This is a great way of storing such information on a computer. If the categories had some natural ordering among them, say if we were trying to predict {baby,toddler,adolescent,young adult,adult,geriatric}, then it might even make sense to cast this as an ordinal regression problem and keep the labels in this format. See Moon et al. (2010) for an overview of different types of ranking loss functions and Beutel et al. (2014) for a Bayesian approach that addresses responses with more than one mode.

In general, classification problems do not come with natural orderings among the classes. Fortunately, statisticians long ago invented a simple way to represent categorical data: the one-hot encoding. A one-hot encoding is a vector with as many components as we have categories. The component corresponding to a particular instance’s category is set to 1 and all other components are set to 0. In our case, a label 𝑦 would be a three-dimensional vector, with (1,0,0) corresponding to “cat”, (0,1,0) to “chicken”, and (0,0,1) to “dog”:

 

4.1.1.1. Linear Model

 

In order to estimate the conditional probabilities associated with all the possible classes, we need a model with multiple outputs, one per class. To address classification with linear models, we will need as many affine functions* as we have outputs. Strictly speaking, we only need one fewer, since the final category has to be the difference between 1 and the sum of the other categories, but for reasons of symmetry we use a slightly redundant parametrization. Each output corresponds to its own affine function. In our case, since we have 4 features and 3 possible output categories, we need 12 scalars to represent the weights (𝒃 with subscripts), and 3 scalars to represent the biases (𝔀 with subscripts). This yields:

 

*𝔃 = 𝒙𝔀+𝒃 ? is this right? -> go back to Linear Algebra

 
 

 

'Deep Learning' 카테고리의 다른 글

[DIVE INTO DEEP LEARNING] 2.3 Linear Algebra_matrices, tensors  (1) 2023.12.22
[DIVE INTO DEEP LEARNING] 2.3 Linear Algebra_scalars, vectors  (1) 2023.12.20
[DL] Transformer  (0) 2023.03.03
[DL] 감성분석  (0) 2023.02.21
[DL] RNN , LSTM  (0) 2023.02.21