본문 바로가기
Deep Learning

[DL] options in DeepLearning ( Transfer learning)

by ram_ 2023. 2. 16.
  1.  num_workers
  2.  x.view()
  3.  F.log_softmax(x, dim=1)
  4.  lr = 0.001
  5.  eval()
  6.  to.(DEVICE)
  7.  fc.in_features
  8.  lr_scheduler.StepLR()
  9.  zero_grad
  10.  set_grad_enabled
  11.  max(outputs, 1)
  12. trainable=False
  13. [np.newaxis, ...]
  14. argmax()
  15. axis=-1
  16. untar = True

 


 

#1   num_workers

val_loader = torch.utils.data.DataLoader(val_dataset,
                                         batch_size = BATCH_SIZE,
                                         shuffle =True,
                                         num_workers = 4)

In PyTorch, num_workers is an argument in the DataLoader class that specifies the number of subprocesses to use for data loading.

When num_workers is set to a value greater than 0, the data loading will be performed in parallel using multiple subprocesses. This can significantly speed up data loading when you have a large dataset or when the data preprocessing or augmentation is computationally expensive.

For example, when num_workers=4, the data loading will be performed using 4 subprocesses in parallel. The main process will create 4 subprocesses and distribute the data loading work among them. Each subprocess will load a batch of data and pass it back to the main process for processing.

However, it's worth noting that setting num_workers too high can result in slower data loading performance due to the overhead of creating and managing subprocesses. Additionally, some data preprocessing or augmentation methods may not be thread-safe, which means that they cannot be used with multiple subprocesses. In such cases, you should set num_workers to 0 to perform the data loading in the main process.

 

#2   x.view()

#3   F.log_softmax(x, dim=1)

 

  def forward(self, x):
    
    x = self.conv1(x)
    x = F.relu(x)
    x = self.pool(x)
    x = F.dropout(x, p=0.25, training = self.training)

    x = self.conv2(x)
    x = F.relu(x)
    x = self.pool(x)
    x = F.dropout(x, p=0.25, training = self.training)

    x = self.conv3(x)
    x = F.relu(x)
    x = self.pool(x)
    x = F.dropout(x, p=0.25, training = self.training)

    x = x.view(-1, 4096) # what is x.view
    x = self.fc1(x)
    x = F.relu(x)
    x = F.dropout(x, p=0.5, training = self.training)
    x = self.fc2(x)

    return F.log_softmax(x, dim=1) #what is dim and log_softmax

In PyTorch, x.view(-1, 4096) is a method for reshaping a tensor. Here, it reshapes the tensor x so that it has a shape of (batch_size, 4096), where batch_size is the number of samples in the batch and 4096 is the number of elements in the tensor. The -1 argument in x.view(-1, 4096) means that PyTorch should infer the size of the first dimension based on the size of the input tensor and the specified size of the second dimension.

 

F.log_softmax(x, dim=1) is a function for computing the logarithm of the softmax function along the specified dimension. The softmax function is a probability distribution that maps an input tensor to a probability distribution over its elements. The dim argument specifies the dimension along which the softmax function is computed, which is usually the last dimension in the tensor. The output of the softmax function can be interpreted as the probability of each class, and taking the logarithm of the probabilities is a common way to compute the loss during training, such as cross-entropy loss.

 

In summary, x.view(-1, 4096) reshapes the tensor x, and F.log_softmax(x, dim=1) computes the logarithm of the softmax function along the specified dimension.

 

 

#4   lr = 0.001

optimizer = optim.Adam(model_base.parameters(), lr = 0.001)

In PyTorch, lr is a hyperparameter that specifies the learning rate for the optimizer during training. The learning rate determines the step size of the optimizer in the direction of the negative gradient, and it controls how quickly or slowly the model weights are updated during training.

The learning rate is a critical hyperparameter to tune during model training, as it can significantly affect the performance and convergence of the model. If the learning rate is too high, the optimizer may overshoot the optimal weights and the model may diverge. On the other hand, if the learning rate is too low, the optimizer may take too long to converge and the model may not reach its optimal performance.

The appropriate learning rate depends on the model architecture, the dataset, and the optimizer. It is often necessary to experiment with different learning rates to find the optimal value for a given task. The Adam optimizer is a popular choice for deep learning models and has shown to work well with a wide range of learning rates. The default learning rate for Adam is 0.001, which is a good starting point for many tasks, but it may need to be adjusted depending on the specific use case.

 

#5   eval()

resnet50 = torch.load('resnet50.pt')
resnet50.eval() 
test_loss, test_accuracy = evaluate(resnet50, test_loader_resNet)

In PyTorch, the eval() method sets the module in evaluation mode. This is used during the inference or testing phase, when you want the model to predict output for new data.

During evaluation, certain modules such as dropout and batch normalization behave differently than they do during training. For example, dropout layers are turned off, and batch normalization layers use the population statistics instead of the batch statistics.

When you load a pre-trained ResNet50 model and call resnet50.eval(), you are setting the model to evaluation mode to ensure that all the necessary changes are made to the model for correct predictions during testing.

After setting the model in evaluation mode, you are using the evaluate() function to calculate the test loss and accuracy for the ResNet50 model on the test dataset test_loader_resNet. The evaluate() function most likely uses the forward() method of the ResNet50 model to make predictions on the test data and compute the loss and accuracy.

 

#6   to(DEVICE)

DEVICE = torch.device("cuda" if USE_CUDA else "cpu")
resnet = resnet.to(DEVICE)

In PyTorch, the to() method is used to move the model and its parameters to a specified device. In this line of code, resnet.to(DEVICE) is moving the ResNet model and its parameters to the device specified by the DEVICE variable.

The DEVICE variable is defined earlier in the code and specifies whether to use the GPU or CPU for the tensor and neural network operations. If USE_CUDA is True, DEVICE will be set to cuda, which means the model will be moved to the GPU. If USE_CUDA is False, DEVICE will be set to cpu, which means the model will be moved to the CPU.

By moving the model and its parameters to the specified device, PyTorch ensures that all computations associated with this model, including forward and backward passes, are performed on that device. This allows for efficient computation and faster training or evaluation times.

 

 

#7  fc.in_features

#8  lr_scheduler.StepLR()

from torchvision import models

resnet = models.resnet50(pretrained = True) 
# pretrained된 모델 가져온다. False로 설정하면 구조만 가져오게 된다. 
# 우리의 class는 resnet50의 출력class와 숫자가 다르기 때문에, 맞춰주어야 한다.
num_ftrs = resnet.fc.in_features
resnet.fc = nn.Linear(num_ftrs, 33) #마지막 레이어의 채널 숫자 = in_features이다. 33개로 맞춰준다.
resnet = resnet.to(DEVICE)

criterion = nn.CrossEntropyLoss()
optimizer_ft = optim.Adam(filter(lambda p : p.requires_grad, resnet.parameters()), lr=0.001)

from torch.optim import lr_scheduler
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma = 0.01)

In PyTorch, fc and in_features are attributes of a neural network module that represent the fully connected layer and the number of input features to that layer, respectively.

The fully connected layer, also known as the linear layer, is a type of layer in a neural network that performs a linear transformation on the input. It takes as input a tensor of size (batch_size, in_features) and outputs a tensor of size (batch_size, out_features), where out_features is the number of neurons in the layer. The fc attribute is used to refer to the fully connected layer in a PyTorch module.

The in_features attribute is the number of input features to the fully connected layer. It is used to define the size of the input tensor to the layer. When creating a fully connected layer in PyTorch, the in_features argument is typically specified as the number of output features of the preceding layer.

 

The lr_scheduler.StepLR() function is used to implement a step learning rate scheduler for the optimizer used in training the neural network. The StepLR scheduler is used to adjust the learning rate of the optimizer at specific intervals of training epochs.

The StepLR() function takes three arguments:

  • optimizer: the optimizer for which the learning rate needs to be adjusted
  • step_size: the number of epochs after which the learning rate is reduced
  • gamma: the factor by which the learning rate is reduced

This step learning rate scheduler is used to decrease the learning rate during the training process as the optimization approaches the minimum. This can help prevent overfitting and improve the model's generalization ability.

 

#9  zero_grad

for inputs, labels in dataloaders[phase]:
        inputs = inputs.to(DEVICE)
        labels = labels.to(DEVICE)

        optimizer.zero_grad()

In PyTorch, when you call optimizer.zero_grad(), you are essentially telling the optimizer to reset the gradients of all parameters to zero. This is typically done at the start of each training iteration, before computing the gradients of the loss with respect to the model parameters.

The reason for this is that the grad attribute of a PyTorch tensor accumulates gradients over multiple backward passes. If you don't reset the gradients to zero at the start of each iteration, the gradients from the previous iteration will be accumulated with the new gradients, which can lead to incorrect updates to the model parameters.

Therefore, it is important to call optimizer.zero_grad() at the beginning of each training iteration to ensure that you are starting with a clean slate and only updating the gradients based on the current batch of data.

 

 

#10  set_grad_enabled

#11  max(outputs, 1)

 

with torch.set_grad_enabled(phase == 'train'):
          outputs = model(inputs)
          _, preds = torch.max(outputs, 1) 
          loss = criterion(outputs, labels)

In PyTorch, torch.set_grad_enabled(True) is a context manager that enables or disables gradient computation. When set_grad_enabled(True) is called, PyTorch tracks and calculates gradients for all operations that involve tensors with the requires_grad attribute set to True. If set_grad_enabled(False) is called, PyTorch disables gradient computation and does not calculate gradients, even for tensors with requires_grad set to True.

In my code, the set_grad_enabled function is used to toggle gradient computation based on the value of the phase variable. When phase is 'train', gradient computation is enabled so that the gradients of the model parameters can be computed and used to update the model during training. When phase is 'val' or 'test', gradient computation is disabled to save memory and computation time since gradients are not needed for evaluation.

Therefore, the line with torch.set_grad_enabled(phase == 'train'): sets the gradient computation to True during training (phase == 'train' evaluates to True) and to False during validation and testing (phase == 'val' or phase == 'test' evaluates to False). This allows for efficient and memory-saving computation of the gradients during the training phase while avoiding unnecessary computation during the validation and testing phases.

 

The torch.max() function returns the maximum value of a tensor along a specified dimension. When you call torch.max(outputs, 1), you are asking PyTorch to find the maximum value of the outputs tensor along the second dimension (i.e., dimension 1).

The second argument (1) in the torch.max(outputs, 1) call is the dim argument, which specifies the dimension along which to compute the maximum. By setting dim=1, torch.max() returns a tuple containing two tensors: the maximum values along dimension 1, and the indices of the maximum values along dimension 1.

In the context of classification problems, the index of the maximum value is often used as the predicted class label. 

 

 

#12  trainable=False

base_model.trainable = False

In the line of code base_model.trainable = False, the trainable attribute of a pre-trained model base_model is being set to False. This means that the weights and biases of the pre-trained model's layers will not be updated during training.

Setting the trainable attribute of a pre-trained model to False can be useful in transfer learning scenarios where you want to use the pre-trained model to extract features from input data, but don't want to update its weights during training. This can save computation time and reduce the risk of overfitting, especially if the pre-trained model was trained on a large and diverse dataset.

On the other hand, if you want to fine-tune the pre-trained model on a new dataset or task, you can set the trainable attribute to True, and the weights and biases of the pre-trained model's layers will be updated during training. In this case, the model is typically trained with a smaller learning rate than the one used for training from scratch, to avoid catastrophic forgetting and preserve the pre-trained knowledge.

 
 

#13  [np.newaxis, ...]

result = classifier.predict(grace_hopper[np.newaxis, ...])

The ... in the code snippet you provided is called Ellipsis and it is a shorthand notation in Python that represents all trailing dimensions in a multi-dimensional array.

In the code, grace_hopper is a multi-dimensional array that represents an image. Since machine learning models typically take input in the form of batches, the input to the predict method should also be in the form of a batch. In this case, we have only one image to predict, so we need to convert the image into a batch of size 1.

To do this, we use np.newaxis to add an extra dimension to the array to represent the batch size. However, we need to specify the size of the other dimensions of the batch, which is the size of the image. The size of the image can be inferred from the grace_hopper array, but we don't want to hardcode the size in case the size of the image changes. So, we use ... to represent all trailing dimensions of the array.

Therefore, the code classifier.predict(grace_hopper[np.newaxis, ...]) means to add an extra dimension at the beginning of the grace_hopper array to represent the batch size of 1, and include all remaining dimensions of the array as the size of the image. This creates a batch of size 1 with the image represented by the grace_hopper array as its only element, and then feeds it to the predict method of the classifier object for making predictions.

 

 

#14  argmax

#15  axis=-1

predicted_class = np.argmax(result[0], axis = -1)

argmax is a function that is commonly used in machine learning and other areas of data analysis to determine the index of the maximum value in an array or along a given axis of an array. It is particularly useful when dealing with categorical data or when trying to determine the most likely class label for a given set of input features.

In machine learning, the output of a classifier is often a probability distribution over a set of possible classes, where each class is assigned a probability value between 0 and 1. In order to make a prediction, we typically choose the class with the highest probability value. This is where argmax comes in: it allows us to determine the index of the highest probability value in the array of probabilities output by the classifier, which corresponds to the most likely class.

 

In the case of axis=-1, it specifies the last axis of the array. This is a common shorthand for specifying the axis when the number of dimensions of the array is not known in advance, or when we want to apply an operation along the last axis regardless of the number of dimensions.

For example, if we have a 3-dimensional array with shape (2, 3, 4), specifying axis=-1 in a sum operation like np.sum(arr, axis=-1) would sum the elements along the last axis (i.e., the axis with size 4), resulting in a new array of shape (2, 3). If we instead specify axis=0, the sum operation would be performed along the first axis (i.e., the axis with size 2), resulting in a new array of shape (3, 4).

Therefore, the value of axis=-1 is used to specify the last axis of an array in a given operation, regardless of the number of dimensions of the array.

 

 

#16 untar=True

data_root = tf.keras.utils.get_file(
    'flower_photos', url,
    untar=True
)

untar=True is an argument that instructs the get_file function to extract the downloaded file.

The get_file function downloads the file from the specified URL and stores it in the cache directory with the specified name. By default, the downloaded file is compressed, and if untar=False is specified, it will be saved without decompressing.

However, if untar=True is specified, the downloaded compressed file will be automatically decompressed. Therefore, in the above code, if untar=True is set, the downloaded file will be automatically extracted.

untar=True is used only for tar compressed files. If the compression format is different, you need to call the decompression function directly instead of using untar=True.

 

 

'Deep Learning' 카테고리의 다른 글

[DL] t-SNE  (0) 2023.02.19
[DL] AutoEncoder. Latent Vector  (0) 2023.02.19
[DL] Transfer learning _ 기본 option  (0) 2023.02.16
[DL] Deep Learning Scratch  (0) 2023.02.13
[DL] CNN  (0) 2023.02.08