TensorFlow-L2 손실이있는 정규화, 마지막 가중치가 아닌 모든 가중치에 적용하는 방법은 무엇입니까?

Program Tip

TensorFlow-L2 손실이있는 정규화, 마지막 가중치가 아닌 모든 가중치에 적용하는 방법은 무엇입니까?

programtip 2020. 11. 29. 12:12

TensorFlow-L2 손실이있는 정규화, 마지막 가중치가 아닌 모든 가중치에 적용하는 방법은 무엇입니까?

Udacity DeepLearning 과정의 일부인 ANN을 가지고 놀고 있습니다.

L2 손실을 사용하는 하나의 숨겨진 ReLU 계층으로 네트워크에 일반화를 도입하는 작업이 있습니다. 출력 레이어의 가중치뿐만 아니라 모든 가중치에 페널티를 주도록 올바르게 도입하는 방법이 궁금합니다.

일반화 되지 않은 네트워크에 대한 코드 는 게시물 하단에 있습니다 (실제로 교육을 실행하는 코드는 문제의 범위를 벗어남).

L2를 도입하는 분명한 방법은 손실 계산을 다음과 같이 바꾸는 것입니다 (베타가 0.01 인 경우).

loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(out_layer, tf_train_labels) + 0.01*tf.nn.l2_loss(out_weights))

그러나 이러한 경우 출력 레이어의 가중치 값을 고려합니다. 숨겨진 ReLU 계층으로 들어오는 가중치에 어떻게 페널티를 적용하는지 잘 모르겠습니다. 전혀 필요하지 않습니까, 아니면 출력 레이어의 페널티를 도입하면 숨겨진 가중치도 점검 할 수 있습니까?

#some importing
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

#loading data
pickle_file = '/home/maxkhk/Documents/Udacity/DeepLearningCourse/SourceCode/tensorflow/examples/udacity/notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)


#prepare data to have right format for tensorflow
#i.e. data is flat matrix, labels are onehot

image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)


#now is the interesting part - we are building a network with
#one hidden ReLU layer and out usual output linear layer

#we are going to use SGD so here is our size of batch
batch_size = 128

#building tensorflow graph
graph = tf.Graph()
with graph.as_default():
      # Input data. For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  #now let's build our new hidden layer
  #that's how many hidden neurons we want
  num_hidden_neurons = 1024
  #its weights
  hidden_weights = tf.Variable(
    tf.truncated_normal([image_size * image_size, num_hidden_neurons]))
  hidden_biases = tf.Variable(tf.zeros([num_hidden_neurons]))

  #now the layer itself. It multiplies data by weights, adds biases
  #and takes ReLU over result
  hidden_layer = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights) + hidden_biases)

  #time to go for output linear layer
  #out weights connect hidden neurons to output labels
  #biases are added to output labels  
  out_weights = tf.Variable(
    tf.truncated_normal([num_hidden_neurons, num_labels]))  

  out_biases = tf.Variable(tf.zeros([num_labels]))  

  #compute output  
  out_layer = tf.matmul(hidden_layer,out_weights) + out_biases
  #our real output is a softmax of prior result
  #and we also compute its cross-entropy to get our loss
  loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(out_layer, tf_train_labels))

  #now we just minimize this loss to actually train the network
  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

  #nice, now let's calculate the predictions on each dataset for evaluating the
  #performance so far
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(out_layer)
  valid_relu = tf.nn.relu(  tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases)
  valid_prediction = tf.nn.softmax( tf.matmul(valid_relu, out_weights) + out_biases) 

  test_relu = tf.nn.relu( tf.matmul( tf_test_dataset, hidden_weights) + hidden_biases)
  test_prediction = tf.nn.softmax(tf.matmul(test_relu, out_weights) + out_biases)

hidden_weights, hidden_biases, out_weights, 그리고 out_biases당신이 만드는 것이 모든 모델 매개 변수입니다. 다음과 같이 이러한 모든 매개 변수에 L2 정규화를 추가 할 수 있습니다.

loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=out_layer, labels=tf_train_labels)) +
    0.01*tf.nn.l2_loss(hidden_weights) +
    0.01*tf.nn.l2_loss(hidden_biases) +
    0.01*tf.nn.l2_loss(out_weights) +
    0.01*tf.nn.l2_loss(out_biases))

이 작업을 수행하는 더 짧고 확장 가능한 방법은 다음과 같습니다.

vars   = tf.trainable_variables() 
lossL2 = tf.add_n([ tf.nn.l2_loss(v) for v in vars ]) * 0.001

이것은 기본적으로 훈련 가능한 모든 변수의 l2_loss를 합산합니다. 비용에 추가 할 변수 만 지정하고 위의 두 번째 줄을 사용하는 사전을 만들 수도 있습니다. 그런 다음 총 손실을 계산하기 위해 softmax 교차 엔트로피 값과 lossL2를 더할 수 있습니다.

Edit : As mentioned by Piotr Dabkowski, the code above will also regularise biases. This can be avoided by adding an if statement in the second line ;

lossL2 = tf.add_n([ tf.nn.l2_loss(v) for v in vars
                    if 'bias' not in v.name ]) * 0.001

This can be used to exclude other variables.

In fact, we usually do not regularize bias terms (intercepts). So, I go for:

loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=out_layer, labels=tf_train_labels)) +
    0.01*tf.nn.l2_loss(hidden_weights) +
    0.01*tf.nn.l2_loss(out_weights))

By penalizing the intercept term, as the intercept is added to y values, it will result in changing the y values, adding a constant c to the intercepts. Having it or not will not change the results but takes some computations

참고URL : https://stackoverflow.com/questions/38286717/tensorflow-regularization-with-l2-loss-how-to-apply-to-all-weights-not-just

'Program Tip' 카테고리의 다른 글

FFMPEG mux 비디오 및 오디오 (다른 비디오의)-매핑 문제 (0)	2020.11.29
TLS와 함께 요청을 사용하면 SNI 지원이 제공되지 않습니다. (0)	2020.11.29
C ++에서 템플릿 변수가있는 구조체 (0)	2020.11.29
데이터 프레임에서 열을 정렬하여 그룹 (사 분위수, 십 분위수 등)을 빠르게 형성하는 방법 (0)	2020.11.29
Xcode가 이전에 열린 프로젝트를 자동으로 열지 않도록하는 방법 (0)	2020.11.29

현재글TensorFlow-L2 손실이있는 정규화, 마지막 가중치가 아닌 모든 가중치에 적용하는 방법은 무엇입니까?

programtip

TensorFlow-L2 손실이있는 정규화, 마지막 가중치가 아닌 모든 가중치에 적용하는 방법은 무엇입니까?

TensorFlow-L2 손실이있는 정규화, 마지막 가중치가 아닌 모든 가중치에 적용하는 방법은 무엇입니까?

'Program Tip' 카테고리의 다른 글

'Program Tip'의 다른글

티스토리툴바

TensorFlow-L2 손실이있는 정규화, 마지막 가중치가 아닌 모든 가중치에 적용하는 방법은 무엇입니까?

TensorFlow-L2 손실이있는 정규화, 마지막 가중치가 아닌 모든 가중치에 적용하는 방법은 무엇입니까?

'Program Tip' 카테고리의 다른 글

'Program Tip'의 다른글

관련글

티스토리툴바