Lec 10-1  More than Sigmoid : ReLU

1.  Vanish Gradient 


 Depth가 2~3일 경우에는 잘 해결되었던 문제가, Depth 가 깊어질 수록 문제해결이 어려워짐. 

 

 원인중 하나는 sigmoid함수 적용에 있는데, 아주 크거자 작은 값도 무조건 0~1 사이의 값을 보여지다 보니 그 값의 영향력이 작아지고 이로 인해 정확한 계산이 어려워 진다.



2.  Geoffrey Hinton's Summary  


  • Our labeled datasets were thousands of times too small. 
  • Our computers were millions of times too slow. 
  • We initialized the weights in a stupid way. 
  • We used the wrong type of non-linearity.



3.  ReLU 



  in Tensorflow 


  L1 = tf.sigmoid(tf.matmul(X, W1) + b1) 

  L1 = tf.nn.relu(tf.matmul(X, W1) + b1)


적용 예  

  중간 계산 과정은 모두 ReLU를 사용하고, 마지막에는 Sigmoid를 사용한다.




4. Activation Functions 

 

   1) Sigmoid :  0~1 사이

   2) tanh : -1 ~1 사이

   3) ReLU : -무시, +선형 

   4) Leaky ReLu :-일 경우 0.1 가중치 선형, +경우 1 가중치 선형

   5) max out : 선형변환의 최대값 

   6) ELU : - 구간 지수함수 적용  ( 최대 -1)

  

eluELU RELU maxout cifar-10에 대한 이미지 검색결과



Lec 10-2  Weigth 초기화 잘해보자

1.  Geoffrey Hinton's Summary  


  • Our labeled datasets were thousands of times too small. 
  • Our computers were millions of times too slow. 
  • We initialized the weights in a stupid way. 
  • We used the wrong type of non-linearity.



2.  RBM 

  1) Restricted Boatman Machine 

  2) Hinton et al. (2006) "A Fast Learning Algorithm for Deep Belief Nets” 

  3) concept : 1 단계별 forward & backward시 값차이가 가장 적게 발생하는 weight 값을 초기값으로 사용

  4) Deep Belief Network 


3.  Xavier/He initialization 

  1) Xavier initialization: 

     X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in International conference on artificial intelligence and statistics, 2010

  2) He’s initialization: 

    K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” 2015

  3) Makes sure the weights are ‘just right’, not too small, not too big 

  4) Using number of input (fan_in) and output (fan_out)

  5) application


def xavier_init(n_inputs, n_outputs, uniform=True):

  """Set the parameter initialization using the method described.
  This method is designed to keep the scale of the gradients roughly the same
  in all layers.
  Xavier Glorot and Yoshua Bengio (2010):
           Understanding the difficulty of training deep feedforward neural
           networks. International conference on artificial intelligence and
           statistics.
  Args:
    n_inputs: The number of input nodes into each output.
    n_outputs: The number of output nodes for each input.
    uniform: If true use a uniform distribution, otherwise use a normal.
  Returns:
    An initializer.
  """
  if uniform:
    # 6 was used in the paper.
    init_range = math.sqrt(6.0 / (n_inputs + n_outputs))
    return tf.random_uniform_initializer(-init_range, init_range)
  else:
    # 3 gives us approximately the same limits as above since this repicks
    # values greater than 2 standard deviations from the mean.
    stddev = math.sqrt(3.0 / (n_inputs + n_outputs))
    return tf.truncated_normal_initializer(stddev=stddev)



Lec 10-3  Dropout과 앙상블


1.  Solutions for overfitting

  1) More training data

  2) Reduce the number of features 

  3) Regularization


2.  Regularization

  1) Let's not have too big numbers in the weight 



  2) Dropout 

    - A Simple Way to Prevent Neural Networks from Overfitting [Srivastava et al. 2014]

   

  - randomly set some neurons to zero in the forward pass  



3. Ensenble



Lec 10-4  레고처럼 네트워크 모듈을 마음껏 쌓아 보자



Posted by 꿈을펼쳐라
,