Lec 12  Recurrent Neural Network


1. Concept of RNN

   1) Basic Concept

       


        - 현재의 상태값을 계산하는데, 이전의 상태값이 사용됨           

        - 단, 모든 타입 스텝에 대해ㅏ여 동일 함수, 동일 파라미터가 적용됨 

   2) Vanilla RNN 

       


   3) Character-level Language Model 

      - output layer값으로 solftMax를 구하여 정확도를 구하고, 정확도의 평균을 

        Cost 함수로 하여 최소화하는 학습을 진행할 수 있음 


     

 2. RNN applications

  1) application field  

      • Language Modeling 
      • Speech Recognition
      • Machine Translation
      • Conversation Modeling / Question Answering (Chat Bot) 
      • Image / Video Captioning 
      • Image / Music / Dance Generation

        참고 : https://github.com/TensorFlowKR/awesome_tensorflow_implementations


  2) application type 

    - one to one : Vanilla Neural Networks

    - one to many : Image Captioning  

    - many to one : Sentiment Classification 

    - many to many : Machine Translation 

    - many to many : Video Classification on frame level

 

  3) multi-Layer RNN


  4) RNN Training Model 

     - Long Short Term Memory (LSTM) 

     - GRU by Cho. et al. 2014




Lab 12  RNN in TF

1. Concept of RNN in TF 

   1) Cell : 상황에 따라서 결정 (hiden_size는 자유롭게 설정 가능)

   2) 결과 : outputs과, states 두 개가 나오지만, states는 거의 사용하지 않음



2. Input / Output RNN in TF  

   1) One Node: 4 input in 2 hidden_size 

  2) Unfolding to n sequence



  3) Batching Input / Output



3. ....


Posted by 꿈을펼쳐라
,

Lec 11  ConvNet의 Conv 레이어 만들기

1.  Concept of Convolution


 고양이 뇌 실험을 통해 스크린의 변화를 인지하는 데 사용하는 뇌의 기능은 뇌 전체가 아닌 일부 뇌만을 사용함.

 신경망을 사용할때 전체를 다 파악하는 것이 아닌 부분을 판단하는 방식을 적요하는 방법 




2.  Size of filter



3.  Convolution Layers

4. Pooling Layer

 

convolution Layer와 Pooling Layer와의 차이는 Convolution Layer의 weight는 학습을 통해서 결정되지만 Pooling Layer는 정해진 방식에 따라 적용됨.  ( 학습하는 것이 아닌, 대표값 정하는 방식으로)



5. Case Study


LeCun et. al

2012년 ImageNet 경진 대회 우승   

현재는 Normalization Layer는 사용하지 않음


2014년 ImageNet 경진 대회 우승


2015년 다수의 경진 대회 우승


Revolution of Depth

  - fast forward 기법 적용 :  계산값을 2-3 단계 이후 결과값이 Jump하여 적용 



2016, 2017  이미지넷 우승은?


2016

2.99%의 Trimps-Soushen 팀. (The Third Research Institute of the Ministry of Public Security, 중국 공안부 산하 연구팀). ResNet, Inception, Inception-Resnet를 혼합하여 적용 


2017

사물 종류별 검출 성능 :  남경정보과학기술대(NUIST, 중국 ) 

평균 검출 정확도 : 남경정보과학기술대(NUIST, 중국) , 0.61  




2017년 경진대회를 끝으로 이미지넷 경진 대회는 더 진행하지 않는다고 함.




Lab 11  CNN 

1. CNN Process ( 3 step)


2. CNN with Tensorflow

  1) conv2d 

  2) conv2


  3) MNIST Data Load

  4) apply Conv Layer


  5) apply Max Pooling



이후 MNIST 데이터에 대한 convolution layer 적용은 

https://docs.google.com/presentation/d/1h90rpyWiVlwkuCtMgTLfAVKIiqJrFunnKR7dqPNtI6I/edit#slide=id.g1dc2d142ea_0_399 

에서 확인



Posted by 꿈을펼쳐라
,

Lab 10  NN, ReLu, Xavier, Dropout, and Adam

1.  softmax for MNIST

# define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# initialize
sess = tf.Session()
sess.run(tf.global_variables_initializer())

 

Epoch: 0014 cost = 0.419000337

Epoch: 0015 cost = 0.406490815

Learning Finished!

Accuracy: 0.9035

 

 

 

2.  NN for MNIST with ReLU

 

여러개의 계산 단계를 거침

 

# input place holders

X = tf.placeholder(tf.float32, [None, 784])

Y = tf.placeholder(tf.float32, [None, 10])


# weights & bias for nn layers

W1 = tf.Variable(tf.random_normal([784, 256]))

b1 = tf.Variable(tf.random_normal([256]))

L1 = tf.nn.relu(tf.matmul(X, W1) + b1)


W2 = tf.Variable(tf.random_normal([256, 256]))

b2 = tf.Variable(tf.random_normal([256]))

L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)


W3 = tf.Variable(tf.random_normal([256, 10]))

b3 = tf.Variable(tf.random_normal([10]))

hypothesis = tf.matmul(L2, W3) + b3


# define cost/loss & optimizer

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(

  logits=hypothesis, labels=Y))

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

 

Epoch: 0014 cost = 0.624131458

Epoch: 0015 cost = 0.454633765

Learning Finished!

Accuracy: 0.9455

 

 

 

3.  Xavier, Adam for MNIST

 

초기값이 초기부터 낮음 

 

# input place holders

X = tf.placeholder(tf.float32, [None, 784])

Y = tf.placeholder(tf.float32, [None, 10])


# weights & bias for nn layers

# http://stackoverflow.com/questions/33640581

W1 = tf.get_variable("W1", shape=[784, 256],

                   initializer=tf.contrib.layers.xavier_initializer())

b1 = tf.Variable(tf.random_normal([256]))

L1 = tf.nn.relu(tf.matmul(X, W1) + b1)


W2 = tf.get_variable("W2", shape=[256, 256],

                   initializer=tf.contrib.layers.xavier_initializer())

b2 = tf.Variable(tf.random_normal([256]))

L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)


W3 = tf.get_variable("W3", shape=[256, 10],

                   initializer=tf.contrib.layers.xavier_initializer())

b3 = tf.Variable(tf.random_normal([10]))

hypothesis = tf.matmul(L2, W3) + b3


# define cost/loss & optimizer

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(

  logits=hypothesis, labels=Y))

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

 

 

 

Epoch: 0001 cost = 0.301498963

Epoch: 0002 cost = 0.107252513

Epoch: 0003 cost = 0.064888892

....

Epoch: 0014 cost = 0.002714260

Epoch: 0015 cost = 0.004707661

Learning Finished!

Accuracy: 0.9783

Epoch: 0001 cost = 141.207671860

Epoch: 0002 cost = 38.788445864

Epoch: 0003 cost = 23.977515479

 

 


 

4.  Deep NN for MNIST

단순한 Deep 만으로는 효과가 나지 않고 Dropout 적용를 통해서 효과 확인

# dropout (keep_prob) rate  0.7 on training, but should be 1 for testing

keep_prob = tf.placeholder(tf.float32)


W1 = tf.get_variable("W1", shape=[784, 512])

b1 = tf.Variable(tf.random_normal([512]))

L1 = tf.nn.relu(tf.matmul(X, W1) + b1)

L1 = tf.nn.dropout(L1, keep_prob=keep_prob)


W2 = tf.get_variable("W2", shape=[512, 512])

b2 = tf.Variable(tf.random_normal([512]))

L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)

L2 = tf.nn.dropout(L2, keep_prob=keep_prob)

# train my model

for epoch in range(training_epochs):

   ...

  for i in range(total_batch):

      batch_xs, batch_ys = mnist.train.next_batch(batch_size)

      feed_dict = {X: batch_xs, Y: batch_ys, keep_prob: 0.7}

      c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)

      avg_cost += c / total_batch


# Test model and check accuracy

correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

print('Accuracy:', sess.run(accuracy, feed_dict={

    X: mnist.test.images, Y: mnist.test.labels, keep_prob: 1}))

 

Epoch: 0014 cost = 0.041290121

Epoch: 0015 cost = 0.043621063

Learning Finished!

Accuracy: 0.9804!!

 

 

5.  Optimizer : Adam 추천 

   

6. Summary

  • Softmax VS Neural Nets for MNIST, 90% and 94.5%

  • Xavier initialization: 97.8%

  • Deep Neural Nets with Dropout: 98%

  • Adam and other optimizers

  • Exercise: Batch Normalization

Posted by 꿈을펼쳐라
,

Lec 07-1 Learning Rate, Overfitting and Regularization

1. Determining Learning Rate

    1) Try Several learning Rates ( start with 0.01 )

    2) Observe the cost function 

    3) Check it goes down in a reasonable rate 

      - 너무 크면 divergence ,  너무 작은면 늦게 수렴


2. Data(X) Preprocessing for gradient descent 

   1) RAW data가 편중되어 있을 경우, 특정 변수의 민감도가 높거나 낮아질 수 있음. 

   2) Normalization 


    3) Standardization 

          

           X_std[:,0] = (X[:0] - X[:,0].mean()) / X[:,0].std()   


2. Regularization  

   1) OverFitting 

      - Our model is very good with training data set (with memorization)

      - Not good at test dataset or in real use

    2) Solutions for overfitting 

       - More Training data 

       - Reduce the number of features 

       - Regularization 


    3) Regularization 

       - Let's not have too big numbers in the weight      

         

        ( Loss함수에 Weight크기를 포함하도록 함으로써 해당 값도 최소화하는 경우를 찾고자 함)  

       - with Tensorflow 

          l2reg = 0.001 * tf.reduce_sum(tf.square(W)) 


     


Lec 07-2: Training/Testing Data Set


1. Data set 


2. Online Learning 

    1) 데이터 셋을 여러 단위로 분리해서 예측 진행

    2) 이전 단계에서 예측돼었던 내용이 새로운 데이터 셋 예측에서도 동일한 영향을 주여야 함 



3. M NIST Data set





Lab 07-1: training/test dataset, learning rate, normalization


1. Test Dataset  & Learning Rate  

  0) Test Data

   # Evaluation our model using this test dataset

   x_test = [[2, 1, 1],          

            [3, 1, 2],       

            [3, 3, 4]]

   y_test = [[0, 0, 1],

             [0, 0, 1],

             [0, 0, 1]]


   1) optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-10).minimize(cost)

TOOL Small : Local Min. No Progress


200 5.73203 [[ 0.80269569  0.67861289 -1.21728313]

 [-0.3051686  -0.3032113   1.50825703]

 [ 0.75722361 -0.7008909  -2.10820389]]

Prediction: [0 0 0]

Accuracy:  0.0

   2) optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.5).minimize(cost)

TOOL Large : Divergen


200 nan [[ nan  nan  nan]

 [ nan  nan  nan]

 [ nan  nan  nan]]

Prediction: [0 0 0]

Accuracy:  0.0

  3) optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

Proper Initial : 


200 0.670909 [[-1.15377057  0.2814692   1.13632655]

 [ 0.37484601  0.18958248  0.33544892]

 [-0.35609847 -0.43973017 -1.256042  ]]

Prediction: [2 2 2]

Accuracy:  1.0


2. Normalized input

  1) Large Value Raw Data 

     y = np.array([[828.659973, 833.450012, 908100, 828.349976, 831.659973],

                   [823.02002, 828.070007, 1828100, 821.655029, 828.070007],

                   [819.929993, 824.400024, 1438100, 818.97998, 824.159973],

                   [816, 820.958984, 1008100, 815.48999, 819.23999],

                   [819.359985, 823, 1188100, 818.469971, 818.97998],

               [819, 823, 1198100, 816, 820.450012],

                   [811.700012, 815.25, 1098100, 809.780029, 813.669983],

                   [809.51001, 816.659973, 1398100, 804.539978, 809.559998]])

100 Cost:  nan 

Prediction:

 [[ nan]

 [ nan]

 [ nan]

 [ nan]

 [ nan]

 [ nan]

 [ nan]

 [ nan]]


Process finished with exit code 0


  2) Normalized input 

def MinMaxScaler(data):

    numerator = data - np.min(data, 0)

    denominator = np.max(data, 0) - np.min(data, 0)

    # noise term prevents the zero division

    return numerator / (denominator + 1e-7)


...


xy = MinMaxScaler(xy)


100 Cost:  0.0136869 

Prediction:

 [[ 1.12295258]

 [ 0.63500863]

 [ 0.53340685]

 [ 0.4315863 ]

 [ 0.53191048]

 [ 0.55868214]

 [ 0.15761785]

 [ 0.14425412]]


Process finished with exit code 0



Lab 07-2: Meet MNIST Dataset


1. MNIST  Image

  1) 28 * 28 * 1 Image (Binary Bitmap



# MNIST data image of shape 28 * 28 = 784

X = tf.placeholder(tf.float32, [None, 784])

# 0 - 9 digits recognition = 10 classes

Y = tf.placeholder(tf.float32, [None, nb_classes])



2. Reading Data and set variables


from tensorflow.examples.tutorials.mnist import input_data

# Check out https://www.tensorflow.org/get_started/mnist/beginners for

# more information about the mnist dataset

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)


nb_classes = 10


# MNIST data image of shape 28 * 28 = 784

X = tf.placeholder(tf.float32, [None, 784])

# 0 - 9 digits recognition = 10 classes

Y = tf.placeholder(tf.float32, [None, nb_classes])


W = tf.Variable(tf.random_normal([784, nb_classes]))

b = tf.Variable(tf.random_normal([nb_classes]))


...
batch_xs, batch_ys = mnist.train.next_batch(100)
...
print("Accuracy: ", accuracy.eval(session=sess, feed_dict={X: mnist.test.images, Y: mnist.test.labels}))


3. SoftMax

# Hypothesis (using softmax)

hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)


cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), axis=1))

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)


# Test model

is_correct = tf.equal(tf.arg_max(hypothesis, 1), tf.arg_max(Y, 1))

# Calculate accuracy

accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))


4. epoch / Batch


1) 데이터가 많을 경우, 모든 데이터를 올리려면 Memory가 많이 필요하므로, Batch로 분할하여 적용

2) epoch [에폭] : 전체 데이터를 한번 모두 훈련하는 과정

3) iteration per epoch = [전체데이터 수] / [batch_size]


# parameters

training_epochs = 15

batch_size = 100


with tf.Session() as sess:

   # Initialize TensorFlow variables

   sess.run(tf.global_variables_initializer())

   # Training cycle

   for epoch in range(training_epochs):

       avg_cost = 0

       total_batch = int(mnist.train.num_examples / batch_size)


       for i in range(total_batch):

           batch_xs, batch_ys = mnist.train.next_batch(batch_size)

           c, _ = sess.run([cost, optimizer], feed_dict={X: batch_xs, Y: batch_ys})

           avg_cost += c / total_batch


       print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))



5. Result


1) 0.8951 정확도


Epoch: 0007 cost = 0.591160339 Epoch: 0008 cost = 0.563868978 Epoch: 0009 cost = 0.541745167 Epoch: 0010 cost = 0.522673571 Epoch: 0011 cost = 0.506782322 Epoch: 0012 cost = 0.492447640 Epoch: 0013 cost = 0.479955830 Epoch: 0014 cost = 0.468893666 Epoch: 0015 cost = 0.458703479 Learning finished Accuracy: 0.8951 Label: [3] Prediction: [5] 


3을 5로 인식한 결과

7을 2로 인식하였다.   아직 89 % 정확도 이니까...


  2) Learning Rate, Epoch 회수 조정으로 정확도를 더 높일 수 있다.  

     Learning Rate = 0.4 ,   Epoch = 100시     92.25%

Epoch: 0098 cost = 0.243552753

Epoch: 0099 cost = 0.243438786

Epoch: 0100 cost = 0.243145558

Learning finished

Accuracy:  0.9225

Label:  [5]

Prediction:  [5]


Posted by 꿈을펼쳐라
,

Lab 00 - TensorFlow의 설치 및 기본적인 Operation

 


1. TensorFlow

  • Open source software library for Numerial computation using data flow graphs.
  • Python

2.  Data Flow Graph

  • Nodes : mathermatical operations
  • Edges : multidimensional data arrays (tensors) communicated between them.

3. Check installation and version    (1.4)

 

 

4. Example

 

  1) Hello world

 

  2) Node Add 

 

  3) Placeholder

 

 

 

 

 

 

TensorFlow Mechanics 

 

1. Build graph using TensorFlow operations

2. feed data and run graph (operation) : sess.run(op)

3. update variables in the grap (and return values)

 

 

 

4. Tenor Ranks, Shapes, and Types

  1) Rank

  2) Shape

  3) Type

 

 

 

Lab 02 - TensorFlow로 구현한 Linear regression

 

1. Example

 

2. Linear Regression with Placeholder

 

 

 

 

Lab 03 - Linear regression : minimize Cost

 

 

1. plot cost function

 

 

2. Optimized by Hand

 

 

3. Optimized by Gradientdecent Function

 

4. Calculate gradient value

 

 

 

 

Lab 04 - multi-variable Linear regression

 

1. Multi-Variable  

 

 

2. Multi-Variable with matrix

 

 

 

3. Slice Matrix

 

 

4. file data

 

 

5. Queue Runners

 

 

 

[에러 발생함]  일단 진행

 

 

 

 

 

Posted by 꿈을펼쳐라
,

강의 웹사이트: http://hunkim.github.io/ml/

Facebook: https://www.facebook.com/groups/Tenso...

소스코드: https://github.com/hunkim/DeepLearnin...

 

 

Lec 00 - Machine/Deep learning 수업의 개요와 일정



 

Lec 01 - 머신러닝의 개념과 용어

 

1. Machine Learning

  - "Field of study that gives computers the ability to learn without being explicitly programmed” Arthur Samuel (1959)

 

2. Supervised/Unsupervised learning

  - Supervised learning:

     . learning with labeled examples

  - Unsupervised learning: un-labeled data

     . Google news grouping

     . Word clustering

 

3. Types of supervised learning

  • Predicting final exam score based on time spent

    - regression

  • Pass/non-pass based on time spent

    - binary classification

  • Letter grade (A, B, C, E and F) based on time spent

    - multi-label classification



Lec 02 - Linear Regression의 Hypothesis 와 cost 설명

1. Linear hypothesis

    H(x) = W x + b

 

2. Cost

    H(x) - y

  

 

     H(x) = W x + b

 

   

 

  3. Goal : Minizie Cost

 

      minimize cost(W,b)

 

 

 

Lec 03 - Linear Regression의 cost 최소화 알고리즘 설명

 

1. Hypothesis and Cost

  • H(x) = W x

        

 

 2. Gradient descent algorithm Minimize cost function

  • formal definition

        

  • How about Newton-Rapshon algorithm ?

 

 3. Derivative Calculator : [link]

 

 4. Check cost function 

  • must be convex curve

 

Lec 04 - Multi-variable linear regression

 

1. Hypothesis for multi-variable

 

   

 

2. Hypothesis using Matrix

    

    

 

 

Posted by 꿈을펼쳐라
,