Data Preprocess (MNIST)¶

import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf

%matplotlib inline

데이터 불러오기¶

TensorFlow에서 제공해주는 데이터셋(MNIST) 예제 불러오기

from tensorflow.keras import datasets

데이터 shape 확인하기

mnist = datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x.shape

(60000, 28, 28)

Image Dataset 들여다보기¶

불러온 데이터셋에서 이미지 데이터 하나만 뽑아서 시각화까지 확인

데이터 하나만 뽑기

image = train_x[0]
image.shape  # 뒤에 rgb를 뜻하는 3이 없음 - gray scale

(28, 28)

시각화해서 확인

plt.imshow(image, 'gray')

<matplotlib.image.AxesImage at 0x190d3fc6a08>

Channel 관련¶

[Batch Size, Height, Width, Channel]
GrayScale이면 1, RGB이면 3으로 만들어줘야함

다시 shape로 데이터 확인

train_x.shape

(60000, 28, 28)

데이터 차원수 늘리기 (numpy)

expanded_data = np.expand_dims(train_x, -1) # 마지막에 차원수롤 1 늘린다.
expanded_data.shape

(60000, 28, 28, 1)

TensorFlow 패키지 불러와 데이터 차원수 늘리기 (tensorflow)

expanded_data = tf.expand_dims(train_x, -1)
expanded_data.shape

TensorShape([60000, 28, 28, 1])

TensorFlow 공홈에서 소개하는 방법 tf.newaxis

train_x.shape

(60000, 28, 28)

expanded_data[...,tf.newaxis].shape

TensorShape([60000, 28, 28, 1, 1])

# +) reshape 사용
expanded_data = train_x.reshape([60000,28,28,1])
expanded_data.shape

(60000, 28, 28, 1)

*주의 사항
matplotlib로 이미지 시각화 할 때는 gray scale의 이미지는 3번쨰 dimension이 없으므로,
2개의 dimension으로 gray scale로 차원 조절해서 넣어줘야함

new_train_x[0] -> new_train_x[0, :, :, 0]

new_train_x = train_x[...,tf.newaxis]
new_train_x.shape

(60000, 28, 28, 1)

disp = new_train_x[0]
"""
plt.imshow(disp,'gray') # demension이 안맞아서 나는 error, 시각화시에 뒤에 1이 있으면 안됨.
"""

"\nplt.imshow(disp,'gray') # demension이 안맞아서 나는 error, 시각화시에 뒤에 1이 있으면 안됨.\n"

# 차원을 줄이는 방법 1
disp = new_train_x[1, :, :, 0]
disp.shape

(28, 28)

# 차원을 줄이는 방법 2
disp = np.squeeze(new_train_x[0])
disp.shape

(28, 28)

다시 시각화

plt.imshow(disp,'gray')
plt.show()

Label Dataset 들여다보기¶

Label 하나를 열어서 Image와 비교하여 제대로 들어갔는지. 어떤 식으로 저장 되어있는지 확인

label 하나만 뽑아보기

train_y.shape

(60000,)

print("label = ",train_y[0])
plt.imshow(train_x[0])
plt.show()

label =  5

Label 시각화

# title로 traib_y 지정
plt.title(train_y[0])
plt.imshow(train_x[0], 'gray')
plt.show

<function matplotlib.pyplot.show(*args, **kw)>

OneHot Encoding¶

컴퓨터가 이해할 수 있는 형태로 변환해서 Label을 주도록 함

CLASS "1" == [ 0, 1, 0, 0, 0, 0, 0, 0, 0 , 0 ]¶

## 동일한 방식으로 one hot encoding을 다른 수에도 적용한다면 아래와 같다.
# 5
[0,0,0,0,0,1,0,0,0,0]

# 9
[0,0,0,0,0,0,0,0,0,1]

[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]

tensorflow.keras.utils.to_categorical

from tensorflow.keras.utils import to_categorical

1을 예시로 one hot encoding

print(to_categorical(1, 10))
print(to_categorical(5, 10))
print(to_categorical(0, 10))
print(to_categorical(9, 10))

[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]

label 확인해서 to_categorical 사용

label = train_y[0]
label

5

label_onehot = to_categorical(label, num_classes = 10)
label_onehot

array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)

onehot encoding으로 바꾼 것과 이미지 확인

plt.title(label_onehot)
plt.imshow(train_x[0], 'gray')
plt.show

C:\Users\user\Anaconda3\lib\site-packages\matplotlib\text.py:1165: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if s != self._text:

<function matplotlib.pyplot.show(*args, **kw)>

티스토리

[tensorflow2.x 기초 - 2] MNIST 시각화를 one hot encoding과 tensor 차원 다루기 - tf.expand_dims / .squeeze / to_categorical