(source : GCP qwiklabs)

 

버킷생성 

 

1. 버킷 생성 : Navigation menu> Storage > [Create a standard bucket]

 

 

2. Babyweight 데이터세트를 스토리지 버킷에 복사

- Cloud Shell에서, 아래 명령어를 실행하여 사전 처리된 데이터세트를 내 버킷에 복사

- <BUCKET> 부분을 위에서 생성한 버킷 이름으로 변경

gsutil cp gs://cloud-training-demos/babyweight/preproc/* gs://<BUCKET>/babyweight/preproc/

 

(결과화면)

 

 

TensorBoard 설정 + AI Platform Notebooks 생성

 

3.  Cloud Shell에서, TensorBoard를 지원하는 Cloud AI Platform Notebook 인스턴스를 생성

export IMAGE_FAMILY="tf-1-14-cpu" 
export ZONE="us-west1-b" 
export INSTANCE_NAME="tf-tensorboard-1" 
export INSTANCE_TYPE="n1-standard-4" 
gcloud compute instances create "${INSTANCE_NAME}" \ 
     --zone="${ZONE}" \ 
     --image-family="${IMAGE_FAMILY}" \ 
     --image-project=deeplearning-platform-release \ 
     --machine-type="${INSTANCE_TYPE}" \ 
     --boot-disk-size=200GB \ 
     --scopes=https://www.googleapis.com/auth/cloud-platform \ 
     --metadata="proxy-mode=project_editors"

 

 

4. Navigation Menu > AI Platform > Notebooks 클릭

 

 

5. 인스턴스 생성 확인 : REFRESH 버튼 클릭 

(인스턴스 생성까지 2~3분 정도 소요될 수 있음)

 

 

6. 주피터 노트북 열기 : 생성된 인스턴스에서 OPEN JUPYTERLAB 클릭

 

(결과화면) JupyterLab Window가 새창으로 열림

 

 

AI Platform Notebooks 인스턴스에 학습 Repo 복사

 

 

 

7. 학습 Repo 복사 : JupyterLab에서 Terminal 아이콘을 클릭한 후, 아래 명령어 실행하여 git에서 training-data-analyst 클론

git clone https://github.com/GoogleCloudPlatform/training-data-analyst

 

(결과화면)

 

 

학습(training) 예측(prediction) 모델 실행하기

 

8. 아래 경로로 이동하여 노트북 실행

- training-data-analyst > blogs > babyweight > train_deploy.ipynb 실행

- Python 커널을 Python3로 변경 (default : Python 2)

train_deploy.html
0.44MB

 

- 노트북 코드를 따라 아래 내용을 단계별로 실습할 수 있음 (Shift + Enger)

   1) Dataflow를 사용하여 머신러닝을 위한 Babyweight 데이터 세트 생성

   2) Estimator API를 사용하여 모델 구축

   3) Cloud ML Engine에서 머신러닝 모델 학습 (Training)

   4) 학습된 모델 배포 (Deploying)

   5) 모델을 활용하여 예측 (Predicting)

 

 

9. 텐서보드(TensorBoard)에서 작업결과 확인하기

- File > New Luancher > Tensorboard 클릭

- SCALARS

 

- GRAPHS

 

- DISTRIBUTIONS

 

- HISTOGRAMS

 

- PROJECTOR

 

(source : GCP qiwklabs)

 

AI Platform Notebooks 생성

 

1. Navigation Menu > AI Platform > Notebooks 클릭

 

2. 인스턴스 생성 : NEW INSTANCE 클릭 

- Tensorflow Enterprise 1.15 > Without GPUs 선택

 

3. 팝업창이 뜨면 딥러닝 VM 이름을 입력/확인 하고 Create 클릭

 

(VM 생성까지 2~3분 정도 소요될 수 있음)

 

4. Open JupyterLab 클릭 > JupyterLab Window가 새창으로 열림

 

Datalab instance에 학습용 repo 복제

- training-data-analyst 를 내 JupyterLab 인스턴스에 복제

 

1. JupyterLab 에서 터미널 아이콘(Terminal icon) 클릭하여 새 터미널 창을 오픈

 

2. 커맨드라인에서 아래 명령어를 입력하고 Enter

git clone https://github.com/GoogleCloudPlatform/training-data-analyst

 

실행 결과 : 

 

3. training-data-anlyst 디렉토리를 더블 클릭하여 컨텐츠가 제대로 복제되었는지 확인

 

 

 

기본적인 TensorFlow 활용 예제

 

1. 데이터셋 탐색

- notebook 인터페이스에서 training-data-analyst > courses > machine_learning > deepdive > 06_structured > 1_explore.ipynb 선택

1_explore.html
0.36MB

 

 

2. 샘플 데이터셋 생성

- notebook 인터페이스에서 training-data-analyst > courses > machine_learning > deepdive > 06_structured > 2_sample.ipynb 선택

2_sample.html
0.29MB

 

 

3. 텐서플로우 모델 생성

- notebook 인터페이스에서 training-data-analyst > courses > machine_learning > deepdive > 06_structured > 3_tensorflow_dnn.ipynb & 3_tensorflow_wd.ipynb 선택

3_tensorflow_dnn.html
0.29MB
3_tensorflow_wd.html
0.29MB

 

4. 데이터 전처리 (Preprocessing)

- notebook 인터페이스에서 training-data-analyst > courses > machine_learning > deepdive > 06_structured > 4_preproc.ipynb 선택

4_preproc.html
0.29MB

 

 

5. Cloud AI Platform 에서 Machine Learning 모델 학습하기 (Training)

- notebook 인터페이스에서 training-data-analyst > courses > machine_learning > deepdive > 06_structured > 5_train.ipynb 선택

5_train.html
0.32MB

 

 

6. Cloud AI Platform 에서 모델 배포 (Deploying)

- notebook 인터페이스에서 training-data-analyst > courses > machine_learning > deepdive > 06_structured > 6_deploy.ipynb 선택

6_deploy.html
0.28MB

 

 

 

이미지 분석

 

7. MNIST 이미지 분석 - 선형 모델(Linear model)

- training-data-analyst > courses > machine_learning > deepdive > 08_image > mnist_linear.ipynb 선택

mnist_linear.html
0.30MB

 

 

8. MNIST 이미지 분석 (Image Classification) - CNN & DNN(Deep Neural Network model)

- training-data-analyst > courses > machine_learning > deepdive > 08_image > mnistmodel > trainer > model.py 선택

- training-data-analyst > courses > machine_learning > deepdive > 08_image > mnist_models.ipynb 선택

mnist_models.html
0.30MB
model.py
0.01MB

 

※ 드랍아웃(Dropout)을 사용하려면 모델 타입 MODEL_TYPE 을 dnn_dropout 으로 변경

※ CNN을 사용하려면 모델 타입 MODEL_TYPEcnn 으로 변경

 

 

9. 꽃 이미지 분석 (Flowers Image Classification) - 이미지 오그멘테이션 (Image Augmentation)

- training-data-analyst > courses > machine_learning > deepdive > 08_image > flowersmodel > model.py

- training-data-analyst > courses > machine_learning > deepdive > 08_image > flowers_fromscratch.ipynb

flowers_fromscratch.html
0.29MB
model.py
0.01MB

 

 

 

시계열 데이터 분석

 

10. 시계열 데이터 분석 (Time Series Prediction) 

- training-data-analyst > courses > machine_learning > deepdive > 09_sequence > sinemodel > model.py

- training-data-analyst > courses > machine_learning > deepdive > 09_sequence > sinewaves.ipynb

sinewaves.html
0.28MB
model.py
0.01MB

 

※ 모델 옵션

 - 선형 모델 (Linear Model) : --model = linear (defuault 값은 linear 모델)

 - DNN 모델 (Deep Neural Netowrk Model) : --model = dnn

 - CNN 모델 (Convolutional Neural Network Model) : --model = cnn

 - RNN 모델 (Recurrent Neural Network Model) : --model = rnn

 - 2-layer RNN 모델 (Two-Layer Recurrent Neural Network Model) : --model = rnn2

 

 

11. 날씨/온도 시계열 데이터 분석 (Time Series Prediction)

- training-data-analyst > courses > machine_learning > deepdive > 09_sequence > temperatures.ipynb

temperatures.html
1.32MB

 

 

 

텍스트 분석

 

12. 텍스트 분석 (Text Classification)

- training-data-analyst > courses > machine_learning > deepdive > 09_sequence > txtclsmodel > trainer > model.py

- training-data-analyst > courses > machine_learning > deepdive > 09_sequence > text_classification.ipynb

text_classification.html
0.30MB
model.py
0.01MB

 

 

13. 학습 모델 평가 (Evaluating a Pre-trained embeddings)

- training-data-analyst > courses > machine_learning > deepdive > 09_sequence > reusable-embeddings.ipynb

reusable-embeddings.html
0.34MB

 

 

14. 텍스트 제너레이션 (Text generation) - tensor2tensor 활용

- training-data-analyst > courses > machine_learning > deepdive > 09_sequence > poetry.ipynb

poetry.html
0.35MB

 

 

 

추천 시스템

 

15. 컨텐츠 기반 추천 시스템 (Content-based Recommendation System)

- training-data-analyst > courses > machine_learning > deepdive > 10_recommend > content_based_by_hand.ipynb

content_based_by_hand.html
0.29MB

 

- training-data-analyst > courses > machine_learning > deepdive > 10_recommend > content_based_preproc.ipynb

content_based_preproc.html
0.29MB

 

- training-data-analyst > courses > machine_learning > deepdive > 10_recommend > content_based_using_neural_networks.ipynb

content_based_using_neural_networks.html
0.37MB

 

 

16. 구글 애널리틱스 데이터 - 추천 시스템 : 협업 필터링 (Collaborative Filtering Recommendation System)

- training-data-analyst > courses > machine_learning > deepdive > 10_recommend > wals.ipynb

wals.html
0.49MB

 

 

17. 영화 MovieLens 데이터 - 하이브리드 추천 (Hybrid Recommendations)

- training-data-analyst > courses > machine_learning > deepdive2 > recommendation_systems > solutions > als_bqml_hybrid.ipynb

als_bqml_hybrid.html
0.28MB

 

 

18. 엔드 투 엔드 추천 시스템 (End-to-End Recommendation System)

- training-data-analyst > courses > machine_learning > deepdive > 10_recommend > endtoend > endtoend.ipynb 선택

endtoend.html
0.35MB

 

※ 18번 예제 노트북 실행 전, 아래 사항 체크

 1) 버킷 생성

 2) 주피터 노트북 (Jupyterlabs) 실행

 3) 클라우드 컴포저 인스턴스 생성 (Cloud Composer instance)

   - Navigation menu > Composer > Create (인스턴스 네임 = mlcomposer )  * 최초 실행시 15~20분 정도 소요됨

 4) 구글 앱 엔진 인스턴스 생성 (Google App Engine instance)

   - Cloud Shell 에서 아래 명령어를 순서대로 실행 (Region은 가까운 곳 선택)

gcloud app regions list
gcloud app create --region <REGION>
gcloud app update --no-split-health-checks

 

 

 

 

 

(source : GCP qwiklabs)

- Jupyter Notebook 실습 코드  

e_ai_platform.html
0.30MB

 

- Jupyter Notebook 실습 코드 : 정답 포함

e_ai_platform_solution.html
0.31MB

 

 

TensorBoard 설정 + AI Platform Notebooks 생성

 

1.  Cloud Shell에서, TensorBoard를 지원하는 Cloud AI Platform Notebook 인스턴스를 생성

export IMAGE_FAMILY="tf-1-14-cpu" 
export ZONE="us-west1-b" 
export INSTANCE_NAME="tf-tensorboard-1" 
export INSTANCE_TYPE="n1-standard-4" 
gcloud compute instances create "${INSTANCE_NAME}" \ 
     --zone="${ZONE}" \ 
     --image-family="${IMAGE_FAMILY}" \ 
     --image-project=deeplearning-platform-release \ 
     --machine-type="${INSTANCE_TYPE}" \ 
     --boot-disk-size=200GB \ 
     --scopes=https://www.googleapis.com/auth/cloud-platform \ 
     --metadata="proxy-mode=project_editors"

 

2. Navigation Menu > AI Platform > Notebooks 클릭

 

3. 인스턴스 생성 : NEW INSTANCE 클릭 

- Tensorflow Enterprise 1.15 > Without GPUs 선택

 

4. 팝업창이 뜨면 딥러닝 VM 이름을 입력/확인 하고 Create 클릭

 

(VM 생성까지 2~3분 정도 소요될 수 있음)

 

5. Open JupyterLab 클릭 > JupyterLab Window가 새창으로 열림

 

 

Scaling TensorFlow with AI Platform Training Service

 

 

training-data-analyst > courses > machine_learning > deepdive > 03_tensorflow > labs > e_ai_platform 실행

(정답확인) training-data-analyst > courses > machine_learning > deepdive > 03_tensorflow > e_ai_platform

 

 

(source : GCP qwiklabs)

 

- Jupyter Notebook 실습 코드 

d_traineval.html
0.27MB

 

- Jupyter Notebook 실습 코드 : 정답  포함

d_traineval_solution.html
0.28MB

 

1. 패키지 import

from google.cloud import bigquery
import tensorflow as tf
import numpy as np
import shutil
print(tf.__version__)

 

2. 입력

CSV_COLUMNS = ['fare_amount', 'pickuplon','pickuplat','dropofflon','dropofflat','passengers', 'key']
LABEL_COLUMN = 'fare_amount'
DEFAULTS = [[0.0], [-74.0], [40.0], [-74.0], [40.7], [1.0], ['nokey']]

def read_dataset(filename, mode, batch_size = 512):
      def decode_csv(value_column):
          columns = tf.decode_csv(value_column, record_defaults = DEFAULTS)
          features = dict(zip(CSV_COLUMNS, columns))
          label = features.pop(LABEL_COLUMN)
          # No need to features.pop('key') since it is not specified in the INPUT_COLUMNS.
          # The key passes through the graph unused.
          return features, label

      # Create list of file names that match "glob" pattern (i.e. data_file_*.csv)
      filenames_dataset = tf.data.Dataset.list_files(filename)
      # Read lines from text files
      textlines_dataset = filenames_dataset.flat_map(tf.data.TextLineDataset)
      # Parse text lines as comma-separated values (CSV)
      dataset = textlines_dataset.map(decode_csv)

      # Note:
      # use tf.data.Dataset.flat_map to apply one to many transformations (here: filename -> text lines)
      # use tf.data.Dataset.map      to apply one to one  transformations (here: text line -> feature list)

      if mode == tf.estimator.ModeKeys.TRAIN:
          num_epochs = None # indefinitely
          dataset = dataset.shuffle(buffer_size = 10 * batch_size)
      else:
          num_epochs = 1 # end-of-input after this

      dataset = dataset.repeat(num_epochs).batch(batch_size)
      
      return dataset

 

 

3. 입력 데이터에서 features 생성

INPUT_COLUMNS = [
    tf.feature_column.numeric_column('pickuplon'),
    tf.feature_column.numeric_column('pickuplat'),
    tf.feature_column.numeric_column('dropofflat'),
    tf.feature_column.numeric_column('dropofflon'),
    tf.feature_column.numeric_column('passengers'),
]

def add_more_features(feats):
    # Nothing to add (yet!)
    return feats

feature_cols = add_more_features(INPUT_COLUMNS)

 

 

4. Serving input function 

# Defines the expected shape of the JSON feed that the model
# will receive once deployed behind a REST API in production.
def serving_input_fn():
    json_feature_placeholders = {
        'pickuplon' : tf.placeholder(tf.float32, [None]),
        'pickuplat' : tf.placeholder(tf.float32, [None]),
        'dropofflat' : tf.placeholder(tf.float32, [None]),
        'dropofflon' : tf.placeholder(tf.float32, [None]),
        'passengers' : tf.placeholder(tf.float32, [None]),
    }
    # You can transforma data here from the input format to the format expected by your model.
    features = json_feature_placeholders # no transformation needed
    return tf.estimator.export.ServingInputReceiver(features, json_feature_placeholders)

 

 

5. tf.estimator.train_and_evaluate

def train_and_evaluate(output_dir, num_train_steps):
    estimator = tf.estimator.LinearRegressor(
                       model_dir = output_dir,
                       feature_columns = feature_cols)
    
    train_spec=tf.estimator.TrainSpec(
                       input_fn = lambda: read_dataset('./taxi-train.csv', mode = tf.estimator.ModeKeys.TRAIN),
                       max_steps = num_train_steps)

    exporter = tf.estimator.LatestExporter('exporter', serving_input_fn)

    eval_spec=tf.estimator.EvalSpec(
                       input_fn = lambda: read_dataset('./taxi-valid.csv', mode = tf.estimator.ModeKeys.EVAL),
                       steps = None,
                       start_delay_secs = 1, # start evaluating after N seconds
                       throttle_secs = 10,  # evaluate every N seconds
                       exporters = exporter)
    
    tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

 

 

6. TensorBoard 에서 학습 모니터링

OUTDIR = './taxi_trained'

- JupyterLab UI > "File" - "New Launcher" > 'Tensorboard' 더블 클릭

 

7. Training (학습)

# Run training    
shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time
tf.summary.FileWriterCache.clear() # ensure filewriter cache is clear for TensorBoard events file
train_and_evaluate(OUTDIR, num_train_steps = 500)

 

 

 

(source : GCP qwiklabs)

 

- Jupyter Notebook 실습 코드 

c_dataset.html
0.27MB

 

- Jupyter Notebook 실습 코드 : 정답  포함

c_dataset_solution.html
0.28MB

 

1. 패키지 import

from google.cloud import bigquery
import tensorflow as tf
import numpy as np
import shutil
print(tf.__version__)

 

2. 입력 Refactor

- Dataset API를 사용하여 데이터가 미니 배치로 모델에 전달 될 때, 필요할 때만 디스크에서 로드됨

CSV_COLUMNS = ['fare_amount', 'pickuplon','pickuplat','dropofflon','dropofflat','passengers', 'key']
DEFAULTS = [[0.0], [-74.0], [40.0], [-74.0], [40.7], [1.0], ['nokey']]

def read_dataset(filename, mode, batch_size = 512):
  def decode_csv(row):
    columns = tf.decode_csv(row, record_defaults = DEFAULTS)
    features = dict(zip(CSV_COLUMNS, columns))
    features.pop('key') # discard, not a real feature
    label = features.pop('fare_amount') # remove label from features and store
    return features, label

  # Create list of file names that match "glob" pattern (i.e. data_file_*.csv)
  filenames_dataset = tf.data.Dataset.list_files(filename, shuffle=False)
  # Read lines from text files
  textlines_dataset = filenames_dataset.flat_map(tf.data.TextLineDataset)
  # Parse text lines as comma-separated values (CSV)
  dataset = textlines_dataset.map(decode_csv)

  # Note:
  # use tf.data.Dataset.flat_map to apply one to many transformations (here: filename -> text lines)
  # use tf.data.Dataset.map      to apply one to one  transformations (here: text line -> feature list)

  if mode == tf.estimator.ModeKeys.TRAIN:
      num_epochs = None # loop indefinitely
      dataset = dataset.shuffle(buffer_size = 10 * batch_size, seed=2)
  else:
      num_epochs = 1 # end-of-input after this

  dataset = dataset.repeat(num_epochs).batch(batch_size)

  return dataset

def get_train_input_fn():
  return read_dataset('./taxi-train.csv', mode = tf.estimator.ModeKeys.TRAIN)

def get_valid_input_fn():
  return read_dataset('./taxi-valid.csv', mode = tf.estimator.ModeKeys.EVAL)

 

 

3. feature 생성 방식 리팩토링

INPUT_COLUMNS = [
    tf.feature_column.numeric_column('pickuplon'),
    tf.feature_column.numeric_column('pickuplat'),
    tf.feature_column.numeric_column('dropofflat'),
    tf.feature_column.numeric_column('dropofflon'),
    tf.feature_column.numeric_column('passengers'),
]

def add_more_features(feats):
  # Nothing to add (yet!)
  return feats

feature_cols = add_more_features(INPUT_COLUMNS)

 

 

4. 모델 생성 및 학습

- num_steps * batch_size 예제 학습

tf.logging.set_verbosity(tf.logging.INFO)
OUTDIR = 'taxi_trained'
shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time
model = tf.estimator.LinearRegressor(
      feature_columns = feature_cols, model_dir = OUTDIR)
model.train(input_fn = get_train_input_fn, steps = 200)

 

 

5. 모델 평가

metrics = model.evaluate(input_fn = get_valid_input_fn, steps = None)
print('RMSE on dataset = {}'.format(np.sqrt(metrics['average_loss'])))

 

 

+ Recent posts