Deploying pre-trained PyTorch VGG19 model with Amazon SageMaker Neo¶

Amazon SageMaker Neo is API to compile machine learning models to optimize them for our choice of hardward targets. Currently, Neo supports pre-trained PyTorch models from TorchVision. General support for other PyTorch models is forthcoming.

In this example notebook, we will compare the performace of PyTorch pretrained Vgg19_bn model before versus after compilation using Neo.

Pytorch Vgg19_bn model is one of the models that benefits a lot from compilation with Neo. Here we will verify that in end to end compilation and inference on sagemaker endpoints, Neo compiled model can get seven times speedup with no loss in accuracy.

[ ]:

!~/anaconda3/envs/pytorch_p36/bin/pip install torch==1.4.0 torchvision==0.5.0

SageMaker SDK >= 2.0 is required for this notebook¶

[ ]:

!~/anaconda3/envs/pytorch_p36/bin/pip install --upgrade sagemaker

[ ]:

import sagemaker
current_version = sagemaker.__version__
if current_version.split('.')[0] == '1':
    raise Exception("Please upgrade SageMaker SDK by running the above code cell and restart the kernel")

Import VGG19 from TorchVision¶

We’ll import VGG19_bn model from TorchVision and create a model artifact model.tar.gz:

[ ]:

import torch
import torchvision.models as models
import tarfile

[ ]:

vgg19_bn = models.vgg19_bn(pretrained=True)
input_shape = [1,3,224,224]
trace = torch.jit.trace(vgg19_bn.float().eval(), torch.zeros(input_shape).float())
trace.save('model.pth')

with tarfile.open('model.tar.gz', 'w:gz') as f:
    f.add('model.pth')

Set up the environment¶

[ ]:

import boto3
import sagemaker
import time
from sagemaker.utils import name_from_base
from sagemaker import image_uris

role = sagemaker.get_execution_role()
sess = sagemaker.Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

compilation_job_name = name_from_base('TorchVision-vgg19-Neo')
prefix = compilation_job_name+'/model'

model_path = sess.upload_data(path='model.tar.gz', key_prefix=prefix)

data_shape = '{"input0":[1,3,224,224]}'
target_device = 'ml_c5'
framework = 'pytorch'
framework_version = '1.4.0'
compiled_model_path = 's3://{}/{}/output'.format(bucket, compilation_job_name)

inference_image_uri = image_uris.retrieve(f'neo-{framework}', region, framework_version, instance_type=target_device)

Use sagemaker PyTorchModel to load pretained PyTorch model¶

[ ]:

from sagemaker.pytorch.model import PyTorchModel
from sagemaker.predictor import Predictor

pt_vgg = PyTorchModel(model_data=model_path,
                      framework_version=framework_version,
                      predictor_cls=Predictor,
                      role=role,
                      sagemaker_session=sess,
                      entry_point='vgg19_bn_uncompiled.py',
                      source_dir='code',
                      py_version='py3',
                      image_uri=inference_image_uri
                     )

Deploy the pretrained model to prepare for predictions(the old way)¶

[ ]:

vgg_predictor = pt_vgg.deploy(initial_instance_count = 1,
                              instance_type = 'ml.c5.9xlarge'
                             )

Invoke the endpoint¶

Let’s test with a cat image.

[ ]:

from IPython.display import Image
Image('cat.jpg')

[ ]:

import json

with open('cat.jpg', 'rb') as f:
    payload = f.read()
    payload = bytearray(payload)

[ ]:

import time
start = time.time()
for _ in range(1000):
    output = vgg_predictor.predict(payload)
inference_time = (time.time()-start)
print('Inference time is ' + str(inference_time) + 'millisecond')

[ ]:

import numpy as np
result = json.loads(output.decode())
predicted = np.argmax(result)

[ ]:

# Load names for ImageNet classes
object_categories = {}
with open("imagenet1000_clsidx_to_labels.txt", "r") as f:
    for line in f:
        key, val = line.strip().split(':')
        object_categories[key] = val

[ ]:

print("Result: label - " + object_categories[str(predicted)])

Clean-up¶

Deleting the local endpoint when you’re finished is important since you can only run one local endpoint at a time.

[ ]:

sess.delete_endpoint(vgg_predictor.endpoint_name)

Neo optimization¶

Create a PyTorch SageMaker model¶

[ ]:

from sagemaker.pytorch.model import PyTorchModel
from sagemaker.predictor import Predictor

sagemaker_model = PyTorchModel(model_data=model_path,
                               predictor_cls=Predictor,
                               framework_version = framework_version,
                               role=role,
                               sagemaker_session=sess,
                               entry_point='vgg19_bn_compiled.py',
                               source_dir='code',
                               py_version='py3',
                               env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'}
                              )

Use Neo compiler to compile the model¶

[ ]:

compiled_model = sagemaker_model.compile(target_instance_family=target_device,
                                         input_shape=data_shape,
                                         job_name=compilation_job_name,
                                         role=role,
                                         framework=framework.lower(),
                                         framework_version=framework_version,
                                         output_path=compiled_model_path
                                        )

[ ]:

predictor = compiled_model.deploy(initial_instance_count = 1,
                                  instance_type = 'ml.c5.9xlarge'
                                 )

[ ]:

import time
start = time.time()
for _ in range(1000):
    response = predictor.predict(payload)
neo_inference_time = (time.time()-start)
print('Neo optimized inference time is ' + str(neo_inference_time) + 'millisecond')

[ ]:

result = json.loads(response.decode())
print('Most likely class: {}'.format(np.argmax(result)))
print("Result: label - " + object_categories[str(np.argmax(result))]+ " probability - " + str(np.amax(result)))

[ ]:

sess.delete_endpoint(predictor.endpoint_name)

[ ]: