Game servers autopilot

Multiplayer game publishers often need to either over-provision resources or manually manage compute resource allocation when launching a large-scale worldwide game, to avoid the long player-wait in the game lobby. Game publishers need to develop, config, and deploy tools that helped them to monitor and control the compute allocation.

This notebook demonstrates Game server autopilot, a new machine learning-based example tool that makes it easy for game publishers to reduce the time players wait for compute to spawn, while still avoiding compute over-provisioning. It also eliminates manual configuration decisions and changes publishers need to make and reduces the opportunity for human errors.

We heard from customers that optimizing compute resource allocation is not trivial. This is because it often takes substantial time to allocate and prepare EC2 instances. The time needed to spin up an EC2 instance and install game binaries and other assets must be learned and accounted for in the allocation algorithm. Ever-changing usage patterns require a model that is adaptive to emerging player habits. Finally, the system also performs scale down in concert with new server allocation as needed.

We describe a reinforcement learning-based system that learns to allocate resources in response to player usage patterns. The hosted model directly predicts the required number of game-servers so as to allow EKS the time to allocate instances to reduce player wait time. The training process integrates with the game eco-system, and requires minimal manual configuration.

Pre-requisites

Imports

To get started, we’ll import the Python libraries we need, set up the environment with a few prerequisites for permissions and configurations.

[30]:
import sagemaker
import boto3
import sys
import os
import glob
import re
import subprocess
import numpy as np
from IPython.display import HTML
import time
from time import gmtime, strftime
sys.path.append("common")
from misc import get_execution_role, wait_for_s3_object
from docker_utils import build_and_push_docker_image
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework

Setup S3 bucket

Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata.

[31]:
sage_session = sagemaker.session.Session()
s3_bucket = sage_session.default_bucket()
s3_output_path = 's3://{}/'.format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))
S3 bucket path: s3://sagemaker-us-west-2-356566070122/

Parameters

Adding new parameters for the job require update in the training section that invokes the RLEstimator.

[32]:
job_name_prefix = 'rl-game-server-autopilot'
job_duration_in_seconds = 60 * 60 * 24 * 5
train_instance_count = 1
cloudwatch_namespace = 'rl-game-server-autopilot'
min_servers=10
max_servers=100
# over provisionning factor. use 5 for optimal.
over_prov_factor=5
#gamma is the discount factor
gamma=0.9
# if local inference is set gs_inventory_url=local and populate learning_freq
gs_inventory_url = 'https://4bfiebw6ui.execute-api.us-west-2.amazonaws.com/api/currsine1h/'
#gs_inventory_url = 'local'
# sleep time in seconds between step() executions
learning_freq = 65
# actions are normelized between 0 and 1, action factor the number of game servers needed e.g. 100 will be 100*action and clipped to the min and max servers parameters above
action_factor = 100
[33]:

# Pick the instance type
instance_type = "ml.c5.xlarge" #4 cpus
#     instance_type = "ml.c5.4xlarge" #16 cpus
#      instance_type = "ml.c5.2xlarge" #8 cpus
#      instance_type = "ml.c4.4xlarge"
#     instance_type = "ml.p2.8xlarge" #32 cpus
#     instance_type = "ml.p3.2xlarge" #8 cpus
#    instance_type = "ml.p3.8xlarge" #32 cpus
#     instance_type = "ml.p3.16xlarge" #96 cpus
#     instance_type = "ml.c5.18xlarge" #72 cpus

num_cpus_per_instance = 4

Create an IAM role

Either get the execution role when running from a SageMaker notebook instance role = sagemaker.get_execution_role() or, when running from local notebook instance, use utils method role = get_execution_role() to create an execution role. In this example, the env thru the training job, publishes cloudwatch custom metrics as well as put values in DynamoDB table. Therefore, an appropriate role is required to be set to the role arn below.

[34]:
try:
    role = sagemaker.get_execution_role()
except:
    role = get_execution_role()

print("Using IAM role arn: {}".format(role))
Using IAM role arn: arn:aws:iam::356566070122:role/service-role/AmazonSageMaker-ExecutionRole-20181024T210472

Set up the environment

The environment is defined in a Python file called gameserver_env.py and the file is uploaded on /src directory. The environment also implements the init(), step() and reset() functions that describe how the environment behaves. This is consistent with Open AI Gym interfaces for defining an environment. It also implements help functions for custom CloudWatch metrics (populate_cloudwatch_metric()) and a simple sine demand simulator (get_curr_sine1h())

  1. init() - initialize the environment in a pre-defined state

  2. step() - take an action on the environment

  3. reset()- restart the environment on a new episode

  4. get_curr_sine1h() - return the sine value based on the current second.

  5. populate_cloudwatch_metric(namespace,metric_value,metric_name) - populate the metric_name with metric_value in namespace.

[35]:
!pygmentize src/gameserver_env.py
import time
import boto3
import requests
import gym
import numpy as np
from time import gmtime,strftime
from gym.spaces import Discrete, Box

cloudwatch_cli = boto3.client('cloudwatch',region_name='us-west-2')

class GameServerEnv(gym.Env):

    def __init__(self, env_config={}):
        print ("in __init__")
        print ("env_config {}".format(env_config))
        self.namespace = env_config['cloudwatch_namespace']
        self.gs_inventory_url = env_config['gs_inventory_url']
        self.learning_freq = env_config['learning_freq']
        self.min_servers = int(env_config['min_servers'])
        self.max_servers = int(env_config['max_servers'])
        self.action_factor = int(env_config['action_factor'])
        self.over_prov_factor = int(env_config['over_prov_factor'])
        self.num_steps = 0
        self.max_num_steps = 301
        self.history_len = 5
        self.total_num_of_obs = 1
        # we have two observation array, allocation and demand. allocation is alloc_observation, demand is observation hence *2
        self.observation_space = Box(low=np.array([self.min_servers]*self.history_len*2),
                                           high=np.array([self.max_servers]*self.history_len*2),
                                           dtype=np.uint32)

        # How many servers should the agent spin up at each time step 
        self.action_space = Box(low=np.array([0]),
                                     high=np.array([1]),
                                     dtype=np.float32)

    def reset(self):
        print ("in reset")
        #self.populate_cloudwatch_metric(self.namespace,1,'reset')
        self.num_steps = 0
        self.current_min = 0
        self.demand_observation = np.array([self.min_servers]*self.history_len)
        self.alloc_observation = np.array([self.min_servers]*self.history_len)
        #self.action_observation = np.array([self.min_servers]*self.history_len)

        print ('self.demand_observation '+str(self.demand_observation))
        print ('self.alloc_observation '+str(self.alloc_observation))
        #return np.concatenate((self.demand_observation, self.alloc_observation,self.action_observation))
        return np.concatenate((self.demand_observation, self.alloc_observation))



    def step(self, action):
        print ('in step - action recieved from model'+str(action))
        self.num_steps+=1
        self.total_num_of_obs+=1
        print('total_num_of_obs={}'.format(self.total_num_of_obs))

        raw_action=float(action)
        self.curr_action = raw_action*self.action_factor
        self.curr_action = np.clip(self.curr_action, self.min_servers, self.max_servers)
        print('self.curr_action={}'.format(self.curr_action))


        if (self.gs_inventory_url!='local'):
          #get the demand from the matchmaking service
          print('quering matchmaking service for current demand, curr_demand')
          try:
           gs_url=self.gs_inventory_url
           req=requests.get(url=gs_url)
           data=req.json()
           self.curr_demand = float(data['Prediction']['num_of_gameservers'])

          except requests.exceptions.RequestException as e:
           print(e)
           print('if matchmaking did not respond just randomized curr_demand between limit, reward will correct')
           self.curr_demand = float(np.random.randint(self.min_servers,self.max_servers))
        if (self.gs_inventory_url=='local'):
          print('local matchmaking service for current demand, curr_demand')
          data=self.get_curr_sine1h()
          self.curr_demand = float(data['Prediction']['num_of_gameservers'])
        # clip the demand to the allowed range
        self.curr_demand = np.clip(self.curr_demand, self.min_servers, self.max_servers)
        print('self.curr_demand={}'.format(self.curr_demand))

        self.curr_alloc = self.alloc_observation[0]
        print('self.curr_alloc={}'.format(self.curr_alloc))

        # Assumes it takes history_len time steps to create or delete 
        # the game server from allocation
        # self.action_observation = self.action_observation[1:]
        # self.action_observation = np.append(self.action_observation, self.curr_action)
        # print('self.action_observation={}'.format(self.action_observation))

        # store the current demand in the history array demand_observation
        self.demand_observation = self.demand_observation[1:] # shift the observation by one to remove one history point
        self.demand_observation=np.append(self.demand_observation,self.curr_demand)
        print('self.demand_observation={}'.format(self.demand_observation))

        # store the current demand in the history array demand_observation
        self.alloc_observation = self.alloc_observation[1:]
        self.alloc_observation=np.append(self.alloc_observation,self.curr_action)
        print('self.alloc_observation={}'.format(self.alloc_observation))


        #reward calculation - in case of over provision just 1-ratio. under provision is more severe so 500% more negative reward
        print('calculate the reward, calculate the ratio between allocation and demand, we use the first allocation in the series of history of five, first_alloc/curr_demand')
        print('history of previous predictions made by the model ={}'.format(self.alloc_observation))

        ratio=self.curr_alloc/self.curr_demand
        print('ratio={}'.format(ratio))
        if (ratio>1):
           #reward=1-ratio
           reward = -1 * (self.curr_alloc - self.curr_demand)
           print('over provision - ratio>1 - {}'.format(reward))
        if (ratio<1):
           #reward=-50*ratio
           reward = -5 * (self.curr_demand - self.curr_alloc)
           print('under provision - ratio<1 - {}'.format(reward))
        if (ratio==1):
           reward=1
           print('ratio=1')
        reward -= (self.curr_demand - self.curr_alloc)*self.over_prov_factor
        print('ratio={}'.format(ratio))
        print('reward={}'.format(reward))


        #Instrumnet the supply and demand in cloudwatch
        print('populating cloudwatch - self.curr_demand={}'.format(self.curr_demand))
        self.populate_cloudwatch_metric(self.namespace,self.curr_demand,'curr_demand')
        print('populating cloudwatch - self.curr_alloc={}'.format(self.curr_action))
        self.populate_cloudwatch_metric(self.namespace,self.curr_action,'curr_alloc')
        print('populating cloudwatch - reward={}'.format(reward))
        self.populate_cloudwatch_metric(self.namespace,reward,'reward')

        if (self.num_steps >= self.max_num_steps):
          done = True
          print ("self.num_steps "+str(self.num_steps))
          print ("self.max_num_steps "+str(self.max_num_steps))
        else:
          done = False

        print ('time.sleep() for {} before next iteration'.format(self.learning_freq))
        time.sleep(int(self.learning_freq))

        extra_info = {}
        #the next state includes the demand and allocation history. 
        #next_state=np.concatenate((self.demand_observation,self.alloc_observation,self.action_observation))
        next_state=np.concatenate((self.demand_observation,self.alloc_observation))
        print ('next_state={}'.format(next_state))
        return next_state, reward, done, extra_info

    def render(self, mode):
        print("in render")
        pass

    def populate_cloudwatch_metric(self,namespace,metric_value,metric_name):
        print("in populate_cloudwatch_metric metric_value="+str(metric_value)+" metric_name="+metric_name)
        response = cloudwatch_cli.put_metric_data(
        Namespace=namespace,
        MetricData=[
           {
              'MetricName': metric_name,
              'Unit': 'None',
              'Value': metric_value,
           },
        ]
        )
        print('response from cloud watch'+str(response))

    def get_curr_sine1h(self):
        max_servers=self.max_servers*0.9
        print ('in get_curr_sine1h')
        cycle_arr=np.linspace(0.2,3.1,61)
        self.current_min = (self.current_min + 1) % 60
        current_min = self.current_min
        print('current_min={}'.format(current_min))
        current_point=cycle_arr[int(current_min)]
        sine=max_servers*np.sin(current_point)
        print('sine({})={}'.format(current_point,sine))
        return {"Prediction":{"num_of_gameservers": sine}}

Configure the presets for RL algorithm

The presets that configure the RL training jobs are defined in the train_gameserver_ppo.py file which is also uploaded on the /src directory. Using the preset file, you can define agent parameters to select the specific agent algorithm. You can also set the environment parameters, define the schedule and visualization parameters, and define the graph manager. The schedule presets will define the number of heat up steps, periodic evaluation steps, training steps between evaluations. It can be used to define custom hyperparameters.

[36]:
!pygmentize src/train_gameserver_ppo.py
import json
import os
import sys
import gym
import ray
from ray.tune import run_experiments
from ray.tune.registry import register_env

from sagemaker_rl.ray_launcher import SageMakerRayLauncher

env_config={}

class MyLauncher(SageMakerRayLauncher):

    def register_env_creator(self):
        from gameserver_env import GameServerEnv
        register_env("GameServers", lambda env_config: GameServerEnv(env_config))

    def _save_tf_model(self):
        print("in _save_tf_model")
        ckpt_dir = '/opt/ml/output/data/checkpoint'
        model_dir = '/opt/ml/model'

        # Re-Initialize from the checkpoint so that you will have the latest models up.
        tf.train.init_from_checkpoint(ckpt_dir,
                                      {'main_level/agent/online/network_0/': 'main_level/agent/online/network_0'})
        tf.train.init_from_checkpoint(ckpt_dir,
                                      {'main_level/agent/online/network_1/': 'main_level/agent/online/network_1'})

        # Create a new session with a new tf graph.
        sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))
        sess.run(tf.global_variables_initializer())  # initialize the checkpoint.

        # This is the node that will accept the input.
        input_nodes = tf.get_default_graph().get_tensor_by_name('main_level/agent/main/online/' + \
                                                                'network_0/observation/observation:0')
        # This is the node that will produce the output.
        output_nodes = tf.get_default_graph().get_operation_by_name('main_level/agent/main/online/' + \
                                                                    'network_1/ppo_head_0/policy_mean/BiasAdd')
        # Save the model as a servable model.
        tf.saved_model.simple_save(session=sess,
                                   export_dir='model',
                                   inputs={"observation": input_nodes},
                                   outputs={"policy": output_nodes.outputs[0]})
        # Move to the appropriate folder. 
        shutil.move('model/', model_dir + '/model/tf-model/00000001/')
        # SageMaker will pick it up and upload to the right path.
        print("in _save_tf_model Success")

    def get_experiment_config(self):
        print('get_experiment_config')
        print(env_config)
        # allowing 1600 seconds to the job toto stop and save the model
        time_total_s=float(env_config["time_total_s"])-4600
        print("time_total_s="+str(time_total_s))
        return {
          "training": {
            "env": "GameServers",
            "run": "PPO",
             "stop": {
               "time_total_s": time_total_s
             },
            "config": {
               "ignore_worker_failures": True,
               "gamma": 0,
               "kl_coeff": 1.0,
               "num_sgd_iter": 10,
               "lr": 0.0001,
               "sgd_minibatch_size": 32,
               "train_batch_size": 128,
               "model": {
#                 "free_log_std": True,
#                  "fcnet_hiddens": [512, 512],
                },
               "use_gae": True,
               #"num_workers": (self.num_cpus-1),
               "num_gpus": self.num_gpus,
               #"batch_mode": "complete_episodes",
               "num_workers": 1,
                "env_config": env_config,
               #'observation_filter': 'MeanStdFilter',
            }
          }
        }

if __name__ == "__main__":
    for i in range(len(sys.argv)):
      if i==0:
         continue
      if i % 2 > 0:
         env_config[sys.argv[i].split('--',1)[1]]=sys.argv[i+1]
    MyLauncher().train_main()

Train the RL model using the Python SDK Script mode

The RLEstimator is used for training RL jobs.

  1. The entry_point value indicates the script that invokes the GameServer RL environment.

  2. source_dir indicates the location of environment code which currently includes train-gameserver-ppo.py and game_server_env.py.

  3. Specify the choice of RL toolkit and framework. This automatically resolves to the ECR path for the RL Container.

  4. Define the training parameters such as the instance count, job name, S3 path for output and job name.

  5. Specify the hyperparameters for the RL agent algorithm. The RLCOACH_PRESET or the RLRAY_PRESET can be used to specify the RL agent algorithm you want to use.

  6. Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks.

[37]:
metric_definitions = [{'Name': 'episode_reward_mean',
  'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_max',
  'Regex': 'episode_reward_max: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_len_mean',
  'Regex': 'episode_len_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'entropy',
  'Regex': 'entropy: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_min',
  'Regex': 'episode_reward_min: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'vf_loss',
  'Regex': 'vf_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'policy_loss',
  'Regex': 'policy_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
]

metric_definitions
[37]:
[{'Name': 'episode_reward_mean',
  'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_max',
  'Regex': 'episode_reward_max: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_len_mean',
  'Regex': 'episode_len_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'entropy',
  'Regex': 'entropy: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_min',
  'Regex': 'episode_reward_min: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'vf_loss',
  'Regex': 'vf_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'policy_loss',
  'Regex': 'policy_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'}]
[38]:
%%time
#metric_definitions = RLEstimator.default_metric_definitions(RLToolkit.RAY)

estimator = RLEstimator(
                        entry_point="train_gameserver_ppo.py",
                        source_dir='src',
                        dependencies=["common/sagemaker_rl"],
                        toolkit=RLToolkit.RAY,
                        toolkit_version='0.6.5',
                        framework=RLFramework.TENSORFLOW,
                        role=role,
                        train_instance_type=instance_type,
                        train_instance_count=train_instance_count,
                        output_path=s3_output_path,
                        base_job_name=job_name_prefix,
                        metric_definitions=metric_definitions,
                        train_max_run=job_duration_in_seconds,
                        hyperparameters={
                           "cloudwatch_namespace":cloudwatch_namespace,
                          "gs_inventory_url":gs_inventory_url,
                          "learning_freq":learning_freq,
                          "time_total_s":job_duration_in_seconds,
                          "min_servers":min_servers,
                          "max_servers":max_servers,
                          "gamma":gamma,
                          "action_factor":action_factor,
                          "over_prov_factor":over_prov_factor,
                          "save_model": 1
                        }
                    )

estimator.fit(wait=False)
job_name = estimator.latest_training_job.job_name
print("Training job: %s" % job_name)
Training job: rl-game-server-autopilot-2019-12-25-06-03-34-742
CPU times: user 118 ms, sys: 0 ns, total: 118 ms
Wall time: 315 ms
[11]:
import sagemaker
sagemaker.__version__
[11]:
'1.45.0.dev0'

Store intermediate training output and model checkpoints

The output from the training job above is stored in a S3.

[ ]:
%%time

job_name=estimator._current_job_name
print("Job name: {}".format(job_name))

s3_url = "s3://{}/{}".format(s3_bucket,job_name)

output_tar_key = "{}/output/output.tar.gz".format(job_name)

intermediate_folder_key = "{}/output/intermediate/".format(job_name)
output_url = "s3://{}/{}".format(s3_bucket, output_tar_key)
intermediate_url = "s3://{}/{}".format(s3_bucket, intermediate_folder_key)

print("S3 job path: {}".format(s3_url))
print("Output.tar.gz location: {}".format(output_url))
print("Intermediate folder path: {}".format(intermediate_url))

tmp_dir = "/tmp/{}".format(job_name)
os.system("mkdir {}".format(tmp_dir))
print("Create local folder {}".format(tmp_dir))

Evaluation of RL models

We use the latest checkpointed model to run evaluation for the RL Agent.

Load checkpointed model

Checkpointed data from the previously trained models will be passed on for evaluation / inference in the checkpoint channel. Since TensorFlow stores ckeckpoint file containes absolute paths from when they were generated (see issue), we need to replace the absolute paths to relative paths. This is implemented within evaluate-game-server.py

[ ]:
%%time

wait_for_s3_object(s3_bucket, output_tar_key, tmp_dir)

if not os.path.isfile("{}/output.tar.gz".format(tmp_dir)):
    raise FileNotFoundError("File output.tar.gz not found")
os.system("tar -xvzf {}/output.tar.gz -C {}".format(tmp_dir, tmp_dir))

checkpoint_dir = "{}/checkpoint".format(tmp_dir)

print("Checkpoint directory {}".format(checkpoint_dir))
[ ]:
%%time
checkpoint_path = "s3://{}/{}/checkpoint/".format(s3_bucket, job_name)
if not os.listdir(checkpoint_dir):
     raise FileNotFoundError("Checkpoint files not found under the path")
os.system("aws s3 cp --recursive {} {}".format(checkpoint_dir, checkpoint_path))
print("S3 checkpoint file path: {}".format(checkpoint_path))

Run the evaluation step

Use the checkpointed model to run the evaluation step.

[99]:
%%time
job_name = "5obs-local-sine-2019-08-18-21-13-45-314"
print("job_name: %s" % job_name)
estimator_eval = RLEstimator(entry_point="evaluate_gameserver_ppo.py",
                        source_dir='src',
                        dependencies=["common/sagemaker_rl"],
                        role=role,
                        toolkit=RLToolkit.RAY,
                        toolkit_version='0.6.5',
                        framework=RLFramework.TENSORFLOW,
                        train_instance_type=instance_type,
                        train_instance_count=1,
                        base_job_name=job_name_prefix + "-evaluation",
                        hyperparameters={
                          "cloudwatch_namespace":cloudwatch_namespace,
                          "gs_inventory_url":gs_inventory_url,
                          "learning_freq":learning_freq,
                          "time_total_s":job_duration_in_seconds,
                          "min_servers":min_servers,
                          "max_servers":max_servers,
                          "gamma":gamma,
                          "action_factor":action_factor,
                          "over_prov_factor":over_prov_factor,
                          "save_model": 1
                        }
                    )
estimator_eval.fit({'model': checkpoint_path})
job_name = estimator_eval.latest_training_job.job_name
print("Evaluation job: %s" % job_name)
job_name: 5obs-local-sine-2019-08-18-21-13-45-314
in __init__
env_config
{'cloudwatch_namespace': '5obs-local-sine', 'gs_inventory_url': 'https://4bfiebw6ui.execute-api.us-west-2.amazonaws.com/api/currsine1h/', 'learning_freq': '5', 'max_servers': '100', 'min_servers': '10', 'save_model': '1', 'time_total_s': '32400'}
self.curr_demand=63.138143498979936
calculate the reward, calculate the ratio between allocation and demand, curr_alloc/curr_demand
interm ratio=1.0151651067081289
over provision - ratio>1 - -0.9574966835151812
https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/tensorflow#adapting-your-local-tensorflow-script
2019-08-19 06:24:38,987 sagemaker-containers INFO     Reporting training SUCCESS

2019-08-19 06:24:43 Uploading - Uploading generated training model
2019-08-19 06:24:43 Completed - Training job completed
Billable seconds: 60
Evaluation job: 5obs-local-sine-evaluation-2019-08-19-06-21-59-623
CPU times: user 1.65 s, sys: 129 ms, total: 1.78 s
Wall time: 3min 14s

Hosting

Once the training is done, we can deploy the trained model as an Amazon SageMaker real-time hosted endpoint. This will allow us to make predictions (or inference) from the model. Note that we don’t have to host on the same insantance (or type of instance) that we used to train. The endpoint deployment can be accomplished as follows:

Model deployment

Now let us deploy the RL policy so that we can get the optimal action, given an environment observation. In case the notebook restarted and lost its previous estimator object, populate the estimator.model_data with the full s3 link to the model.tar.gz. e.g., s3://sagemaker-us-west-2-356566070122/rl-gameserver-autopilot-2019-07-19-19-36-32-926/output/model.tar.gz

[18]:
from sagemaker.tensorflow.serving import Model
print ("model name: %s" % estimator.model_data)
model_data='s3://sagemaker-us-west-2-356566070122/rl-gs-training-2019-09-23-15-41-40-260/output/model.tar.gz'
model = Model(model_data=model_data,
              role=role)

predictor = model.deploy(initial_instance_count=1, instance_type=instance_type)
model name: s3://sagemaker-us-west-2-356566070122/rl-gs-training-2019-09-23-15-41-40-260/output/model.tar.gz
-------------------------------------------------------------------------!

Inference

Now that the trained model is deployed at an endpoint that is up-and-running, we can use this endpoint for inference. The format of input should match that of observation_space in the defined environment. In this example, the observation space is a 15 dimensional vector formulated from previous and current observations. For the sake of space, this demo doesn’t include the non-trivial construction process. Instead, we provide a dummy input below. For more details, please check src/gameserver_env.py.

[ ]:
sagemaker_region = 'us-west-2'
sagemaker_client = boto3.client('sagemaker-runtime',region_name=sagemaker_region)
#populate the correct endpoint_name
endpoint_name ="sagemaker-tensorflow-serving-2019-09-23-20-53-20-237"
content_type = "application/json"
accept = "Accept"
last_observations = np.arange(1, 16)

response = sagemaker_client.invoke_endpoint(
      EndpointName=endpoint_name,
      ContentType=content_type,
      Accept=accept,
      Body=last_observations
    )
response['Body'].read()

Delete the Endpoint

Having an endpoint running will incur some costs. Therefore as a clean-up job, we should delete the endpoint.

[ ]:
predictor.delete_endpoint()