Title¶

The title should be similar to the filename, but the filename should be very concise and compact, so people can read what it is when displayed in a list view in JupyterLab.

Example title - Amazon SageMaker Processing: pre-processing images with PyTorch using a GPU instance type

Bad example filename: amazon_sagemaker-processing-images_with_pytorch_on_GPU.ipynb (too long & mixes case, dashes, and underscores)
Good example filename: processing_images_pytorch_gpu.ipynb (succinct, all lowercase, all underscores)

IMPORTANT: Use only one maining heading with #, so your next subheading is ## or ### and so on.

Overview¶

What does this notebook do?
- What will the user learn how to do?
Is this an end-to-end tutorial or it is a how-to (procedural) example?
- Tutorial: add conceptual information, flowcharts, images
- How to: notebook should be lean. More of a list of steps. No conceptual info, but links to resources for more info.
Who is the audience?
- What should the user be familiar with before running this?
- Link to other examples they should have run first.
How much will this cost?
- Some estimate of both time and money is recommended.
- List the instance types and other resources that are created.

Prerequisites¶

Which environments does this notebook work in? Select all that apply.

Notebook Instances: Jupyter?
Notebook Instances: JupyterLab?
Studio?

Which conda kernel is required?
Is there a previous notebook that is required?

Setup¶

Setup Dependencies¶

Describe any pip or conda or apt installs or setup scripts that are needed.
Pin sagemaker if version <2 is required.

%pip install "sagemaker>=1.14.2,<2"
Upgrade sagemaker if version 2 is required, but rollback upgrades to packages that might taint the user’s kernel and make other notebooks break. Do this at the end of the notebook in the cleanup cell.
```
# setup
import sagemaker
version = sagemaker.__version__
%pip install 'sagemaker>=2.0.0'
...
# cleanup
%pip install 'sagemaker=={}'.format(version)
```
Use flags that facilitate automatic, end-to-end running without a user prompt, so that the notebook can run in CI without any updates or special configuration.

[ ]:

# SageMaker Python SDK version 1.x is required
import sys
%pip install "sagemaker>=1.14.2,<2"

[ ]:

# SageMaker Python SDK version 2.x is required
import sagemaker
import sys
original_version = sagemaker.__version__
%pip install 'sagemaker>=2.0.0'

Setup Python Modules¶

Import modules, set options, and activate extensions.

[ ]:

# imports
import sagemaker
import numpy as np
import pandas as pd

# options
pd.options.display.max_columns = 50
pd.options.display.max_rows = 30

# visualizations
import plotly
import plotly.graph_objs as go
import plotly.offline as ply
plotly.offline.init_notebook_mode(connected=True)

# extensions
if 'autoreload' not in get_ipython().extension_manager.loaded:
    %load_ext autoreload

%autoreload 2

Parameters¶

Setup user supplied parameters like custom bucket names and roles in a separated cell and call out what their options are.
Use defaults, so the notebook will still run end-to-end without any user modification.

For example, the following description & code block prompts the user to select the preferred dataset.

To do select a particular dataset, assign choosen_data_set below to be one of 'diabetes', 'california', or 'boston' where each name corresponds to the it's respective dataset.

'boston' : boston house data
'california' : california house data
'diabetes' : diabetes data

[ ]:

data_sets = {'diabetes': 'load_diabetes()', 'california': 'fetch_california_housing()', 'boston' : 'load_boston()'}

# Change choosen_data_set variable to one of the data sets above.
choosen_data_set = 'california'
assert choosen_data_set in data_sets.keys()
print("I selected the '{}' dataset!".format(choosen_data_set))

Data import¶

Look for the data that was stored by a previous notebook run %store -r variableName
If that doesn’t exist, look in S3 in their default bucket
If that doesn’t exist, download it from the SageMaker dataset bucket
If that doesn’t exist, download it from origin

For example, the following code block will pull training and validation data that was created in a previous notebook. This allows the customer to experiment with features, re-run the notebook, and not have it pull the dataset over and over.

[ ]:

# Load relevant dataframes and variables from preprocessing_tabular_data.ipynb required for this notebook
%store -r X_train
%store -r X_test
%store -r X_val
%store -r Y_train
%store -r Y_test
%store -r Y_val
%store -r choosen_data_set

Procedure or tutorial¶

Break up processes with Markdown blocks to explain what’s going on.
Make use of visualizations to better demonstrate each step.

Cleanup¶

If you upgraded their sagemaker SDK, roll it back.
Delete any endpoints or other resources that linger and might cost the user money.

[ ]:

# rollback the SageMaker Python SDK to the kernel's original version
print("Original version: {}".format(original_version))
print("Current version: {}".format(sagemaker.__version__))
s = 'sagemaker=={}'.format(version)
print("Rolling back to... {}".format(s))
%pip install {s}
import sagemaker
print("{} installed!".format(sagemaker.__version__))

Next steps¶

Wrap up with some conclusion or overview of what was accomplished.
Offer another notebook or more resources or some other call to action.

References¶

author1, article1, journal1, year1, url1
author2, article2, journal2, year2, url2

[ ]: