Data Science on AWS

O'Reilly Book · Early 2021

Star

BOOKWORKSHOP CODE
VIDEOS

Antje Barth

YouTube Videos

Antje Barth is a Developer Advocate for AI and Machine Learning at Amazon Web Services (AWS) based in Düsseldorf, Germany. She is co-author of the O'Reilly Book, "Data Science on AWS."

Antje is also co-founder of the Düsseldorf chapter of Women in Big Data.  She frequently speaks at AI and Machine Learning conferences and meetups around the world, including the O’Reilly AI and Strata conferences.  Besides ML/AI, Antje is passionate about helping developers leverage Big Data, container and Kubernetes platforms in the context of AI and Machine Learning.  

Previously, Antje worked in technical evangelism and solutions engineering at MapR and Cisco where she worked with many companies to build and deploy cloud-based AI solutions using AWS and Kubernetes.

Connect on LinkedIn

Chris Fregly

YouTube Videos

Chris Fregly is a Developer Advocate for AI and Machine Learning at Amazon Web Services (AWS) based in San Francisco, California. He is co-author of the O'Reilly Book, "Data Science on AWS."

Chris is also the Founder of many global meetups focused on Apache Spark, TensorFlow, and KubeFlow. He regularly speaks at AI and Machine Learning conferences across the world including O’Reilly AI & Strata, Open Data Science Conference (ODSC), and GPU Technology Conference (GTC).

Previously, Chris was Founder at PipelineAI where he worked with many AI-first startups and enterprises to continuously deploy ML/AI Pipelines using Apache Spark ML, Kubernetes, TensorFlow, Kubeflow, Amazon EKS, and Amazon SageMaker.

Connect on LinkedIn

Popular Talks
Building an End-to-End Pipeline with BERT, TensorFlow, and Amazon SageMaker

In this talk, I will build an end-to-end AI/ML pipeline for natural language processing with SageMaker.  Attendees will learn how to do the following:

* Ingest data into S3 using Amazon Athena and the Parquet data format
* Visualize data with pandas, matplotlib on SageMaker notebooks and AWS Data Wrangler
* Analyze data with the Deequ library, Apache Spark, and SageMaker Processing Jobs
* Perform feature engineering on a raw dataset using Scikit-Learn and SageMaker Processing Jobs
* Train a custom BERT model using TensorFlow, Keras, and SageMaker Training Jobs
* Find the best hyper-parameters using SageMaker Hyper-Parameter Optimization Service (HPO) 
* Deploy a model to a REST Inference Endpoint using SageMaker Endpoints
* Perform batch inference on a model using SageMaker Batch Transformations
* Automate the entire process using StepFunctions, EventBridge, and S3 Triggers

This talk is based on our workshop here:   https://github.com/data-science-on-aws/workshop

You can RSVP to the workshop here:   https://www.eventbrite.com/e/full-day-workshop-kubeflow-bert-gpu-tensorflow-keras-sagemaker-tickets-63362929227

AI and Machine Learning, Open Source, Natural Language Processing (NLP), Natural Language Understanding (NLU), and BERT
Quantum Computing with Amazon Braket

In this talk, I describe some  fundamental principles of quantum computing including qu-bits, superposition, and entanglement.  I will demonstrate how to perform secure quantum computing tasks across many Quantum Processing Units (QPUs) using Amazon Braket, IAM, and S3.

AI and Machine Learning, Quantum Computing, Amazon Braket, QPU
Controlling and Optimizing Costs on Amazon SageMaker

In this talk, we show how to control and optimize your Amazon SageMaker costs across multiple features including SageMaker Notebooks, Training Jobs, Hyper-Parameter Tuning, Batch Predictions, Deployment Endpoints, and GPUs.

AI and Machine Learning, Amazon SageMaker, Cost Optimization, GPU
Smokey and the Multi-Armed Bandit featuring BERT Reynolds and Reinforcement Learning

Using the popular Hugging Face Transformers open source library for BERT, I will train and deploy multiple natural language understanding (NLU) models and compare them in live production using reinforcement learning to dynamically shift traffic to the winning model.

Next, I will describe the differences between A/B and multi-armed bandit tests including exploration-exploitation, reward-maximization, and regret-minimization.

Last, I will dive deep into the details of scaling a multi-armed bandit architecture on AWS using a real-time, stream-based text classifier with TensorFlow, PyTorch, and BERT on 150+ million reviews from the Amazon Customer Reviews Dataset.

I and Machine Learning, Open Source, Natural Language Processing (NLP), Natural Language Understanding (NLU), and BERT
From Your Laptop To The Cloud:  Scaling Machine Learning the Easy Way

Working withJupyter notebooks, I will show you how you can get offyour laptop in just a few clicks and run your first ML model in the cloud.

AI and Machine Learning, Jupyter Notebooks, Amazon SageMaker, Model Scaling, Model Development
1-Click Model Training, Tuning, and Deploying with Amazon SageMaker AutoPilot

Typical approaches to automated machine learning (AutoML) don’t provide insights into the data or logic used to create models, forcing you to compromise on accuracy. Join us as I introduce Amazon SageMaker Autopilot, a fully managed AutoML service that generates machine learning models and provides complete control and visibility into the process. Learn how Autopilot automatically inspects raw data, picks the best set of algorithms, trains multiple models, tunes them, and ranks them based on performance. The result is a recommendation for the best performing model and visibility into the logic and code for how the model was created. Autopilot offers the best combination of automatic model creation with control and visibility for true 1-Click Machine Learning.

AI and Machine Learning, Automated Machine Learning, Model Training, Hyper-Parameter Tuning, Model Deploying, 1-Click, Amazon Autopilot
Beginners Guide to Natural Language Processing (NLP) and Natural Language Understanding (NLU)

Natural Language Processing (NLP) is a field of artificial intelligence which focuses on the machines' ability to read, understand and derive meaning from human languages.  It is with no doubt one of the fields that has been studied for very long times, with research publications dating back to the early 1900's.  Fast-forward to 2020,  the field of NLP is experiencing ground-breaking NLP research with new language models nearly every month.  In this session, I will introduce you to Amazon Comprehend, a fully-managed NLP service, to find insights and relationships in text without any machine learning experience required.

AI and Machine Learning, Natural Language Processing (NLP), BERT, Amazon Comprehend
Scaling Data Science with Open Source on Amazon Web Services

Accessing data is the most important part of data science. In this workshop, I download, ingest, and analyze many aspects of a public dataset using S3, Athena, Redshift, and SageMaker Notebooks. I will highlight various AWS open source projects such as Deequ and Data Wrangler to improve the data science experience on AWS.

AI and Machine Learning, Big Data, Open Source, Data Science, Amazon S3, Athena, Redshift, SageMaker
Putting Your Machine Learning on Autopilot

A typical machine learning workflow consists of many steps including data analysis, feature engineering, model training and model tuning. What if our machine learning platform could perform these tasks for us and generate high-quality model candidates ready for review and deployment? In this session, I will discuss the concept of Automated Machine Learning (AutoML) and how the latest advances in AutoML allow you to put your machine learning models into autopilot mode while maintaining full visibility and control. I will demonstrate AutoML using Amazon SageMaker Autopilot, a fully managed AutoML service offered by AWS.

AI and Machine Learning, Automated Machine Learning, Amazon Autopilot
Owning Your Own (Data) Lake House

The amount of data generated by IoT, smart devices, cloud applications, and social media is growing exponentially. You need ways to easily and cost-effectively analyze all of this data with minimal time-to-insight, regardless of the format or where the data is stored. In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake. This allows you to make this data available easily to other analytics and machine learning tools rather than locking it in a new silo.

Big Data, Data Lake, Data Warehouse, Amazon S3, Athena, Redshift, and Redshift Spectrum
Running Federated Queries Across Your Data Lake and Data Warehouse with Amazon S3, Athena, Redshift, and Spectrum

In this talk, I demonstrate how to run federated queries across your unstructured data lake and structured data warehouse using Redshift Spectrum.

AI and Machine Learning, Big Data, Data Lake, Data Warehouse, Amazon S3, Athena, Redshift, and Redshift Spectrum
Building End-to-End Machine Learning Workflows with Kubernetes, Kubeflow Pipelines, and BERT

Kubeflow is a popular open-source machine learning (ML) toolkit for Kubernetes users who want to build custom ML pipelines. Kubeflow Pipelines is an add-on to Kubeflow that lets you build and deploy portable and scalable end-to-end ML workflows. In this session, I show you how to get started with Kubeflow Pipelines on AWS. I also demonstrate how you can integrate powerful Amazon SageMaker features such as data labeling, large-scale hyperparameter tuning, distributed training jobs, secure and scalable model deployment using Amazon SageMaker Components for Kubeflow Pipelines. 

AI and Machine Learning, Open Source, Kubernetes, Kubeflow Pipelines, SageMaker Components for Kubeflow Pipelines, Amazon Elastic Kubernetes Service (EKS)
Creating an FAQ Chatbot with Slack, Amazon Lex, and Kendra

In this talk, we demonstrate how to build a chatbot to handle common questions such as "Where can I get the slides?",  "Is this recorded?", and "Where can I get the recording?"  We will use Slack with Amazon Lex, Kendra, and Lambda Functions to build the FAQ chatbot.

AI and Machine Learning, Natural Language Processing (NLP), Natural Language Understanding (NLU), and BERT
Amazon SageMaker Studio:   The First IDE for Machine Learning to Build, Train, and Deploy Your Models Quickly

Machine learning (ML) is a complex, iterative, often time-consuming process. One difficult aspect is the lack of integration between the workflow steps and the tools to accomplish them. Join us as I introduce Amazon SageMaker Studio, the first full integrated development environment (IDE) for ML that makes it easy to build, train, tune, debug, deploy, and monitor ML models at scale. It pulls together the ML workflow steps in a unified, visual interface—since they’re performed and tracked within one environment, the non-linear and iterative nature of ML development is greatly simplified. You can quickly move between steps, compare results and adjust inputs and parameters, and iterate faster with Amazon SageMaker Studio.

AI and Machine Learning, Integrated Development Environment (IDE), Model Development Lifecycle (MDLC)
Use Natural Language Processing (NLP) and BERT to Power Intelligent Applications

AWS brings natural language and text analysis technologies within the reach of every developer through pre-trained AI services. Learn how to modernize, adding intelligence to any application with machine learning services that provide language and chatbot functions. See how others are defining and building the next generation of apps that can interact with the world around us.

AI and Machine Learning, Natural Language Processing (NLP), BERT, Amazon SageMaker
Build, Train, and Deploy an AI/ML Pipeline with BERT, Amazon SageMaker, and StepFunctions

In this talk, I demonstrate how to deploy a continuous AI/ML pipeline using Amazon SageMaker and Step Functions.  I will build, train, tune, and deploy a BERT model using the public Amazon Customer Reviews Dataset with 150+ million Amazon.com product reviews from 1995-2015.

AI and Machine Learning, Big Data, AI/ML Pipelines, Amazon SageMaker, Step Functions
Find the Best Predictive Model for Your Dataset with Hyper-Parameter Tuning and Experiment Tracking on Amazon SageMaker

In this talk, I describe how to tune and track your AI and machine learning models with Amazon SageMaker Hyper-Parameter Tuning and Experiment Tracking.

AI and Machine Learning, Hyper-Parameter Tuning, Experiment Tracking, Amazon SageMaker
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Models Using Amazon SageMaker Endpoints and Model Monitor

In this talk, I describe how to deploy a model into production and monitor its performance using SageMaker Model Monitor. With Model Monitor, I can detect if a model's predictive performance has degraded - and alert an on-call data scientist to take action and improve the model at 2am while the DevOps folks sleep soundly through the night.

AI and Machine Learning, Model Deployment, Anomaly Detection, Amazon SageMaker Endpoints, and Model Monitor
Visualize BERT Attention for Natural Language Understanding (NLU) Use Cases using  Amazon SageMaker Model and Debugger 

BERT is a revolutionary AI/ML model for Natural Language Understanding (NLP) and Natural Language Understanding (NLU). In this talk, I describe how to use Amazon SageMaker Model Debugger to visualize how BERT learns to understand the language of a given set of documents.

AI and Machine Learning, Model Training, Model Debugging, Natural Language Processing (NLP), Natural Language Understanding (NLU), and BERT
Analyze and Detect Anomalies in Large Data sets Using AWS Open Source and Amazon SageMaker

Data quality issues can lead to poor accuracy and bias in your predictive models.  In my experience, this is the number one issue with machine learning pipelines based on unstructured data generated from modern applications.  In this talk, we download, ingest, and analyze many aspects of  the Amazon Customer Reviews Dataset including 150 million reviews from 1995-2015.  We will use S3, Athena, Redshift, and SageMaker along with various open source projects from AWS including Data Wrangler and Deequ to improve the data science experience on AWS.  Data Wrangler helps with data ingest, ETL, and feature engineering.  Deequ analyzes large datasets for inconsistencies such as missing values, unexpected data types, and other statistical anomalies.

AI and Machine Learning, Big Data, Open Source, Apache Spark, Data Quality, Anomaly Detection
Human-in-the-Loop Active Learning and Augmented AI

Humans are not being replaced by AI. In fact, they work very well together. AI takes care of the tedious tasks while humans focus on the complex tasks that require human intelligence. This talk will demonstrate human-in-the-loop workflows using Amazon Augmented AI (A2I). AI will perform most of the predictions - and involve me only when the AI is not confident about its predictions. I then actively learn a new model to improve the AI's confidence on future predictions.

AI and Machine Learning, AI/ML Pipelines, Augmented AI, Human-in-the-Loop, Human Intelligence
Understanding Natural Language with Amazon Comprehend and Kendra

In this talk, we describe how to use Comprehend and Kendra to analyze documents and derive valuable insights from text-based data sources including S3, Salesforce, Sharepoint, MySQL, and many others.  I will demonstrate how to build a custom text classifier using Amazon Comprehend and the public Amazon Customer Reviews Dataset with 150+ million reviews from 1995-2015. 

AI and Machine Learning, Natural Language Processing (NLP), Natural Language Understanding (NLU), Enterprise Search, Amazon Comprehend, Kendra

Data Science on Amazon Web Services

O'Reilly Book · Early 2021

BOOKWORKSHOP CODEVIDEOS