Amazon SageMaker Data Wrangler provides an end-to-end solution to import, prepare, transform, featurize, and analyze data. You can integrate a Data Wrangler data preparation flow into your machine learning (ML) workflows to simplify and streamline data pre-processing and feature engineering using little to no code. You can also add custom python scripts to customize workflows.
Core functionalities
Import - connect to and import data from Amazon S3, Amazon Athena, Amazon Redshift, Snowflake and Databricks
Data Flow - create a data flow to define a series of ML data prep steps. Use flow to:
combine datasets from different data sources
identify the number and type of transformations you want to apply to datasets
define a data prep workflow that can be integrated into an ML pipeline
Transform -
clean and transform your dataset using standard transforms like string, vector and numeric data formatting tools
Featurize your data using transforms like text and date/time embedding and categorical encoding
Generate data insights - automatically verify data quality and detect abnormalities in your data with Data Wrangler Data Insights and Quality Report
Analyze -
Analyze features in the dataset at any point in the flow
Data wrangler includes built-in data visualization tools like scatter plots, histograms
Data wrangler includes data analysis tools like target leakage analysis and quick modeling to understand feature correlation
Export - export data preparation workflow to:
Amazon S3 bucket
Amazon SageMaker model building pipeline (using SageMaker pipelines to automate model deployments)
Amazon SageMaker Feature Store - store the features and their data in a centralized location
Python Script - Store the data and their transformations in a Python script for your custom workflows
Core activities
Upload dataset to Amazon S3 and import
Analyze the data using Data Wrangler Analysis
Define data flow using Data Wrangler data transforms
Prepare and visualize
Data exploration
Drop unused columns
Cleanup missing values
Custom Pandas :: Encoding
Custom SQL
Save the flow
Export flow to notebook
Training using a classifier
Shutdown the Data Wrangler
Practical Implementations
Students enrolling for any AI related course from Carnegie Training Institute have access to practical and working implementation guidelines