Amazon SageMaker Data Wrangler

Robert Ayub Technology 17 July 2023 Hits: 989

Amazon SageMaker Data Wrangler provides an end-to-end solution to import, prepare, transform, featurize, and analyze data. You can integrate a Data Wrangler data preparation flow into your machine learning (ML) workflows to simplify and streamline data pre-processing and feature engineering using little to no code. You can also add custom python scripts to customize workflows.

Core functionalities

Import - connect to and import data from Amazon S3, Amazon Athena, Amazon Redshift, Snowflake and Databricks
Data Flow - create a data flow to define a series of ML data prep steps. Use flow to:
1. combine datasets from different data sources
2. identify the number and type of transformations you want to apply to datasets
3. define a data prep workflow that can be integrated into an ML pipeline
Transform -
1. clean and transform your dataset using standard transforms like string, vector and numeric data formatting tools
2. Featurize your data using transforms like text and date/time embedding and categorical encoding
Generate data insights - automatically verify data quality and detect abnormalities in your data with Data Wrangler Data Insights and Quality Report
Analyze -
1. Analyze features in the dataset at any point in the flow
2. Data wrangler includes built-in data visualization tools like scatter plots, histograms
3. Data wrangler includes data analysis tools like target leakage analysis and quick modeling to understand feature correlation
Export - export data preparation workflow to:
1. Amazon S3 bucket
2. Amazon SageMaker model building pipeline (using SageMaker pipelines to automate model deployments)
3. Amazon SageMaker Feature Store - store the features and their data in a centralized location
4. Python Script - Store the data and their transformations in a Python script for your custom workflows

Core activities

Upload dataset to Amazon S3 and import
Analyze the data using Data Wrangler Analysis
Define data flow using Data Wrangler data transforms
1. Prepare and visualize
2. Data exploration
3. Drop unused columns
4. Cleanup missing values
5. Custom Pandas :: Encoding
6. Custom SQL
7. Save the flow
Export flow to notebook
Training using a classifier
Shutdown the Data Wrangler

Practical Implementations

Students enrolling for any AI related course from Carnegie Training Institute have access to practical and working implementation guidelines

Sources

Amazon DOCs

Robert Ayub

Kenya

+254 718 758 221

robert@ayub.co.ke

+254 718 758 221

Technology

Amazon SageMaker Data Wrangler

Robert Ayub

Kenya

+254 718 758 221

robert@ayub.co.ke

+254 718 758 221

Technology

Amazon SageMaker Data Wrangler

Related Articles

What is Artificial Intelligence

Mobile App or Website?

Multipl Linear Regression