Wipro - AWS Data Platform Engineering

AWS | Data Engineering | Terraform | Python Automation

Built Terraform-driven AWS data platform infrastructure and Python automation for Alight Solutions, supporting governed ingestion, transformation, metadata discovery, analytics delivery, and forecasting workflows.

Business Context

Alight Solutions needed reliable AWS data platform components for ingestion, transformation, cataloging, analytics, caching, governance, and ML-ready workflows across enterprise datasets.

The Challenge

The work required reusable infrastructure, secure access controls, data validation, monitoring, and repeatable daily refresh workflows while protecting sensitive data and supporting downstream Tableau, analytics, and machine learning consumers.

Approach & Architecture

Led and mentored a 7-member data engineering team. Wrote, deployed, and managed reusable Terraform modules across S3, Glue, Athena, Redshift, Lambda, Step Functions, and Redis. Built Glue/PySpark and SQL validation checks, Python batch jobs, and custom PII scanning with Lambda and EventBridge Scheduler. Applied Lake Formation, IAM, KMS, and Secrets Manager controls and configured CloudWatch and CloudTrail for monitoring and auditability.

Architecture notes

Data from 10+ applications landed in S3. Glue Crawlers inferred schemas, updated partitions, and registered metadata in Glue Data Catalog. Athena and Redshift SQL validated and analyzed curated datasets. Python batch jobs refreshed Redshift datasets for reporting and Tableau dashboard consumption. Lake Formation, IAM, KMS, and Secrets Manager controlled access and protected sensitive datasets.

Tools & Stack

What I Owned

Reusable Terraform modules for AWS data platform components across S3, Glue, Athena, Redshift, Lambda, Step Functions, and Redis
Glue/PySpark and SQL-based source-to-target validation, schema checks, null checks, duplicate key checks, and accepted value checks
Custom PII scanning process using Python, AWS Lambda, and EventBridge Scheduler
Lake Formation, IAM, KMS, and Secrets Manager controls for governed access and encrypted data pipelines
CloudWatch and CloudTrail monitoring for pipeline execution, access activity, failures, and auditability
Daily Python batch jobs that refreshed Redshift datasets for reporting and Tableau dashboards

Outcomes

Prepared curated S3 and Redshift datasets for reporting, governance, analytics, and Amazon SageMaker model workflows
Improved visibility into Glue, S3, Athena, Redshift, and security-related services through CloudWatch and CloudTrail
Enabled internal users and downstream teams to consume trusted, governed datasets
Reduced manual data refresh effort through scheduled Python batch jobs
Documented data ownership, definitions, and access needs with application developers and product owners

Lessons Learned

Enterprise data platforms need the same engineering discipline as application platforms: reusable infrastructure, validation, access governance, monitoring, and clear ownership documentation.

Explore more

All Case Studies Full Experience LinkedIn