← Case Studies

AWS | Data Engineering | Terraform | Python Automation

Wipro

Wipro

DevOps Engineer - AWS Data Platform · Navi Mumbai, India · February 2024 - April 2025

Built Terraform-driven AWS data platform infrastructure and Python automation for Alight Solutions, supporting governed ingestion, transformation, metadata discovery, analytics delivery, and forecasting workflows.

AWSTerraformPythonGlueAthenaRedshiftLake FormationPySparkLambdaCloudWatchDevOpsData Platform

Business Context

Alight Solutions needed reliable AWS data platform components for ingestion, transformation, cataloging, analytics, caching, governance, and ML-ready workflows across enterprise datasets.

The Challenge

The work required reusable infrastructure, secure access controls, data validation, monitoring, and repeatable daily refresh workflows while protecting sensitive data and supporting downstream Tableau, analytics, and machine learning consumers.

Approach & Architecture

Led and mentored a 7-member data engineering team. Wrote, deployed, and managed reusable Terraform modules across S3, Glue, Athena, Redshift, Lambda, Step Functions, and Redis. Built Glue/PySpark and SQL validation checks, Python batch jobs, and custom PII scanning with Lambda and EventBridge Scheduler. Applied Lake Formation, IAM, KMS, and Secrets Manager controls and configured CloudWatch and CloudTrail for monitoring and auditability.

Architecture notes

Data from 10+ applications landed in S3. Glue Crawlers inferred schemas, updated partitions, and registered metadata in Glue Data Catalog. Athena and Redshift SQL validated and analyzed curated datasets. Python batch jobs refreshed Redshift datasets for reporting and Tableau dashboard consumption. Lake Formation, IAM, KMS, and Secrets Manager controlled access and protected sensitive datasets.

Tools & Stack

AWSTerraformPythonPySparkS3AWS GlueAthenaRedshiftLake FormationLambdaStep FunctionsRedisCloudWatchCloudTrailIAMKMSSecrets ManagerTableauSageMaker

What I Owned

  • Reusable Terraform modules for AWS data platform components across S3, Glue, Athena, Redshift, Lambda, Step Functions, and Redis
  • Glue/PySpark and SQL-based source-to-target validation, schema checks, null checks, duplicate key checks, and accepted value checks
  • Custom PII scanning process using Python, AWS Lambda, and EventBridge Scheduler
  • Lake Formation, IAM, KMS, and Secrets Manager controls for governed access and encrypted data pipelines
  • CloudWatch and CloudTrail monitoring for pipeline execution, access activity, failures, and auditability
  • Daily Python batch jobs that refreshed Redshift datasets for reporting and Tableau dashboards

Outcomes

  • Prepared curated S3 and Redshift datasets for reporting, governance, analytics, and Amazon SageMaker model workflows
  • Improved visibility into Glue, S3, Athena, Redshift, and security-related services through CloudWatch and CloudTrail
  • Enabled internal users and downstream teams to consume trusted, governed datasets
  • Reduced manual data refresh effort through scheduled Python batch jobs
  • Documented data ownership, definitions, and access needs with application developers and product owners

Lessons Learned

Enterprise data platforms need the same engineering discipline as application platforms: reusable infrastructure, validation, access governance, monitoring, and clear ownership documentation.