AWS | Data Engineering | Terraform | Python Automation
Wipro
DevOps Engineer - AWS Data Platform · Navi Mumbai, India · February 2024 - April 2025
Built Terraform-driven AWS data platform infrastructure and Python automation for Alight Solutions, supporting governed ingestion, transformation, metadata discovery, analytics delivery, and forecasting workflows.
Business Context
Alight Solutions needed reliable AWS data platform components for ingestion, transformation, cataloging, analytics, caching, governance, and ML-ready workflows across enterprise datasets.
The Challenge
The work required reusable infrastructure, secure access controls, data validation, monitoring, and repeatable daily refresh workflows while protecting sensitive data and supporting downstream Tableau, analytics, and machine learning consumers.
Approach & Architecture
Led and mentored a 7-member data engineering team. Wrote, deployed, and managed reusable Terraform modules across S3, Glue, Athena, Redshift, Lambda, Step Functions, and Redis. Built Glue/PySpark and SQL validation checks, Python batch jobs, and custom PII scanning with Lambda and EventBridge Scheduler. Applied Lake Formation, IAM, KMS, and Secrets Manager controls and configured CloudWatch and CloudTrail for monitoring and auditability.
Architecture notes
Data from 10+ applications landed in S3. Glue Crawlers inferred schemas, updated partitions, and registered metadata in Glue Data Catalog. Athena and Redshift SQL validated and analyzed curated datasets. Python batch jobs refreshed Redshift datasets for reporting and Tableau dashboard consumption. Lake Formation, IAM, KMS, and Secrets Manager controlled access and protected sensitive datasets.
Tools & Stack
What I Owned
- Reusable Terraform modules for AWS data platform components across S3, Glue, Athena, Redshift, Lambda, Step Functions, and Redis
- Glue/PySpark and SQL-based source-to-target validation, schema checks, null checks, duplicate key checks, and accepted value checks
- Custom PII scanning process using Python, AWS Lambda, and EventBridge Scheduler
- Lake Formation, IAM, KMS, and Secrets Manager controls for governed access and encrypted data pipelines
- CloudWatch and CloudTrail monitoring for pipeline execution, access activity, failures, and auditability
- Daily Python batch jobs that refreshed Redshift datasets for reporting and Tableau dashboards
Outcomes
- Prepared curated S3 and Redshift datasets for reporting, governance, analytics, and Amazon SageMaker model workflows
- Improved visibility into Glue, S3, Athena, Redshift, and security-related services through CloudWatch and CloudTrail
- Enabled internal users and downstream teams to consume trusted, governed datasets
- Reduced manual data refresh effort through scheduled Python batch jobs
- Documented data ownership, definitions, and access needs with application developers and product owners
Lessons Learned
Enterprise data platforms need the same engineering discipline as application platforms: reusable infrastructure, validation, access governance, monitoring, and clear ownership documentation.
Explore more