Open to opportunities

trushna patel

@trushnapatel

Message

Data engineer and Python developer specializing in scalable ETL solutions.

United States

Message

What I'm looking for

I seek a hands-on data engineering role focused on scalable ETL, cloud-native architectures (AWS), data quality and automation, collaborative Agile teams, and opportunities to optimize performance and cost while enabling analytics and ML workflows.

I am a data engineer and Python developer with 7+ years building and maintaining ETL pipelines, data warehouses, and production data platforms. I hold an M.S. in Information System Technology and strong foundations in big data analytics and cloud computing.

My work emphasizes performance and cost optimization: I architected PySpark pipelines and Delta Lake solutions that improved processing times and reduced compute costs while enforcing data quality and PHI masking for regulated healthcare datasets.

I have built serverless and orchestration solutions on AWS (Lambda, Step Functions, S3, EMR, Glue) and automated infrastructure and data workflows with Boto3, CI/CD practices, and unit testing. I also engineered an AI-powered meeting-summary automation using Google Gemini and Python to reduce manual documentation time by 70%.

I bring experience across real-time streaming (Kafka, Kinesis), data migration, BI/dashboarding, and data engineering tooling, and I seek roles where I can drive scalable, secure, and cost-efficient data solutions that enable analytics and product teams.

Experience

Work history, roles, and key accomplishments

Current

Sr Technical Data Analyst

Current

hmetrix

May 2025 - Present (1 year 3 months)

Architected and optimized high-volume PySpark ETL/ELT pipelines for healthcare claims and patient records, reducing processing time by 40% and cluster costs by 25% while achieving 99.9% data quality for regulatory reporting.

PySpark Delta Lake Data Engineering ETL Data Quality PHI Masking Python AWS

AWS Data Engineer / Python Developer

Oracle

Apr 2021 - Apr 2024 (3 years)

Developed and enhanced Python and Spark ETL pipelines and AWS serverless workflows, reducing processing time by 35% and optimizing Spark workloads to cut query times from 1.5 hours to 37 minutes.

Python Apache Spark AWS Lambda Step Functions S3 DynamoDB ETL

AWS Data Engineer

Oracle

Apr 2021 - Apr 2024 (3 years)

Developed and enhanced Python and Spark ETL pipelines reducing processing time by 35% and optimized Spark workloads to cut average query times from 1.5 hours to 37 minutes; built Lambda and Step Functions orchestrations for SQS-driven workflows and automated resource management with Boto3.

Amazon SQS Python Apache Spark AWS Lambda Step Functions S3 DynamoDB ETL

Python / AWS Data Engineer

TCS

Oct 2019 - Apr 2021 (1 year 6 months)

Led development of Spark-based ETL pipelines into Redshift and implemented real-time streaming with Kafka and Spark Streaming to process up to 20M records, writing Parquet outputs to S3 and scheduling EMR jobs for daily processing.

Apache Spark Kafka Redshift S3 Python Data Ingestion

Data Engineer

Amar Technology

Jun 2014 - Oct 2016 (2 years 4 months)

Assisted in maintaining and optimizing ETL pipelines with Spark and Python, developed Tableau dashboards, and deployed ETL jobs on AWS Glue to improve transformation performance.

Apache Spark Python AWS Glue Tableau ETL PySpark Data Processing Parquet