Open to opportunities

Shrikant Pandey

@shrikantpandey

Message

Senior Data Engineer optimizing big-data pipelines for analytics and cost.

India

Message

What I'm looking for

I’m looking for a data engineering role where I can build and optimize cloud ETL/ELT pipelines, reduce runtime and cost, and integrate LLM-powered workflows—while collaborating with teams and mentoring engineers to deliver reliable analytics at scale.

I’m a Senior Data Engineer with 7+ years of experience designing, developing, and optimizing big data pipelines, ETL/ELT workflows, and cloud-based data platforms. I build reliable data pipelines that power large-scale analytics and business intelligence, with a strong focus on performance tuning and cost optimization.

In my recent roles, I’ve led end-to-end ingestion and transformation systems across financial, healthcare, supply chain, and pharmaceutical domains. I built an LLM (Claude)-powered pipeline that extracted stock market news from 1000+ daily images, structured the output into CSV, and loaded it into MySQL. I also designed scalable Databricks and AWS Glue ETL pipelines to ingest 50M+ records daily into Snowflake and Redshift.

I’m known for measurable impact: I optimized PySpark jobs to reduce runtime by 60% and improved reliability of batch processes, while achieving 70% Snowflake cost savings through query tuning, data modeling best practices, and warehouse optimization. I combine strong data engineering fundamentals (data modeling, warehousing, and CI/CD) with hands-on cloud execution (S3, EMR, Lambda, Glue, Athena, DynamoDB) and AI/ML integration (LLM integration, generative AI, prompt engineering), and I’ve led and mentored a small engineering team to deliver critical milestones.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

Amtex System

Nov 2023 - Present (2 years 9 months)

Built an end-to-end ETL pipeline extracting stock market news from 1,000+ daily images using Claude and loading results into MySQL. Designed Databricks/AWS Glue pipelines ingesting 50M+ records daily into Snowflake and Redshift, cutting PySpark runtime by 60% and reducing Snowflake costs by 70%.

Python SQL PySpark Databricks AWS Glue Snowflake AWS RedShift ETL Performance Optimization

Data Engineer

Brillio

Aug 2022 - Oct 2023 (1 year 2 months)

Developed a healthcare analytics platform with automated integration of structured and semi-structured data using AWS Glue, S3, and RDS MySQL. Built PySpark ETL pipelines on AWS Glue to process JSON/CSV/Parquet into Bronze/Silver/Gold layers, improving data quality by 90%, and delivered curated datasets to AWS Redshift to accelerate reporting.

PySpark Python AWS Glue S3 AWS RDS Data Modeling ETL ELT AWS RedShift Data Quality CI CD

Data Engineer

MothersonSumi Infotech & Designs Ltd.

Jun 2021 - Jul 2022 (1 year 1 month)

Built a high-throughput supply chain analytics pipeline on AWS Glue processing 10M+ records daily. Improved Apache Spark performance by 20% on a 5TB dataset (12B+ records) and created reusable Python automation scripts, eliminating 95% of manual effort.

Python SQL AWS Glue Apache Spark PySpark Performance Optimization Data Pipelines Automation S3 Data Warehouse

Data Engineer

Syngene International Limited

Jun 2019 - Jun 2021 (2 years)

Designed and automated an ETL pipeline on AWS Glue to ingest gene and molecular data into AWS S3 and SQL Server for analytics. Managed ingestion of 50GB+ per day (10M+ records) to ensure scalability, data quality, and reliability.