Open to opportunities

Vijay Shankar

@vijayshankar1

Message

I’m a Data Engineer building scalable batch/real-time pipelines across Azure and GCP.

India

Message

What I'm looking for

I want to build reliable batch and real-time data platforms on Azure/GCP, using CDC, lakehouse patterns, and strong governance. I’m excited about fraud/ML-ready pipelines, automated CI/CD, and performance tuning that delivers measurable latency and runtime improvements.

I’m a Big Data Engineer with 5+ years of experience across fraud detection, insurance, and financial services domains. I build and optimize batch and real-time data pipelines at scale, with hands-on expertise in streaming architectures, CDC-based ingestion, and Medallion Lakehouse design. I take ownership across the full data engineering lifecycle—from raw ingestion and transformation through performance tuning, data quality enforcement, and CI/CD deployment across GCP and Azure cloud environments.

In my current Stripe-focused Real-Time Fraud Detection work at Infosys, I engineered high-throughput enterprise CDC pipelines using Debezium into a Delta Lake Medallion architecture on GCS, reducing fraud audit query latency by 40% with Z-Order clustering. I also delivered a high-availability Lambda architecture on GCP Dataflow consuming production events from Kafka and Pub/Sub, streamlined feature delivery into BigQuery, and operationalized batch AI/ML feature ingestion via Vertex AI Pipelines. I’ve enforced enterprise governance using Google Cloud Dataplex, built CI/CD schema-contract validation to prevent breaking changes reaching production, and provisioned secure multi-environment infrastructure with Terraform, VPC Service Controls, and CMEK via Cloud KMS—while optimizing autoscaling and Spark execution to decrease overnight batch runtimes by 35%.

Experience

Work history, roles, and key accomplishments

Current

Data Engineer

Current

Infosys

Feb 2024 - Present (2 years 5 months)

Engineered high-throughput CDC ingestion using Debezium into a GCS-hosted Delta Lake Medallion architecture, reducing fraud audit query latency by 40% via Z-Order clustering. Built real-time fraud metrics with GCP Dataflow consuming Kafka/Pub-Sub events and streamlined feature delivery to BigQuery for downstream model training, while automating ingestion for an LLM-assisted Fraud Operations Engine

Apache Spark Pyspark Debezium Delta Lake Google Cloud Dataflow Apache Kafka BigQuery Terraform Airflow Dataplex

Data Engineer

Wipro

Dec 2020 - Jan 2024 (3 years 1 month)

Built Azure Databricks PySpark ingestion pipelines for an insurance claims platform, reducing CRM/policy replication latency from 4 hours to under 30 minutes using Fivetran and achieving sub-5-minute end-to-end latency with Event Hub streaming. Delivered regulatory-ready historical tracking with Delta SCD Type 2 (7 years), optimized Spark jobs to cut average runtimes by 30%, and implemented Hadoop