TARUN KALVA
@tarunkalva
Senior Data Engineer specializing in cloud-native, big data, and LLM-powered analytics solutions.
What I'm looking for
I am a Senior Data Engineer with 9+ years building scalable cloud-native and big-data platforms across AWS, Azure, and GCP, specializing in ETL/ELT, Snowflake, Databricks, and PySpark. I design high-performance data warehouses, optimize large Spark workloads, and deliver self-service BI with Tableau, Power BI, and Looker.
I build production-ready ML/AI data pipelines and LLM-powered services using LangChain, Hugging Face, OpenAI, Vertex AI, and SageMaker, and integrate vector DBs (Pinecone, Weaviate, FAISS) for RAG workflows. I also implement governance, RBAC, compliance (SOX, HIPAA, NIST), and automated infrastructure checks with Terraform and Python.
I have led large migrations from Oracle, Teradata, and on-prem systems into cloud-first platforms (Snowflake, Synapse) and modernized legacy stacks using Airflow, Glue, ADF, Dataiku, and Databricks, enabling real-time analytics, cost optimization, and reliable data products for finance and healthcare stakeholders.
Experience
Work history, roles, and key accomplishments
Senior Data Engineer
Capital One
Jan 2024 - Present (1 year 9 months)
Designed and implemented Foundry-based production data pipelines and Snowflake data warehouses for risk analytics, improving Spark job runtime by 40% and reducing Tableau refresh times by 35% while enabling LLM-powered query microservices for analysts.
Azure Data Engineer
UnitedHealthcare
Aug 2021 - Dec 2023 (2 years 4 months)
Architected and implemented large-scale Azure data platform solutions (ADF, Databricks, Synapse) for healthcare analytics, improving data accuracy by 20% and reducing Snowflake compute costs by 40% through query and clustering optimizations.
Data Engineer
Marathon Petroleum
Apr 2018 - Jul 2021 (3 years 3 months)
Led migrations of Oracle/Teradata to AWS S3 and Snowflake and built PySpark/AWS Glue ETL pipelines processing IoT and transactional data, reducing EMR costs by 20% and enabling near-real-time analytics.
Software Developer
Grantley
Sep 2015 - Dec 2017 (2 years 3 months)
Designed and automated cross-cloud ETL pipelines (AWS/GCP) using Airflow, Glue and PySpark, migrated on-prem workloads to cloud, and implemented CI/CD to reduce manual intervention by 50%.
Education
Degrees, certifications, and relevant coursework
TARUN hasn't added their education
Don't worry, there are 90k+ talented remote workers on Himalayas
Tech stack
Software and tools used professionally
Matillion
Azure Synapse
Apache Spark
AWS Glue
Talend
Bokeh
Microsoft Azure
Amazon S3
GitHub
Kubernetes
AWS CodePipeline
Jenkins
Salesforce
NumPy
Pandas
PySpark
Dataiku
dbt
DB
Sqoop
MySQL
PostgreSQL
MongoDB
SQLite
Cassandra
Hadoop
HBase
Gmail
Node.js
Yarn
Google Analytics
Databricks
Terraform
AWS CloudFormation
Jira
Java
JSON
Perl
PowerShell
XML
TensorFlow
scikit-learn
Kafka
RabbitMQ
Apache NiFi
FastAPI
OpenTelemetry
OpenTracing
Azure Active Directory
GraphQL
ws
OpenSearch
Avro
AWS Lambda
Azure SQL Database
pytest
JUnit
TestNG
OAuth2
Airflow
Time Analytics
Root Cause
erwin Data Modeler
Luigi
SQL
XGBoost
Hugging Face
LangChain
Weaviate
Foundry
Pinecone
Delta Lake
Great Expectations
Trino
GitHub Copilot
Dynatrace
Cosmos
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring TARUN?
You can contact TARUN and 90k+ other talented remote workers on Himalayas.
Message TARUNFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
