Karan Jung Karki
@karanjungkarki
Senior Data Engineer specializing in scalable ETL, cloud data platforms, and real-time analytics.
What I'm looking for
I am a Senior Data Engineer with 7+ years designing and delivering scalable, high-performance data solutions across banking, e-commerce, regulatory compliance, and operational risk. I build automated, fault-tolerant ETL workflows using Spark, Python, and SQL, optimize performance for petabyte-scale workloads, and implement secure, governed data architectures on AWS, Azure, and GCP.
I lead cross-functional teams to deliver end-to-end data products, from ingestion through analytics and BI, and have reduced processing times by up to 40% through Spark optimization and pipeline automation. I prioritize data quality, security, and compliance while enabling real-time ML and analytics via containerization, orchestration, and CI/CD pipelines.
Experience
Work history, roles, and key accomplishments
Architected and optimized petabyte-scale ETL pipelines using PySpark and Airflow, reducing batch processing time by 40% and improving financial query responsiveness for data science and reporting teams. Led cross-functional teams to deliver a HIPAA-aligned enterprise data lake and implemented Delta Lake with Snowflake for ACID-compliant real-time and batch processing.
Data Engineer II
Capital One
May 2020 - Aug 2022 (2 years 3 months)
Led migration of on-prem data warehouses to AWS data lake, reducing processing time and costs while implementing CI/CD pipelines to streamline deployments and reduce manual errors. Built scalable ETL pipelines with PySpark, EMR, Lambda, and Kafka to process millions of records daily and enabled real-time analytics via Spark Streaming.
Data Engineer
Signify Health
Jan 2019 - Apr 2020 (1 year 3 months)
Modernized data warehouse by building HDFS/Hive archival and ingestion pipelines, improving storage efficiency and read performance, and implemented validation and lineage tracking to ensure data quality for healthcare reporting. Migrated legacy environments and developed PySpark apps and Sqoop-based ingestion to support patient analytics and operational dashboards.
Education
Degrees, certifications, and relevant coursework
University of North Texas
Bachelor's in Business Computer Information Systems, Business Computer Information Systems
Completed a Bachelor's degree in Business Computer Information Systems focusing on information systems, data management, and business applications.
Tech stack
Software and tools used professionally
Azure Synapse
Apache Spark
GitHub
GitLab
Kubernetes
Jenkins
GitHub Actions
GitLab CI
NumPy
Pandas
PySpark
Sqoop
PostgreSQL
MongoDB
Cassandra
Hadoop
Gmail
Django
Spring Boot
Databricks
Terraform
AWS CloudFormation
Azure DevOps
Jira
JavaScript
Java
JSON
XML
TensorFlow
PyTorch
scikit-learn
Kafka
Azure Service Bus
FastAPI
Linux
pytest
Airflow
Apache Oozie
s3-lambda
erwin Data Modeler
SQL
LangChain
Delta Lake
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Karan Jung?
You can contact Karan Jung and 90k+ other talented remote workers on Himalayas.
Message Karan JungFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
