Suman Kafle
@sumankafle1
Senior Data Engineer designing scalable data platforms and real-time pipelines across healthcare and finance.
What I'm looking for
I’m a Senior Data Engineer with 8+ years of experience designing scalable data platforms and real-time data pipelines across healthcare and financial domains. I lead end-to-end architecture and delivery, integrating complex source systems so teams can move faster with reliable, analytics-ready data.
At Pfizer, I’ve built Azure-based platforms leveraging Databricks, Delta Lake, and Snowflake, and implemented scalable ELT workflows with PySpark, SQL, and dbt. I architected Medallion Architecture (Bronze/Silver/Gold), engineered high-performance Spark pipelines, and optimized dimensional models for star-schema analytics and self-service reporting.
I also deliver batch and streaming solutions—using Azure Data Factory, Apache Airflow, Kafka, and Azure Stream Analytics—while maintaining strong data governance, security, and reliability (RBAC, encryption, Azure Active Directory, HIPAA-compliant handling). Earlier roles at Discover, NCR, and Bank of America strengthened my breadth across AWS ingestion, orchestration, warehouse management, and legacy ETL/data warehouse systems.
Experience
Work history, roles, and key accomplishments
Owned the architecture and delivery of an Azure-based data platform using Databricks, Delta Lake, and Snowflake to enable scalable analytics for healthcare and workforce operational data. Built batch and streaming ELT pipelines with dbt and Medallion Architecture, improving data quality, lineage, and pipeline reliability while enabling near real-time ingestion and ML-ready datasets.
Data Engineer
Discover Financial Services
Jul 2020 - Feb 2023 (2 years 7 months)
Built and maintained AWS-based data ingestion pipelines for financial transaction and account data to support enterprise analytics and downstream applications. Developed batch and near real-time pipelines using Glue/EMR and Kinesis/Kafka, managed Snowflake and Redshift warehouses with star schema modeling, and improved pipeline reliability using Airflow and monitoring/alerting.
Developed and optimized big data processing solutions using Hadoop and Spark to support financial transaction datasets with accuracy and consistency. Engineered ETL workflows with Informatica PowerCenter and IBM DataStage, supported on-prem data warehouse solutions (Teradata/Oracle Exadata), and enhanced real-time integration using Kafka while improving reporting through legacy BI tools.
Built Python-based backend services for internal banking systems, including data processing and operational reporting workflows across teams. Optimized SQL for large transactional datasets, automated ETL tasks with Python, and supported CI/CD with Jenkins and Git while debugging production issues and improving monitoring and data validation.
Education
Degrees, certifications, and relevant coursework
University of New Mexico
Bachelor of Computer Science, Computer Science
Bachelor of Computer Science with a minor in Mathematics from the University of New Mexico.
Tech stack
Software and tools used professionally
Amazon Redshift
Azure Synapse
Apache Spark
AWS Glue
Apache Flink
AWS IAM
GitHub
Kubernetes
Jenkins
GitHub Actions
PySpark
dbt
PostgreSQL
Microsoft SQL Server
Hadoop
Gmail
Databricks
Terraform
Java
AWS CloudTrail
Kafka
Azure Monitor
Linux
Azure Active Directory
AppDynamics
Airflow
SQL
AWS KMS
Delta Lake
Dynatrace
Factory
MicroStrategy
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Suman?
You can contact Suman and 90k+ other talented remote workers on Himalayas.
Message SumanFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
