Open to opportunities

Suman Kafle

@sumankafle1

Message

Senior Data Engineer designing scalable data platforms and real-time pipelines across healthcare and finance.

United States

Message

What I'm looking for

I’m looking for a senior data engineering role where I can own end-to-end data platform architecture, build batch + streaming pipelines, and strengthen governance, reliability, and cost efficiency while partnering with product and analytics teams.

I’m a Senior Data Engineer with 8+ years of experience designing scalable data platforms and real-time data pipelines across healthcare and financial domains. I lead end-to-end architecture and delivery, integrating complex source systems so teams can move faster with reliable, analytics-ready data.

At Pfizer, I’ve built Azure-based platforms leveraging Databricks, Delta Lake, and Snowflake, and implemented scalable ELT workflows with PySpark, SQL, and dbt. I architected Medallion Architecture (Bronze/Silver/Gold), engineered high-performance Spark pipelines, and optimized dimensional models for star-schema analytics and self-service reporting.

I also deliver batch and streaming solutions—using Azure Data Factory, Apache Airflow, Kafka, and Azure Stream Analytics—while maintaining strong data governance, security, and reliability (RBAC, encryption, Azure Active Directory, HIPAA-compliant handling). Earlier roles at Discover, NCR, and Bank of America strengthened my breadth across AWS ingestion, orchestration, warehouse management, and legacy ETL/data warehouse systems.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

Pfizer

Mar 2023 - Present (3 years 4 months)

Owned the architecture and delivery of an Azure-based data platform using Databricks, Delta Lake, and Snowflake to enable scalable analytics for healthcare and workforce operational data. Built batch and streaming ELT pipelines with dbt and Medallion Architecture, improving data quality, lineage, and pipeline reliability while enabling near real-time ingestion and ML-ready datasets.

Databricks Delta Lake Snowflake DBT Pyspark Azure Data Factory Apache Airflow Kafka Azure Stream Analytics Terraform

Data Engineer

Discover Financial Services

Jul 2020 - Feb 2023 (2 years 7 months)

Built and maintained AWS-based data ingestion pipelines for financial transaction and account data to support enterprise analytics and downstream applications. Developed batch and near real-time pipelines using Glue/EMR and Kinesis/Kafka, managed Snowflake and Redshift warehouses with star schema modeling, and improved pipeline reliability using Airflow and monitoring/alerting.

AWS Glue Amazon EMR Pyspark Snowflake Amazon Redshift Apache Airflow AWS Kinesis Kafka

Big Data Developer

NCR Corporation

Aug 2019 - Jun 2020 (10 months)

Developed and optimized big data processing solutions using Hadoop and Spark to support financial transaction datasets with accuracy and consistency. Engineered ETL workflows with Informatica PowerCenter and IBM DataStage, supported on-prem data warehouse solutions (Teradata/Oracle Exadata), and enhanced real-time integration using Kafka while improving reporting through legacy BI tools.

Apache Hadoop Apache Spark Informatica IBM DataStage Oracle Database Teradata Exadata Apache Kafka

Software Engineer

Bank of America

Nov 2017 - Jun 2019 (1 year 7 months)

Built Python-based backend services for internal banking systems, including data processing and operational reporting workflows across teams. Optimized SQL for large transactional datasets, automated ETL tasks with Python, and supported CI/CD with Jenkins and Git while debugging production issues and improving monitoring and data validation.