Open to opportunities

Jay Bergen

@jaybergen

Message

Senior Data Engineer specializing in machine learning and data pipelines.

United States

Message

What I'm looking for

I am looking for a role that challenges me technically and allows for growth in machine learning and data engineering.

With over 9 years of experience in data engineering and machine learning, I excel at transforming messy data into actionable insights. My expertise lies in building robust data pipelines and real-time systems that enhance operational efficiency across various industries, including healthcare and fintech.

At CitiusTech, I designed and optimized terabyte-scale data pipelines, developed LLM-powered document parsing systems, and engineered fraud detection models that saved millions in fraudulent payouts. My technical skills span a wide range of tools and platforms, including Apache Spark, Kafka, and AWS, enabling me to deliver high-quality data solutions that drive business success.

I am passionate about mentoring junior engineers and collaborating with cross-functional teams to implement AI-powered solutions. I thrive in environments where data integrity and reliability are paramount, and I am committed to ensuring that data works effectively for all stakeholders.

Experience

Work history, roles, and key accomplishments

Senior Data Engineer (ML focus)

CitiusTech

May 2021 - Jun 2025 (4 years 1 month)

Designed, built, and optimized terabyte-scale data pipelines using Databricks, Apache Spark, and Azure Data Factory, ensuring seamless ingestion, transformation, and storage of structured and unstructured healthcare data. Developed and deployed LLM-powered document parsing systems leveraging OCR, deep learning models, and Graph Convolutional Networks (GCNs), improving data extraction accuracy by 4

Python SQL Databricks Apache Spark Apache Kafka Azure Data Factory Azure Synapse Snowflake TensorFlow PyTorch Power BI Looker MLFlow CI CD Pipelines HIPAA)HIPAA

Data Engineer (ML focus)

Sift

May 2019 - Apr 2021 (1 year 11 months)

Designed and implemented real-time fraud detection pipelines using Azure Synapse, Apache Spark, and Kafka, analyzing millions of transactions daily and reducing fraudulent activity by 30%. Developed high-performance ETL workflows in PySpark and SQL, increasing data processing efficiency by 50% for e-commerce datasets.

Azure Synapse Apache Spark Kafka SQL Server Python DBT Power BI REST APIs Airflow azure monitor GitHub Actions)Bitbucket Jenkins MLFlow Great Expectations GitHub Actions

Data Engineer

Rivery

Nov 2017 - Apr 2019 (1 year 5 months)

Developed financial ETL pipelines using Azure Data Factory, Python, and SQL, ensuring seamless aggregation of transactional data from multiple banking systems. Built and deployed fraud detection models using Isolation Forests and One-Class SVM, successfully reducing financial fraud risks.

Azure Data Factory SQL Python Snowflake Power BI Hadoop Spark Isolation Forest REST APIs azure monitor

Junior Data Engineer

Fivetran

Feb 2017 - Oct 2017 (8 months)

Assisted in building Azure-based data ingestion pipelines, supporting large-scale ML projects. Developed ETL scripts for data normalization, improving query performance.

SQL Python Azure Data Factory Power BI Git data preprocessing

Education

Degrees, certifications, and relevant coursework

National University of Singapore

Master's degree, Information Science

2015 - 2016

Completed a Master's degree in Information Science at the National University of Singapore, deepening expertise in advanced topics and research methodologies.