Jay Bergen
@jaybergen
Senior Data Engineer specializing in machine learning and data pipelines.
What I'm looking for
With over 9 years of experience in data engineering and machine learning, I excel at transforming messy data into actionable insights. My expertise lies in building robust data pipelines and real-time systems that enhance operational efficiency across various industries, including healthcare and fintech.
At CitiusTech, I designed and optimized terabyte-scale data pipelines, developed LLM-powered document parsing systems, and engineered fraud detection models that saved millions in fraudulent payouts. My technical skills span a wide range of tools and platforms, including Apache Spark, Kafka, and AWS, enabling me to deliver high-quality data solutions that drive business success.
I am passionate about mentoring junior engineers and collaborating with cross-functional teams to implement AI-powered solutions. I thrive in environments where data integrity and reliability are paramount, and I am committed to ensuring that data works effectively for all stakeholders.
Experience
Work history, roles, and key accomplishments
Senior Data Engineer (ML focus)
CitiusTech
May 2021 - Jun 2025 (4 years 1 month)
Designed, built, and optimized terabyte-scale data pipelines using Databricks, Apache Spark, and Azure Data Factory, ensuring seamless ingestion, transformation, and storage of structured and unstructured healthcare data. Developed and deployed LLM-powered document parsing systems leveraging OCR, deep learning models, and Graph Convolutional Networks (GCNs), improving data extraction accuracy by 4
Data Engineer (ML focus)
Sift
May 2019 - Apr 2021 (1 year 11 months)
Designed and implemented real-time fraud detection pipelines using Azure Synapse, Apache Spark, and Kafka, analyzing millions of transactions daily and reducing fraudulent activity by 30%. Developed high-performance ETL workflows in PySpark and SQL, increasing data processing efficiency by 50% for e-commerce datasets.
Data Engineer
Rivery
Nov 2017 - Apr 2019 (1 year 5 months)
Developed financial ETL pipelines using Azure Data Factory, Python, and SQL, ensuring seamless aggregation of transactional data from multiple banking systems. Built and deployed fraud detection models using Isolation Forests and One-Class SVM, successfully reducing financial fraud risks.
Junior Data Engineer
Fivetran
Feb 2017 - Oct 2017 (8 months)
Assisted in building Azure-based data ingestion pipelines, supporting large-scale ML projects. Developed ETL scripts for data normalization, improving query performance.
Education
Degrees, certifications, and relevant coursework
National University of Singapore
Master's degree, Information Science
2015 - 2016
Completed a Master's degree in Information Science at the National University of Singapore, deepening expertise in advanced topics and research methodologies.
National University of Singapore
Bachelor of Science, Information Science
2011 - 2015
Studied Information Science at the National University of Singapore, focusing on foundational concepts and applications within the field.
Tech stack
Software and tools used professionally
Fivetran
Azure Synapse
Apache Spark
AWS Glue
Apache Flink
AtScale
GitHub
Bitbucket
Azure Repos
Kubernetes
Jenkins
GitHub Actions
Salesforce
PySpark
Debezium
dbt
DB
MySQL
PostgreSQL
MongoDB
Cassandra
Hadoop
Gmail
Databricks
Zendesk
Redis
Terraform
Java
TensorFlow
PyTorch
MLflow
scikit-learn
HubSpot
Kafka
Apache NiFi
Grafana
Prometheus
OpenTelemetry
Azure Monitor
Windows
Datadog
ClickUp
GraphQL
Google Cloud Pub/Sub
pytest
OAuth2
Airflow
Apache Beam
SQL
Dagster
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Jay?
You can contact Jay and 90k+ other talented remote workers on Himalayas.
Message JayFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
