Derek Dai
@derekdai
Senior Data Analytics Engineer building scalable ETL/ELT and real-time analytics on AWS.
What I'm looking for
I’m a Senior Data Analytics Engineer with 10+ years of experience building scalable ETL/ELT pipelines, real-time analytics platforms, and cloud-based data solutions. I specialize in Python, SQL, PySpark, and streaming architectures, with a strong focus on data modeling, performance optimization, and data governance.
At Whatnot, I designed and scaled end-to-end analytics pipelines for the live-commerce marketplace using Apache Kafka, dbt, Snowflake, and AWS. I built high-performance ELT workflows and dimensional data models that reduced dashboard latency by 60% and improved data reliability for Product, Growth, and Executive teams. I also developed real-time streaming analytics architecture with Kafka Streams, Spark Structured Streaming, and AWS S3 to reduce incident detection time by 45% while improving operational monitoring.
At Amazon, I delivered scalable ETL/ELT pipelines using Python, PySpark, AWS Glue, Apache Airflow, and Amazon S3—enabling near real-time analytics and reducing manual reporting efforts by 70%. I optimized large-scale SQL workloads in Amazon Redshift, Athena, and Hive to improve dashboard response times by 45%. I also engineered streaming analytics pipelines with Amazon Kinesis, Kafka, Spark Streaming, and AWS Lambda to improve delivery accuracy and reduce incident response time by 40%, while strengthening reliability through automated validation and Git-based CI/CD workflows.
Earlier roles strengthened my breadth: I automated market-data pipelines and reporting workflows, and built centralized analytics marts and dashboards that improved operational decision-making. Across my career, I’ve created trusted semantic layers and self-service analytics by standardizing KPIs and taxonomies, implementing robust data quality frameworks, and optimizing compute costs (including 30% reductions in Snowflake). I bring an engineering mindset to analytics—so teams can move faster with governed, trustworthy data that powers product and ML outcomes.
Experience
Work history, roles, and key accomplishments
Designed and scaled end-to-end analytics pipelines for Whatnot’s live-commerce marketplace using Kafka, dbt, Snowflake, and AWS to deliver near real-time engagement and GMV visibility. Reduced dashboard latency by 60% and improved incident detection time by 45% while strengthening data reliability and governance for product and executive teams.
Built scalable ETL/ELT pipelines with Python, PySpark, AWS Glue, Airflow, and S3 to enable near real-time analytics and reduce manual reporting by 70% for operations and business teams. Improved dashboard response times by 45% and reduced incident response time by 40% through Kinesis/Lambda streaming analytics and reliability optimizations across AWS services.
Data Engineer
Clutter
Feb 2020 - Aug 2022 (2 years 6 months)
Developed scalable ETL pipelines with Python, PySpark, Airflow, S3, Glue, and Redshift to process logistics, warehouse, and customer data while improving reliability and reducing manual reporting effort. Implemented near real-time streaming with Kafka/Kinesis and automated CI/CD data-quality workflows (Terraform, Docker, Kubernetes, Jenkins) to reduce pipeline failures and improve stability during
Data Analyst
Wag Labs
Jun 2018 - Jan 2020 (1 year 7 months)
Built marketplace dashboards in SQL, Python, Redshift, S3, and Tableau to track bookings, cancellations, utilization, and regional demand. Analyzed customer and product behavior with Mixpanel, Firebase, Amplitude, Looker, and Optimizely to improve onboarding funnels, retention, and A/B test outcomes.
Data Acquisition Services Analyst
Nisa Investment Advisors, LLC
Apr 2017 - May 2018 (1 year 1 month)
Automated market-data pipelines with SQL, Python, SSIS, Bloomberg API, and Refinitiv to improve pricing accuracy and reduce overnight processing failures by 35%. Built reporting and reconciliation workflows using Tableau, Excel VBA, SSRS, Control-M, and SQL Server to reduce manual work and improve SLA performance.
Consultant
Revenue Solutions
Jun 2014 - Mar 2017 (2 years 9 months)
Built tax compliance dashboards with Oracle, SQL Server, Tableau, Informatica, and Cognos to automate reporting, improve audit visibility, and reduce manual effort for revenue agencies. Supported legacy tax system migration by developing ETL pipelines and validating financial data using Python, PL/SQL, SSIS, Hadoop, AWS EC2/S3, and JIRA.
Education
Degrees, certifications, and relevant coursework
Washington University in St. Louis
Bachelor of Computer Science, Computer Science
Earned a Bachelor of Computer Science degree at Washington University in St. Louis.
Tech stack
Software and tools used professionally
Amazon Redshift
Apache Spark
AWS Glue
Amazon Quicksight
Amazon S3
GitHub
Kubernetes
Jenkins
GitHub Actions
NumPy
Pandas
PySpark
dbt
PostgreSQL
Hadoop
Gmail
Google Analytics
Mixpanel
Terraform
Jira
Kafka
Datadog
Amazon Kinesis
Firebase
AWS Lambda
Kafka Streams
Airflow
Time Analytics
Optimizely
Amazon Athena
SQL
Great Expectations
Increase
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Derek?
You can contact Derek and 90k+ other talented remote workers on Himalayas.
Message DerekFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
