Samip Subedi
@samipsubedi1
Senior Data Engineer with expertise in cloud-based data solutions.
What I'm looking for
I am a Senior Data Engineer with over 6 years of experience in designing, implementing, and managing cloud-based data architectures and ETL pipelines. My expertise lies in refactoring legacy workflows using Python and building scalable data pipelines with technologies like Apache Spark and Databricks. I have a strong background in cloud and hybrid architectures, particularly with AWS and Azure, where I have successfully migrated and integrated complex data engineering workflows.
Throughout my career, I have developed ETL/ELT solutions using AWS Glue, Azure Data Factory, and other tools, optimizing data ingestion and transformation processes. I am adept at implementing data governance and security measures, ensuring compliance with regulations like HIPAA and GDPR. My collaborative work with data scientists has led to improved clinical outcome predictions and enhanced data analytics capabilities across various domains.
Experience
Work history, roles, and key accomplishments
Senior Data Engineer
Johnson & Johnson
Jan 2023 - Present (2 years 7 months)
Designed scalable ETL pipelines using PySpark, Databricks, and Google Cloud Dataflow, processing over 5TB of healthcare data daily in a hybrid cloud environment. Automated infrastructure deployment using Terraform, CloudFormation, and GitLab CI/CD, reducing manual provisioning effort by 70%.
Data Engineer
Bofa
May 2020 - Present (5 years 3 months)
Designed scalable ETL pipelines using Azure Data Factory, Matillion, and Apache Airflow, automating ingestion from RDBMS and APIs into Azure Data Lake. Built modular data processing workflows using Databricks Notebooks, applying PySpark to transform customer and policyholder data across 10+ business domains.
Data Engineer
Charles Schwab
Jul 2018 - Present (7 years 1 month)
Built and optimized ETL pipelines using AWS Glue, Informatica PowerCenter, and Cleo Integration Cloud to integrate third-party logistics and supplier feeds into enterprise data lakes. Engineered Hadoop-based pipelines using Apache Pig, MapReduce, and Hive, improving product catalog ingestion speed by 30%.
Education
Degrees, certifications, and relevant coursework
Houston Christian University
Masters in Business Administration, Data Analytics
Focused on Data Analytics, gaining expertise in advanced analytical techniques and their application in business contexts. Developed skills in data-driven decision-making and strategic business intelligence.
Tech stack
Software and tools used professionally
Matillion
Azure Synapse
Apache Spark
AWS Glue
Apache Flink
Talend
AWS Step Functions
GitHub
GitLab
Kubernetes
Jenkins
GitLab CI
NumPy
Pandas
PySpark
dbt
Sqoop
MySQL
PostgreSQL
MongoDB
Cassandra
Hadoop
HBase
Gmail
Databricks
Terraform
Azure DevOps
Jira
Java
JSON
XML
TensorFlow
PyTorch
MLflow
scikit-learn
Kafka
Grafana
Kibana
OpenTelemetry
Azure Monitor
Google Cloud Dataflow
Elasticsearch
Avro
AWS Lambda
Airflow
s3-lambda
SQL
Hugging Face
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Samip?
You can contact Samip and 90k+ other talented remote workers on Himalayas.
Message SamipFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
