Shikha Sharma
@shikhasharma4
Data engineer and analyst building scalable cloud data platforms, lakehouses, and GenAI-enabled analytics.
What I'm looking for
I’m a Data Engineer/Data Analyst with 6+ years of experience building scalable batch and real-time data platforms across healthcare, fintech, and retail. I design high-performance data lakehouse and data warehouse architectures that process multi-terabyte datasets end-to-end—from ingestion to governance and analytics.
I specialize in AWS and Azure ecosystems, Apache Spark (PySpark, Spark-SQL), Databricks, Snowflake, Delta Lake, and Kafka, with strong data modeling foundations in dimensional modeling (Star/Snowflake), Data Vault, and medallion architecture. In HIPAA-governed environments, I implement RBAC, IAM policies, KMS encryption, row-level security, dynamic/column-level masking, and audit logging to ensure secure PHI handling.
I also enable Machine Learning and GenAI solutions using MLflow, SageMaker, Azure ML, and LLM integrations (AWS Bedrock, Hugging Face, LangChain). I build production-grade pipelines with CI/CD, Infrastructure as Code (Terraform, CloudFormation), and observability (CloudWatch, Grafana, OpenTelemetry, Elasticsearch) so teams can move faster without sacrificing reliability.
Experience
Work history, roles, and key accomplishments
Designed and implemented an AWS/Azure data platform processing 15+ TB/day of healthcare claims and provider data. Built PySpark/Databricks lakehouse pipelines (Delta Lake, Hudi) and Kafka streaming to reduce downstream data latency to under 30 minutes and enabled HIPAA-compliant PHI governance across multi-tenant environments.
Engineered Azure-based clinical and research data pipelines using ADF/Synapse/ADLS Gen2 and optimized PySpark workloads on Databricks for HL7/JSON/XML datasets. Migrated workloads to Synapse and Delta Lake, supporting 300M+ patient records, and implemented PHI-compliant access controls and streaming ingestion for near real-time monitoring dashboards.
Data Engineer
Amount
May 2020 - Mar 2022 (1 year 10 months)
Architected batch and streaming data pipelines with PySpark and Databricks to process 2TB+ daily lending and credit risk data. Implemented Kafka streaming and Airflow-based ELT to Snowflake, reducing pipeline latency from hours to sub-hour SLA while enforcing ACID storage with Delta Lake and applying data validation for governance.
Analyzed retail and supply chain datasets using advanced SQL across Teradata, Oracle, and SQL Server to support merchandising and inventory planning. Built automated Power BI/Tableau dashboards and ETL workflows (Hive, Sqoop, Hadoop) to consolidate POS and logistics data, enabling SKU-level trend analysis across thousands of stores.
Education
Degrees, certifications, and relevant coursework
University of the Cumberlands
Master's degree in Business Analytics, Business Analytics
Completed a master's program in business analytics at the University of the Cumberlands.
Tech stack
Software and tools used professionally
Amazon Redshift
Azure Synapse
Apache Spark
AWS Glue
GitHub
Jenkins
GitHub Actions
NumPy
Pandas
PySpark
dbt
Sqoop
PostgreSQL
MongoDB
Hadoop
Gmail
Databricks
Terraform
AWS CloudFormation
Azure DevOps
Jira
Java
JSON
XML
MLflow
scikit-learn
Kafka
Grafana
OpenTelemetry
Elasticsearch
pytest
Airflow
Root Cause
Amazon EMR
SQL
Amazon SageMaker
Hugging Face
LangChain
Delta Lake
Apache Hudi
GitHub Copilot
Bash
Depot
Dynamic
Column
Factory
Movement
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Shikha?
You can contact Shikha and 90k+ other talented remote workers on Himalayas.
Message ShikhaFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
