Open to opportunities

Shikha Sharma

@shikhasharma4

Message

Data engineer and analyst building scalable cloud data platforms, lakehouses, and GenAI-enabled analytics.

United States

Message

What I'm looking for

I’m looking for a team where I can build secure, scalable cloud data lakehouses and real-time pipelines, strengthen data governance (HIPAA/RBAC), and apply GenAI/ML to deliver measurable analytics with strong engineering practices.

I’m a Data Engineer/Data Analyst with 6+ years of experience building scalable batch and real-time data platforms across healthcare, fintech, and retail. I design high-performance data lakehouse and data warehouse architectures that process multi-terabyte datasets end-to-end—from ingestion to governance and analytics.

I specialize in AWS and Azure ecosystems, Apache Spark (PySpark, Spark-SQL), Databricks, Snowflake, Delta Lake, and Kafka, with strong data modeling foundations in dimensional modeling (Star/Snowflake), Data Vault, and medallion architecture. In HIPAA-governed environments, I implement RBAC, IAM policies, KMS encryption, row-level security, dynamic/column-level masking, and audit logging to ensure secure PHI handling.

I also enable Machine Learning and GenAI solutions using MLflow, SageMaker, Azure ML, and LLM integrations (AWS Bedrock, Hugging Face, LangChain). I build production-grade pipelines with CI/CD, Infrastructure as Code (Terraform, CloudFormation), and observability (CloudWatch, Grafana, OpenTelemetry, Elasticsearch) so teams can move faster without sacrificing reliability.

Experience

Work history, roles, and key accomplishments

Current

Data Engineer / Data Analyst

Current

UnitedHealth Group

Oct 2024 - Present (1 year 9 months)

Designed and implemented an AWS/Azure data platform processing 15+ TB/day of healthcare claims and provider data. Built PySpark/Databricks lakehouse pipelines (Delta Lake, Hudi) and Kafka streaming to reduce downstream data latency to under 30 minutes and enabled HIPAA-compliant PHI governance across multi-tenant environments.

Kafka AWS Azure PySpark Databricks Delta Lake Snowflake Terraform Data Governance

Data Engineer

Mayo Clinic

Apr 2022 - Sep 2024 (2 years 5 months)

Engineered Azure-based clinical and research data pipelines using ADF/Synapse/ADLS Gen2 and optimized PySpark workloads on Databricks for HL7/JSON/XML datasets. Migrated workloads to Synapse and Delta Lake, supporting 300M+ patient records, and implemented PHI-compliant access controls and streaming ingestion for near real-time monitoring dashboards.

Azure Data Factory Azure Synapse Azure ADLS Gen2 PySpark Databricks Delta Lake Azure Event Hubs DBT

Data Engineer

Amount

May 2020 - Mar 2022 (1 year 10 months)

Architected batch and streaming data pipelines with PySpark and Databricks to process 2TB+ daily lending and credit risk data. Implemented Kafka streaming and Airflow-based ELT to Snowflake, reducing pipeline latency from hours to sub-hour SLA while enforcing ACID storage with Delta Lake and applying data validation for governance.

Apache Spark PySpark Databricks ELT To Snowflake Delta Lake Terraform Kafka Airflow

Data Analyst

Home Depot

Feb 2019 - Apr 2020 (1 year 2 months)

Analyzed retail and supply chain datasets using advanced SQL across Teradata, Oracle, and SQL Server to support merchandising and inventory planning. Built automated Power BI/Tableau dashboards and ETL workflows (Hive, Sqoop, Hadoop) to consolidate POS and logistics data, enabling SKU-level trend analysis across thousands of stores.

Power BI Tableau Hive Sqoop Hadoop Teradata Oracle