Open to opportunities

matheus aragão

@matheusaragaofs

Message

I’m a Senior Data Engineer building scalable Lakehouse data products with Databricks, Azure, and AWS.

Brazil

Message

What I'm looking for

I’m looking for a role where I can design governed Lakehouse architectures, optimize Spark pipelines, and deliver near-real-time analytics with strong data quality, lineage, and compliance—especially in regulated, data-sensitive environments.

I’m a Data Engineer with 5+ years of experience building scalable data pipelines and Lakehouse architectures across Databricks, Azure, and AWS. I use PySpark, Python, and SQL to deliver large-scale data transformation and reliable data products.

In my current role, I designed Lakehouse architectures using Medallion-style processing and exposed curated Gold Delta tables through Microsoft Fabric’s SQL Analytics Endpoint and OneLake, reducing data delivery latency by 35% for real-time municipal updates. I also implemented Delta Lake and Unity Catalog for governance and regulatory compliance, applying RLS, column-level permissions, and data anonymization/masking with auditability via audit logging and time travel.

I optimize performance through Databricks cluster tuning and Spark job improvements (including Z-Order clustering and Parquet-to-Delta migration), achieving a 40% reduction in processing time and faster downstream Power BI responses via DirectQuery. I’ve also built RAG pipelines on Delta Lake for natural-language querying and delivered analytics dashboards with star-schema semantic models and DAX time intelligence.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

4 Smart Cloud

Oct 2023 - Present (2 years 8 months)

Designed and implemented scalable Lakehouse architectures on Databricks and Azure, enforcing medallion-layer data quality and publishing curated Gold Delta tables via Fabric/OneLake, reducing delivery latency by 35% for real-time municipal updates. Implemented Unity Catalog governance (RLS, permissions, masking) and optimized Spark performance (Z-Order, shuffle tuning, Parquet→Delta), cutting proc

Databricks Delta Lake Microsoft Fabric Pyspark Apache Spark Unity Catalog Power BI Z Order Clustering

Data Engineer

Tracking Trade

Mar 2021 - Oct 2023 (2 years 7 months)

Built AWS ETL/ELT pipelines using Glue and PySpark to integrate retail data into a partitioned S3 data lake, processing millions of records daily. Designed data models (PostgreSQL plus star/snowflake analytics structures), implemented IAM and S3 access controls with masking, and developed Python/Node.js APIs with validation and rate limiting, reducing operational costs by 15%.

ETL PostgreSQL AWS IAM Parquet Cloudwatch AWS Glue Pyspark Amazon S3