About This Role
Saaf Finance is building the AI workforce for the mortgage industry. We are an AI startup integrated with a top-10 mortgage lender, American Heritage Lending (AHL). Together we are combining AHL’s 15+ years of mortgage origination expertise with the power of AI-native innovation to redefine what’s possible in mortgage lending.
As a Senior DevOps Engineer, you will own the infrastructure, deployment pipelines, and reliability practices that keep our platform running and our engineering team shipping fast. We are an AI-native engineering team: AI-assisted tools are a regular part of how we build, deploy, and operate infrastructure. We expect engineers to use these tools thoughtfully and effectively as part of their daily workflow, from writing Terraform modules to debugging production incidents and building agentic workflow infrastructure.
Key Responsibilities
Infrastructure & Cloud Operations
Design, build, and maintain production-grade AWS infrastructure using Infrastructure-as-Code (Terraform preferred).
Architect and manage serverless and containerized environments that balance cost, performance, and reliability.
Implement and maintain networking, security groups, IAM policies, and cloud resource configurations following least-privilege principles.
CI/CD & Deployment
Own and evolve the CI/CD pipeline ecosystem, primarily using GitHub Actions, to enable fast, safe, and repeatable deployments.
Implement deployment strategies (blue-green, canary, rolling) that minimize risk and downtime.
Automate build, test, and release workflows across multiple services and environments.
AI-Integrated DevOps
Leverage AI-assisted tools (code generation, intelligent autocomplete, automated IaC authoring) as a regular part of your infrastructure workflow to accelerate delivery and reduce configuration errors.
Use AI tools to support incident diagnosis, log analysis, runbook generation, and documentation of infrastructure decisions.
Evaluate and integrate emerging AI tools and practices into the team's DevOps processes.
Build and support the infrastructure layer for agentic workflows, including compute orchestration, autoscaling, and cost-efficient execution of AI-powered automation.
Monitoring, Observability & Incident Management
Design and maintain monitoring, logging, and alerting systems that provide clear visibility into platform health and performance.
Implement distributed tracing and structured logging across services and multi-step workflows.
Lead incident response, conduct post-mortems, and drive reliability improvements based on findings.
Security & Compliance
Apply cloud security best practices across all infrastructure, including secrets management, encryption, network segmentation, and access controls.
Design secure secrets and configuration management for agentic processes, including API keys, model tokens, and external service credentials.
Ensure infrastructure meets financial regulatory and compliance requirements with full auditability.
Data Infrastructure Support
Support and maintain infrastructure for data engineering workflows, including Snowflake environments, ETL/ELT pipelines, and dbt execution.
Manage serverless event-driven pipelines and orchestration tools (Step Functions, Temporal, or similar).
Team & Process
Collaborate with product engineers, data engineers, and founders to ensure infrastructure supports rapid iteration and reliable delivery.
Document infrastructure decisions, runbooks, and operational procedures to support team knowledge sharing and onboarding.
Regularly review and improve operational workflows, automation coverage, and infrastructure cost efficiency.
Qualifications
Required
4+ years of experience in DevOps, SRE, or similar infrastructure-focused roles.
Proficient in AWS with strong Infrastructure-as-Code experience (Terraform preferred).
Strong CI/CD expertise with GitHub Actions.
Experience with containerization and serverless architectures.
Skilled in monitoring, logging, and incident management.
Strong scripting and automation skills in Bash, Python, or Node.js.
Knowledge of cloud security principles, least privilege, and compliance requirements.
Experience with Snowflake and data engineering workflows (ETL, dbt).
Exposure to Kubernetes and orchestration tools.
Understanding of serverless event-driven pipelines (Step Functions, Temporal).
Demonstrated, regular use of AI-powered development tools (e.g., Cursor, GitHub Copilot, Claude Code, or similar) to accelerate infrastructure authoring, debugging, or documentation workflows.
Startup mindset: hands-on, resourceful, and comfortable operating in a fast-paced environment.
Preferred
Experience with event-driven workflow orchestration tools such as Step Functions, Temporal, Airflow, or Prefect.
Familiarity with agentic workflow patterns, including integrating API-based decision points, asynchronous task handling, and dynamic routing of requests.
Understanding of infrastructure requirements for AI-powered automation, including latency optimization, autoscaling strategies, and cost-efficient compute for high-throughput processes.
Ability to design secure secrets and configuration management systems for agentic processes, including API keys, model tokens, and external service credentials.
Experience implementing observability for multi-step workflows, including distributed tracing, structured logging, and audit-friendly data pipelines.
Experience with prompt engineering for IaC generation, incident analysis, or building AI-powered operational tooling.
Prior early-stage startup experience is highly preferred
Benefits
Competitive salary
High ownership from day one — your work will directly shape core systems and products
Fast-paced environment with quick decision cycles and minimal bureaucracy
Remote-first team with flexibility on work hours and location
Direct access to founders and cross-functional teams — no layers, no silos
Clear expectations, regular feedback, and support for professional growth
Work on real problems in a complex, high-impact industry
