HimalayasHimalayas logo
TC
Open to opportunities

Tony Chen

@tonychen

Senior software engineer building large-scale distributed systems and intelligent data platforms.

United States
Message

What I'm looking for

I’m looking to build reliable, cloud-native AI and data platforms—owning backend services, RAG/LLM pipelines, and operational analytics—while collaborating closely with teams that value observability, performance, and real engineering impact.

I’m a Senior Software Engineer with 8+ years of experience designing and building large-scale distributed systems and intelligent data platforms. I specialize in backend engineering, full-stack development, and AI-driven systems across cloud infrastructure environments.

I’ve developed machine learning pipelines, Retrieval-Augmented Generation (RAG) systems, and operational analytics platforms that process large volumes of telemetry and service data. At Amazon Web Services, I helped build an internal LLM-powered enterprise knowledge assistant by designing RAG pipelines with Amazon Bedrock and LangChain, creating vector retrieval pipelines for semantic search, and delivering AI-generated responses via Python, FastAPI, and PostgreSQL.

I also build systems with reliability engineering and cloud-native architecture at the core, using AWS, Kubernetes, and modern AI frameworks. From incident detection dashboards to storage reliability monitoring, I’ve focused on operational analytics, observability, and performance optimization—so engineering teams can investigate issues faster and make better decisions.

Experience

Work history, roles, and key accomplishments

AS
Current

Senior Software Engineer

Jan 2023 - Present (3 years 3 months)

Contributed to an internal LLM-powered enterprise knowledge assistant by designing and implementing RAG pipelines with Amazon Bedrock and LangChain. Built embedding-based vector retrieval and backend services using Python, FastAPI, and PostgreSQL to orchestrate multi-turn knowledge queries and deliver AI-generated responses.

AS

Software Development Engineer II

Jan 2019 - Dec 2022 (3 years 11 months)

Developed a full-stack incident detection and response platform to monitor service telemetry, investigate operational incidents, and track distributed system activity. Designed RESTful backend microservices and React.js/TypeScript dashboards to visualize health metrics, alerts, and operational events in real time.

AS

Software Development Engineer I

Aug 2017 - Dec 2018 (1 year 4 months)

Worked on internal storage reliability monitoring tools, including probabilistic durability modeling using Markov chains and coordination with erasure coding techniques. Built Python/SQL analysis utilities and internal services for storage lifecycle metadata, inconsistency detection, safe garbage collection, and reliability simulations.

Education

Degrees, certifications, and relevant coursework

University of California, Berkeley logoUB

University of California, Berkeley

Bachelor of Science, Electrical Engineering and Computer Science

2013 - 2017

Earned a Bachelor of Science in Electrical Engineering and Computer Science at UC Berkeley (2013–2017).

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan