Tony Chen
@tonychen
Senior software engineer building large-scale distributed systems and intelligent data platforms.
What I'm looking for
I’m a Senior Software Engineer with 8+ years of experience designing and building large-scale distributed systems and intelligent data platforms. I specialize in backend engineering, full-stack development, and AI-driven systems across cloud infrastructure environments.
I’ve developed machine learning pipelines, Retrieval-Augmented Generation (RAG) systems, and operational analytics platforms that process large volumes of telemetry and service data. At Amazon Web Services, I helped build an internal LLM-powered enterprise knowledge assistant by designing RAG pipelines with Amazon Bedrock and LangChain, creating vector retrieval pipelines for semantic search, and delivering AI-generated responses via Python, FastAPI, and PostgreSQL.
I also build systems with reliability engineering and cloud-native architecture at the core, using AWS, Kubernetes, and modern AI frameworks. From incident detection dashboards to storage reliability monitoring, I’ve focused on operational analytics, observability, and performance optimization—so engineering teams can investigate issues faster and make better decisions.
Experience
Work history, roles, and key accomplishments
Contributed to an internal LLM-powered enterprise knowledge assistant by designing and implementing RAG pipelines with Amazon Bedrock and LangChain. Built embedding-based vector retrieval and backend services using Python, FastAPI, and PostgreSQL to orchestrate multi-turn knowledge queries and deliver AI-generated responses.
Developed a full-stack incident detection and response platform to monitor service telemetry, investigate operational incidents, and track distributed system activity. Designed RESTful backend microservices and React.js/TypeScript dashboards to visualize health metrics, alerts, and operational events in real time.
Worked on internal storage reliability monitoring tools, including probabilistic durability modeling using Markov chains and coordination with erasure coding techniques. Built Python/SQL analysis utilities and internal services for storage lifecycle metadata, inconsistency detection, safe garbage collection, and reliability simulations.
Integrated open-source ML libraries (Apache Spark, scikit-learn) with SAP HANA Predictive Analysis Library and trained/tested predictive models for end-to-end use cases. Applied open-source ML workflows alongside SAP HANA capabilities to support predictive analysis scenarios.
Software Engineering Intern
Databiology
Jun 2015 - Aug 2015 (2 months)
Designed testing plans, protocols, and documentation for prototype applications for an enterprise platform supporting streaming, genomics analysis, and workflow scheduling. Integrated IBM Platform Computing with Databiology for Enterprise using SOAP APIs with Python and Bash automation scripts.
Education
Degrees, certifications, and relevant coursework
University of California, Berkeley
Bachelor of Science, Electrical Engineering and Computer Science
2013 - 2017
Earned a Bachelor of Science in Electrical Engineering and Computer Science at UC Berkeley (2013–2017).
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Tony?
You can contact Tony and 90k+ other talented remote workers on Himalayas.
Message TonyFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
