Arize Phoenix — Open Source LLM Tracing, Evaluation, and AI App Observability
Arize Phoenix is an open source platform for tracing and evaluating LLM applications and agents. Built by Arize AI, Phoenix provides span-level visibility, structured evaluations, and experiment workflows to help teams instrument, debug, and improve AI systems. Run it self-hosted or in the cloud via Phoenix Cloud’s free tier.
Website: [phoenix.arize.com](https://phoenix.arize.com)Docs: [Arize Phoenix documentation](https://arize.com/docs/phoenix)GitHub: [Arize-ai/phoenix](https://github.com/Arize-ai/phoenix) (7.3k+ stars; active releases and issues)Tracing overview: [LLM tracing and observability](https://phoenix.arize.com/llm-tracing-and-observability-with-arize-phoenix)What Phoenix Does
**LLM/Agent Tracing:** End-to-end span traces with inputs/outputs, latency, token usage, errors, and metadata. Built on open standards for easy instrumentation. See [LLM traces](https://arize.com/docs/phoenix/tracing/llm-traces).**Evaluations & Testing:** Score outputs with model judges and ground-truth checks, run test sets, compare prompts/models, and track experiments. Includes **RAG relevance** checks and dataset management. Start with [evaluations](https://arize.com/docs/phoenix/get-started/get-started-evaluations).**Observability for AI Apps:** Metrics, visualizations, and workflows that shorten the debug/iterate loop for agents, RAG pipelines, and production monitoring.Why It Stands Out
**Open source, OSS-first features** with optional managed cloud. Explore the [GitHub project](https://github.com/Arize-ai/phoenix) and [releases](https://github.com/Arize-ai/phoenix/releases).**Open standards:** Native support for [OpenTelemetry](https://github.com/Arize-ai/openinference) and [OpenInference](https://github.com/Arize-ai/openinference) to fit modern stacks.**Fast instrumentation:** Minimal code changes to trace popular frameworks; see the [tracing explainer](https://phoenix.arize.com/llm-tracing-and-observability-with-arize-phoenix).Key Features
Tracing and spans for agents/tools, inputs/outputs, latency, tokens, errors, and metadataEvaluations: LLM judge, relevance/correctness checks, production evals, and **A/B comparisons** across prompts and modelsExperiment tracking and **prompt workflow** for testing provider changes; see [provider setup](https://arize.com/docs/phoenix/prompt-engineering/how-to-prompts/configure-ai-providers)RAG diagnostics: retrieval and answer relevance, dataset curation, and regression testsDashboards and visualizations for debugging and performance tuningTutorials and cookbooks for hands-on workflows: [Phoenix cookbook](https://arize.com/docs/phoenix/cookbook)User guide: [Phoenix user guide](https://arize.com/docs/phoenix/user-guide)Deployment, Cloud, and Licensing
Deployment options:**Phoenix Cloud:** Free instances with 10 GB storage; see [environments](https://arize.com/docs/phoenix/environments).**Self-hosted:** Docker/Kubernetes deploy; see [self-hosting](https://arize.com/docs/phoenix/self-hosting).License: **Elastic License 2.0**. Self-hosting is free and permitted. Details in the [license docs](https://arize.com/docs/phoenix/self-hosting/license).Integrations
LLM providers: [OpenAI, Azure OpenAI, Anthropic, Google AI Studio](https://arize.com/docs/phoenix/prompt-engineering/how-to-prompts/configure-ai-providers)Frameworks:Python: [LangChain](https://arize.com/docs/phoenix/integrations/python/langchain), [LlamaIndex](https://arize.com/docs/phoenix/integrations/python/llamaindex)Java: [LangChain4j tracing](https://arize.com/docs/phoenix/integrations/java/langchain4j/langchain4j-tracing)JS/TS: LangChain.js (via OpenTelemetry/OpenInference)Meta frameworks/routers: [LiteLLM](https://arize.com/docs/phoenix/integrations/llm-providers/litellm/litellm-tracing)Google Gen AI SDK: [Tracing integration](https://arize.com/docs/phoenix/integrations/llm-providers/google-gen-ai/google-genai-tracing)Open standards: [OpenTelemetry and OpenInference](https://github.com/Arize-ai/openinference)Who It’s For
**AI engineers and agent developers** needing span-level traces and quick feedback loops**ML engineers building RAG systems** requiring relevance checks and dataset workflowsTeams standardizing on **OpenTelemetry/OpenInference**Organizations that prefer **open source** with a self-hosting optionCommon Use Cases
Debug agent tools/toolchains with span traces and error insightsTrack prompt/model changes with experiments and test setsEvaluate RAG pipelines for retrieval and answer relevanceMonitor latency and token usage across dev and production contextsCollect human feedback and compare model-graded scores on datasetsUser Sentiment Snapshot
Pros:Open source with robust OSS features; noted in [community threads](https://www.reddit.com/r/LLMDevs/comments/1jb1knr/why_the_heck_is_llm_observation_and_management).Strong tracing and evals for agents and RAG; praised for workflow testing in [user reviews](https://www.reddit.com/r/AI_Agents/comments/1glwb2x/how_are_you_testingevaluating_your_llm_workflows).Fits modern observability stacks via open standards; see [industry discussion](https://www.reddit.com/r/Rag/comments/1gghx59/industry_standard_observability_tool) and [OpenInference](https://github.com/Arize-ai/openinference).Helpful visualizations and easy integrations called out on [G2](https://www.g2.com/products/arize-ai/reviews).Cons:Noticeable learning curve for new users; highlighted on [AWS Marketplace](https://aws.amazon.com/marketplace/reviews/reviews-list/prodview-kjmocii4mcw4s) and [G2](https://www.g2.com/products/arize-ai/reviews).Self-hosting requires infra work; mentioned in a [third-party comparison](https://www.braintrust.dev/articles/arize-phoenix-vs-braintrust).Compared with Langfuse/LangSmith for prompt management and monitoring; expect tradeoffs. See [Langfuse FAQ](https://langfuse.com/faq/all/best-phoenix-arize-alternatives) and a [Reddit roundup](https://www.reddit.com/r/AIQuality/comments/1nidcyt/comparison_of_top_llm_evaluation_platforms).Company Background: Arize AI
Company: [Arize AI](https://arize.com/about-us), based in Berkeley, CAFounders: [Aparna Dhinakaran](https://www.linkedin.com/in/aparnadhinakaran), [Jason Lopatecki](https://www.linkedin.com/in/jason-lopatecki-9509941)Team size: 51–200 employees (LinkedIn: [Arize AI](https://www.linkedin.com/company/arizeai))Funding: $38M Series B led by TCV in 2022Getting Started
Start here: [Phoenix homepage](https://phoenix.arize.com)Quick setup and environments: [Phoenix Cloud and self-hosting](https://arize.com/docs/phoenix/environments)Self-hosting guide: [Deploy via Docker/Kubernetes](https://arize.com/docs/phoenix/self-hosting)Tracing your app: [LLM traces](https://arize.com/docs/phoenix/tracing/llm-traces)Evaluations: [Get started with evaluations](https://arize.com/docs/phoenix/get-started/get-started-evaluations)Tutorials: [Cookbook and workflows](https://arize.com/docs/phoenix/cookbook)Community article: [Hugging Face guide to tracing and evaluating agents](https://huggingface.co/blog/smolagents-phoenix)Pricing and Free Tier
Phoenix Cloud offers a free instance with **10 GB storage**; see [environments and free tier](https://arize.com/docs/phoenix/environments).Security and Data Control
Security depends on your deployment. Self-hosting provides full data control. Review [self-hosting docs](https://arize.com/docs/phoenix/self-hosting) and [configuration](https://arize.com/docs/phoenix/self-hosting/configuration).Comparisons and Alternatives
Phoenix vs. Langfuse/LangSmith: Phoenix emphasizes **open standards**, **OSS-first features**, and **evaluations**. See [Phoenix vs. LangSmith FAQ](https://arize.com/docs/phoenix/resources/frequently-asked-questions/open-source-langsmith-alternative-arize-phoenix-vs.-langsmith) and [Langfuse FAQ](https://langfuse.com/faq/all/best-phoenix-arize-alternatives).---
Keywords: Arize Phoenix, LLM observability, LLM tracing, LLM evaluation, RAG evaluation, AI agent tracing, OpenTelemetry, OpenInference, open source AI observability, Phoenix Cloud, self-hosted LLM monitoring.