Phoenix (Arize AI)

Ship Agents that Work. Arize AI & Agent Engineering Platform. One place for development, observability, and evaluation.

Agent Operations

Visit Website

Founded

2020

Location

Berkeley, CA

Employees

145

Funding

$138M total

Arize Phoenix — Open Source LLM Tracing, Evaluation, and AI App Observability

Arize Phoenix is an open source platform for tracing and evaluating LLM applications and agents. Built by Arize AI, Phoenix provides span-level visibility, structured evaluations, and experiment workflows to help teams instrument, debug, and improve AI systems. Run it self-hosted or in the cloud via Phoenix Cloud’s free tier.

Website: [phoenix.arize.com](https://phoenix.arize.com)

Docs: [Arize Phoenix documentation](https://arize.com/docs/phoenix)

GitHub: [Arize-ai/phoenix](https://github.com/Arize-ai/phoenix) (7.3k+ stars; active releases and issues)

Tracing overview: [LLM tracing and observability](https://phoenix.arize.com/llm-tracing-and-observability-with-arize-phoenix)

What Phoenix Does

**LLM/Agent Tracing:** End-to-end span traces with inputs/outputs, latency, token usage, errors, and metadata. Built on open standards for easy instrumentation. See [LLM traces](https://arize.com/docs/phoenix/tracing/llm-traces).

**Evaluations & Testing:** Score outputs with model judges and ground-truth checks, run test sets, compare prompts/models, and track experiments. Includes **RAG relevance** checks and dataset management. Start with [evaluations](https://arize.com/docs/phoenix/get-started/get-started-evaluations).

**Observability for AI Apps:** Metrics, visualizations, and workflows that shorten the debug/iterate loop for agents, RAG pipelines, and production monitoring.

Why It Stands Out

**Open source, OSS-first features** with optional managed cloud. Explore the [GitHub project](https://github.com/Arize-ai/phoenix) and [releases](https://github.com/Arize-ai/phoenix/releases).

**Open standards:** Native support for [OpenTelemetry](https://github.com/Arize-ai/openinference) and [OpenInference](https://github.com/Arize-ai/openinference) to fit modern stacks.

**Fast instrumentation:** Minimal code changes to trace popular frameworks; see the [tracing explainer](https://phoenix.arize.com/llm-tracing-and-observability-with-arize-phoenix).

Key Features

Tracing and spans for agents/tools, inputs/outputs, latency, tokens, errors, and metadata

Evaluations: LLM judge, relevance/correctness checks, production evals, and **A/B comparisons** across prompts and models

Experiment tracking and **prompt workflow** for testing provider changes; see [provider setup](https://arize.com/docs/phoenix/prompt-engineering/how-to-prompts/configure-ai-providers)

RAG diagnostics: retrieval and answer relevance, dataset curation, and regression tests

Dashboards and visualizations for debugging and performance tuning

Tutorials and cookbooks for hands-on workflows: [Phoenix cookbook](https://arize.com/docs/phoenix/cookbook)

User guide: [Phoenix user guide](https://arize.com/docs/phoenix/user-guide)

Deployment, Cloud, and Licensing

Deployment options:

**Phoenix Cloud:** Free instances with 10 GB storage; see [environments](https://arize.com/docs/phoenix/environments).

**Self-hosted:** Docker/Kubernetes deploy; see [self-hosting](https://arize.com/docs/phoenix/self-hosting).

License: **Elastic License 2.0**. Self-hosting is free and permitted. Details in the [license docs](https://arize.com/docs/phoenix/self-hosting/license).

Integrations

LLM providers: [OpenAI, Azure OpenAI, Anthropic, Google AI Studio](https://arize.com/docs/phoenix/prompt-engineering/how-to-prompts/configure-ai-providers)

Frameworks:

Python: [LangChain](https://arize.com/docs/phoenix/integrations/python/langchain), [LlamaIndex](https://arize.com/docs/phoenix/integrations/python/llamaindex)

Java: [LangChain4j tracing](https://arize.com/docs/phoenix/integrations/java/langchain4j/langchain4j-tracing)

JS/TS: LangChain.js (via OpenTelemetry/OpenInference)

Meta frameworks/routers: [LiteLLM](https://arize.com/docs/phoenix/integrations/llm-providers/litellm/litellm-tracing)

Google Gen AI SDK: [Tracing integration](https://arize.com/docs/phoenix/integrations/llm-providers/google-gen-ai/google-genai-tracing)

Open standards: [OpenTelemetry and OpenInference](https://github.com/Arize-ai/openinference)

Who It’s For

**AI engineers and agent developers** needing span-level traces and quick feedback loops

**ML engineers building RAG systems** requiring relevance checks and dataset workflows

Teams standardizing on **OpenTelemetry/OpenInference**

Organizations that prefer **open source** with a self-hosting option

Common Use Cases

Debug agent tools/toolchains with span traces and error insights

Track prompt/model changes with experiments and test sets

Evaluate RAG pipelines for retrieval and answer relevance

Monitor latency and token usage across dev and production contexts

Collect human feedback and compare model-graded scores on datasets

User Sentiment Snapshot

Pros:

Open source with robust OSS features; noted in [community threads](https://www.reddit.com/r/LLMDevs/comments/1jb1knr/why_the_heck_is_llm_observation_and_management).

Strong tracing and evals for agents and RAG; praised for workflow testing in [user reviews](https://www.reddit.com/r/AI_Agents/comments/1glwb2x/how_are_you_testingevaluating_your_llm_workflows).

Fits modern observability stacks via open standards; see [industry discussion](https://www.reddit.com/r/Rag/comments/1gghx59/industry_standard_observability_tool) and [OpenInference](https://github.com/Arize-ai/openinference).

Helpful visualizations and easy integrations called out on [G2](https://www.g2.com/products/arize-ai/reviews).

Cons:

Noticeable learning curve for new users; highlighted on [AWS Marketplace](https://aws.amazon.com/marketplace/reviews/reviews-list/prodview-kjmocii4mcw4s) and [G2](https://www.g2.com/products/arize-ai/reviews).

Self-hosting requires infra work; mentioned in a [third-party comparison](https://www.braintrust.dev/articles/arize-phoenix-vs-braintrust).

Compared with Langfuse/LangSmith for prompt management and monitoring; expect tradeoffs. See [Langfuse FAQ](https://langfuse.com/faq/all/best-phoenix-arize-alternatives) and a [Reddit roundup](https://www.reddit.com/r/AIQuality/comments/1nidcyt/comparison_of_top_llm_evaluation_platforms).

Company Background: Arize AI

Company: [Arize AI](https://arize.com/about-us), based in Berkeley, CA

Founders: [Aparna Dhinakaran](https://www.linkedin.com/in/aparnadhinakaran), [Jason Lopatecki](https://www.linkedin.com/in/jason-lopatecki-9509941)

Team size: 51–200 employees (LinkedIn: [Arize AI](https://www.linkedin.com/company/arizeai))

Funding: $38M Series B led by TCV in 2022

Getting Started

Start here: [Phoenix homepage](https://phoenix.arize.com)

Quick setup and environments: [Phoenix Cloud and self-hosting](https://arize.com/docs/phoenix/environments)

Self-hosting guide: [Deploy via Docker/Kubernetes](https://arize.com/docs/phoenix/self-hosting)

Tracing your app: [LLM traces](https://arize.com/docs/phoenix/tracing/llm-traces)

Evaluations: [Get started with evaluations](https://arize.com/docs/phoenix/get-started/get-started-evaluations)

Tutorials: [Cookbook and workflows](https://arize.com/docs/phoenix/cookbook)

Community article: [Hugging Face guide to tracing and evaluating agents](https://huggingface.co/blog/smolagents-phoenix)

Pricing and Free Tier

Phoenix Cloud offers a free instance with **10 GB storage**; see [environments and free tier](https://arize.com/docs/phoenix/environments).

Security and Data Control

Security depends on your deployment. Self-hosting provides full data control. Review [self-hosting docs](https://arize.com/docs/phoenix/self-hosting) and [configuration](https://arize.com/docs/phoenix/self-hosting/configuration).

Comparisons and Alternatives

Phoenix vs. Langfuse/LangSmith: Phoenix emphasizes **open standards**, **OSS-first features**, and **evaluations**. See [Phoenix vs. LangSmith FAQ](https://arize.com/docs/phoenix/resources/frequently-asked-questions/open-source-langsmith-alternative-arize-phoenix-vs.-langsmith) and [Langfuse FAQ](https://langfuse.com/faq/all/best-phoenix-arize-alternatives).

---

Keywords: Arize Phoenix, LLM observability, LLM tracing, LLM evaluation, RAG evaluation, AI agent tracing, OpenTelemetry, OpenInference, open source AI observability, Phoenix Cloud, self-hosted LLM monitoring.

Related Companies

Galileo

Galileo is the leading platform for enterprise GenAI evaluation and observability. Our comprehensive suite of products support builders across the new AI development workflow—from fine-tuning LLMs to developing, testing, monitoring, and securing their AI applications. Each product is powered by our research-backed evaluation metrics. Today, Galileo is used by 100s of AI teams from startups to Fortune 50 enterprises, including Twilio, Comcast, and HP.

HoneyHive

HoneyHive is the leading AI observability and evals platform, trusted by next-gen AI startups to Fortune 100 enterprises. We make it easy and repeatable for modern AI teams to debug, evaluate, and monitor AI agents, and deploy them to production with confidence. HoneyHive’s founding team brings AI and infrastructure expertise from Microsoft OpenAI, Amazon, Amplitude, New Relic, and Sisu. The company is based in New York and San Francisco.

Humanloop

Humanloop is the LLM evals platform for enterprises. Teams at Gusto, Vanta and Duolingo use Humanloop to ship reliable AI products. We enable you to adopt best practices for prompt management, evaluation and observability.

LangFuse

Langfuse is the 𝗺𝗼𝘀𝘁 𝗽𝗼𝗽𝘂𝗹𝗮𝗿 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲 𝗟𝗟𝗠𝗢𝗽𝘀 𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺. It helps teams collaboratively develop, monitor, evaluate, and debug AI applications. Langfuse can be 𝘀𝗲𝗹𝗳-𝗵𝗼𝘀𝘁𝗲𝗱 in minutes and is battle-tested and used in production by thousands of users from YC startups to large companies like Khan Academy or Twilio. Langfuse builds on a proven track record of reliability and performance. Developers can trace any Large Language model or framework using our SDKs for Python and JS/TS, our open API or our native integrations (OpenAI, Langchain, Llama-Index, Vercel AI SDK). Beyond tracing, developers use 𝗟𝗮𝗻𝗴𝗳𝘂𝘀𝗲 𝗣𝗿𝗼𝗺𝗽𝘁 𝗠𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁, 𝗶𝘁𝘀 𝗼𝗽𝗲𝗻 𝗔𝗣𝗜𝘀, 𝗮𝗻𝗱 𝘁𝗲𝘀𝘁𝗶𝗻𝗴 𝗮𝗻𝗱 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀 to improve the quality of their applications. Product managers can 𝗮𝗻𝗮𝗹𝘆𝘇𝗲, 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗲, 𝗮𝗻𝗱 𝗱𝗲𝗯𝘂𝗴 𝗔𝗜 𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝘀 by accessing detailed metrics on costs, latencies, and user feedback in the Langfuse Dashboard. They can bring 𝗵𝘂𝗺𝗮𝗻𝘀 𝗶𝗻 𝘁𝗵𝗲 𝗹𝗼𝗼𝗽 by setting up annotation workflows for human labelers to score their application. Langfuse can also be used to 𝗺𝗼𝗻𝗶𝘁𝗼𝗿 𝘀𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗿𝗶𝘀𝗸𝘀 through security framework and evaluation pipelines. Langfuse enables 𝗻𝗼𝗻-𝘁𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝘁𝗲𝗮𝗺 𝗺𝗲𝗺𝗯𝗲𝗿𝘀 to iterate on prompts and model configurations directly within the Langfuse UI or use the Langfuse Playground for fast prompt testing. Langfuse is 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲 and we are proud to have a fantastic community on Github and Discord that provides help and feedback. Do get in touch with us!

LangSmith

LangChain provides the agent engineering platform and open source frameworks developers need to ship reliable agents fast.

Portkey

AI Gateway, Guardrails, and Governance. Processing 14 Billion+ LLM tokens every day. Backed by Lightspeed.