Brixo
Skip to main content
Back to Agent Operations
Phoenix (Arize AI) logo

Phoenix (Arize AI)

Ship Agents that Work. Arize AI & Agent Engineering Platform. One place for development, observability, and evaluation.

Visit Website

Founded

2020

Location

Berkeley, CA

Employees

145

Funding

$138M total

Arize Phoenix — Open Source LLM Tracing, Evaluation, and AI App Observability

Arize Phoenix is an open source platform for tracing and evaluating LLM applications and agents. Built by Arize AI, Phoenix provides span-level visibility, structured evaluations, and experiment workflows to help teams instrument, debug, and improve AI systems. Run it self-hosted or in the cloud via Phoenix Cloud’s free tier.

  • Website: [phoenix.arize.com](https://phoenix.arize.com)
  • Docs: [Arize Phoenix documentation](https://arize.com/docs/phoenix)
  • GitHub: [Arize-ai/phoenix](https://github.com/Arize-ai/phoenix) (7.3k+ stars; active releases and issues)
  • Tracing overview: [LLM tracing and observability](https://phoenix.arize.com/llm-tracing-and-observability-with-arize-phoenix)
  • What Phoenix Does

  • **LLM/Agent Tracing:** End-to-end span traces with inputs/outputs, latency, token usage, errors, and metadata. Built on open standards for easy instrumentation. See [LLM traces](https://arize.com/docs/phoenix/tracing/llm-traces).
  • **Evaluations & Testing:** Score outputs with model judges and ground-truth checks, run test sets, compare prompts/models, and track experiments. Includes **RAG relevance** checks and dataset management. Start with [evaluations](https://arize.com/docs/phoenix/get-started/get-started-evaluations).
  • **Observability for AI Apps:** Metrics, visualizations, and workflows that shorten the debug/iterate loop for agents, RAG pipelines, and production monitoring.
  • Why It Stands Out

  • **Open source, OSS-first features** with optional managed cloud. Explore the [GitHub project](https://github.com/Arize-ai/phoenix) and [releases](https://github.com/Arize-ai/phoenix/releases).
  • **Open standards:** Native support for [OpenTelemetry](https://github.com/Arize-ai/openinference) and [OpenInference](https://github.com/Arize-ai/openinference) to fit modern stacks.
  • **Fast instrumentation:** Minimal code changes to trace popular frameworks; see the [tracing explainer](https://phoenix.arize.com/llm-tracing-and-observability-with-arize-phoenix).
  • Key Features

  • Tracing and spans for agents/tools, inputs/outputs, latency, tokens, errors, and metadata
  • Evaluations: LLM judge, relevance/correctness checks, production evals, and **A/B comparisons** across prompts and models
  • Experiment tracking and **prompt workflow** for testing provider changes; see [provider setup](https://arize.com/docs/phoenix/prompt-engineering/how-to-prompts/configure-ai-providers)
  • RAG diagnostics: retrieval and answer relevance, dataset curation, and regression tests
  • Dashboards and visualizations for debugging and performance tuning
  • Tutorials and cookbooks for hands-on workflows: [Phoenix cookbook](https://arize.com/docs/phoenix/cookbook)
  • User guide: [Phoenix user guide](https://arize.com/docs/phoenix/user-guide)
  • Deployment, Cloud, and Licensing

  • Deployment options:
  • **Phoenix Cloud:** Free instances with 10 GB storage; see [environments](https://arize.com/docs/phoenix/environments).
  • **Self-hosted:** Docker/Kubernetes deploy; see [self-hosting](https://arize.com/docs/phoenix/self-hosting).
  • License: **Elastic License 2.0**. Self-hosting is free and permitted. Details in the [license docs](https://arize.com/docs/phoenix/self-hosting/license).
  • Integrations

  • LLM providers: [OpenAI, Azure OpenAI, Anthropic, Google AI Studio](https://arize.com/docs/phoenix/prompt-engineering/how-to-prompts/configure-ai-providers)
  • Frameworks:
  • Python: [LangChain](https://arize.com/docs/phoenix/integrations/python/langchain), [LlamaIndex](https://arize.com/docs/phoenix/integrations/python/llamaindex)
  • Java: [LangChain4j tracing](https://arize.com/docs/phoenix/integrations/java/langchain4j/langchain4j-tracing)
  • JS/TS: LangChain.js (via OpenTelemetry/OpenInference)
  • Meta frameworks/routers: [LiteLLM](https://arize.com/docs/phoenix/integrations/llm-providers/litellm/litellm-tracing)
  • Google Gen AI SDK: [Tracing integration](https://arize.com/docs/phoenix/integrations/llm-providers/google-gen-ai/google-genai-tracing)
  • Open standards: [OpenTelemetry and OpenInference](https://github.com/Arize-ai/openinference)
  • Who It’s For

  • **AI engineers and agent developers** needing span-level traces and quick feedback loops
  • **ML engineers building RAG systems** requiring relevance checks and dataset workflows
  • Teams standardizing on **OpenTelemetry/OpenInference**
  • Organizations that prefer **open source** with a self-hosting option
  • Common Use Cases

  • Debug agent tools/toolchains with span traces and error insights
  • Track prompt/model changes with experiments and test sets
  • Evaluate RAG pipelines for retrieval and answer relevance
  • Monitor latency and token usage across dev and production contexts
  • Collect human feedback and compare model-graded scores on datasets
  • User Sentiment Snapshot

  • Pros:
  • Open source with robust OSS features; noted in [community threads](https://www.reddit.com/r/LLMDevs/comments/1jb1knr/why_the_heck_is_llm_observation_and_management).
  • Strong tracing and evals for agents and RAG; praised for workflow testing in [user reviews](https://www.reddit.com/r/AI_Agents/comments/1glwb2x/how_are_you_testingevaluating_your_llm_workflows).
  • Fits modern observability stacks via open standards; see [industry discussion](https://www.reddit.com/r/Rag/comments/1gghx59/industry_standard_observability_tool) and [OpenInference](https://github.com/Arize-ai/openinference).
  • Helpful visualizations and easy integrations called out on [G2](https://www.g2.com/products/arize-ai/reviews).
  • Cons:
  • Noticeable learning curve for new users; highlighted on [AWS Marketplace](https://aws.amazon.com/marketplace/reviews/reviews-list/prodview-kjmocii4mcw4s) and [G2](https://www.g2.com/products/arize-ai/reviews).
  • Self-hosting requires infra work; mentioned in a [third-party comparison](https://www.braintrust.dev/articles/arize-phoenix-vs-braintrust).
  • Compared with Langfuse/LangSmith for prompt management and monitoring; expect tradeoffs. See [Langfuse FAQ](https://langfuse.com/faq/all/best-phoenix-arize-alternatives) and a [Reddit roundup](https://www.reddit.com/r/AIQuality/comments/1nidcyt/comparison_of_top_llm_evaluation_platforms).
  • Company Background: Arize AI

  • Company: [Arize AI](https://arize.com/about-us), based in Berkeley, CA
  • Founders: [Aparna Dhinakaran](https://www.linkedin.com/in/aparnadhinakaran), [Jason Lopatecki](https://www.linkedin.com/in/jason-lopatecki-9509941)
  • Team size: 51–200 employees (LinkedIn: [Arize AI](https://www.linkedin.com/company/arizeai))
  • Funding: $38M Series B led by TCV in 2022
  • Getting Started

  • Start here: [Phoenix homepage](https://phoenix.arize.com)
  • Quick setup and environments: [Phoenix Cloud and self-hosting](https://arize.com/docs/phoenix/environments)
  • Self-hosting guide: [Deploy via Docker/Kubernetes](https://arize.com/docs/phoenix/self-hosting)
  • Tracing your app: [LLM traces](https://arize.com/docs/phoenix/tracing/llm-traces)
  • Evaluations: [Get started with evaluations](https://arize.com/docs/phoenix/get-started/get-started-evaluations)
  • Tutorials: [Cookbook and workflows](https://arize.com/docs/phoenix/cookbook)
  • Community article: [Hugging Face guide to tracing and evaluating agents](https://huggingface.co/blog/smolagents-phoenix)
  • Pricing and Free Tier

  • Phoenix Cloud offers a free instance with **10 GB storage**; see [environments and free tier](https://arize.com/docs/phoenix/environments).
  • Security and Data Control

  • Security depends on your deployment. Self-hosting provides full data control. Review [self-hosting docs](https://arize.com/docs/phoenix/self-hosting) and [configuration](https://arize.com/docs/phoenix/self-hosting/configuration).
  • Comparisons and Alternatives

  • Phoenix vs. Langfuse/LangSmith: Phoenix emphasizes **open standards**, **OSS-first features**, and **evaluations**. See [Phoenix vs. LangSmith FAQ](https://arize.com/docs/phoenix/resources/frequently-asked-questions/open-source-langsmith-alternative-arize-phoenix-vs.-langsmith) and [Langfuse FAQ](https://langfuse.com/faq/all/best-phoenix-arize-alternatives).
  • ---

    Keywords: Arize Phoenix, LLM observability, LLM tracing, LLM evaluation, RAG evaluation, AI agent tracing, OpenTelemetry, OpenInference, open source AI observability, Phoenix Cloud, self-hosted LLM monitoring.

    Related Companies

    Galileo logo

    Galileo

    Galileo is the leading platform for enterprise GenAI evaluation and observability. Our comprehensive suite of products support builders across the new AI development workflow—from fine-tuning LLMs to developing, testing, monitoring, and securing their AI applications. Each product is powered by our research-backed evaluation metrics. Today, Galileo is used by 100s of AI teams from startups to Fortune 50 enterprises, including Twilio, Comcast, and HP.

    HoneyHive logo

    HoneyHive

    HoneyHive is the leading AI observability and evals platform, trusted by next-gen AI startups to Fortune 100 enterprises. We make it easy and repeatable for modern AI teams to debug, evaluate, and monitor AI agents, and deploy them to production with confidence. HoneyHive’s founding team brings AI and infrastructure expertise from Microsoft OpenAI, Amazon, Amplitude, New Relic, and Sisu. The company is based in New York and San Francisco.

    Humanloop logo

    Humanloop

    Humanloop is the LLM evals platform for enterprises. Teams at Gusto, Vanta and Duolingo use Humanloop to ship reliable AI products. We enable you to adopt best practices for prompt management, evaluation and observability.

    LangFuse logo

    LangFuse

    Langfuse is the 𝗺𝗼𝘀𝘁 𝗽𝗼𝗽𝘂𝗹𝗮𝗿 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲 𝗟𝗟𝗠𝗢𝗽𝘀 𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺. It helps teams collaboratively develop, monitor, evaluate, and debug AI applications. Langfuse can be 𝘀𝗲𝗹𝗳-𝗵𝗼𝘀𝘁𝗲𝗱 in minutes and is battle-tested and used in production by thousands of users from YC startups to large companies like Khan Academy or Twilio. Langfuse builds on a proven track record of reliability and performance. Developers can trace any Large Language model or framework using our SDKs for Python and JS/TS, our open API or our native integrations (OpenAI, Langchain, Llama-Index, Vercel AI SDK). Beyond tracing, developers use 𝗟𝗮𝗻𝗴𝗳𝘂𝘀𝗲 𝗣𝗿𝗼𝗺𝗽𝘁 𝗠𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁, 𝗶𝘁𝘀 𝗼𝗽𝗲𝗻 𝗔𝗣𝗜𝘀, 𝗮𝗻𝗱 𝘁𝗲𝘀𝘁𝗶𝗻𝗴 𝗮𝗻𝗱 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀 to improve the quality of their applications. Product managers can 𝗮𝗻𝗮𝗹𝘆𝘇𝗲, 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗲, 𝗮𝗻𝗱 𝗱𝗲𝗯𝘂𝗴 𝗔𝗜 𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝘀 by accessing detailed metrics on costs, latencies, and user feedback in the Langfuse Dashboard. They can bring 𝗵𝘂𝗺𝗮𝗻𝘀 𝗶𝗻 𝘁𝗵𝗲 𝗹𝗼𝗼𝗽 by setting up annotation workflows for human labelers to score their application. Langfuse can also be used to 𝗺𝗼𝗻𝗶𝘁𝗼𝗿 𝘀𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗿𝗶𝘀𝗸𝘀 through security framework and evaluation pipelines. Langfuse enables 𝗻𝗼𝗻-𝘁𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝘁𝗲𝗮𝗺 𝗺𝗲𝗺𝗯𝗲𝗿𝘀 to iterate on prompts and model configurations directly within the Langfuse UI or use the Langfuse Playground for fast prompt testing. Langfuse is 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲 and we are proud to have a fantastic community on Github and Discord that provides help and feedback. Do get in touch with us!

    LangSmith logo

    LangSmith

    LangChain provides the agent engineering platform and open source frameworks developers need to ship reliable agents fast.

    Portkey logo

    Portkey

    AI Gateway, Guardrails, and Governance. Processing 14 Billion+ LLM tokens every day. Backed by Lightspeed.