Resolve.ai is positioning as a series a horizontal AI infrastructure play, building foundational capabilities around knowledge graphs.
As agentic architectures emerge as the dominant build pattern, Resolve.ai is positioned to benefit from enterprise demand for autonomous workflow solutions. The timing aligns with broader market readiness for AI systems that can execute multi-step tasks without human intervention.
Resolve.ai provides an AI platform that automates software production operations and incident management for engineering teams.
The combination of domain credibility (OpenTelemetry co‑creators and ex‑Splunk observability leadership), deep tool + telemetry + code integrations, a multi‑agent reasoning/execution layer, and a privacy‑first architecture that allows enterprises to safely grant agents operational access without data leakage or cross‑tenant model use.
Resolve appears to maintain structured, permission-aware representations of production entities and their relationships (services, infra, runbooks, telemetry, chats). This is likely implemented as an index or graph-like knowledge store to support scoped queries, entity linking, and causal timeline construction.
Emerging pattern with potential to unlock new application categories.
They convert freeform user intent into executable artifacts (kubectl, PRs, scripts). Likely implemented with LLM-driven prompt-to-command/code generation plus environment-aware templating and safety checks to create working code and infra commands.
Emerging pattern with potential to unlock new application categories.
There are strong operational and policy guardrails described (access controls, no data mixing, no write access). However, the content does not explicitly describe a secondary LLM layer that checks or filters outputs. The guardrails appear emphasized at infra/policy level; model-level safety validators are plausible but not explicit.
Accelerates AI deployment in compliance-heavy industries. Creates new category of AI safety tooling.
Resolve describes an orchestration of specialized agents/models that handle different subtasks (investigation, telemetry parsing, remediation). This implies a router/orchestrator selecting among task-specific or fine-tuned models (an explicit micro-model mesh or ensemble of expert models).
Cost-effective AI deployment for mid-market. Creates opportunity for specialized model providers.
Resolve.ai builds on Claude Sonnet 4.6, Anthropic Opus 4.6, leveraging Anthropic infrastructure. The technical approach emphasizes hybrid.
Claims of exclusive per-customer fine-tuning/adaptation are made; no low-level technique (LoRA, delta tuning) is specified in content. — Customer-specific operational data (runbooks, chats, incidents, telemetry-derived summaries) as implied by "Builds a living model...captures tribal knowledge" and "Your Data Trains Only Your Models".
Controller/orchestrator that spawns specialized agents which operate tools and query systems in parallel to test multiple hypotheses and build causal timelines; agents are tool-aware (can run kubectl, update JIRA, generate PRs).
Co-creator of OpenTelemetry; led Splunk Observability as GM; Chief Architect at Splunk Observability; long-standing focus on production systems and observability; educated at University of Illinois Urbana-Champaign
Previously: Splunk, VMware
Co-creator of OpenTelemetry; led Splunk Observability as Chief Architect; experienced in enterprise software and production tooling; educated at University of Illinois Urbana-Champaign
Previously: Splunk, VMware
Strong: founders' backgrounds in OpenTelemetry and Splunk Observability align closely with building AI-driven production reliability and incident response platforms at scale.
content marketing
Target: enterprise
custom
hybrid
• Meir Amiel testimonial
• Logos: Coinbase, Zscaler, Toast (and others)
• MTTR reduction and production incident improvements
Automated on-call incident investigation and triage with AI agents
The emphasis on metadata-only access and configurable scraping frequency is distinct from many RAG implementations that default to ingesting wide swaths of data; this reduces data exposure and aligns retrieval with least-privilege principles.
Packaging a configurable satellite that can either scrape metadata or act as a proxy and explicitly enforce no-write/no-persist rules is a strong enterprise pattern for regulated environments; it's a pragmatic hybrid of on-prem control with cloud-managed models.
The explicit claim of exclusive fine-tuning combined with a no-raw-data policy addresses a hard enterprise tension (models that learn from customers while preserving privacy) — the implementation details are critical and noteworthy if true.
Resolve.ai operates in a competitive landscape that includes PagerDuty, Datadog, Splunk (including VictorOps/On-Call capabilities).
Differentiation: Resolve.ai emphasizes autonomous multi‑agent investigations that reason across code, telemetry and infrastructure and can take remediation actions (PRs, kubectl, scripts). It also markets a privacy-first architecture (no data ingestion, tenant‑specific fine‑tuning) and deeper code/system-level integrations beyond PagerDuty’s incident routing and runbook automation.
Differentiation: Datadog focuses on metrics/traces/logs and dashboards; Resolve.ai layers agentic AI on top of observability to autonomously investigate incidents, build causal timelines linking code changes to telemetry, and operate tools. Resolve markets tighter integration with code, PR generation, and automated remediation workflows rather than primarily telemetry visualization.
Differentiation: Resolve.ai claims to be built specifically for autonomous production operations with multi‑agent reasoning and agentic tool execution. Founders are ex‑Splunk and OpenTelemetry co‑creators, positioning Resolve as more focused on LLM agents that act across code/infra/telemetry rather than Splunk’s broader data platform approach.
Privacy-first 'satellite' gateway that can be configured in metadata-only mode or act as a secure proxy — enabling deep runtime access without centralizing raw customer data. This is a distinct engineering tradeoff (edge proxy + ephemeral context) versus SaaS ingestion-first models.
Per-customer 'living model' and exclusive fine-tuning claims: they emphasize no cross-customer models and no raw-data ingestion while still offering continuous learning from runbooks/chats/incidents. That implies on-prem or customer-specific parameter-efficient fine-tuning (LoRA/adapter-style) or encrypted/ephemeral embedding strategies rather than standard multi-tenant training.
Multi-agent orchestration focused on parallel hypothesis-driven incident investigations: an orchestration layer that spawns specialized agents (telemetry, code, infra, knowledge), coordinates async evidence collection, and synthesizes causal timelines across traces, logs, and commits.
Tool-operating agents with action gating: the system not only analyzes but generates executable remediation (kubectl, PRs, Jira updates) under strict RBAC and SSO-controlled service accounts — coupling inference with audited effectors and safety constraints.
Deep integration with observability standards and lineage: co-creators of OpenTelemetry on the team signals a likely tight coupling to instrumentation/trace semantics (e.g., deterministic mappings from spans to service ownership and code commits), which reduces brittle mapping problems other vendors face.
If Resolve.ai achieves its technical roadmap, it could become foundational infrastructure for the next generation of AI applications. Success here would accelerate the timeline for downstream companies to build reliable, production-grade AI products. Failure or pivot would signal continued fragmentation in the AI tooling landscape.
“Multi-agent system that connects code, services, infrastructure, and telemetry. Operates tools and reasons through complex problems like your expert engineers.”
“AI for prod. It works across your code, infrastructure, telemetry, and knowledge to help engineering teams run production more reliably and efficiently.”
“On-call 24/7 AI that is always debugging, triaging and debugging incidents with full context across telemetry, infra, code, and tools.”
“Autonomously investigates incidents and builds initial theories before your on-call engineer even looks. Gets you to root cause.”
“Executes remediation actions: generates Git PRs, kubectl commands, code fixes, or scripts that work with your setup.”
“AI agents deployed into complex, domain-specific workflows don't work out of the box. Here's why Forward Deployed Engineering is the critical path to enterprise AI adoption.”