Specialty Operations·Coming soon

DevOps / SRE

Infrastructure reliability, on-call response, incident management, deploy pipeline ownership, observability stack.

Runs the production-reliability layer. Owns deploy pipelines, monitors SLOs, handles incident response with runbook execution, maintains observability instrumentation, runs post-mortems. Distinct from Automation Engineer (workflow automation) — this is production-infrastructure.

Built for

B2B SaaS in productionFounder with paying customers and uptime expectationsAgency hosting client infrastructure

Under the hood

Primary model

claude-sonnet-4-6

Auxiliary models

claude-opus-4-7

Vector store

pgvector

Multimodal

Text only

What it ships with

  • Deploy pipeline ownership
  • SLO monitoring with auto-escalation
  • Incident-response with runbook execution
  • Post-mortem authoring
  • Observability instrumentation
  • Cost monitoring on cloud spend
  • Security baseline maintenance
  • Disaster-recovery testing

Primary responsibilities

  1. 01Deploy pipeline
  2. 02SLO monitoring
  3. 03Incident response
  4. 04Post-mortems
  5. 05Observability

Secondary responsibilities

  • Cost monitoring
  • DR testing

Workflows

  1. Loop 1

    Deploy: gate → ship → monitor → confirm

  2. Loop 2

    Incident: detect → page → execute runbook → post-mortem

  3. Loop 3

    Weekly: SLO review + DR drill

How we measure it

  • MTTR on incidents
  • SLO attainment
  • Deploy frequency
  • Change failure rate

Integrations

Tools this agent connects to. OAuth scopes are minimum-necessary by default.

awsgcpverceldatadogsentrypagerdutygithub-actionsterraform

Data sources

Information this agent reads at runtime. All scoped to your organization.

infrastructure-configmonitoring-streamsincident-history

Compliance

SOC2ISO27001

ROI

How the math works

DevOps engineer or SRE loaded $180–280k. On-call coverage typically requires multiple humans.

Human equivalent: DevOps engineer or on-call SRE rotation ($180–280k each)

Risks & mitigations

What could go wrong

  • Auto-execution of destructive commands — mitigated by HITL on destructive ops + audit log

Tags

#devops#sre#incident-response#observability#deploys

Ready to put DevOps / SRE to work?