# Reliability and SLA

Service Level Commitments

## Service Level Agreement

Sistava commits to 99.95% monthly availability for the webapp and API, which corresponds to roughly 21 minutes of downtime per month. Background work targets 99.9%, the marketing site targets 99.99%, and customer file storage targets eleven nines of durability through redundant object storage.

Internally we steer to a higher Service Level Objective than the public SLA so issues surface before they affect customers. Latency commitments include p95 under 300 ms and p99 under 1 second on the API.

## Monitoring and Alerting

The platform is instrumented with continuous monitoring across infrastructure, application, and business metrics. Critical incidents page our on-call rotation in real time through a dedicated alert channel. Mean time to detect targets under 5 minutes; mean time to resolve a P1 incident targets under 60 minutes.

A public status page at status.sista.ai shows current platform health and a history of incidents. Major incidents trigger customer communication through both the status page and direct email.

## Backups and Disaster Recovery

Customer data is backed up daily and retained for 14 days in a separate geographic failure domain. Continuous transaction-log archiving keeps our recovery point objective under 5 minutes for the primary database. Our recovery time objective for a full restore is under 4 hours, validated through documented procedure.

Deployments are zero-downtime by default. Planned maintenance, when required, is announced at least 7 days in advance through the status page.

## Incident Response

Every incident is classified by severity, communicated through the status page, and followed by a post-mortem when material. We track incident frequency and recurrence as a key reliability metric and feed every learning back into platform hardening.

Service credits for SLA breaches are available to enterprise customers per contract. Reach out at security@sista.ai to discuss specific reliability requirements.

## What this means for customers

- Public 99.95% uptime SLA for the webapp and API
- Eleven nines of durability for customer file storage
- Recovery point under 5 minutes, recovery time under 4 hours
- Real-time monitoring with on-call rotation for critical incidents
- Public status page and incident communication