Monitoring
This guide covers monitoring strategies for your Palpo server to ensure reliability and performance.
Key Metrics to Monitor
Server Health
Application Metrics
Database Metrics
Logging Configuration
Basic Logging
Configure logging in your palpo.toml:
Structured Logging
For production, use JSON format for easier parsing:
Log output:
Log Rotation
Use logrotate for managing log files:
Health Checks
HTTP Health Endpoint
Check server health:
Systemd Health Check
Docker Health Check
Prometheus Metrics
Exposing Metrics
Palpo can expose Prometheus-compatible metrics. Enable in configuration:
Common Metrics
Prometheus Configuration
Grafana Dashboard
Key Panels
- Request Rate - Requests per second
- Response Time - p50, p95, p99 latencies
- Error Rate - 4xx and 5xx responses
- Active Users - Concurrent connected users
- Federation Health - Queue size and delivery rate
- Resource Usage - CPU, memory, disk
Sample Dashboard JSON
Alerting
Alert Examples
High Error Rate:
Slow Response Time:
Federation Queue Growing:
Low Disk Space:
External Monitoring
Uptime Monitoring Services
Monitor your server's availability from outside:
- UptimeRobot
- Pingdom
- StatusCake
- Better Uptime
Endpoint to monitor:
Federation Tester
Test federation connectivity:
Troubleshooting with Monitoring
High CPU Usage
- Check slow queries in database
- Review active requests
- Look for federation issues
- Check for runaway processes
Memory Leaks
- Monitor memory over time
- Check for growing connection pools
- Review long-running operations
- Consider restart schedule if needed
Slow Responses
- Check database query times
- Review disk I/O
- Check network latency
- Look for lock contention
Federation Issues
- Monitor federation queue size
- Check destination server health
- Review error logs for specific failures
- Reset problematic connections via Admin API
Best Practices
- Set up alerts before problems occur - Don't wait for users to report issues
- Monitor trends, not just thresholds - A gradual increase may indicate a developing problem
- Keep historical data - Useful for capacity planning and debugging
- Document your monitoring setup - So others can understand and maintain it
- Test your alerts - Ensure they fire when expected
- Have runbooks - Document response procedures for common alerts