Metrics Configuration
Sockudo can expose performance metrics that can be scraped by monitoring systems like Prometheus. This allows you to observe the server's behavior, track performance, and set up alerts.
Metrics configuration is managed under the metrics
object in your config.json
.
Main Metrics Settings
- JSON Key (Parent):
metrics
metrics.enabled
- JSON Key:
enabled
- Environment Variable:
METRICS_ENABLED
(Set totrue
or1
) - Type:
boolean
- Description: Enables or disables the metrics exposition system. If disabled, the metrics endpoint will not be available.
- Default Value:
true
metrics.driver
- JSON Key:
driver
- Environment Variable:
METRICS_DRIVER
- Type:
enum
(string) - Description: Specifies the metrics system driver to use.
- Default Value:
"prometheus"
- Possible Values:
"prometheus"
: Exposes metrics in a Prometheus-compatible format.
metrics.host
- JSON Key:
host
- Environment Variable:
METRICS_HOST
- Type:
string
- Description: The IP address the metrics server will listen on. Use
0.0.0.0
to listen on all available network interfaces. - Default Value:
"0.0.0.0"
metrics.port
- JSON Key:
port
- Environment Variable:
METRICS_PORT
- Type:
integer
(u16) - Description: The port number the metrics server will listen on. This is typically different from the main application port.
- Default Value:
9601
Example (config.json
):
{
"metrics": {
"enabled": true,
"driver": "prometheus",
"host": "0.0.0.0",
"port": 9601,
"prometheus": {
"prefix": "sockudo_"
}
}
}
Example (Environment Variables):
export METRICS_ENABLED=true
export METRICS_HOST="0.0.0.0"
export METRICS_PORT=9601
Prometheus Configuration (metrics.prometheus
)
These settings are applicable if metrics.driver
is set to "prometheus"
.
- JSON Key (Parent Object):
metrics.prometheus
metrics.prometheus.prefix
- JSON Key:
prefix
- Environment Variable:
PROMETHEUS_METRICS_PREFIX
- Type:
string
- Description: A prefix that will be added to all metric names exposed by Sockudo. Useful for namespacing in a shared Prometheus instance.
- Default Value:
"sockudo_"
Example (config.json
):
{
"metrics": {
"enabled": true,
"driver": "prometheus",
"host": "0.0.0.0",
"port": 9601,
"prometheus": {
"prefix": "my_company_sockudo_"
}
}
}
Environment Variables:
METRICS_ENABLED=true
METRICS_DRIVER=prometheus
METRICS_HOST="0.0.0.0"
METRICS_PORT=9601
PROMETHEUS_METRICS_PREFIX="my_company_sockudo_"
Available Metrics
Sockudo exposes metrics for monitoring various aspects of the server's performance. These are currently implemented and available for use:
Connection Metrics
sockudo_connected
: Current number of active WebSocket connectionssockudo_new_connections_total
: Total number of WebSocket connections establishedsockudo_new_disconnections_total
: Total number of WebSocket disconnectionssockudo_connection_errors_total
: Total number of connection errors
Message Metrics
sockudo_ws_messages_sent_total
: Total number of messages sent by the serversockudo_ws_messages_received_total
: Total number of messages received from clientssockudo_socket_transmitted_bytes
: Total bytes transmitted via WebSocket connectionssockudo_socket_received_bytes
: Total bytes received via WebSocket connections
HTTP API Metrics
sockudo_http_calls_received_total
: Total number of HTTP API requestssockudo_http_received_bytes
: Total bytes received by HTTP APIsockudo_http_transmitted_bytes
: Total bytes sent by HTTP API
Channel Metrics
sockudo_active_channels
: Current number of active channelssockudo_channel_subscriptions_total
: Total number of channel subscriptionssockudo_channel_unsubscriptions_total
: Total number of channel unsubscriptions
Rate Limiting Metrics
sockudo_rate_limit_triggered_total
: Number of times rate limits were triggeredsockudo_rate_limit_checks_total
: Total number of rate limit checks performed
Horizontal Adapter Metrics
sockudo_horizontal_adapter_resolve_time
: Resolve time for requests to other nodes (histogram)sockudo_horizontal_adapter_resolved_promises
: Promises fulfilled by other nodessockudo_horizontal_adapter_uncomplete_promises
: Promises not entirely fulfilled by other nodessockudo_horizontal_adapter_sent_requests
: Total requests sent to other nodessockudo_horizontal_adapter_received_requests
: Total requests received from other nodessockudo_horizontal_adapter_received_responses
: Total responses received from other nodes
Broadcast Performance Metrics (v2.6.1+)
sockudo_broadcast_latency_ms
: End-to-end latency for broadcast messages in milliseconds- Type: Histogram
- Description: Measures the complete time from when a broadcast is initiated until it's delivered to all recipients
- Labels:
app_id
: The application identifierport
: The server portchannel_type
: Type of channel (public
,private
,presence
,encrypted
)recipient_count_bucket
: Size category of broadcast recipientsxs
: 1-10 recipientssm
: 11-100 recipientsmd
: 101-1000 recipientslg
: 1001-10000 recipientsxl
: 10000+ recipients
- Histogram Buckets (in milliseconds): 0.5, 1.0, 2.5, 5.0, 10.0, 25.0, 50.0, 100.0, 250.0, 500.0, 1000.0, 2500.0, 5000.0
- Use Cases:
- Monitor broadcast performance across different channel sizes
- Identify performance degradation for large broadcasts
- Track latency distribution patterns
- Set up alerts for slow broadcasts
Planned Metrics
The following metrics are planned for future releases but are not currently implemented. They are listed here to provide visibility into the roadmap and help you plan your monitoring strategy.
Note: These metrics are not available yet. Attempting to query them will result in no data. The examples and alert rules in this documentation focus only on available metrics.
Message Processing (Planned)
sockudo_client_events_total
: Total number of client events processedsockudo_broadcast_messages_total
: Total number of messages broadcast to channels
HTTP API Performance (Planned)
sockudo_http_request_duration_seconds
: HTTP request duration histogramsockudo_http_response_size_bytes
: HTTP response size histogram
Channel Presence (Planned)
sockudo_presence_members
: Current number of members in presence channels
Queue Metrics (Planned)
sockudo_queue_jobs_processed_total
: Total number of queue jobs processedsockudo_queue_jobs_failed_total
: Total number of failed queue jobssockudo_queue_active_jobs
: Current number of jobs in the queuesockudo_queue_job_duration_seconds
: Queue job processing time histogram
Webhook Metrics (Planned)
sockudo_webhooks_sent_total
: Total number of webhooks sentsockudo_webhooks_failed_total
: Total number of failed webhookssockudo_webhook_duration_seconds
: Webhook request duration histogram
Cache Metrics (Planned)
sockudo_cache_hits_total
: Total number of cache hitssockudo_cache_misses_total
: Total number of cache missessockudo_cache_operations_total
: Total number of cache operationssockudo_cache_memory_usage_bytes
: Current cache memory usage
Adapter Metrics (Planned)
sockudo_adapter_operations_total
: Total number of adapter operationssockudo_adapter_errors_total
: Total number of adapter errorssockudo_adapter_latency_seconds
: Adapter operation latency histogram
Accessing Metrics
When enabled, metrics are available at the following endpoint:
http://<metrics.host>:<metrics.port>/metrics
For example, with default settings: http://localhost:9601/metrics
Example Metrics Output
# HELP sockudo_connected Current number of active connections
# TYPE sockudo_connected gauge
sockudo_connected{app_id="demo-app",port="6001"} 42
# HELP sockudo_ws_messages_sent_total Total messages sent
# TYPE sockudo_ws_messages_sent_total counter
sockudo_ws_messages_sent_total{app_id="demo-app",port="6001"} 1234
# HELP sockudo_http_request_duration_seconds HTTP request duration
# TYPE sockudo_http_request_duration_seconds histogram
sockudo_http_request_duration_seconds_bucket{method="POST",endpoint="/events",le="0.1"} 100
sockudo_http_request_duration_seconds_bucket{method="POST",endpoint="/events",le="0.5"} 150
sockudo_http_request_duration_seconds_bucket{method="POST",endpoint="/events",le="1.0"} 200
sockudo_http_request_duration_seconds_sum{method="POST",endpoint="/events"} 45.2
sockudo_http_request_duration_seconds_count{method="POST",endpoint="/events"} 200
# HELP sockudo_broadcast_latency_ms End-to-end latency for broadcast messages in milliseconds
# TYPE sockudo_broadcast_latency_ms histogram
sockudo_broadcast_latency_ms_bucket{app_id="demo-app",port="6001",channel_type="public",recipient_count_bucket="md",le="1"} 850
sockudo_broadcast_latency_ms_bucket{app_id="demo-app",port="6001",channel_type="public",recipient_count_bucket="md",le="2.5"} 920
sockudo_broadcast_latency_ms_bucket{app_id="demo-app",port="6001",channel_type="public",recipient_count_bucket="md",le="5"} 980
sockudo_broadcast_latency_ms_bucket{app_id="demo-app",port="6001",channel_type="public",recipient_count_bucket="md",le="10"} 995
sockudo_broadcast_latency_ms_bucket{app_id="demo-app",port="6001",channel_type="public",recipient_count_bucket="md",le="+Inf"} 1000
sockudo_broadcast_latency_ms_sum{app_id="demo-app",port="6001",channel_type="public",recipient_count_bucket="md"} 2341.5
sockudo_broadcast_latency_ms_count{app_id="demo-app",port="6001",channel_type="public",recipient_count_bucket="md"} 1000
Security Considerations
Network Access
The metrics endpoint should be secured and only accessible to monitoring systems:
{
"metrics": {
"host": "127.0.0.1", // Only local access
"port": 9601
}
}
Firewall Configuration
Configure your firewall to restrict access to the metrics port:
# Allow only monitoring server
iptables -A INPUT -p tcp --dport 9601 -s 10.0.1.100 -j ACCEPT
iptables -A INPUT -p tcp --dport 9601 -j DROP
Reverse Proxy Protection
Use a reverse proxy to add authentication:
server {
listen 9602;
location /metrics {
auth_basic "Metrics";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://localhost:9601/metrics;
}
}
Integration with Monitoring Systems
Prometheus Configuration
Add a scrape job to your prometheus.yml
:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'sockudo'
static_configs:
- targets: ['localhost:9601']
labels:
instance: 'sockudo-1'
environment: 'production'
scrape_interval: 15s
metrics_path: /metrics
scheme: http
For multiple Sockudo instances:
scrape_configs:
- job_name: 'sockudo'
static_configs:
- targets:
- 'sockudo-1.example.com:9601'
- 'sockudo-2.example.com:9601'
- 'sockudo-3.example.com:9601'
labels:
environment: 'production'
scrape_interval: 15s
metrics_path: /metrics
Kubernetes Service Discovery
scrape_configs:
- job_name: 'sockudo'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: sockudo
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
Docker Compose with Prometheus
version: '3.8'
services:
sockudo:
image: sockudo/sockudo:latest
ports:
- "6001:6001"
- "9601:9601"
environment:
- METRICS_ENABLED=true
labels:
- "prometheus.io/scrape=true"
- "prometheus.io/port=9601"
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
Grafana Dashboard
Key Panels for Sockudo Dashboard
- Connection Overview
- Active connections gauge
- Connection rate graph
- Connections per app
- Message Throughput
- Messages sent/received rate
- Client events rate
- Broadcast message rate
- HTTP API Performance
- Request rate
- Response time percentiles
- Error rate
- Channel Activity
- Active channels
- Subscription/unsubscription rates
- Presence channel members
- System Health
- Rate limit triggers
- Queue depth (if using queues)
- Cache hit rate
- Webhook success rate
Example Grafana Queries
# Active connections per app
sockudo_connected
# Message rate (5-minute average)
rate(sockudo_ws_messages_sent_total[5m])
# HTTP API calls rate
rate(sockudo_http_calls_received_total[5m])
# Broadcast latency percentiles by recipient count bucket (v2.6.1+)
histogram_quantile(0.50, rate(sockudo_broadcast_latency_ms_bucket[5m])) by (recipient_count_bucket)
histogram_quantile(0.95, rate(sockudo_broadcast_latency_ms_bucket[5m])) by (recipient_count_bucket)
histogram_quantile(0.99, rate(sockudo_broadcast_latency_ms_bucket[5m])) by (recipient_count_bucket)
# Average broadcast latency by channel type
rate(sockudo_broadcast_latency_ms_sum[5m]) / rate(sockudo_broadcast_latency_ms_count[5m]) by (channel_type)
# Broadcast latency heatmap (for Grafana heatmap panel)
rate(sockudo_broadcast_latency_ms_bucket[5m])
Alerting Rules
Prometheus Alerting Rules
groups:
- name: sockudo_alerts
rules:
- alert: SockudoHighConnections
expr: sockudo_connected > 1000
for: 5m
labels:
severity: warning
annotations:
summary: "High connection count on {{ $labels.instance }}"
description: "Active connections: {{ $value }}"
- alert: SockudoInstanceDown
expr: up{job="sockudo"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Sockudo instance down"
description: "Instance {{ $labels.instance }} is down"
# Broadcast Performance Alert (v2.6.1+)
- alert: SockudoHighBroadcastLatency
expr: histogram_quantile(0.95, rate(sockudo_broadcast_latency_ms_bucket[5m])) > 100
for: 5m
labels:
severity: warning
annotations:
summary: "High broadcast latency on {{ $labels.instance }}"
description: "95th percentile broadcast latency is {{ $value }}ms for {{ $labels.recipient_count_bucket }} recipient bucket"
- alert: SockudoVeryHighBroadcastLatency
expr: histogram_quantile(0.99, rate(sockudo_broadcast_latency_ms_bucket[5m])) > 500
for: 2m
labels:
severity: critical
annotations:
summary: "Very high broadcast latency on {{ $labels.instance }}"
description: "99th percentile broadcast latency is {{ $value }}ms for {{ $labels.recipient_count_bucket }} recipient bucket"
Performance Impact
Metrics Collection Overhead
- CPU: Minimal overhead (< 1% typically)
- Memory: Small memory footprint for metric storage
- Network: Metrics endpoint only accessed when scraped
Optimization Tips
- Adjust scrape interval based on your needs (15s-60s typical)
- Use recording rules for complex queries
- Monitor metrics cardinality to avoid high-cardinality labels
- Configure appropriate retention for historical data
Troubleshooting
Common Issues
Metrics Endpoint Not Accessible
- Check if metrics are enabled:
"enabled": true
- Verify host and port configuration
- Check firewall rules
- Test endpoint:
curl http://localhost:9601/metrics
No Metrics Data
- Verify Sockudo is receiving traffic
- Check metric prefix configuration
- Ensure Prometheus is scraping correctly
- Check Sockudo logs for metric errors
High Cardinality Issues
- Monitor number of unique label combinations
- Avoid user IDs or session IDs as labels
- Use histogram buckets appropriately
- Consider metric sampling for high-volume metrics
Debug Commands
# Check if metrics endpoint is working
curl http://localhost:9601/metrics | head -20
# Check specific metrics
curl http://localhost:9601/metrics | grep sockudo_active_connections
# Verify Prometheus scraping
curl http://prometheus:9090/api/v1/targets
# Check metrics in Prometheus
curl 'http://prometheus:9090/api/v1/query?query=sockudo_active_connections'
Best Practices
Metrics Design
- Use consistent naming conventions
- Include relevant labels (app_id, instance, etc.)
- Avoid high-cardinality labels
- Use appropriate metric types (counter, gauge, histogram)
Monitoring Strategy
- Monitor key business metrics (connections, messages)
- Set up meaningful alerts with appropriate thresholds
- Use dashboards for operational visibility
- Regular review of metrics and alerts
Security
- Restrict metrics endpoint access
- Use authentication for sensitive environments
- Monitor metrics access logs
- Regular security updates for monitoring stack
The metrics system provides valuable insights into Sockudo's performance and health, enabling proactive monitoring and troubleshooting of your real-time messaging infrastructure.