guide/configuration/metrics.md

Metrics Configuration

Sockudo can expose performance metrics that can be scraped by monitoring systems like Prometheus. This allows you to observe the server's behavior, track performance, and set up alerts.

Metrics configuration is managed under the metrics object in your config.json.

Main Metrics Settings

JSON Key (Parent): metrics

`metrics.enabled`

JSON Key: enabled
Environment Variable: METRICS_ENABLED (Set to true or 1)
Type: boolean
Description: Enables or disables the metrics exposition system. If disabled, the metrics endpoint will not be available.
Default Value: true

`metrics.driver`

JSON Key: driver
Environment Variable: METRICS_DRIVER
Type: enum (string)
Description: Specifies the metrics system driver to use.
Default Value: "prometheus"
Possible Values:
- "prometheus": Exposes metrics in a Prometheus-compatible format.

`metrics.host`

JSON Key: host
Environment Variable: METRICS_HOST
Type: string
Description: The IP address the metrics server will listen on. Use 0.0.0.0 to listen on all available network interfaces.
Default Value: "0.0.0.0"

`metrics.port`

JSON Key: port
Environment Variable: METRICS_PORT
Type: integer (u16)
Description: The port number the metrics server will listen on. This is typically different from the main application port.
Default Value: 9601

Example (config.json):

json

{
  "metrics": {
    "enabled": true,
    "driver": "prometheus",
    "host": "0.0.0.0",
    "port": 9601,
    "prometheus": {
      "prefix": "sockudo_"
    }
  }
}

Example (Environment Variables):

bash

export METRICS_ENABLED=true
export METRICS_HOST="0.0.0.0"
export METRICS_PORT=9601

Prometheus Configuration (`metrics.prometheus`)

These settings are applicable if metrics.driver is set to "prometheus".

JSON Key (Parent Object): metrics.prometheus

`metrics.prometheus.prefix`

JSON Key: prefix
Environment Variable: PROMETHEUS_METRICS_PREFIX
Type: string
Description: A prefix that will be added to all metric names exposed by Sockudo. Useful for namespacing in a shared Prometheus instance.
Default Value: "sockudo_"

Example (config.json):

json

{
  "metrics": {
    "enabled": true,
    "driver": "prometheus",
    "host": "0.0.0.0",
    "port": 9601,
    "prometheus": {
      "prefix": "my_company_sockudo_"
    }
  }
}

Environment Variables:

bash

METRICS_ENABLED=true
METRICS_DRIVER=prometheus
METRICS_HOST="0.0.0.0"
METRICS_PORT=9601
PROMETHEUS_METRICS_PREFIX="my_company_sockudo_"

Available Metrics

Sockudo exposes a comprehensive set of metrics for monitoring various aspects of the server's performance:

Connection Metrics

sockudo_active_connections: Current number of active WebSocket connections
sockudo_total_connections: Total number of WebSocket connections established
sockudo_connection_errors_total: Total number of connection errors
sockudo_connections_per_app: Active connections per application

Message Metrics

sockudo_messages_sent_total: Total number of messages sent by the server
sockudo_messages_received_total: Total number of messages received from clients
sockudo_client_events_total: Total number of client events processed
sockudo_broadcast_messages_total: Total number of messages broadcast to channels

HTTP API Metrics

sockudo_http_requests_total: Total number of HTTP API requests
sockudo_http_request_duration_seconds: HTTP request duration histogram
sockudo_http_response_size_bytes: HTTP response size histogram

Channel Metrics

sockudo_active_channels: Current number of active channels
sockudo_channel_subscriptions_total: Total number of channel subscriptions
sockudo_channel_unsubscriptions_total: Total number of channel unsubscriptions
sockudo_presence_members: Current number of members in presence channels

Rate Limiting Metrics

sockudo_rate_limit_triggered_total: Number of times rate limits were triggered
sockudo_rate_limit_checks_total: Total number of rate limit checks performed

Queue Metrics (if queue is enabled)

sockudo_queue_jobs_processed_total: Total number of queue jobs processed
sockudo_queue_jobs_failed_total: Total number of failed queue jobs
sockudo_queue_active_jobs: Current number of jobs in the queue
sockudo_queue_job_duration_seconds: Queue job processing time histogram

Webhook Metrics

sockudo_webhooks_sent_total: Total number of webhooks sent
sockudo_webhooks_failed_total: Total number of failed webhooks
sockudo_webhook_duration_seconds: Webhook request duration histogram

Cache Metrics

sockudo_cache_hits_total: Total number of cache hits
sockudo_cache_misses_total: Total number of cache misses
sockudo_cache_operations_total: Total number of cache operations
sockudo_cache_memory_usage_bytes: Current cache memory usage

Adapter Metrics

sockudo_adapter_operations_total: Total number of adapter operations
sockudo_adapter_errors_total: Total number of adapter errors
sockudo_adapter_latency_seconds: Adapter operation latency histogram

Accessing Metrics

When enabled, metrics are available at the following endpoint:

http://<metrics.host>:<metrics.port>/metrics

For example, with default settings: http://localhost:9601/metrics

Example Metrics Output

# HELP sockudo_active_connections Current number of active connections
# TYPE sockudo_active_connections gauge
sockudo_active_connections{app_id="demo-app"} 42

# HELP sockudo_messages_sent_total Total messages sent
# TYPE sockudo_messages_sent_total counter
sockudo_messages_sent_total{app_id="demo-app",channel_type="public"} 1234

# HELP sockudo_http_request_duration_seconds HTTP request duration
# TYPE sockudo_http_request_duration_seconds histogram
sockudo_http_request_duration_seconds_bucket{method="POST",endpoint="/events",le="0.1"} 100
sockudo_http_request_duration_seconds_bucket{method="POST",endpoint="/events",le="0.5"} 150
sockudo_http_request_duration_seconds_bucket{method="POST",endpoint="/events",le="1.0"} 200
sockudo_http_request_duration_seconds_sum{method="POST",endpoint="/events"} 45.2
sockudo_http_request_duration_seconds_count{method="POST",endpoint="/events"} 200

Security Considerations

Network Access

The metrics endpoint should be secured and only accessible to monitoring systems:

json

{
  "metrics": {
    "host": "127.0.0.1",  // Only local access
    "port": 9601
  }
}

Firewall Configuration

Configure your firewall to restrict access to the metrics port:

bash

# Allow only monitoring server
iptables -A INPUT -p tcp --dport 9601 -s 10.0.1.100 -j ACCEPT
iptables -A INPUT -p tcp --dport 9601 -j DROP

Reverse Proxy Protection

Use a reverse proxy to add authentication:

nginx

server {
    listen 9602;
    location /metrics {
        auth_basic "Metrics";
        auth_basic_user_file /etc/nginx/.htpasswd;
        proxy_pass http://localhost:9601/metrics;
    }
}

Integration with Monitoring Systems

Prometheus Configuration

Add a scrape job to your prometheus.yml:

yaml

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'sockudo'
    static_configs:
      - targets: ['localhost:9601']
        labels:
          instance: 'sockudo-1'
          environment: 'production'
    scrape_interval: 15s
    metrics_path: /metrics
    scheme: http

For multiple Sockudo instances:

yaml

scrape_configs:
  - job_name: 'sockudo'
    static_configs:
      - targets: 
          - 'sockudo-1.example.com:9601'
          - 'sockudo-2.example.com:9601'
          - 'sockudo-3.example.com:9601'
        labels:
          environment: 'production'
    scrape_interval: 15s
    metrics_path: /metrics

Kubernetes Service Discovery

yaml

scrape_configs:
  - job_name: 'sockudo'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        action: keep
        regex: sockudo
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2

Docker Compose with Prometheus

yaml

version: '3.8'
services:
  sockudo:
    image: sockudo/sockudo:latest
    ports:
      - "6001:6001"
      - "9601:9601"
    environment:
      - METRICS_ENABLED=true
    labels:
      - "prometheus.io/scrape=true"
      - "prometheus.io/port=9601"

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'

Grafana Dashboard

Key Panels for Sockudo Dashboard

Connection Overview

Active connections gauge
Connection rate graph
Connections per app

Message Throughput

Messages sent/received rate
Client events rate
Broadcast message rate

HTTP API Performance

Request rate
Response time percentiles
Error rate

Channel Activity

Active channels
Subscription/unsubscription rates
Presence channel members

System Health

Rate limit triggers
Queue depth (if using queues)
Cache hit rate
Webhook success rate

Example Grafana Queries

promql

# Active connections per app
sockudo_active_connections

# Message rate (5-minute average)
rate(sockudo_messages_sent_total[5m])

# 95th percentile response time
histogram_quantile(0.95, rate(sockudo_http_request_duration_seconds_bucket[5m]))

# Error rate percentage
rate(sockudo_http_requests_total{status=~"5.."}[5m]) / rate(sockudo_http_requests_total[5m]) * 100

# Cache hit rate
rate(sockudo_cache_hits_total[5m]) / (rate(sockudo_cache_hits_total[5m]) + rate(sockudo_cache_misses_total[5m])) * 100

Alerting Rules

Prometheus Alerting Rules

yaml

groups:
  - name: sockudo_alerts
    rules:
      - alert: SockudoHighErrorRate
        expr: rate(sockudo_http_requests_total{status=~"5.."}[5m]) / rate(sockudo_http_requests_total[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High error rate on Sockudo instance {{ $labels.instance }}"
          description: "Error rate is {{ $value | humanizePercentage }}"

      - alert: SockudoHighConnections
        expr: sockudo_active_connections > 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High connection count on {{ $labels.instance }}"
          description: "Active connections: {{ $value }}"

      - alert: SockudoInstanceDown
        expr: up{job="sockudo"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Sockudo instance down"
          description: "Instance {{ $labels.instance }} is down"

      - alert: SockudoHighLatency
        expr: histogram_quantile(0.95, rate(sockudo_http_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High latency on Sockudo"
          description: "95th percentile latency is {{ $value }}s"

      - alert: SockudoQueueBacklog
        expr: sockudo_queue_active_jobs > 1000
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High queue backlog"
          description: "Queue has {{ $value }} pending jobs"

Performance Impact

Metrics Collection Overhead

CPU: Minimal overhead (< 1% typically)
Memory: Small memory footprint for metric storage
Network: Metrics endpoint only accessed when scraped

Optimization Tips

Adjust scrape interval based on your needs (15s-60s typical)
Use recording rules for complex queries
Monitor metrics cardinality to avoid high-cardinality labels
Configure appropriate retention for historical data

Troubleshooting

Common Issues

Metrics Endpoint Not Accessible

Check if metrics are enabled: "enabled": true
Verify host and port configuration
Check firewall rules
Test endpoint: curl http://localhost:9601/metrics

No Metrics Data

Verify Sockudo is receiving traffic
Check metric prefix configuration
Ensure Prometheus is scraping correctly
Check Sockudo logs for metric errors

High Cardinality Issues

Monitor number of unique label combinations
Avoid user IDs or session IDs as labels
Use histogram buckets appropriately
Consider metric sampling for high-volume metrics

Debug Commands

bash

# Check if metrics endpoint is working
curl http://localhost:9601/metrics | head -20

# Check specific metrics
curl http://localhost:9601/metrics | grep sockudo_active_connections

# Verify Prometheus scraping
curl http://prometheus:9090/api/v1/targets

# Check metrics in Prometheus
curl 'http://prometheus:9090/api/v1/query?query=sockudo_active_connections'

Best Practices

Metrics Design

Use consistent naming conventions
Include relevant labels (app_id, instance, etc.)
Avoid high-cardinality labels
Use appropriate metric types (counter, gauge, histogram)

Monitoring Strategy

Monitor key business metrics (connections, messages)
Set up meaningful alerts with appropriate thresholds
Use dashboards for operational visibility
Regular review of metrics and alerts

Security

Restrict metrics endpoint access
Use authentication for sensitive environments
Monitor metrics access logs
Regular security updates for monitoring stack

The metrics system provides valuable insights into Sockudo's performance and health, enabling proactive monitoring and troubleshooting of your real-time messaging infrastructure.

guide/configuration/metrics.md ​

Metrics Configuration ​

Main Metrics Settings ​

metrics.enabled ​

metrics.driver ​

metrics.host ​

metrics.port ​

Prometheus Configuration (metrics.prometheus) ​

metrics.prometheus.prefix ​

Available Metrics ​

Connection Metrics ​

Message Metrics ​

HTTP API Metrics ​

Channel Metrics ​

Rate Limiting Metrics ​

Queue Metrics (if queue is enabled) ​

Webhook Metrics ​

Cache Metrics ​

Adapter Metrics ​

Accessing Metrics ​

Example Metrics Output ​

Security Considerations ​

Network Access ​

Firewall Configuration ​

Reverse Proxy Protection ​

Integration with Monitoring Systems ​

Prometheus Configuration ​

Kubernetes Service Discovery ​

Docker Compose with Prometheus ​

Grafana Dashboard ​

Key Panels for Sockudo Dashboard ​

Example Grafana Queries ​

Alerting Rules ​

Prometheus Alerting Rules ​

Performance Impact ​

Metrics Collection Overhead ​

Optimization Tips ​

Troubleshooting ​

Common Issues ​

Metrics Endpoint Not Accessible ​

No Metrics Data ​

High Cardinality Issues ​

Debug Commands ​

Best Practices ​

Metrics Design ​

Monitoring Strategy ​

Security ​

guide/configuration/metrics.md

Metrics Configuration

Main Metrics Settings

`metrics.enabled`

`metrics.driver`

`metrics.host`

`metrics.port`

Prometheus Configuration (`metrics.prometheus`)

`metrics.prometheus.prefix`

Available Metrics

Connection Metrics

Message Metrics

HTTP API Metrics

Channel Metrics

Rate Limiting Metrics

Queue Metrics (if queue is enabled)

Webhook Metrics

Cache Metrics

Adapter Metrics

Accessing Metrics

Example Metrics Output

Security Considerations

Network Access

Firewall Configuration

Reverse Proxy Protection

Integration with Monitoring Systems

Prometheus Configuration

Kubernetes Service Discovery

Docker Compose with Prometheus

Grafana Dashboard

Key Panels for Sockudo Dashboard

Example Grafana Queries

Alerting Rules

Prometheus Alerting Rules

Performance Impact

Metrics Collection Overhead

Optimization Tips

Troubleshooting

Common Issues

Metrics Endpoint Not Accessible

No Metrics Data

High Cardinality Issues

Debug Commands

Best Practices

Metrics Design

Monitoring Strategy

Security