Chapter 13: Introspection and Monitoring

Running a database in production requires visibility. You need to understand how it's performing, how much memory it's using, identify slow operations, and integrate it into your existing monitoring infrastructure. SpinelDB provides a powerful suite of tools for introspection and monitoring, compatible with the standard Redis commands and modern observability platforms like Prometheus.

1. General Server Statistics (`INFO`)

The INFO command is your primary tool for getting a comprehensive snapshot of the server's state and statistics. It returns a human-readable text block containing sections of information.

Command: INFO [section]

You can request all information at once or ask for a specific section, such as server, replication, memory, or stats.

Example Session

127.0.0.1:7878> INFO
# Server
spineldb_version:0.1.0
tcp_port:7878

# Replication
role:master
master_replid:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2
master_repl_offset:123456
connected_slaves:1

# Memory
used_memory:536870912
used_memory_human:512.00M
maxmemory:1073741824

# Stats
total_connections_received:150
total_commands_processed:12543

... and more sections ...

# Requesting a specific section
127.0.0.1:7878> INFO memory
# Memory
used_memory:536870912
used_memory_human:512.00M
maxmemory:1073741824

2. Analyzing Command Latency (`SLOWLOG` and `LATENCY`)

Slow commands can be a major source of performance issues, as they can block other clients from being served. SpinelDB provides tools to identify and analyze these slow operations.

`SLOWLOG`: Finding Slow Commands

The SLOWLOG keeps a running log of commands that exceeded a configured execution time threshold.

Commands:

SLOWLOG GET [count]: Retrieves the count most recent slow log entries (default is 10).
SLOWLOG LEN: Returns the number of entries in the slow log.
SLOWLOG RESET: Clears the slow log.

Example Session

# Get the two most recent slow commands
127.0.0.1:7878> SLOWLOG GET 2
1) 1) (integer) 128      # Unique entry ID
   2) (integer) 1679591234 # Unix timestamp
   3) (integer) 15234      # Execution time in microseconds
   4) 1) "KEYS"
      2) "*"
2) 1) (integer) 127
   2) (integer) 1679591100
   3) (integer) 8500
   4) 1) "SINTERSTORE"
      2) "result-key"
      3) "set1"
      4) "set2"

This is an invaluable tool for debugging performance bottlenecks in your application.

`LATENCY`: Advanced Latency Analysis

The LATENCY command provides tools for more advanced, real-time latency analysis.

Commands:

LATENCY HISTORY <event>: Shows a time-series graph of latency for a specific command (e.g., get, set).
LATENCY DOCTOR: Provides a human-readable report with suggestions based on the server's observed latency patterns.

Example Session

127.0.0.1:7878> LATENCY DOCTOR
SpinelDB Latency Doctor
- Max latency so far: 15234 microseconds.
- High latency is often caused by:
  - Slow commands. Use SLOWLOG to inspect your slow commands.
  - AOF fsync blocking the main thread. Check your fsync policy.
  - High system load. Check CPU and I/O usage.

3. Monitoring with Prometheus (`/metrics`)

For modern, automated monitoring, SpinelDB includes a built-in Prometheus exporter. When enabled in config.toml, SpinelDB starts a small HTTP server on a separate port that exposes a /metrics endpoint.

Configuration

Enable the metrics server in your config.toml:

# In your config.toml

[metrics]
enabled = true
port = 8878

After restarting the server, you can access the metrics endpoint with a tool like curl or by pointing your Prometheus instance to it.

curl http://127.0.0.1:8878/metrics

Exposed Metrics

The endpoint exposes a wide range of useful metrics, including:

spineldb_connected_clients: The current number of connected clients (Gauge).
spineldb_memory_used_bytes: Total memory used by the database (Gauge).
spineldb_commands_processed_total: A running count of all commands (Counter).
spineldb_cache_hits_total & spineldb_cache_misses_total: Counters for the Intelligent Caching Engine.
spineldb_command_latency_seconds: A histogram of command latencies, perfect for calculating percentiles (e.g., p99) in Grafana.
...and many more.

Integrating SpinelDB with Prometheus and Grafana provides a powerful, industry-standard solution for visualizing trends, creating dashboards, and setting up alerts to ensure your database instances are always healthy and performant.

⬅️ Previous Chapter: 12. Publish/Subscribe Messaging➡️ Next Chapter: 14. Persistence and Backup

1. General Server Statistics (INFO)​

Example Session​

2. Analyzing Command Latency (SLOWLOG and LATENCY)​

SLOWLOG: Finding Slow Commands​

Example Session​

LATENCY: Advanced Latency Analysis​

Example Session​

3. Monitoring with Prometheus (/metrics)​

Configuration​

Exposed Metrics​

1. General Server Statistics (`INFO`)

Example Session

2. Analyzing Command Latency (`SLOWLOG` and `LATENCY`)

`SLOWLOG`: Finding Slow Commands

Example Session

`LATENCY`: Advanced Latency Analysis

Example Session

3. Monitoring with Prometheus (`/metrics`)

Configuration

Exposed Metrics