Skip to main content

Observability

ToolJet supports OpenTelemetry (OTEL) for comprehensive observability, enabling you to monitor application performance, track query executions, and analyze system health through metrics.

Categories of Metrics

  1. App-Based Metrics - Monitor the performance and reliability of individual ToolJet applications. These metrics include detailed labels such as app_name, query_name, environment, query_text, and query_mode (SQL/GUI) for fine-grained analysis.
    • Query Executions: Track total query executions per application
    • Query Duration: Measure query execution times with histogram buckets
    • Query Failures: Monitor failed queries with error categorization
    • Success Rates: Application-level success rate percentages
    • App Usage: Track application access and interaction events

  2. Platform-Based Metrics - Monitor the overall health and performance of your ToolJet instance:
    • HTTP Server Metrics: Request rates, response times, status codes
    • API Performance: Endpoint-specific latency and throughput
    • Database Operations: Query execution times and connection health
    • Node.js Runtime: Event loop delays, garbage collection, memory usage
    • V8 Memory: Heap usage and external memory tracking

Configuration

Enable OpenTelemetry by setting the following environment variables in your ToolJet deployment:

Required Variables

# Enable OpenTelemetry metrics collection
ENABLE_OTEL=true

Optional Variables

# OTLP Endpoint Configuration
OTEL_EXPORTER_OTLP_TRACES=http://localhost:4318/v1/traces
OTEL_EXPORTER_OTLP_METRICS=http://localhost:4318/v1/metrics

# Service Identification
OTEL_SERVICE_NAME=tooljet

# Authentication (if required by your OTEL collector)
OTEL_EXPORTER_OTLP_HEADERS=api-key=your-api-key

# Advanced Configuration
OTEL_LOG_LEVEL=debug # Enable debug logging for OTEL
OTEL_ACTIVE_USER_WINDOW_MINUTES=5 # Activity window for concurrent user tracking (default: 5)
OTEL_MAX_TRACKED_USERS=10000 # Maximum tracked users/sessions (default: 10000)

# WARNING: High Cardinality - Only enable for debugging
OTEL_INCLUDE_QUERY_TEXT=false # Include actual query text in metrics (default: false)
# Creates HIGH CARDINALITY - use OTEL Collector to drop in production

For a complete list of OpenTelemetry environment variables, refer to the OpenTelemetry documentation.

Setup Examples

Local OTEL Collector

Deploy an OpenTelemetry Collector alongside ToolJet to receive and forward metrics:

# docker-compose.yml excerpt
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4318:4318" # OTLP HTTP receiver
- "8889:8889" # Prometheus exporter

Grafana Cloud

Configure ToolJet to send metrics directly to Grafana Cloud:

ENABLE_OTEL=true
OTEL_EXPORTER_OTLP_TRACES=https://otlp-gateway-prod-us-central-0.grafana.net/otlp/v1/traces
OTEL_EXPORTER_OTLP_METRICS=https://otlp-gateway-prod-us-central-0.grafana.net/otlp/v1/metrics
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64-encoded-credentials>
OTEL_SERVICE_NAME=tooljet-production

Datadog

Send metrics to Datadog using the OTLP endpoint:

ENABLE_OTEL=true
OTEL_EXPORTER_OTLP_TRACES=https://api.datadoghq.com/v1/traces
OTEL_EXPORTER_OTLP_METRICS=https://api.datadoghq.com/v1/metrics
OTEL_EXPORTER_OTLP_HEADERS=dd-api-key=<your-datadog-api-key>
OTEL_SERVICE_NAME=tooljet

New Relic

Configure for New Relic OTLP endpoint:

ENABLE_OTEL=true
OTEL_EXPORTER_OTLP_TRACES=https://otlp.nr-data.net:4318/v1/traces
OTEL_EXPORTER_OTLP_METRICS=https://otlp.nr-data.net:4318/v1/metrics
OTEL_EXPORTER_OTLP_HEADERS=api-key=<your-newrelic-license-key>
OTEL_SERVICE_NAME=tooljet

Grafana Dashboards

ToolJet provides two pre-built Grafana dashboards for visualizing metrics:

Per-App Metrics Dashboard

Download the dashboard:

curl -O https://tooljet-deployments.s3.us-west-1.amazonaws.com/tooljet-app-dashboard.json

This dashboard focuses on application-specific metrics and includes:

  • App Overview: Total query executions, success rate gauge, p95 latency, failure counts
  • Query Performance: Execution rates by query, latency percentiles, data source breakdown
  • Top Queries: Most executed queries, slowest queries (p95), most failed queries
  • Environment Filtering: Filter by app name, environment (production/staging/development), and mode (view/edit)

The dashboard automatically extracts query text and environment names for immediate debugging without consulting logs.

Platform Ultimate Dashboard

Download the dashboard:

curl -O https://tooljet-deployments.s3.us-west-1.amazonaws.com/tooljet-platform-dashboard.json

This dashboard provides comprehensive platform monitoring:

  • System Health: P95 response time, request rate, error rate, total requests
  • API Analytics: Traffic distribution, top endpoints by hits, slowest endpoints
  • Performance Trends: Multi-percentile response time analysis (P50, P95, P99)
  • Status Codes: Success/error distribution over time
  • Database Performance: Query execution times, connection health
  • Runtime Metrics: Node.js event loop, GC performance, V8 memory usage
  • Distributed Tracing: Integration with Jaeger for trace viewing

Importing Dashboards

To import the Grafana dashboards:

  1. Download the dashboard JSON files:
    # Download App-Based Metrics Dashboard
    curl -O https://tooljet-deployments.s3.us-west-1.amazonaws.com/tooljet-app-dashboard.json

    # Download Platform-Based Metrics Dashboard
    curl -O https://tooljet-deployments.s3.us-west-1.amazonaws.com/tooljet-platform-dashboard.json
  2. Open Grafana and navigate to DashboardsImport
  3. Click Upload JSON file and select the downloaded dashboard JSON file
  4. Select your Prometheus data source
  5. Click Import

The dashboards will be immediately available with real-time data from your ToolJet instance.

Production Considerations

High Cardinality Warning

The app-based metrics can optionally include a query_text label that contains the actual SQL or query content. By default, this is disabled to prevent high cardinality issues.

Enabling Query Text (For Debugging Only)

To enable query text in metrics for debugging purposes:

OTEL_INCLUDE_QUERY_TEXT=true
warning

Enabling query_text creates high cardinality time series that can significantly impact Prometheus storage and query performance. Only enable this temporarily for debugging specific query issues.

Production Best Practices

If you must enable query_text in production:

  1. Use an OTEL Collector to drop the label before metrics reach Prometheus:
# otel-collector-config.yaml
processors:
attributes:
actions:
- key: query_text
action: delete

service:
pipelines:
metrics:
receivers: [otlp]
processors: [attributes]
exporters: [prometheus]
  1. Alternative: Hash the query text to reduce cardinality:
processors:
transform:
metric_statements:
- context: datapoint
statements:
- set(attributes["query_text"], SHA256(attributes["query_text"]))

Performance Impact

OpenTelemetry metrics collection has minimal performance impact:

  • Metric collection is asynchronous and non-blocking
  • Histogram buckets are pre-configured for optimal performance
  • Observable gauges (like success rates) are updated on a 15-minute interval

Sampling and Filtering

For high-volume deployments, consider:

  • Filtering environments: Only collect metrics from production environments
  • Sampling queries: Use OTEL Collector sampling for high-frequency queries
  • Aggregation: Pre-aggregate metrics at the collector level before storage

Troubleshooting

Metrics Not Appearing

  1. Verify ENABLE_OTEL=true is set
  2. Check OTEL collector endpoint is reachable:
    curl http://localhost:4318/v1/metrics
    curl http://localhost:4318/v1/traces
  3. Review ToolJet server logs for OTEL connection errors
  4. Verify OTEL collector configuration and Prometheus scrape targets

High Memory Usage

If you experience high memory usage:

  1. Remove high-cardinality labels like query_text using OTEL Collector processors
  2. Reduce histogram bucket counts if needed
  3. Implement metric filtering at the collector level
  4. Consider using remote write to offload storage

Missing Labels or Metrics

Ensure you're using ToolJet version 3.16.0-LTS or higher, which includes the full OTEL implementation with both app-based and platform-based metrics.

Additional Resources