Files
volar-docs/docs/telemetry-and-observability.md
2025-11-09 22:22:52 -06:00

7.5 KiB
Raw Permalink Blame History

Telemetry & Observability for Volar Language Servers

Docs IndexRepo READMEPerformance GuideError Handling

Robust observability helps teams detect regressions, diagnose user issues, and justify performance work. This guide covers every capability available to Volar-based servers: telemetry events, logging, progress reporting, health probes, and integration tips for popular editors.

Observability Building Blocks

Tool Purpose APIs
Console logging Developer-facing output in editor logs connection.console.{info,warn,error}
Telemetry events Structured analytics consumed by clients connection.telemetry.logEvent
Work done progress User-facing progress bars for long tasks connection.window.createWorkDoneProgress
Diagnostics refresh Force client to re-query diagnostics connection.languages.diagnostics.refresh()
Custom notifications Surface warnings/errors to users connection.window.showWarningMessage

Logging Strategies

1. Namespaced Logs

Prefix logs with your server/component so users can filter easily:

const log = (level: 'info' | 'warn' | 'error', message: string, payload?: unknown) => {
  const text = `[json-yaml:${level}] ${message}${payload ? ` ${JSON.stringify(payload)}` : ''}`;
  connection.console[level](text);
};
  • Use info for high-level events (project load, schema cache hits).
  • Use warn for recoverable issues (fallback to default schema).
  • Use error for critical failures—ideally accompanied by a user notification.

2. Log Levels via Settings

Expose a logLevel configuration (off | error | warn | info | debug) so users can control verbosity:

if (config.logLevel === 'debug') {
  log('info', 'Schema fetch start', { uri });
}

3. Structured Payloads

Include relevant context in JSON to aid parsing:

{
  "timestamp": "...",
  "event": "schema.fetch",
  "uri": "http://schemas/foo.json",
  "durationMs": 123,
  "success": true
}

Telemetry Events

Telemetry is optional and client-controlled; always check capabilities:

const telemetrySupported = server.initializeParams.capabilities?.experimental?.telemetry === true;

if (telemetrySupported) {
  connection.telemetry.logEvent({
    type: 'json-yaml.schemaFetch',
    uri,
    durationMs,
    success,
  });
}

Event Design Principles

  1. No PII never include actual source code or user-specific paths unless hashed/anonymized.
  2. Actionable log events that can drive product decisions (schema fetch failures, TypeScript reloads, “workspace diagnostics took > 5s”).
  3. Stable schema define event names and payload shapes up front; changing them frequently breaks dashboards.
Event Trigger Payload
schema.fetch Schema download (success + failure) { uri, durationMs, success }
diagnostics.publish After sendDiagnostics { uri, count, durationMs }
workspaceDiagnostics.run Workspace diagnostics completed { documentCount, durationMs }
configuration.apply New config applied { success, changedKeys }
takeOverMode.warning Detected conflicting TS server { message }

Sample Telemetry Wiring (VS Code Extension)

const telemetryReporter = new TelemetryReporter('volar-extension', version, aiKey);

connection.telemetry.logEvent = (event) => {
  telemetryReporter.sendTelemetryEvent(event.type, sanitize(event));
};

function sanitize(event: AnyEvent) {
  return {
    ...event,
    uri: undefined, // avoid sending raw file paths
    timestamp: new Date().toISOString(),
  };
}
  • Use Azure App Insights, Segment, or any analytics platform that matches your privacy requirements.
  • Strip PII (file paths, code snippets) before forwarding events.

Dashboard Example

Track key metrics with a dashboard (e.g., Grafana/Looker):

Widget Description
Schema Fetch Success % success count / total by day; alert if < 95%.
Diagnostics Duration P95 Box plot of diagnostics.publish.durationMs.
Workspace Diagnostics Runs Number per workspace; spikes may indicate user confusion.
Top Error Messages Grouped count of takeOverMode.warning and other errors.

Use these dashboards to catch regressions (e.g., schema outages, slow diagnostics) before users report them.

Work Done Progress & Notifications

For long-running operations (initial project load, large workspace diagnostics), display progress so users know work is happening.

async function withProgress(title: string, task: () => Promise<void>) {
  const progress = await connection.window.createWorkDoneProgress();
  progress.begin(title, 0);
  try {
    await task();
    progress.report(100, 'Complete');
  } finally {
    progress.done();
  }
}

Also use connection.window.showWarningMessage / showErrorMessage for issues requiring user intervention (missing schema files, invalid config).

Health Checks & Metrics

Consider exposing internal health metrics for CI or headless environments:

connection.onRequest('volar/health', async () => ({
  openDocuments: server.documents.all().length,
  workspaceFolders: server.workspaceFolders.all.length,
  lastDiagnosticsMs: metrics.lastDiagnosticsDuration,
}));

Integrate this request into smoke tests or CI to ensure the server responds with sane values.

Editor-Specific Considerations

VS Code

  • Use connection.window.showInformationMessage sparingly; prefer progress notifications and logs.
  • Provide commands that surface diagnostics/profiling info for debugging (e.g., “Volar: Show Server Logs”).

Neovim

  • Expose an RPC command to toggle verbose logging at runtime.
  • Append logs to a file (e.g., ~/.cache/volar-nvim/volar.log) so users can share them in bug reports.

CLI

  • Provide --log-level and --log-file flags.
  • For headless usage, print JSON logs to stdout so downstream automation can parse them.

Sampling & Rate Limiting

  • For frequent events (per-keystroke completions), sample logs/telemetry to reduce noise:
if (Math.random() < 0.1) {
  connection.telemetry.logEvent({ type: 'completion.run', durationMs });
}
  • Throttle repeated warnings (e.g., schema fetch failures) to once per URI per minute to avoid spamming logs.

Error Reporting

When an unexpected exception occurs:

  1. Log the stack via connection.console.error.
  2. Emit telemetry (if enabled) with a sanitized version of the error.
  3. Notify the user when action is needed (“Failed to load schema; see Output ▸ Volar for details”).

Observability Checklist

  1. Logging Namespaced helper with log levels; logs include key metadata.
  2. Telemetry Optional, PII-free events for actions that matter (schema fetch, diagnostics).
  3. Progress Work done progress notifications for long-running operations.
  4. Notifications Friendly user messages for actionable issues.
  5. Health endpoint Optional volar/health request for automated monitors.
  6. Configurable verbosity Users can toggle log level / telemetry participation.
  7. Sampling Applied to high-frequency events to avoid flooding logs.
  8. Documentation Tell users how to capture logs/telemetry, attach them to bug reports, and opt out if desired.

Instrument early and consistently—observability is far easier to add when your server is small than when youre firefighting production issues.