volar-docs/docs/telemetry-and-observability.md

# Telemetry & Observability for Volar Language Servers

> [Docs Index](README.md) • [Repo README](../README.md) • [Performance Guide](performance-and-debugging.md) • [Error Handling](error-handling-and-resilience.md)

Robust observability helps teams detect regressions, diagnose user issues, and justify performance work. This guide covers every capability available to Volar-based servers: telemetry events, logging, progress reporting, health probes, and integration tips for popular editors.

## Observability Building Blocks

| Tool | Purpose | APIs |
| --- | --- | --- |
| Console logging | Developer-facing output in editor logs | `connection.console.{info,warn,error}` |
| Telemetry events | Structured analytics consumed by clients | `connection.telemetry.logEvent` |
| Work done progress | User-facing progress bars for long tasks | `connection.window.createWorkDoneProgress` |
| Diagnostics refresh | Force client to re-query diagnostics | `connection.languages.diagnostics.refresh()` |
| Custom notifications | Surface warnings/errors to users | `connection.window.showWarningMessage` |

## Logging Strategies

### 1. Namespaced Logs

Prefix logs with your server/component so users can filter easily:

```ts
const log = (level: 'info' | 'warn' | 'error', message: string, payload?: unknown) => {
  const text = `[json-yaml:${level}] ${message}${payload ? ` ${JSON.stringify(payload)}` : ''}`;
  connection.console[level](text);
};
```

- Use `info` for high-level events (project load, schema cache hits).
- Use `warn` for recoverable issues (fallback to default schema).
- Use `error` for critical failures—ideally accompanied by a user notification.

### 2. Log Levels via Settings

Expose a `logLevel` configuration (`off | error | warn | info | debug`) so users can control verbosity:

```ts
if (config.logLevel === 'debug') {
  log('info', 'Schema fetch start', { uri });
}
```

### 3. Structured Payloads

Include relevant context in JSON to aid parsing:

```json
{
  "timestamp": "...",
  "event": "schema.fetch",
  "uri": "http://schemas/foo.json",
  "durationMs": 123,
  "success": true
}
```

## Telemetry Events

Telemetry is optional and client-controlled; always check capabilities:

```ts
const telemetrySupported = server.initializeParams.capabilities?.experimental?.telemetry === true;

if (telemetrySupported) {
  connection.telemetry.logEvent({
    type: 'json-yaml.schemaFetch',
    uri,
    durationMs,
    success,
  });
}
```

### Event Design Principles

1. **No PII** – never include actual source code or user-specific paths unless hashed/anonymized.
2. **Actionable** – log events that can drive product decisions (schema fetch failures, TypeScript reloads, “workspace diagnostics took > 5s”).
3. **Stable schema** – define event names and payload shapes up front; changing them frequently breaks dashboards.

### Recommended Events

| Event | Trigger | Payload |
| --- | --- | --- |
| `schema.fetch` | Schema download (success + failure) | `{ uri, durationMs, success }` |
| `diagnostics.publish` | After `sendDiagnostics` | `{ uri, count, durationMs }` |
| `workspaceDiagnostics.run` | Workspace diagnostics completed | `{ documentCount, durationMs }` |
| `configuration.apply` | New config applied | `{ success, changedKeys }` |
| `takeOverMode.warning` | Detected conflicting TS server | `{ message }` |

### Sample Telemetry Wiring (VS Code Extension)

```ts
const telemetryReporter = new TelemetryReporter('volar-extension', version, aiKey);

connection.telemetry.logEvent = (event) => {
  telemetryReporter.sendTelemetryEvent(event.type, sanitize(event));
};

function sanitize(event: AnyEvent) {
  return {
    ...event,
    uri: undefined, // avoid sending raw file paths
    timestamp: new Date().toISOString(),
  };
}
```

- Use Azure App Insights, Segment, or any analytics platform that matches your privacy requirements.
- Strip PII (file paths, code snippets) before forwarding events.

### Dashboard Example

Track key metrics with a dashboard (e.g., Grafana/Looker):

| Widget | Description |
| --- | --- |
| Schema Fetch Success % | `success` count / total by day; alert if < 95%. |
| Diagnostics Duration P95 | Box plot of `diagnostics.publish.durationMs`. |
| Workspace Diagnostics Runs | Number per workspace; spikes may indicate user confusion. |
| Top Error Messages | Grouped count of `takeOverMode.warning` and other errors. |

Use these dashboards to catch regressions (e.g., schema outages, slow diagnostics) before users report them.

## Work Done Progress & Notifications

For long-running operations (initial project load, large workspace diagnostics), display progress so users know work is happening.

```ts
async function withProgress(title: string, task: () => Promise<void>) {
  const progress = await connection.window.createWorkDoneProgress();
  progress.begin(title, 0);
  try {
    await task();
    progress.report(100, 'Complete');
  } finally {
    progress.done();
  }
}
```

Also use `connection.window.showWarningMessage` / `showErrorMessage` for issues requiring user intervention (missing schema files, invalid config).

## Health Checks & Metrics

Consider exposing internal health metrics for CI or headless environments:

```ts
connection.onRequest('volar/health', async () => ({
  openDocuments: server.documents.all().length,
  workspaceFolders: server.workspaceFolders.all.length,
  lastDiagnosticsMs: metrics.lastDiagnosticsDuration,
}));
```

Integrate this request into smoke tests or CI to ensure the server responds with sane values.

## Editor-Specific Considerations

### VS Code

- Use `connection.window.showInformationMessage` sparingly; prefer progress notifications and logs.
- Provide commands that surface diagnostics/profiling info for debugging (e.g., “Volar: Show Server Logs”).

### Neovim

- Expose an RPC command to toggle verbose logging at runtime.
- Append logs to a file (e.g., `~/.cache/volar-nvim/volar.log`) so users can share them in bug reports.

### CLI

- Provide `--log-level` and `--log-file` flags.
- For headless usage, print JSON logs to stdout so downstream automation can parse them.

## Sampling & Rate Limiting

- For frequent events (per-keystroke completions), sample logs/telemetry to reduce noise:

```ts
if (Math.random() < 0.1) {
  connection.telemetry.logEvent({ type: 'completion.run', durationMs });
}
```

- Throttle repeated warnings (e.g., schema fetch failures) to once per URI per minute to avoid spamming logs.

## Error Reporting

When an unexpected exception occurs:

1. Log the stack via `connection.console.error`.
2. Emit telemetry (if enabled) with a sanitized version of the error.
3. Notify the user when action is needed (“Failed to load schema; see Output ▸ Volar for details”).

## Observability Checklist

1. **Logging** – Namespaced helper with log levels; logs include key metadata.
2. **Telemetry** – Optional, PII-free events for actions that matter (schema fetch, diagnostics).
3. **Progress** – Work done progress notifications for long-running operations.
4. **Notifications** – Friendly user messages for actionable issues.
5. **Health endpoint** – Optional `volar/health` request for automated monitors.
6. **Configurable verbosity** – Users can toggle log level / telemetry participation.
7. **Sampling** – Applied to high-frequency events to avoid flooding logs.
8. **Documentation** – Tell users how to capture logs/telemetry, attach them to bug reports, and opt out if desired.

Instrument early and consistently—observability is far easier to add when your server is small than when you’re firefighting production issues.