Files
volar-docs/docs/telemetry-and-observability.md
2025-11-09 22:22:52 -06:00

207 lines
7.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Telemetry & Observability for Volar Language Servers
> [Docs Index](README.md) • [Repo README](../README.md) • [Performance Guide](performance-and-debugging.md) • [Error Handling](error-handling-and-resilience.md)
Robust observability helps teams detect regressions, diagnose user issues, and justify performance work. This guide covers every capability available to Volar-based servers: telemetry events, logging, progress reporting, health probes, and integration tips for popular editors.
## Observability Building Blocks
| Tool | Purpose | APIs |
| --- | --- | --- |
| Console logging | Developer-facing output in editor logs | `connection.console.{info,warn,error}` |
| Telemetry events | Structured analytics consumed by clients | `connection.telemetry.logEvent` |
| Work done progress | User-facing progress bars for long tasks | `connection.window.createWorkDoneProgress` |
| Diagnostics refresh | Force client to re-query diagnostics | `connection.languages.diagnostics.refresh()` |
| Custom notifications | Surface warnings/errors to users | `connection.window.showWarningMessage` |
## Logging Strategies
### 1. Namespaced Logs
Prefix logs with your server/component so users can filter easily:
```ts
const log = (level: 'info' | 'warn' | 'error', message: string, payload?: unknown) => {
const text = `[json-yaml:${level}] ${message}${payload ? ` ${JSON.stringify(payload)}` : ''}`;
connection.console[level](text);
};
```
- Use `info` for high-level events (project load, schema cache hits).
- Use `warn` for recoverable issues (fallback to default schema).
- Use `error` for critical failures—ideally accompanied by a user notification.
### 2. Log Levels via Settings
Expose a `logLevel` configuration (`off | error | warn | info | debug`) so users can control verbosity:
```ts
if (config.logLevel === 'debug') {
log('info', 'Schema fetch start', { uri });
}
```
### 3. Structured Payloads
Include relevant context in JSON to aid parsing:
```json
{
"timestamp": "...",
"event": "schema.fetch",
"uri": "http://schemas/foo.json",
"durationMs": 123,
"success": true
}
```
## Telemetry Events
Telemetry is optional and client-controlled; always check capabilities:
```ts
const telemetrySupported = server.initializeParams.capabilities?.experimental?.telemetry === true;
if (telemetrySupported) {
connection.telemetry.logEvent({
type: 'json-yaml.schemaFetch',
uri,
durationMs,
success,
});
}
```
### Event Design Principles
1. **No PII** never include actual source code or user-specific paths unless hashed/anonymized.
2. **Actionable** log events that can drive product decisions (schema fetch failures, TypeScript reloads, “workspace diagnostics took > 5s”).
3. **Stable schema** define event names and payload shapes up front; changing them frequently breaks dashboards.
### Recommended Events
| Event | Trigger | Payload |
| --- | --- | --- |
| `schema.fetch` | Schema download (success + failure) | `{ uri, durationMs, success }` |
| `diagnostics.publish` | After `sendDiagnostics` | `{ uri, count, durationMs }` |
| `workspaceDiagnostics.run` | Workspace diagnostics completed | `{ documentCount, durationMs }` |
| `configuration.apply` | New config applied | `{ success, changedKeys }` |
| `takeOverMode.warning` | Detected conflicting TS server | `{ message }` |
### Sample Telemetry Wiring (VS Code Extension)
```ts
const telemetryReporter = new TelemetryReporter('volar-extension', version, aiKey);
connection.telemetry.logEvent = (event) => {
telemetryReporter.sendTelemetryEvent(event.type, sanitize(event));
};
function sanitize(event: AnyEvent) {
return {
...event,
uri: undefined, // avoid sending raw file paths
timestamp: new Date().toISOString(),
};
}
```
- Use Azure App Insights, Segment, or any analytics platform that matches your privacy requirements.
- Strip PII (file paths, code snippets) before forwarding events.
### Dashboard Example
Track key metrics with a dashboard (e.g., Grafana/Looker):
| Widget | Description |
| --- | --- |
| Schema Fetch Success % | `success` count / total by day; alert if < 95%. |
| Diagnostics Duration P95 | Box plot of `diagnostics.publish.durationMs`. |
| Workspace Diagnostics Runs | Number per workspace; spikes may indicate user confusion. |
| Top Error Messages | Grouped count of `takeOverMode.warning` and other errors. |
Use these dashboards to catch regressions (e.g., schema outages, slow diagnostics) before users report them.
## Work Done Progress & Notifications
For long-running operations (initial project load, large workspace diagnostics), display progress so users know work is happening.
```ts
async function withProgress(title: string, task: () => Promise<void>) {
const progress = await connection.window.createWorkDoneProgress();
progress.begin(title, 0);
try {
await task();
progress.report(100, 'Complete');
} finally {
progress.done();
}
}
```
Also use `connection.window.showWarningMessage` / `showErrorMessage` for issues requiring user intervention (missing schema files, invalid config).
## Health Checks & Metrics
Consider exposing internal health metrics for CI or headless environments:
```ts
connection.onRequest('volar/health', async () => ({
openDocuments: server.documents.all().length,
workspaceFolders: server.workspaceFolders.all.length,
lastDiagnosticsMs: metrics.lastDiagnosticsDuration,
}));
```
Integrate this request into smoke tests or CI to ensure the server responds with sane values.
## Editor-Specific Considerations
### VS Code
- Use `connection.window.showInformationMessage` sparingly; prefer progress notifications and logs.
- Provide commands that surface diagnostics/profiling info for debugging (e.g., Volar: Show Server Logs”).
### Neovim
- Expose an RPC command to toggle verbose logging at runtime.
- Append logs to a file (e.g., `~/.cache/volar-nvim/volar.log`) so users can share them in bug reports.
### CLI
- Provide `--log-level` and `--log-file` flags.
- For headless usage, print JSON logs to stdout so downstream automation can parse them.
## Sampling & Rate Limiting
- For frequent events (per-keystroke completions), sample logs/telemetry to reduce noise:
```ts
if (Math.random() < 0.1) {
connection.telemetry.logEvent({ type: 'completion.run', durationMs });
}
```
- Throttle repeated warnings (e.g., schema fetch failures) to once per URI per minute to avoid spamming logs.
## Error Reporting
When an unexpected exception occurs:
1. Log the stack via `connection.console.error`.
2. Emit telemetry (if enabled) with a sanitized version of the error.
3. Notify the user when action is needed (“Failed to load schema; see Output Volar for details”).
## Observability Checklist
1. **Logging** Namespaced helper with log levels; logs include key metadata.
2. **Telemetry** Optional, PII-free events for actions that matter (schema fetch, diagnostics).
3. **Progress** Work done progress notifications for long-running operations.
4. **Notifications** Friendly user messages for actionable issues.
5. **Health endpoint** Optional `volar/health` request for automated monitors.
6. **Configurable verbosity** Users can toggle log level / telemetry participation.
7. **Sampling** Applied to high-frequency events to avoid flooding logs.
8. **Documentation** Tell users how to capture logs/telemetry, attach them to bug reports, and opt out if desired.
Instrument early and consistentlyobservability is far easier to add when your server is small than when youre firefighting production issues.