Files
volar-docs/docs/error-handling-and-resilience.md
2025-11-09 22:22:52 -06:00

5.5 KiB
Raw Permalink Blame History

Error Handling & Resilience Playbook

Docs IndexRepo READMETelemetryUX Best Practices

Language servers run continuously inside users editors and CI pipelines. Robust error handling keeps them stable even when schemas fail to download, configuration is malformed, or files vanish mid-request. This guide catalogs every failure mode weve encountered and how to mitigate it gracefully.

Guiding Principles

  1. Fail soft surface actionable warnings, but keep the server alive.
  2. Fallbacks everywhere default to cached schemas, previous configuration, or safe defaults.
  3. Short-circuit bail out early when prerequisites (schema, file content) arent available rather than cascading errors.

Common Failure Modes & Mitigations

Schema Fetch Failures

Symptoms: diagnostics missing, completion items blank, repeated network errors.

Mitigations:

  • Cache schemas on disk (~/.cache/volar-schemas) or in memory (Map).
  • Retry with exponential backoff; after a few retries, fall back to last known schema.
  • Log via connection.console.warn, notify the user if failure persists.
  • Allow users to disable remote fetches (config.schemas.allowRemote = false) for air-gapped environments.
async function fetchSchema(uri: string) {
  try {
    const response = await fetch(uri);
    if (!response.ok) throw new Error(`HTTP ${response.status}`);
    return await response.text();
  } catch (err) {
    log('warn', 'Schema fetch failed', { uri, err: String(err) });
    return cache.get(uri) ?? DEFAULT_SCHEMA;
  }
}

Missing Files / Race Conditions

When editors delete or move files while diagnostics are running:

  • Re-fetch the document from server.documents right before publishing diagnostics, and ensure the version matches.
  • Wrap file reads in try/catch and skip processing if the file disappears.

Invalid User Configuration

If users supply malformed settings:

  • Validate against a JSON schema; on failure, log and revert to defaults.
  • Notify users via connection.window.showErrorMessage (“Invalid volarJsonYaml.schemas: expected array, received string”).
  • Never crash the server due to config errors.

TypeScript Project Reloads

  • Guard project.reload() with a debounce; rapid reloads can starve the event loop.
  • Catch errors from TypeScripts API and surface them with actionable messaging (e.g., “Failed to read tsconfig.json: …”).

Plugin Exceptions

  • Wrap plugin hooks in try/catch to prevent a single plugin from taking down the server.
function safeInvoke<T>(fn: () => T, fallback: T) {
  try {
    return fn();
  } catch (err) {
    log('error', 'Plugin error', { err: String(err) });
    return fallback;
  }
}

Memory Pressure

  • Dispose snapshots (snapshot.dispose?.()) when removing virtual files.
  • Avoid keeping entire project graphs in memory—rely on Volars internal caches.
  • Expose a volar/clearCaches command for debugging sessions; reset caches when memory spikes.

Fallback Configuration Strategy

  1. Load default config.
  2. Merge user config (server.configurations).
  3. Validate; if validation fails, log error and keep defaults.
  4. Apply new config atomically (e.g., swap references to schema caches).
function applyConfiguration(rawConfig: unknown) {
  const parsed = schema.safeParse(rawConfig);
  if (!parsed.success) {
    log('warn', 'Invalid config, using defaults', { issues: parsed.error.issues });
    return defaultConfig;
  }
  return { ...defaultConfig, ...parsed.data };
}

Editor Recovery Patterns

VS Code

  • Use connection.languages.diagnostics.refresh after recovering from an error (e.g., schema server returns to service) to prompt the client for fresh diagnostics.
  • Provide a command “Reload Volar Server” that restarts the process if the user wants a clean slate.

Neovim / CLI

  • Document a :VolarRestart command or CLI flag (volar --restart) to drop caches.
  • If the server exits due to an unrecoverable error, ensure the client auto-restarts it with limited retries to avoid infinite loops.

Testing Resilience

  1. Simulate network failures by mocking fetch to throw errors.
  2. Provide fixtures with malformed configuration and assert the server falls back to defaults without crashing.
  3. Force server.documents.get to return undefined mid-request to confirm handlers bail out gracefully.
  4. Run stress tests opening/closing hundreds of files to ensure watchers and caches dont leak.

Monitoring & Alerts

  • Emit telemetry for repeated failures (5 schema fetch failures in a row) and consider surfacing a notification prompting the user to check their network or configuration.
  • Track server restarts if the host environment allows; multiple restarts in quick succession indicate a crash loop.

Quick Reference Checklist

  • Cache schemas and support offline mode.
  • Validate configuration and fall back to defaults on errors.
  • Guard plugin hooks and log exceptions.
  • Compare document versions before publishing diagnostics.
  • Dispose snapshots and purge caches when documents close.
  • Provide commands or RPC hooks to restart/clear caches.
  • Surface actionable notifications for persistent issues.
  • Cover failure scenarios in tests (network, config, missing files).

Designing for resilience upfront saves countless hours later. Treat every external dependency (network, file system, user config) as unreliable, and make fallbacks explicit so your language server remains stable under real-world conditions.