LukeParke/volar-docs

Fork 0

mirror of https://github.com/LukeHagar/volar-docs.git synced 2025-12-06 04:22:01 +00:00

Files

Luke Hagar 95f23e2eb3 Saving inital Docs POC

2025-11-09 22:22:52 -06:00

6.3 KiB

Raw Permalink Blame History

Benchmarking Volar-Based Language Servers

Docs Index • Repo README • Performance Guide • Testing & CI

Reliable performance requires repeatable measurements. This guide explains how to benchmark a Volar-powered LSP—whether for regression testing, capacity planning, or comparing implementation choices. We cover synthetic microbenchmarks, realistic end-to-end scenarios, metrics to track, and tooling suggestions.

Benchmarking Goals

Latency – response time for common requests (diagnostics, completion, hover, definition).
Throughput – how many requests per second the server can handle under load.
Resource usage – CPU, memory, and IO characteristics.
Stability – behavior under long-running or adverse conditions (massive workspaces, schema outages).

Benchmarking Environments

Environment	Use Case
Local machine	Quick iteration, microbenchmarks
Dedicated CI runner	Repeatable, no background processes
Containerized setup	Shareable configuration, reproducible

Ensure you run benchmarks on idle machines to reduce noise (disable background indexing, close unrelated apps).

Metrics to Capture

Latency (ms): time from request start to response.
P99 latency: worst-case tail latency for each request type.
CPU usage (%): average and peaks.
Memory usage (MB): total process memory, heap snapshots.
Garbage Collection stats: number/duration of GC cycles.
Cache hit rate: schema cache hits, TypeScript project reuse, etc.
Errors per minute: schema fetch failures, exceptions.

Tooling

1. LSP Bench Harness

Use lsp-bench or similar tools to replay LSP traces:

npx @vscode/lsp-bench run \
  --server "node dist/server.js" \
  --trace ./fixtures/sample.trace.json \
  --out ./bench/results.json

Prepare trace files capturing representative user sessions (diagnostics, completion, hover, rename).
lsp-bench reports per-method latency, throughput, and timeouts.

2. Custom Harness

Write a Node script to issue requests in sequence or parallel:

import { createConnection } from 'vscode-languageserver/node';

async function benchmarkCompletion(docUri: string, positions: Position[]) {
  const latencies: number[] = [];
  for (const pos of positions) {
    const start = performance.now();
    await connection.sendRequest('textDocument/completion', { textDocument: { uri: docUri }, position: pos });
    latencies.push(performance.now() - start);
  }
  return latencies;
}

Use performance.now() or process.hrtime.bigint() for high-resolution timing.
Run requests concurrently to test throughput (e.g., Promise.all on multiple files).

3. Profilers

node --prof to generate V8 CPU profiles; use node --prof-process to inspect.
node --inspect for interactive CPU/memory profiling.

4. OS Metrics

macOS: Instruments, Activity Monitor.
Linux: perf, htop, pidstat.
Windows: Performance Monitor.

Benchmark Scenarios

Cold start: time from process launch to first diagnostic result on a large workspace.
Hot completion: repeated textDocument/completion calls while typing in large files.
Full diagnostics: workspace/diagnostic on a monorepo.
Schema outage: simulate offline schema servers to ensure fallback paths are fast.
Take Over Mode: measure TS+Volar integration performance with .ts/.vue editing.
Concurrent editing: multiple documents edited simultaneously, triggering overlapping diagnostics.

Benchmark Workflow

Prepare fixtures – real-world repositories or synthetic workloads (e.g., 1,000 .vue files).
Record baseline – run benchmarks on the main branch to capture baseline metrics.
Introduce changes – branch with your modifications.
Rerun benchmarks – identical environment, compare results.
Analyze – highlight regressions/improvements with absolute numbers and percentages.
Automate – integrate into CI to catch regressions before merging.

Reporting

Include:

Method: textDocument/completion
Sample size: number of requests.
Mean/P95/P99 latency.
CPU/memory usage snapshots.
Notes on environment (machine specs, Node version).

Example summary:

Scenario	Baseline P95	New P95	Delta
Completion (component props)	42ms	38ms	-9.5%
Hover	31ms	34ms	+9.7%
Cold diagnostics (monorepo)	3.2s	2.8s	-12.5%

Tips & Best Practices

Isolate background noise – disable incremental builds, watchers, or other processes during runs.
Warm caches – run a warm-up cycle before recording metrics to simulate typical usage.
Measure tails – averages hide spikes; always record P95/P99.
Instrument in code – log per-request durations (e.g., around collectDiagnostics) to correlate with client-side measurements.
Version your traces – keep benchmark traces under version control so future runs stay consistent.
Automate regression detection – fail CI if key metrics regress beyond a threshold (e.g., P95 > baseline + 20%).
Consider user hardware – run benchmarks on lower-spec machines to mimic real users (e.g., 4-core laptops).
Monitor GC – long GC pauses can inflate tail latency; capture heap snapshots and consider tuning Node’s GC flags if needed.

Sample CI Step

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - run: npm run build
      - run: npx @vscode/lsp-bench run --server "node dist/server.js" --trace ./bench/trace.json --out bench/results.json
      - run: node scripts/compare-benchmarks.js bench/results.json bench/baseline.json

The comparison script should exit non-zero if metrics exceed allowed regression thresholds.

Conclusion

Benchmarking isn’t a one-off task—embed it into your development lifecycle to catch regressions early and keep the Volar experience snappy. By combining realistic traces, automated harnesses, and clear reporting, you can confidently evolve your language server while preserving top-tier performance.

6.3 KiB Raw Permalink Blame History Unescape Escape