Skip to main content

ADR-010: E2E Test Infrastructure

Status

Accepted

Context

jamjam is a P2P audio communication application where audio quality and low latency are critical requirements (as defined in ADR-008). Manual testing is insufficient to guarantee:

  1. Audio quality meets PESQ/MOS thresholds across all presets
  2. Cross-platform connections work reliably (Linux ↔ macOS ↔ Windows)
  3. Multi-user scenarios (up to 8 participants) remain stable
  4. Latency stays within ADR-008 specifications

We need an automated E2E test infrastructure that:

  • Runs on every PR (lightweight tests)
  • Runs nightly (full test suite)
  • Provides objective audio quality metrics
  • Tests cross-platform interoperability
  • Scales to 8-node mesh topology

Decision

Test Architecture

Four-layer test pyramid:

┌───────────────────────────────────────┐
│ System E2E (Nightly, VPS cluster) │ PESQ, 8-node mesh
├───────────────────────────────────────┤
│ Integration E2E (PR, self-hosted) │ Cross-platform
├───────────────────────────────────────┤
│ Component Integration (PR, GH) │ Audio+Network loopback
├───────────────────────────────────────┤
│ Unit Tests (Every commit) │ 117+ tests
└───────────────────────────────────────┘

Infrastructure

EnvironmentRunner TypeTests
GitHub Actionsubuntu-latestUnit, loopback, network-local
Self-hostedLinux/macOS/WindowsCross-platform integration
VPS Cluster8+ nodesFull mesh, PESQ evaluation

Virtual Audio Devices

PlatformSolution
LinuxPipeWire null-audio-sink
macOSBlackHole 2ch
WindowsVB-Audio Virtual Cable

Audio Quality Metrics

  • PESQ (ITU-T P.862): MOS-LQO score (1.0 - 4.5)
  • Latency: Cross-correlation measurement
  • Packet loss: Counted during transmission

Thresholds from ADR-008:

PresetMin MOSMax Latency
zero-latency4.02ms
ultra-low-latency3.85ms
balanced3.515ms
high-quality4.230ms

Feature Flags

[features]
e2e-loopback = [] # Audio loopback tests (no network)
e2e-network-local = [] # Local network tests (localhost)
e2e-remote = [] # Remote node tests (VPS cluster)
e2e-full = ["e2e-loopback", "e2e-network-local", "e2e-remote"]

Directory Structure

tests/e2e/
├── Cargo.toml
├── src/
│ ├── lib.rs
│ ├── orchestrator.rs # Multi-node coordination
│ ├── node.rs # Remote node management
│ ├── audio_injection.rs # Virtual audio control
│ ├── quality.rs # PESQ/latency evaluation
│ └── scenarios/
│ ├── loopback.rs
│ ├── two_node.rs
│ ├── cross_platform.rs
│ └── eight_node.rs
└── scripts/
├── setup-virtual-audio-linux.sh
├── setup-virtual-audio-macos.sh
└── setup-virtual-audio-windows.ps1

Workflow Triggers

WorkflowTriggerTests
e2e-pr.ymlPR, push to mainloopback, network-local
e2e-nightly.ymlDaily 2:00 AM JSTAll presets, two-node
e2e-nightly.yml (manual)workflow_dispatchFull matrix, 8-node

Consequences

Benefits

  1. Objective quality assurance: PESQ provides industry-standard audio quality metrics
  2. Cross-platform confidence: Automated testing of all OS combinations
  3. Regression detection: Quality degradation caught before merge
  4. Scalability testing: 8-node mesh validates production scenarios

Costs

  1. Infrastructure cost: ~$300/month for VPS cluster (nightly tests)
  2. Maintenance: Virtual audio setup varies by OS
  3. Complexity: Multi-node orchestration requires careful synchronization

Risks

  1. Flaky tests: Network tests may be sensitive to timing
    • Mitigation: Use retry logic, increase timeouts
  2. Platform differences: Audio behavior varies by OS
    • Mitigation: Platform-specific thresholds if needed
  3. CI environment limitations: GitHub Actions lacks real audio devices
    • Mitigation: Use virtual audio devices for PR tests

Implementation Notes

Phase 1 (Complete)

  • Create tests/e2e/ directory structure
  • Implement virtual audio setup scripts
  • Add feature flags to Cargo.toml
  • Create loopback test scenarios
  • Create e2e-pr.yml workflow

Phase 2 (Complete)

  • Integrate PESQ evaluation (Python wrapper + correlation-based fallback)
  • Audio injection/capture with virtual devices (VirtualAudioManager)
  • Two-node localhost tests with real audio path
  • Reference audio fixtures generator (sine, sweep, noise, impulse, speech-like)
  • Latency measurement via cross-correlation
  • Self-hosted runner setup documentation
  • e2e-nightly.yml workflow for full test suite

Phase 3 (Future)

  • Set up self-hosted runners (Linux/macOS/Windows)
  • Cross-platform test orchestration
  • 8-node VPS cluster provisioning

References

  • ADR-008: Zero-latency mode and audio quality requirements
  • ITU-T P.862: PESQ algorithm specification
  • PipeWire: Linux audio framework
  • BlackHole: macOS virtual audio
  • VB-Audio: Windows virtual audio