ADR-010: E2E Test Infrastructure
Status
Accepted
Context
jamjam is a P2P audio communication application where audio quality and low latency are critical requirements (as defined in ADR-008). Manual testing is insufficient to guarantee:
- Audio quality meets PESQ/MOS thresholds across all presets
- Cross-platform connections work reliably (Linux ↔ macOS ↔ Windows)
- Multi-user scenarios (up to 8 participants) remain stable
- Latency stays within ADR-008 specifications
We need an automated E2E test infrastructure that:
- Runs on every PR (lightweight tests)
- Runs nightly (full test suite)
- Provides objective audio quality metrics
- Tests cross-platform interoperability
- Scales to 8-node mesh topology
Decision
Test Architecture
Four-layer test pyramid:
┌───────────────────────────────────────┐
│ System E2E (Nightly, VPS cluster) │ PESQ, 8-node mesh
├───────────────────────────────────────┤
│ Integration E2E (PR, self-hosted) │ Cross-platform
├───────────────────────────────────────┤
│ Component Integration (PR, GH) │ Audio+Network loopback
├───────────────────────────────────────┤
│ Unit Tests (Every commit) │ 117+ tests
└───────────────────────────────────────┘
Infrastructure
| Environment | Runner Type | Tests |
|---|---|---|
| GitHub Actions | ubuntu-latest | Unit, loopback, network-local |
| Self-hosted | Linux/macOS/Windows | Cross-platform integration |
| VPS Cluster | 8+ nodes | Full mesh, PESQ evaluation |
Virtual Audio Devices
| Platform | Solution |
|---|---|
| Linux | PipeWire null-audio-sink |
| macOS | BlackHole 2ch |
| Windows | VB-Audio Virtual Cable |
Audio Quality Metrics
- PESQ (ITU-T P.862): MOS-LQO score (1.0 - 4.5)
- Latency: Cross-correlation measurement
- Packet loss: Counted during transmission
Thresholds from ADR-008:
| Preset | Min MOS | Max Latency |
|---|---|---|
| zero-latency | 4.0 | 2ms |
| ultra-low-latency | 3.8 | 5ms |
| balanced | 3.5 | 15ms |
| high-quality | 4.2 | 30ms |
Feature Flags
[features]
e2e-loopback = [] # Audio loopback tests (no network)
e2e-network-local = [] # Local network tests (localhost)
e2e-remote = [] # Remote node tests (VPS cluster)
e2e-full = ["e2e-loopback", "e2e-network-local", "e2e-remote"]
Directory Structure
tests/e2e/
├── Cargo.toml
├── src/
│ ├── lib.rs
│ ├── orchestrator.rs # Multi-node coordination
│ ├── node.rs # Remote node management
│ ├── audio_injection.rs # Virtual audio control
│ ├── quality.rs # PESQ/latency evaluation
│ └── scenarios/
│ ├── loopback.rs
│ ├── two_node.rs
│ ├── cross_platform.rs
│ └── eight_node.rs
└── scripts/
├── setup-virtual-audio-linux.sh
├── setup-virtual-audio-macos.sh
└── setup-virtual-audio-windows.ps1
Workflow Triggers
| Workflow | Trigger | Tests |
|---|---|---|
| e2e-pr.yml | PR, push to main | loopback, network-local |
| e2e-nightly.yml | Daily 2:00 AM JST | All presets, two-node |
| e2e-nightly.yml (manual) | workflow_dispatch | Full matrix, 8-node |
Consequences
Benefits
- Objective quality assurance: PESQ provides industry-standard audio quality metrics
- Cross-platform confidence: Automated testing of all OS combinations
- Regression detection: Quality degradation caught before merge
- Scalability testing: 8-node mesh validates production scenarios
Costs
- Infrastructure cost: ~$300/month for VPS cluster (nightly tests)
- Maintenance: Virtual audio setup varies by OS
- Complexity: Multi-node orchestration requires careful synchronization
Risks
- Flaky tests: Network tests may be sensitive to timing
- Mitigation: Use retry logic, increase timeouts
- Platform differences: Audio behavior varies by OS
- Mitigation: Platform-specific thresholds if needed
- CI environment limitations: GitHub Actions lacks real audio devices
- Mitigation: Use virtual audio devices for PR tests
Implementation Notes
Phase 1 (Complete)
- Create tests/e2e/ directory structure
- Implement virtual audio setup scripts
- Add feature flags to Cargo.toml
- Create loopback test scenarios
- Create e2e-pr.yml workflow
Phase 2 (Complete)
- Integrate PESQ evaluation (Python wrapper + correlation-based fallback)
- Audio injection/capture with virtual devices (VirtualAudioManager)
- Two-node localhost tests with real audio path
- Reference audio fixtures generator (sine, sweep, noise, impulse, speech-like)
- Latency measurement via cross-correlation
- Self-hosted runner setup documentation
- e2e-nightly.yml workflow for full test suite
Phase 3 (Future)
- Set up self-hosted runners (Linux/macOS/Windows)
- Cross-platform test orchestration
- 8-node VPS cluster provisioning