Benchmark Methodology
This page explains how TUnit's performance benchmarks are conducted to ensure fair, accurate, and reproducible results.
Core Principlesâ
1. Real-World Scenariosâ
Benchmarks test realistic patterns, not artificial micro-benchmarks:
- Actual assertion logic
- Real data source patterns
- Typical setup/teardown workflows
- Common parallelization strategies
2. Fair Comparisonâ
Every framework implements identical test logic:
- Same test methods
- Same data inputs
- Same assertion complexity
- Equivalent configuration
3. Statistical Rigorâ
All benchmarks use BenchmarkDotNet, the industry-standard .NET benchmarking library:
- Multiple iterations per benchmark
- Statistical outlier detection
- Warm-up phase excluded from measurements
- Standard deviation and median reported
Test Categoriesâ
Runtime Benchmarksâ
DataDrivenTestsâ
Purpose: Measure parameterized test performance
What's tested:
[Test]
[Arguments(1, 2, 3)]
[Arguments(4, 5, 9)]
// ... 50 argument sets
public void TestAddition(int a, int b, int expected)
{
Assert.That(a + b).IsEqualTo(expected);
}
Why it matters: Most test suites use parameterized tests extensively.
AsyncTestsâ
Purpose: Measure async/await pattern performance
What's tested:
[Test]
public async Task TestAsyncOperation()
{
var result = await SimulateAsyncWork();
await Assert.That(result).IsNotNull();
}
Why it matters: Modern .NET is async-first.
ScaleTestsâ
Purpose: Measure scalability with large test counts
What's tested:
- 1000+ test methods
- Parallel execution
- Memory efficiency
Why it matters: Enterprise codebases have thousands of tests.
MatrixTestsâ
Purpose: Measure combinatorial test generation
What's tested:
[Test]
[Matrix("Create", "Update", "Delete")] // Operation
[Matrix("User", "Admin", "Guest")] // Role
public void TestPermissions(string op, string role)
{
// 9 test combinations
}
Why it matters: Matrix testing is common for comprehensive coverage.
MassiveParallelTestsâ
Purpose: Stress test parallel execution
What's tested:
- 100+ tests running concurrently
- Resource contention
- Thread safety
Why it matters: Parallel execution is TUnit's default behavior.
Build Benchmarksâ
Purpose: Measure compilation time impact
What's tested:
- Clean build time
- Incremental build time
- Source generator overhead
Why it matters: Fast builds improve developer productivity.
Environmentâ
Hardwareâ
- Platform: GitHub Actions Ubuntu runners
- Consistency: Same hardware for all frameworks
- Reproducibility: Daily automated runs
Softwareâ
- Framework Versions: Latest stable releases
- .NET Version: .NET 10 (latest)
- OS: Ubuntu Latest
Configurationâ
- Release Mode: All tests compiled with optimizations
- Native AOT: Separate TUnit_AOT benchmark
- Default Settings: No special framework configuration
Measurement Processâ
1. Build Phaseâ
# Build all frameworks identically
dotnet build -c Release -p:TestFramework=TUNIT
dotnet build -c Release -p:TestFramework=XUNIT3
dotnet build -c Release -p:TestFramework=NUNIT
dotnet build -c Release -p:TestFramework=MSTEST
2. Execution Phaseâ
[Benchmark]
public async Task TUnit()
{
await Cli.Wrap("UnifiedTests.exe")
.WithArguments(["--filter", "TestCategory"])
.ExecuteBufferedAsync();
}
3. Analysis Phaseâ
- BenchmarkDotNet collects metrics
- Statistical analysis performed
- Results exported to markdown
- Historical trends tracked
What Gets Measuredâ
Primary Metricsâ
Mean Execution Timeâ
- Definition: Average time across all iterations
- Unit: Milliseconds (ms) or Seconds (s)
- Lower is better
Median Execution Timeâ
- Definition: Middle value, less affected by outliers
- Unit: Milliseconds (ms) or Seconds (s)
- More stable than mean
Standard Deviationâ
- Definition: Measure of result consistency
- Unit: Same as mean
- Lower is better (more consistent)
Derived Metricsâ
Speedup Factorâ
Speedup = (Other Framework Time) / (TUnit Time)
Example: "2.5x faster" means TUnit is 2.5 times faster.
AOT Improvementâ
AOT Speedup = (TUnit JIT Time) / (TUnit AOT Time)
Example: "4x faster with AOT" means Native AOT is 4 times faster than JIT.
Benchmark Automationâ
Daily Executionâ
Benchmarks run automatically every 24 hours via GitHub Actions.
Processâ
- Build: Compile all framework versions
- Execute: Run benchmarks in isolated processes
- Analyze: Parse BenchmarkDotNet output
- Publish: Update documentation automatically
- Track: Store historical trends
Artifactsâ
All raw benchmark results are available as GitHub Actions artifacts for 90 days.
Reproducibilityâ
Running Locallyâ
# 1. Navigate to benchmark project
cd tools/speed-comparison
# 2. Build all frameworks
dotnet build -c Release
# 3. Run specific benchmark
cd Tests.Benchmark
dotnet run -c Release -- --filter "*RuntimeBenchmarks*"
Viewing Resultsâ
Results are generated in BenchmarkDotNet.Artifacts/results/:
- Markdown reports (*.md)
- CSV data (*.csv)
- HTML reports (*.html)
Limitations & Caveatsâ
What Benchmarks Don't Measureâ
â IDE Integration: Benchmarks don't measure test discovery in IDEs
â Debugger Performance: Debug mode performance is not measured
â Real I/O: Most tests use in-memory operations to avoid I/O variance
â External Dependencies: No database, network, or file system calls
Variance Factorsâ
Results can vary based on:
- Hardware configuration
- Background processes
- OS scheduling
- .NET runtime version
- Test complexity
Interpreting Resultsâ
- Relative Performance: Compare frameworks, not absolute times
- Your Mileage May Vary: Real-world results depend on test characteristics
- Trends Matter More: Watch for performance regressions over time
Transparencyâ
Open Sourceâ
All benchmark code is open source:
Community Verificationâ
Found an issue with the benchmarks? Open an issue or submit a PR!
Further Readingâ
Last updated: 2025-11-13