Skip to main content

Benchmark Methodology

This page explains how TUnit's performance benchmarks are conducted to ensure fair, accurate, and reproducible results.

Core Principles​

1. Real-World Scenarios​

Benchmarks test realistic patterns, not artificial micro-benchmarks:

  • Actual assertion logic
  • Real data source patterns
  • Typical setup/teardown workflows
  • Common parallelization strategies

2. Fair Comparison​

Every framework implements identical test logic:

  • Same test methods
  • Same data inputs
  • Same assertion complexity
  • Equivalent configuration

3. Statistical Rigor​

All benchmarks use BenchmarkDotNet, the industry-standard .NET benchmarking library:

  • Multiple iterations per benchmark
  • Statistical outlier detection
  • Warm-up phase excluded from measurements
  • Standard deviation and median reported

Test Categories​

Runtime Benchmarks​

DataDrivenTests​

Purpose: Measure parameterized test performance

What's tested:

[Test]
[Arguments(1, 2, 3)]
[Arguments(4, 5, 9)]
// ... 50 argument sets
public void TestAddition(int a, int b, int expected)
{
Assert.That(a + b).IsEqualTo(expected);
}

Why it matters: Most test suites use parameterized tests extensively.


AsyncTests​

Purpose: Measure async/await pattern performance

What's tested:

[Test]
public async Task TestAsyncOperation()
{
var result = await SimulateAsyncWork();
await Assert.That(result).IsNotNull();
}

Why it matters: Modern .NET is async-first.


ScaleTests​

Purpose: Measure scalability with large test counts

What's tested:

  • 1000+ test methods
  • Parallel execution
  • Memory efficiency

Why it matters: Enterprise codebases have thousands of tests.


MatrixTests​

Purpose: Measure combinatorial test generation

What's tested:

[Test]
[Matrix("Create", "Update", "Delete")] // Operation
[Matrix("User", "Admin", "Guest")] // Role
public void TestPermissions(string op, string role)
{
// 9 test combinations
}

Why it matters: Matrix testing is common for comprehensive coverage.


MassiveParallelTests​

Purpose: Stress test parallel execution

What's tested:

  • 100+ tests running concurrently
  • Resource contention
  • Thread safety

Why it matters: Parallel execution is TUnit's default behavior.


Build Benchmarks​

Purpose: Measure compilation time impact

What's tested:

  • Clean build time
  • Incremental build time
  • Source generator overhead

Why it matters: Fast builds improve developer productivity.

Environment​

Hardware​

  • Platform: GitHub Actions Ubuntu runners
  • Consistency: Same hardware for all frameworks
  • Reproducibility: Daily automated runs

Software​

  • Framework Versions: Latest stable releases
  • .NET Version: .NET 10 (latest)
  • OS: Ubuntu Latest

Configuration​

  • Release Mode: All tests compiled with optimizations
  • Native AOT: Separate TUnit_AOT benchmark
  • Default Settings: No special framework configuration

Measurement Process​

1. Build Phase​

# Build all frameworks identically
dotnet build -c Release -p:TestFramework=TUNIT
dotnet build -c Release -p:TestFramework=XUNIT3
dotnet build -c Release -p:TestFramework=NUNIT
dotnet build -c Release -p:TestFramework=MSTEST

2. Execution Phase​

[Benchmark]
public async Task TUnit()
{
await Cli.Wrap("UnifiedTests.exe")
.WithArguments(["--filter", "TestCategory"])
.ExecuteBufferedAsync();
}

3. Analysis Phase​

  • BenchmarkDotNet collects metrics
  • Statistical analysis performed
  • Results exported to markdown
  • Historical trends tracked

What Gets Measured​

Primary Metrics​

Mean Execution Time​

  • Definition: Average time across all iterations
  • Unit: Milliseconds (ms) or Seconds (s)
  • Lower is better

Median Execution Time​

  • Definition: Middle value, less affected by outliers
  • Unit: Milliseconds (ms) or Seconds (s)
  • More stable than mean

Standard Deviation​

  • Definition: Measure of result consistency
  • Unit: Same as mean
  • Lower is better (more consistent)

Derived Metrics​

Speedup Factor​

Speedup = (Other Framework Time) / (TUnit Time)

Example: "2.5x faster" means TUnit is 2.5 times faster.

AOT Improvement​

AOT Speedup = (TUnit JIT Time) / (TUnit AOT Time)

Example: "4x faster with AOT" means Native AOT is 4 times faster than JIT.

Benchmark Automation​

Daily Execution​

Benchmarks run automatically every 24 hours via GitHub Actions.

Process​

  1. Build: Compile all framework versions
  2. Execute: Run benchmarks in isolated processes
  3. Analyze: Parse BenchmarkDotNet output
  4. Publish: Update documentation automatically
  5. Track: Store historical trends

Artifacts​

All raw benchmark results are available as GitHub Actions artifacts for 90 days.

Reproducibility​

Running Locally​

# 1. Navigate to benchmark project
cd tools/speed-comparison

# 2. Build all frameworks
dotnet build -c Release

# 3. Run specific benchmark
cd Tests.Benchmark
dotnet run -c Release -- --filter "*RuntimeBenchmarks*"

Viewing Results​

Results are generated in BenchmarkDotNet.Artifacts/results/:

  • Markdown reports (*.md)
  • CSV data (*.csv)
  • HTML reports (*.html)

Limitations & Caveats​

What Benchmarks Don't Measure​

❌ IDE Integration: Benchmarks don't measure test discovery in IDEs

❌ Debugger Performance: Debug mode performance is not measured

❌ Real I/O: Most tests use in-memory operations to avoid I/O variance

❌ External Dependencies: No database, network, or file system calls

Variance Factors​

Results can vary based on:

  • Hardware configuration
  • Background processes
  • OS scheduling
  • .NET runtime version
  • Test complexity

Interpreting Results​

  • Relative Performance: Compare frameworks, not absolute times
  • Your Mileage May Vary: Real-world results depend on test characteristics
  • Trends Matter More: Watch for performance regressions over time

Transparency​

Open Source​

All benchmark code is open source:

Community Verification​

Found an issue with the benchmarks? Open an issue or submit a PR!


Further Reading​

Last updated: 2025-11-13