Benchmark Methodology

This page explains how TUnit's performance benchmarks are conducted to ensure fair, accurate, and reproducible results.

Core Principles

1. Real-World Scenarios

Benchmarks test realistic patterns, not artificial micro-benchmarks:

Actual assertion logic
Real data source patterns
Typical setup/teardown workflows
Common parallelization strategies

2. Fair Comparison

Every framework implements identical test logic:

Same test methods
Same data inputs
Same assertion complexity
Equivalent configuration

3. Statistical Rigor

All benchmarks use BenchmarkDotNet, the industry-standard .NET benchmarking library:

Multiple iterations per benchmark
Statistical outlier detection
Warm-up phase excluded from measurements
Standard deviation and median reported

Test Categories

Runtime Benchmarks

DataDrivenTests

Purpose: Measure parameterized test performance

What's tested:

[Test]
[Arguments(1, 2, 3)]
[Arguments(4, 5, 9)]
// ... 50 argument sets
public void TestAddition(int a, int b, int expected)
{
    Assert.That(a + b).IsEqualTo(expected);
}

Why it matters: Most test suites use parameterized tests extensively.

AsyncTests

Purpose: Measure async/await pattern performance

What's tested:

[Test]
public async Task TestAsyncOperation()
{
    var result = await SimulateAsyncWork();
    await Assert.That(result).IsNotNull();
}

Why it matters: Modern .NET is async-first.

ScaleTests

Purpose: Measure scalability with large test counts

What's tested:

1000+ test methods
Parallel execution
Memory efficiency

Why it matters: Enterprise codebases have thousands of tests.

MatrixTests

Purpose: Measure combinatorial test generation

What's tested:

[Test]
[Matrix("Create", "Update", "Delete")] // Operation
[Matrix("User", "Admin", "Guest")]     // Role
public void TestPermissions(string op, string role)
{
    // 9 test combinations
}

Why it matters: Matrix testing is common for comprehensive coverage.

MassiveParallelTests

Purpose: Stress test parallel execution

What's tested:

100+ tests running concurrently
Resource contention
Thread safety

Why it matters: Parallel execution is TUnit's default behavior.

Build Benchmarks

Purpose: Measure compilation time impact

What's tested:

Clean build time
Incremental build time
Source generator overhead

Why it matters: Fast builds improve developer productivity.

Environment

Hardware

Platform: GitHub Actions Ubuntu runners
Consistency: Same hardware for all frameworks
Reproducibility: Daily automated runs

Software

Framework Versions: Latest stable releases
.NET Version: .NET 10 (latest)
OS: Ubuntu Latest

Configuration

Release Mode: All tests compiled with optimizations
Native AOT: Separate TUnit_AOT benchmark
Default Settings: No special framework configuration

Measurement Process

1. Build Phase

# Build all frameworks identically
dotnet build -c Release -p:TestFramework=TUNIT
dotnet build -c Release -p:TestFramework=XUNIT3
dotnet build -c Release -p:TestFramework=NUNIT
dotnet build -c Release -p:TestFramework=MSTEST

2. Execution Phase

[Benchmark]
public async Task TUnit()
{
    await Cli.Wrap("UnifiedTests.exe")
        .WithArguments(["--filter", "TestCategory"])
        .ExecuteBufferedAsync();
}

3. Analysis Phase

BenchmarkDotNet collects metrics
Statistical analysis performed
Results exported to markdown
Historical trends tracked

What Gets Measured

Primary Metrics

Mean Execution Time

Definition: Average time across all iterations
Unit: Milliseconds (ms) or Seconds (s)
Lower is better

Median Execution Time

Definition: Middle value, less affected by outliers
Unit: Milliseconds (ms) or Seconds (s)
More stable than mean

Standard Deviation

Definition: Measure of result consistency
Unit: Same as mean
Lower is better (more consistent)

Derived Metrics

Speedup Factor

Speedup = (Other Framework Time) / (TUnit Time)

Example: "2.5x faster" means TUnit is 2.5 times faster.

AOT Improvement

AOT Speedup = (TUnit JIT Time) / (TUnit AOT Time)

Example: "4x faster with AOT" means Native AOT is 4 times faster than JIT.

Benchmark Automation

Daily Execution

Benchmarks run automatically every 24 hours via GitHub Actions.

Process

Build: Compile all framework versions
Execute: Run benchmarks in isolated processes
Analyze: Parse BenchmarkDotNet output
Publish: Update documentation automatically
Track: Store historical trends

Artifacts

All raw benchmark results are available as GitHub Actions artifacts for 90 days.

Reproducibility

Running Locally

# 1. Navigate to benchmark project
cd tools/speed-comparison

# 2. Build all frameworks
dotnet build -c Release

# 3. Run specific benchmark
cd Tests.Benchmark
dotnet run -c Release -- --filter "*RuntimeBenchmarks*"

Viewing Results

Results are generated in BenchmarkDotNet.Artifacts/results/:

Markdown reports (*.md)
CSV data (*.csv)
HTML reports (*.html)

Limitations & Caveats

What Benchmarks Don't Measure

❌ IDE Integration: Benchmarks don't measure test discovery in IDEs

❌ Debugger Performance: Debug mode performance is not measured

❌ Real I/O: Most tests use in-memory operations to avoid I/O variance

❌ External Dependencies: No database, network, or file system calls

Variance Factors

Results can vary based on:

Hardware configuration
Background processes
OS scheduling
.NET runtime version
Test complexity

Interpreting Results

Relative Performance: Compare frameworks, not absolute times
Your Mileage May Vary: Real-world results depend on test characteristics
Trends Matter More: Watch for performance regressions over time

Transparency

Open Source

All benchmark code is open source:

Community Verification

Found an issue with the benchmarks? Open an issue or submit a PR!

Core Principles​

1. Real-World Scenarios​

2. Fair Comparison​

3. Statistical Rigor​

Test Categories​

Runtime Benchmarks​

DataDrivenTests​

AsyncTests​

ScaleTests​

MatrixTests​

MassiveParallelTests​

Build Benchmarks​

Environment​

Hardware​

Software​

Configuration​

Measurement Process​

1. Build Phase​

2. Execution Phase​

3. Analysis Phase​

What Gets Measured​

Primary Metrics​

Mean Execution Time​

Median Execution Time​

Standard Deviation​

Derived Metrics​

Speedup Factor​

AOT Improvement​

Benchmark Automation​

Daily Execution​

Process​

Artifacts​

Reproducibility​

Running Locally​

Viewing Results​

Limitations & Caveats​

What Benchmarks Don't Measure​

Variance Factors​

Interpreting Results​

Transparency​

Open Source​

Community Verification​

Further Reading​

Core Principles

1. Real-World Scenarios

2. Fair Comparison

3. Statistical Rigor

Test Categories

Runtime Benchmarks

DataDrivenTests

AsyncTests

ScaleTests

MatrixTests

MassiveParallelTests

Build Benchmarks

Environment

Hardware

Software

Configuration

Measurement Process

1. Build Phase

2. Execution Phase

3. Analysis Phase

What Gets Measured

Primary Metrics

Mean Execution Time

Median Execution Time

Standard Deviation

Derived Metrics

Speedup Factor

AOT Improvement

Benchmark Automation

Daily Execution

Process

Artifacts

Reproducibility

Running Locally

Viewing Results

Limitations & Caveats

What Benchmarks Don't Measure

Variance Factors

Interpreting Results

Transparency

Open Source

Community Verification

Further Reading