Regular v Compiled v Source Generated regex for validating SSI numbers in .NET 8




Date Added (UTC):

15 Apr 2024 @ 00:10

Date Updated (UTC):

15 Apr 2024 @ 00:10


.NET Version(s):

.NET 8

Tag(s):

#Regex #SourceGenerators


Added By:
Profile Image

Blog   
Ireland    
.NET Developer and tech lead from Ireland!

Benchmark Results:





Benchmark Code:



using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Columns;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Reports;
using System.Text.RegularExpressions;

namespace RegexBenchmark;

[Config(typeof(Config))]
[HideColumns(Column.Job, Column.RatioSD, Column.AllocRatio)]
[MemoryDiagnoser]
[ReturnValueValidator(failOnError: true)]
public partial class RegexBenchmarks
{
    private const string pattern = @"^\d{3}-\d{2}-\d{4}$";
    private const RegexOptions regexOptions = RegexOptions.CultureInvariant;

    private Regex standardRegex = new Regex(pattern, regexOptions);
    private Regex compiledRegex = new Regex(pattern, regexOptions | RegexOptions.Compiled);

    // This attribute triggers the source generator
    [GeneratedRegex(pattern, regexOptions)]
    private static partial Regex GeneratedRegex();

    private string[] testInputs = new[]
    {
        "123-45-6789", "987-65-4321", "000-00-0000", "not-a-ssn", "123-456789", "123-45-678"
    };

    [Benchmark(Baseline = true)]
    public void TestStandardRegex()
    {
        foreach (var input in testInputs)
        {
            standardRegex.IsMatch(input);
        }
    }

    [Benchmark]
    public void TestCompiledRegex()
    {
        foreach (var input in testInputs)
        {
            compiledRegex.IsMatch(input);
        }
    }

    [Benchmark]
    public void TestGeneratedRegex()
    {
        var generatedRegex = GeneratedRegex();
        foreach (var input in testInputs)
        {
            generatedRegex.IsMatch(input);
        }
    }

    private class Config : ManualConfig
    {
        public Config()
        {
            SummaryStyle =
                SummaryStyle.Default.WithRatioStyle(RatioStyle.Trend);
        }
    }
}

// .NET 8 Lowered C# Code unavailable due to errors:
error CS8795: Partial method 'RegexBenchmarks.GeneratedRegex()' must have an implementation part because it has accessibility modifiers.

// .NET 8 IL Code unavailable due to errors:
error CS8795: Partial method 'RegexBenchmarks.GeneratedRegex()' must have an implementation part because it has accessibility modifiers.

// .NET 8 Jit Asm Code unavailable due to errors:
error CS8795: Partial method 'RegexBenchmarks.GeneratedRegex()' must have an implementation part because it has accessibility modifiers.


Benchmark Description:


The provided benchmark code is designed to measure and compare the performance of different methods for evaluating regular expressions in .NET. The setup involves three different approaches to using regular expressions: standard, compiled, and source-generated regexes. The configuration specifies that the benchmarks will exclude certain default columns and include memory diagnostics, ensuring a focus on memory usage and performance. The benchmarks are configured to use a specific `.NET` version depending on the version installed and targeted by the project, which isn't explicitly mentioned in the provided code snippet but would typically be a recent version of .NET (e.g., .NET 5, .NET 6, or later) to support features like source-generated regular expressions. ### Benchmark Setup Overview - **.NET Version**: Not explicitly mentioned, but source-generated regexes require .NET 6 or later. - **Configuration**: Custom configuration to adjust the summary style and exclude certain columns for clarity. - **Memory Diagnoser**: Enabled to report memory allocation. - **ReturnValueValidator**: Ensures that the methods return expected values, enhancing the reliability of the benchmark results. ### Benchmark Methods Rationale #### 1. `TestStandardRegex` - **Purpose**: This method measures the performance of evaluating regular expressions using the standard `Regex` class without pre-compilation. It's the baseline for comparison. - **Performance Aspect**: It tests the overhead of parsing and interpreting the regex pattern at runtime for each match operation. - **Expected Insights**: This method typically shows how regex evaluation performs under default conditions. Results from this benchmark can highlight the overhead involved in dynamically parsing and applying regex patterns. #### 2. `TestCompiledRegex` - **Purpose**: This benchmark evaluates the performance improvement when using the `RegexOptions.Compiled` option, which compiles the regular expression to MSIL to speed up matching. - **Performance Aspect**: It specifically measures the impact of regex compilation on execution speed, trading off initial compilation time for faster subsequent matches. - **Expected Insights**: Results should indicate a performance gain for repetitive regex matching scenarios, especially after amortizing the initial compilation cost. It's crucial for understanding when the overhead of compilation pays off. #### 3. `TestGeneratedRegex` - **Purpose**: Measures the performance of using a source-generated regex, a new feature in .NET 6 and later that generates compile-time code for regex evaluation. - **Performance Aspect**: This method tests the efficiency of compile-time regex generation and its impact on runtime performance and memory usage. - **Expected Insights**: Expected to provide the best performance in terms of both speed and memory usage by eliminating runtime compilation and interpretation overhead. It showcases the benefits of adopting newer .NET features for performance-critical applications. ### General Insights Running these benchmarks will give insights into the trade-offs between ease of use, runtime performance, and memory efficiency when working with regular expressions in .NET. The standard regex approach is the most straightforward but potentially the slowest, especially for complex patterns or large input data. Compiled regexes offer a performance boost at the cost of initial compilation time, making them suitable for scenarios where the same pattern is used extensively. Source-generated regexes aim to combine the best of both worlds by moving the compilation to the build time, offering superior runtime performance with minimal overhead. Understanding these trade-offs is crucial for optimizing applications that rely heavily on regular expression matching, allowing developers to choose the most appropriate approach based on their specific requirements and constraints.


Benchmark Comments: