Regex .NET 8 v .NET 7 performance improvement




Date Added (UTC):

08 Apr 2024 @ 00:28

Date Updated (UTC):

13 Apr 2024 @ 00:23


.NET Version(s):

.NET 7 .NET 8

Tag(s):

#Regex #SourceGenerators


Added By:
Profile Image

Blog   
Ireland    
.NET Developer and tech lead from Ireland!

Benchmark Results:





Benchmark Code:



using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Columns;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Environments; 
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Reports;
using System.Net.Http;
using System.Text.RegularExpressions;

[MemoryDiagnoser]
[Config(typeof(Config))]
[HideColumns("Error", "StdDev", "Median", "RatioSD")]
public class RegexBenchmark
{
    private static readonly string s_haystack = 
        new HttpClient().GetStringAsync("https://www.gutenberg.org/files/1661/1661-0.txt").Result;

    private readonly Regex _regex = new Regex("[0-9]{5}", RegexOptions.Compiled);

    [Benchmark]
    public int Count() => _regex.Count(s_haystack);

    private class Config : ManualConfig
    {
        public Config()
        {
            AddJob(Job.Default.WithId(".NET 7").WithRuntime(CoreRuntime.Core70).AsBaseline());
            AddJob(Job.Default.WithId(".NET 8").WithRuntime(CoreRuntime.Core80));

            SummaryStyle =
                SummaryStyle.Default.WithRatioStyle(RatioStyle.Percentage);
        }
    }
}

// .NET 7, .NET 8
public int Count()
{
    return _regex.Count(s_haystack);
}

// .NET 7
.method public hidebysig 
    instance int32 Count () cil managed 
{
    .custom instance void [BenchmarkDotNet.Annotations]BenchmarkDotNet.Attributes.BenchmarkAttribute::.ctor(int32, string) = (
        01 00 1f 00 00 00 01 5f 00 00
    )
    // Method begins at RVA 0x208e
    // Code size 17 (0x11)
    .maxstack 8

    // sequence point: (line 32, col 27) to (line 32, col 51) in _
    IL_0000: ldarg.0
    IL_0001: ldfld class [System.Text.RegularExpressions]System.Text.RegularExpressions.Regex RegexBenchmark::_regex
    IL_0006: ldsfld string RegexBenchmark::s_haystack
    IL_000b: callvirt instance int32 [System.Text.RegularExpressions]System.Text.RegularExpressions.Regex::Count(string)
    IL_0010: ret
}

// .NET 8
.method public hidebysig 
    instance int32 Count () cil managed 
{
    .custom instance void [BenchmarkDotNet.Annotations]BenchmarkDotNet.Attributes.BenchmarkAttribute::.ctor(int32, string) = (
        01 00 1f 00 00 00 01 5f 00 00
    )
    // Method begins at RVA 0x2050
    // Code size 17 (0x11)
    .maxstack 8

    // sequence point: (line 32, col 27) to (line 32, col 51) in _
    IL_0000: ldarg.0
    IL_0001: ldfld class [System.Text.RegularExpressions]System.Text.RegularExpressions.Regex RegexBenchmark::_regex
    IL_0006: ldsfld string RegexBenchmark::s_haystack
    IL_000b: callvirt instance int32 [System.Text.RegularExpressions]System.Text.RegularExpressions.Regex::Count(string)
    IL_0010: ret
}

// .NET 7 Jit Asm Code unavailable due to errors:
Type RegexBenchmark has a static constructor, which is not supported by SharpLab JIT decompiler.
// .NET 8 Jit Asm Code unavailable due to errors:
Type RegexBenchmark has a static constructor, which is not supported by SharpLab JIT decompiler.


Benchmark Description:


https://github.com/dotnet/runtime/pull/76859

The provided benchmark code is designed to test the performance of regular expression (Regex) operations in .NET, specifically focusing on the execution speed and memory usage when searching for patterns within a large text. This benchmark is set up using BenchmarkDotNet, a powerful .NET library that makes it easy to create benchmarks that are accurate and reliable. Let's break down the setup and the rationale behind the benchmark method provided. ### General Setup - **.NET Versions**: The benchmark is configured to test against two different .NET versions: .NET 7 and .NET 8. This allows for a comparison of performance across different runtime versions, which can be useful for identifying performance improvements or regressions introduced in newer versions of the .NET runtime. - **BenchmarkDotNet Configuration**: The benchmark uses a custom configuration defined in the `Config` class. This configuration specifies that the benchmark should run on the default job settings but with different .NET runtimes (.NET 7 as the baseline and .NET 8 for comparison). It also customizes the summary output by hiding certain columns (Error, StdDev, Median, RatioSD) and adjusting the ratio style to percentage, making the results easier to interpret. - **Memory Diagnoser**: The `[MemoryDiagnoser]` attribute is used to enable memory diagnostics, which will provide insights into the memory usage of the benchmarked code, including allocations. - **Data Source**: The benchmark operates on a large text obtained from Project Gutenberg (the book "The Adventures of Sherlock Holmes") by making an HTTP request. This large text serves as a realistic dataset for regex operations, simulating a real-world scenario where regex might be used to process large documents. ### Benchmark Method: `Count` - **Purpose**: The `Count` method is designed to measure the performance of searching for a specific pattern within a large text using regular expressions. The pattern `[0-9]{5}` is used to search for sequences of exactly five digits within the text. - **Performance Aspect**: This method tests both the execution speed of the regex search and the memory efficiency of the regex operation when compiled with `RegexOptions.Compiled`. Compiling a regex can improve its performance, especially in scenarios where the same regex is used multiple times. - **Importance**: Understanding the performance of regex operations is crucial in applications where text processing is a common task, as inefficient regex processing can lead to significant slowdowns and increased memory usage. This benchmark helps identify how well the .NET runtime optimizes regex operations, particularly in the context of large text processing. - **Expected Results/Insights**: From running this benchmark, one should expect to see how the performance (in terms of execution time and memory usage) of regex operations varies between .NET 7 and .NET 8. Improvements in the newer .NET version could indicate optimizations in the regex engine or the runtime's memory management. Conversely, a regression might highlight areas that need attention or optimization in application code when upgrading to a newer .NET version. In summary, this benchmark is a targeted test designed to provide insights into the efficiency of regex operations in .NET, with a focus on execution speed and memory usage. By comparing results across different .NET versions, developers can make informed decisions about upgrading and optimizing their applications for better performance.


Benchmark Comments: