.NET Performance Optimization: Using Structs Instead of Classes

.NET Performance Optimization: Using Structs Instead of Classes

We know that one obvious difference between C# and Java is that C# allows custom value types, which is the protagonist of today: struct. Since we already have the more convenient class, why did Microsoft add struct?

Last updated 5/5/2022 9:33 PM
InCerry
25 min read
Category
.NET
Tags
.NET C# Java Performance Optimization

Author: InCerry

Source: https://www.cnblogs.com/InCerry/archive/2022/05/05/Dotnet-Opt-Perf-Use-Struct-Instead-Of-Class.html

License: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Disclaimer: The copyright of this blog belongs to "InCerry".

1. Preface

We know that one obvious difference between C# and Java is that C# allows custom value types, which is the focus of today’s discussion: struct. Why did Microsoft add struct when we already have the more convenient class? This is precisely the performance optimization tip we’re going to talk about today: use structs instead of classes.

So what are the benefits of using structs instead of classes? In what scenarios should we use structs instead of classes? Today’s article will answer these questions one by one.

Note: All examples in this article are based on the x64 platform.

2. Real-World Case

Let’s take a real-world system as an example. Everyone knows the flight ticket booking process: first, choose the departure and arrival cities and airports (the route), then select a preferred flight and cabin based on your desired date and time, and finally pay.

![[https://img1.dotnet9.com/2022/05/1601.jpg]]

2.1 Memory Usage

There are approximately 49 airlines in China, over 8,000 routes, an average of 20 flights per route, and an average of 10 fare combinations per flight (economy, first class, plus various discount tiers). Generally, OTAs (Online Travel Agencies) allow booking flights up to one year in advance. This means the platform might have 8000*20*10*365 = ~500 million price records (all data above comes from the internet; actual business data is not disclosed).

To enable faster flight searches, OTAs load popular route price data from the database into memory (memory is much faster than network and disk transfers, as shown in the table below). Even taking only 20%, that’s about 100 million records in memory.

Operation Speed
Execute instruction 1/1,000,000,000 sec = 1 ns
Read data from L1 cache 0.5 ns
Branch prediction miss 5 ns
Read data from L2 cache 7 ns
Mutex lock/unlock 25 ns
Read data from main memory (RAM) 100 ns
Send 2KB data over 1Gbps network 20,000 ns
Read 1MB of data from memory 250,000 ns
Disk seek (mechanical hard drive) 8,000,000 ns
Read 1MB of data from disk 20,000,000 ns
Send a packet from the US to Europe and back 150 ms = 150,000,000 ns

Suppose we have the following class with these properties (in reality, it’s much more complex, with storage dimensions like route and date, and different selling rules for different flights; we simplify here for demonstration). How much memory would 100 million such records occupy in memory?

public class FlightPriceClass
{
    /// <summary>
    /// Airline two-letter code, e.g., Air China: CA
    /// </summary>
    public string Airline { get; set; }

    /// <summary>
    /// Departure airport three-letter code, e.g., Shanghai Hongqiao International Airport: SHA
    /// </summary>
    public string Start { get; set; }

    /// <summary>
    /// Arrival airport three-letter code, e.g., Beijing Capital International Airport: PEK
    /// </summary>
    public string End { get; set; }

    /// <summary>
    /// Flight number, e.g., CA0001
    /// </summary>
    public string FlightNo { get; set; }

    /// <summary>
    /// Cabin code, e.g., Y
    /// </summary>
    public string Cabin { get; set; }

    /// <summary>
    /// Price in yuan
    /// </summary>
    public decimal Price { get; set; }

    /// <summary>
    /// Departure date, e.g., 2017-01-01
    /// </summary>
    public DateOnly DepDate { get; set; }

    /// <summary>
    /// Departure time, e.g., 08:00
    /// </summary>
    public TimeOnly DepTime { get; set; }

    /// <summary>
    /// Arrival date, e.g., 2017-01-01
    /// </summary>
    public DateOnly ArrDate { get; set; }

    /// <summary>
    /// Arrival time, e.g., 08:00
    /// </summary>
    public TimeOnly ArrTime { get; set; }
}

We can write a Benchmark to see how much space 1 million records require, then extrapolate to 100 million.

// Pre-generate 1 million random data points to avoid calculation logic skewing results
public static readonly FlightPriceClass[] FlightPrices = Enumerable.Range(0,
        100_0000
    ).Select(index =>
        new FlightPriceClass
        {
            Airline = $"C{(char)(index % 26 + 'A')}",
            Start = $"SH{(char)(index % 26 + 'A')}",
            End = $"PE{(char)(index % 26 + 'A')}",
            FlightNo = $"{index % 1000:0000}",
            Cabin = $"{(char)(index % 26 + 'A')}",
            Price = index % 1000,
            DepDate = DateOnly.FromDateTime(BaseTime.AddHours(index)),
            DepTime = TimeOnly.FromDateTime(BaseTime.AddHours(index)),
            ArrDate = DateOnly.FromDateTime(BaseTime.AddHours(3 + index)),
            ArrTime = TimeOnly.FromDateTime(BaseTime.AddHours(3 + index)),
        }).ToArray();

// Store using class
[Benchmark]
public FlightPriceClass[] GetClassStore()
{
    var arrays = new FlightPriceClass[FlightPrices.Length];
    for (int i = 0; i < FlightPrices.Length; i++)
    {
        var item = FlightPrices[i];
        arrays[i] = new FlightPriceClass
        {
            Airline = item.Airline,
            Start = item.Start,
            End = item.End,
            FlightNo = item.FlightNo,
            Cabin = item.Cabin,
            Price = item.Price,
            DepDate = item.DepDate,
            DepTime = item.DepTime,
            ArrDate = item.ArrDate,
            ArrTime = item.ArrTime
        };
    }
    return arrays;
}

The results are shown in the image below.

![[https://img1.dotnet9.com/2022/05/1602.png]]

From the above chart, we can see that 1 million records require about 107MB of memory, so one object occupies approximately 112 bytes. For 100 million objects, that’s about 10.4GB. This is already quite large. Are there any ways to reduce memory usage? Some might suggest:

  • Use int for string IDs
  • Use long for timestamps
  • Compress data using algorithms like zip
  • etc.

For now, we won’t use these methods. In line with the title of this article, you can probably guess the approach: use a struct instead of a class. We define a struct with the same fields, as shown below.

[StructLayout(LayoutKind.Auto)]
public struct FlightPriceStruct
{
    // Properties identical to the class
    ......
}

We can use Unsafe.SizeOf to check the memory size required by the value type, like this:

![[https://img1.dotnet9.com/2022/05/1603.png]]

We can see the struct needs only 88 bytes, which is 27% less than the 112 bytes required by the class. Let’s see how much memory it saves in practice.

![[https://img1.dotnet9.com/2022/05/1604.png]]

The results are great! Memory is indeed reduced by 27% as calculated. Additionally, assignment speed is 57% faster, and more importantly, the number of GC collections is lower.

So why can a struct save so much memory? Let’s discuss the difference in how structs and classes store data. The diagram below shows the storage layout for an array of classes.

[Article Image - Class.drawio]

We can see that an array of classes only stores pointers to the reference elements, not the data directly. Each reference type instance has the following:

  • Object header: 8 bytes. In CoreCLR, this stores all additional information that needs to be loaded onto the object, such as the lock value or the cached hash code.
  • Method table pointer: 8 bytes. Points to the type’s description data, also known as the Method Table (MT), which contains GC info, field definitions, method definitions, etc.
  • Object placeholder: 8 bytes. The current GC requires that every object has at least one pointer-sized field. If the class is empty, besides the object header and method table pointer, it will still take up 8 bytes. If not empty, it stores the first field.

That means an empty class with nothing defined requires at least 24 bytes: 8 bytes object header + 8 bytes method table pointer + 8 bytes object placeholder.

Back to our case: since it’s not an empty class, each object needs an extra 16 bytes for the object header and method table pointer, in addition to the data storage. Moreover, the array needs 8 bytes for the pointer to the object, so each object stored in the array requires an extra 24 bytes. Now let’s look at value types (structs).

[Article Image - Struct.drawio]

From the above diagram, we can see that in a value type array, the data is stored directly in the array without indirection. Therefore, storing the same data, each empty struct saves 24 bytes (no object header, method table pointer, or instance pointer).

Also, the array itself (which is a reference type) has its own 24-byte overhead, and its object placeholder stores the array size (the first field of the array type).

We can use the NuGet package ObjectLayoutInspector to print the layout information. The class layout is shown below; besides the 88 bytes for data storage, there is an extra 16 bytes.

![[https://img1.dotnet9.com/2022/05/1607.png]]

The struct layout is shown below; each struct contains only actual data storage with no extra overhead.

![[https://img1.dotnet9.com/2022/05/1608.png]]

Can we save even more memory? On a 64-bit platform, a reference (pointer) is 8 bytes. By default, strings in C# use Unicode-16, meaning 2 bytes per character. For fields like airline codes and airport codes that are less than 4 characters, we can use char arrays to save memory, which is less than a pointer. Let’s modify the code.

// Skip local variable initialization
[SkipLocalsInit]
// Adjust layout to Explicit for custom layout
[StructLayout(LayoutKind.Explicit, CharSet = CharSet.Unicode)]
public struct FlightPriceStructExplicit
{
    // Need to manually specify offsets
    [FieldOffset(0)]
    // Airline stored as two characters
    public unsafe fixed char Airline[2];

    // Since airline uses 4 bytes, start airport offset by 4 bytes
    [FieldOffset(4)]
    public unsafe fixed char Start[3];

    // Similarly, start airport uses 6 bytes, offset by 10 bytes
    [FieldOffset(10)]
    public unsafe fixed char End[3];

    [FieldOffset(16)]
    public unsafe fixed char FlightNo[4];

    [FieldOffset(24)]
    public unsafe fixed char Cabin[2];

    // decimal 16 bytes
    [FieldOffset(28)]
    public decimal Price;

    // DateOnly 4 bytes
    [FieldOffset(44)]
    public DateOnly DepDate;

    // TimeOnly 8 bytes
    [FieldOffset(48)]
    public TimeOnly DepTime;
    [FieldOffset(56)]
    public DateOnly ArrDate;
    [FieldOffset(60)]
    public TimeOnly ArrTime;

}

Now let’s check the layout information for this new struct.

![[https://img1.dotnet9.com/2022/05/1609.png]]

We can see it now only needs 68 bytes. The last 4 bytes are for address alignment because the CPU word size is 64 bits; we don’t need to worry about that. According to our calculations, this saves 29% compared to the 88-byte version. However, using unsafe fixed char means we cannot directly assign values; we need to copy data. Here’s the code:

// Extension method for setting string values
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe void SetTo(this string str, char* dest)
{
    fixed (char* ptr = str)
    {
        Unsafe.CopyBlock(dest, ptr, (uint)(Unsafe.SizeOf<char>() * str.Length));
    }
}

// Benchmark method
public static unsafe FlightPriceStructExplicit[] GetStructStoreStructExplicit()
{
    var arrays = new FlightPriceStructExplicit[FlightPrices.Length];
    for (int i = 0; i < FlightPrices.Length; i++)
    {
        ref var item = ref FlightPrices[i];
        arrays[i] = new FlightPriceStructExplicit
        {
            Price = item.Price,
            DepDate = item.DepDate,
            DepTime = item.DepTime,
            ArrDate = item.ArrDate,
            ArrTime = item.ArrTime
        };
        ref var val = ref arrays[i];
        // Need to fix pointers first, then assign
        fixed (char* airline = val.Airline)
        fixed (char* start = val.Start)
        fixed (char* end = val.End)
        fixed (char* flightNo = val.FlightNo)
        fixed (char* cabin = val.Cabin)
        {
            item.Airline.SetTo(airline);
            item.Start.SetTo(start);
            item.End.SetTo(end);
            item.FlightNo.SetTo(flightNo);
            item.Cabin.SetTo(cabin);
        }
    }
    return arrays;
}

Let’s run the benchmark again to see if we save 29% memory.

![[https://img1.dotnet9.com/2022/05/1610.png]]

Yes, from 84MB to 65MB, a saving of approximately 29%. That’s pretty good and meets expectations.

However, we notice that Gen0, Gen1, Gen2 GC collections occur many times. In practice, since these are managed memory, the GC will scan this 65MB during collection, which could increase STW pauses. Since this data is cached and won’t change or be freed for a while, can we tell the GC not to scan it? Yes, we can use unmanaged memory directly with the Marshal class, similar to using malloc in C.

// Allocate unmanaged memory
// Parameter: number of bytes to allocate
// Return: pointer to the memory
IntPtr Marshal.AllocHGlobal(int cb);

// Free allocated unmanaged memory
// Parameter: pointer address of the memory allocated by Marshal
void Marshal.FreeHGlobal(IntPtr hglobal);

Let’s modify the Benchmark code to use unmanaged memory.

// Define out ptr parameter to return the pointer
public static unsafe int GetStructStoreUnManageMemory(out IntPtr ptr)
{
    // Allocate memory using AllocHGlobal with size = struct size * count
    var unManagerPtr = Marshal.AllocHGlobal(Unsafe.SizeOf<FlightPriceStructExplicit>() * FlightPrices.Length);
    ptr = unManagerPtr;
    // Assign the memory space to a Span<FlightPriceStructExplicit> array
    var arrays = new Span<FlightPriceStructExplicit>(unManagerPtr.ToPointer(), FlightPrices.Length);
    for (int i = 0; i < FlightPrices.Length; i++)
    {
        ref var item = ref FlightPrices[i];
        arrays[i] = new FlightPriceStructExplicit
        {
            Price = item.Price,
            DepDate = item.DepDate,
            DepTime = item.DepTime,
            ArrDate = item.ArrDate,
            ArrTime = item.ArrTime
        };
        ref var val = ref arrays[i];
        fixed (char* airline = val.Airline)
        fixed (char* start = val.Start)
        fixed (char* end = val.End)
        fixed (char* flightNo = val.FlightNo)
        fixed (char* cabin = val.Cabin)
        {
            item.Airline.SetTo(airline);
            item.Start.SetTo(start);
            item.End.SetTo(end);
            item.FlightNo.SetTo(flightNo);
            item.Cabin.SetTo(cabin);
        }
    }
    // Return length
    return arrays.Length;
}

// Remember: unmanaged memory must be manually released when not needed
[Benchmark]
public void GetStructStoreUnManageMemory()
{
    _ = FlightPriceCreate.GetStructStoreUnManageMemory(out var ptr);
    // Free unmanaged memory
    Marshal.FreeHGlobal(ptr);
}

Let’s check the Benchmark results.

![[https://img1.dotnet9.com/2022/05/1611.png]]

The result is amazing! No space is allocated on the managed heap, assignment speed is much faster than before, and when GC occurs later, it does not need to scan this memory, reducing GC pressure. This result is quite satisfactory.

Now, storing 100 million records would take about 6.3GB. With other methods mentioned above, we could reduce it further. For example, as in the code below, use enums for strings, store amounts in 'cents', and only store timestamps.

[StructLayout(LayoutKind.Explicit, CharSet = CharSet.Unicode)]
[SkipLocalsInit]
public struct FlightPriceStructExplicit
{
    // Use byte to identify airline (range 0~255)
    [FieldOffset(0)]
    public byte Airline;

    // Use unsigned short for airport and flight number (2^16 range)
    [FieldOffset(1)]
    public UInt16 Start;

    [FieldOffset(3)]
    public UInt16 End;

    [FieldOffset(5)]
    public UInt16 FlightNo;

    [FieldOffset(7)]
    public byte Cabin;

    // Instead of decimal, store price in cents
    [FieldOffset(8)]
    public long PriceFen;

    // Use timestamp instead
    [FieldOffset(16)]
    public long DepTime;

    [FieldOffset(24)]
    public long ArrTime;
}

The final result: each record only needs 32 bytes, so storing 100 million records would be less than 3GB.

![[https://img1.dotnet9.com/2022/05/1612.png]]

We will not continue discussing these approaches in this article.

2.2 Computational Speed

Now, are there any issues with using structs? Let’s look at computation. The task is simple: filter out routes that meet certain criteria. Both the class and struct define methods as below. The Explicit struct is special; we use Span for comparison.

// Methods defined by class and struct (in reality, filtering can be more complex)
// Compare airline
public bool EqualsAirline(string airline)
{
    return Airline == airline;
}
// Compare departure airport
public bool EqualsStart(string start)
{
    return Start == start;
}
// Compare arrival airport
public bool EqualsEnd(string end)
{
    return End == end;
}
// Compare flight number
public bool EqualsFlightNo(string flightNo)
{
    return FlightNo == flightNo;
}
// Is price less than a specified value
public bool IsPriceLess(decimal min)
{
    return Price < min;
}
// For Explicit struct, define EqualsSpan method
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe bool SpanEquals(this string str, char* dest, int length)
{
    // Compare two arrays using Span
    return new Span<char>(dest, length).SequenceEqual(str.AsSpan());
}

// Implementation methods as below
public static unsafe bool EqualsAirline(FlightPriceStructExplicit item, string airline)
{
    // Pass the required length
    return airline.SpanEquals(item.Airline, 2);
}
// Similar for other methods, not repeated
public static unsafe bool EqualsStart(FlightPriceStructExplicit item, string start)
{
    return start.SpanEquals(item.Start, 3);
}
public static unsafe bool EqualsEnd(FlightPriceStructExplicit item, string end)
{
    return end.SpanEquals(item.End, 3);
}
public static unsafe bool EqualsFlightNo(FlightPriceStructExplicit item, string flightNo)
{
    return flightNo.SpanEquals(item.FlightNo, 4);
}
public static unsafe bool EqualsCabin(FlightPriceStructExplicit item, string cabin)
{
    return cabin.SpanEquals(item.Cabin, 2);
}
public static bool IsPriceLess(FlightPriceStructExplicit item, decimal min)
{
    return item.Price < min;
}

The final Benchmark code is as follows. The logic is the same for each storage type. Since 1 million records run too quickly, we use 1.5 million records for each storage type.

// Initialize required data to avoid impacting the test
private static readonly FlightPriceClass[] FlightPrices = FlightPriceCreate.GetClassStore();
private static readonly FlightPriceStruct[] FlightPricesStruct = FlightPriceCreate.GetStructStore();
private static readonly FlightPriceStructUninitialized[] FlightPricesStructUninitialized =
    FlightPriceCreate.GetStructStoreUninitializedArray();
private static readonly FlightPriceStructExplicit[] FlightPricesStructExplicit =
    FlightPriceCreate.GetStructStoreStructExplicit();
// Unmanaged memory is special; only the pointer address needs to be stored
private static IntPtr _unManagerPtr;
private static readonly int FlightPricesStructExplicitUnManageMemoryLength =
    FlightPriceCreate.GetStructStoreUnManageMemory(out _unManagerPtr);
[Benchmark(Baseline = true)]
public int GetClassStore()
{
    var caAirline = 0;
    var shaStart = 0;
    var peaStart = 0;
    var ca0001FlightNo = 0;
    var priceLess500 = 0;
    for (int i = 0; i < FlightPrices.Length; i++)
    {
        // Simple filtering logic
        var item = FlightPrices[i];
        if (item.EqualsAirline("CA"))caAirline++;
        if (item.EqualsStart("SHA"))shaStart++;
        if (item.EqualsEnd("PEA"))peaStart++;
        if (item.EqualsFlightNo("0001"))ca0001FlightNo++;
        if (item.IsPriceLess(500))priceLess500++;
    }
    Debug.WriteLine($"{caAirline},{shaStart},{peaStart},{ca0001FlightNo},{priceLess500}");
    return caAirline + shaStart + peaStart + ca0001FlightNo + priceLess500;
}
[Benchmark]
public int GetStructStore()
{
    var caAirline = 0;
    var shaStart = 0;
    var peaStart = 0;
    var ca0001FlightNo = 0;
    var priceLess500 = 0;
    for (int i = 0; i < FlightPricesStruct.Length; i++)
    {
        var item = FlightPricesStruct[i];
        if (item.EqualsAirline("CA"))caAirline++;
        if (item.EqualsStart("SHA"))shaStart++;
        if (item.EqualsEnd("PEA"))peaStart++;
        if (item.EqualsFlightNo("0001"))ca0001FlightNo++;
        if (item.IsPriceLess(500))priceLess500++;
    }
    Debug.WriteLine($"{caAirline},{shaStart},{peaStart},{ca0001FlightNo},{priceLess500}");
    return caAirline + shaStart + peaStart + ca0001FlightNo + priceLess500;
}
[Benchmark]
public int GetFlightPricesStructExplicit()
{
    var caAirline = 0;
    var shaStart = 0;
    var peaStart = 0;
    var ca0001FlightNo = 0;
    var priceLess500 = 0;
    for (int i = 0; i < FlightPricesStructExplicit.Length; i++)
    {
        var item = FlightPricesStructExplicit[i];
        if (FlightPriceStructExplicit.EqualsAirline(item,"CA"))caAirline++;
        if (FlightPriceStructExplicit.EqualsStart(item,"SHA"))shaStart++;
        if (FlightPriceStructExplicit.EqualsEnd(item,"PEA"))peaStart++;
        if (FlightPriceStructExplicit.EqualsFlightNo(item,"0001"))ca0001FlightNo++;
        if (FlightPriceStructExplicit.IsPriceLess(item,500))priceLess500++;
    }
    Debug.WriteLine($"{caAirline},{shaStart},{peaStart},{ca0001FlightNo},{priceLess500}");
    return caAirline + shaStart + peaStart + ca0001FlightNo + priceLess500;
}
[Benchmark]
public unsafe int GetFlightPricesStructExplicitUnManageMemory()
{
    var caAirline = 0;
    var shaStart = 0;
    var peaStart = 0;
    var ca0001FlightNo = 0;
    var priceLess500 = 0;
    var arrays = new Span<FlightPriceStructExplicit>(_unManagerPtr.ToPointer(), FlightPricesStructExplicitUnManageMemoryLength);
    for (int i = 0; i < arrays.Length; i++)
    {
        var item = arrays[i];
        if (FlightPriceStructExplicit.EqualsAirline(item,"CA"))caAirline++;
        if (FlightPriceStructExplicit.EqualsStart(item,"SHA"))shaStart++;
        if (FlightPriceStructExplicit.EqualsEnd(item,"PEA"))peaStart++;
        if (FlightPriceStructExplicit.EqualsFlightNo(item,"0001"))ca0001FlightNo++;
        if (FlightPriceStructExplicit.IsPriceLess(item,500))priceLess500++;
    }
    Debug.WriteLine($"{caAirline},{shaStart},{peaStart},{ca0001FlightNo},{priceLess500}");
    return caAirline + shaStart + peaStart + ca0001FlightNo + priceLess500;
}

The Benchmark results are as follows.

![[https://img1.dotnet9.com/2022/05/1613.png]]

We can see that using a simple struct is slightly slower than using a class, but the Explicit layout and unmanaged memory versions are much slower, more than double the time. Can’t we have both low memory and high speed?

Let’s analyze why the latter two approaches are slower. The reason is value copying. In C#, reference types are passed by reference by default, while value types are passed by value.

  • When calling a method with a reference type, only one copy is needed, with the length equal to the CPU word size: 4 bytes on a 32-bit system, 8 bytes on a 64-bit system.
  • When calling a method with a value type, the value is copied. For example, if the value takes 4 bytes, 4 bytes are copied. This is an advantage when the size is less than or equal to the CPU word size, but becomes a disadvantage when larger.

Our structs are much larger than the CPU word size (64-bit, 8 bytes), and the later code implementations performed multiple value copies, which slowed down the overall speed.

Is there a way to avoid value copying? Yes, value types in C# can also be passed by reference using the ref keyword. Just add ref where value copying occurs. The code is as follows:

// Modify comparison methods to support pass by reference
// Add ref
public static unsafe bool EqualsAirlineRef(ref FlightPriceStructExplicit item, string airline)
{
    // Since it’s a reference, we need to fix the pointer
    fixed(char* ptr = item.Airline)
    {
        return airline.SpanEquals(ptr, 2);
    }
}

// Also modify the Benchmark code to pass by reference
[Benchmark]
public unsafe int GetStructStoreUnManageMemoryRef()
{
    var caAirline = 0;
    var shaStart = 0;
    var peaStart = 0;
    var ca0001FlightNo = 0;
    var priceLess500 = 0;
    var arrays = new Span<FlightPriceStructExplicit>(_unManagerPtr.ToPointer(), FlightPricesStructExplicitUnManageMemoryLength);
    for (int i = 0; i < arrays.Length; i++)
    {
        // Get direct reference from array
        ref var item = ref arrays[i];
        // Pass reference directly
        if (FlightPriceStructExplicit.EqualsAirlineRef(ref item,"CA"))caAirline++;
        if (FlightPriceStructExplicit.EqualsStartRef(ref item,"SHA"))shaStart++;
        if (FlightPriceStructExplicit.EqualsEndRef(ref item,"PEA"))peaStart++;
        if (FlightPriceStructExplicit.EqualsFlightNoRef(ref item,"0001"))ca0001FlightNo++;
        if (FlightPriceStructExplicit.IsPriceLessRef(ref item,500))priceLess500++;
    }
    Debug.WriteLine($"{caAirline},{shaStart},{peaStart},{ca0001FlightNo},{priceLess500}");
    return caAirline + shaStart + peaStart + ca0001FlightNo + priceLess500;
}

Let’s run the results again. Our Explicit struct takes the lead, being 33% faster than using classes. The unmanaged memory version from the previous round also performed well, ranking second.

![[https://img1.dotnet9.com/2022/05/1614.png]]

So why, when both are passed by reference, is the class slower? This relates to more fundamental CPU knowledge. Besides basic computation units, the CPU has L1, L2, and L3 caches, as shown below.

![[https://img1.dotnet9.com/2022/05/1615.png]]

![[https://img1.dotnet9.com/2022/05/1616.png]]

This is related to CPU performance. Remember the table at the beginning of the article? The CPU’s internal cache is the fastest. The first reason is that struct arrays store data in contiguous memory addresses, which is very cache-friendly. In contrast, class objects are reference types and require pointers for access, which is less cache-friendly.

The second reason is that accessing reference types requires dereferencing: finding the corresponding memory data through the pointer, which structs do not need.

How can we verify this? BenchmarkDotNet provides indicators for this. Simply add the NuGet package BenchmarkDotNet.Diagnostics.Windows and add the following code to the class being benchmarked:

[HardwareCounters(
    HardwareCounter.LlcMisses, // Cache miss count
    HardwareCounter.LlcReference)]  // Dereference count
public class SpeedBench : IDisposable
{
    ......
}

The results are shown below. Because additional Windows ETW information needs to be collected, it runs a bit slower.

![[https://img1.dotnet9.com/2022/05/1617.png]]

From the above chart, we can see that reference types have the highest number of cache misses and many dereferences, which slows down performance.

As shown in the diagram below, sequential storage of structs is more efficient for memory access than the scattered access pattern of reference types. Additionally, the smaller the object size, the more cache-friendly it is.

[Article Image - Class Cache.drawio]

[Article Image - Struct Cache.drawio]

3. Summary

In this article, we discussed how using structs instead of classes can significantly reduce memory usage and improve computational performance by nearly half. We also touched on the simple use of unmanaged memory in .NET. Structs are something I really like; they offer very efficient storage structures and excellent performance. However, you should not convert all classes to structs, as they have different applicable scenarios.

So when should we use structs and when should we use classes? Microsoft provides the following guidelines.

✔️ CONSIDER defining a struct instead of a class if instances of the type are small and commonly short-lived or are commonly embedded in other objects.

AVOID defining a struct unless the type has all of the following characteristics:

  • It logically represents a single value, similar to primitive types (int, double, etc.) – like our cache data, which is mostly primitive types.
  • It has an instance size less than 16 bytes – value copying overhead is significant, though now with ref there are more applicable scenarios.
  • It is immutable – in our example, the cached data does not change, so it has this characteristic.
  • It will not have to be boxed frequently – frequent boxing/unboxing has a large performance impact. In our scenario, functions are adapted with ref, so this is not a concern.

In all other cases, you should define your types as classes.

In fact, from these approaches, we can see that C# is a language with a low barrier to entry but a high ceiling. You can use C#’s syntactic features to quickly turn requirements into code. And when performance becomes a bottleneck, you can write C# code almost like C++ code, achieving performance comparable to C++.

4. Appendix

Keep Exploring

Related Reading

More Articles
Same category / Same tag 3/14/2024

C# and Java

In the dynamic and ever-evolving world of software development, Java and C# are two giants, each with its own unique strengths, philosophies, and ecosystems. This article provides an in-depth comparison of Java and C#, exploring their historical backgrounds, language features, performance metrics, cross-platform capabilities, and more.

Continue Reading