C# Object Binary Serialization Optimization: Achieving Extreme Compression with Bitfield Techniques

C# Object Binary Serialization Optimization: Achieving Extreme Compression with Bitfield Techniques

Demonstrates how to convert C# objects into binary form and optimize them to reduce packet size in network transmission.

Last updated 1/22/2024 12:33 AM
沙漠尽头的狼
18 min read
Category
.NET
Tags
.NET C# Binary

1. Introduction

In operating systems, process information is crucial for system monitoring and performance analysis. Suppose we need to develop a monitoring program that can capture process information from the current operating system and efficiently transmit it to other endpoints (such as a server or monitoring console). During this process, converting captured process objects into binary data and optimizing it to minimize packet size becomes a key challenge. This article explores how to use bit-field technology to optimize binary serialization of C# objects through a step-by-step analysis.

Operating system process information

First, we provide an example of field definitions for a process object. To transmit this object over the network (TCP/UDP), we need to convert it into binary format. The challenge lies in achieving the smallest possible packet size during this conversion.

Field Name Description Example
PID Process ID 10565
Name Process name 码坊
Publisher Publisher 沙漠尽头的狼
CommandLine Command line dotnet CodeWF.Tools.dll
CPU CPU (total processing utilization across all cores) 2.3%
Memory Memory (physical memory used by the process) 0.1%
Disk Disk (total utilization across all physical drives) 0.1 MB/s
Network Network (network utilization on the primary network) 0 Mbps
GPU GPU (highest utilization across all GPU engines) 2.2%
GPUEngine GPU engine GPU 0 - 3D
PowerUsage Power usage (impact of CPU, disk, and GPU on power consumption) Low
PowerUsageTrend Power usage trend (impact of CPU, disk, and GPU over time) Very low
Type Process type Application
Status Process status Efficiency mode

2. Optimization Process

2.1. Process Object Definition and Preliminary Analysis

We determined the data type for each field based on the example values.

Field Name Data Type Description Example
PID int Process ID 10565
Name string? Process name 码坊
Publisher string? Publisher 沙漠尽头的狼
CommandLine string? Command line dotnet CodeWF.Tools.dll
CPU string? CPU (total processing utilization across all cores) 2.3%
Memory string? Memory (physical memory used by the process) 0.1%
Disk string? Disk (total utilization across all physical drives) 0.1 MB/s
Network string? Network (network utilization on the primary network) 0 Mbps
GPU string? GPU (highest utilization across all GPU engines) 2.2%
GPUEngine string? GPU engine GPU 0 - 3D
PowerUsage string? Power usage (impact of CPU, disk, and GPU on power consumption) Low
PowerUsageTrend string? Power usage trend (impact of CPU, disk, and GPU over time) Very low
Type string? Process type Application
Status string? Process status Efficiency mode

Create a C# class SystemProcess to represent process information:

public class SystemProcess
{
    public int PID { get; set; }
    public string? Name { get; set; }
    public string? Publisher { get; set; }
    public string? CommandLine { get; set; }
    public string? CPU { get; set; }
    public string? Memory { get; set; }
    public string? Disk { get; set; }
    public string? Network { get; set; }
    public string? GPU { get; set; }
    public string? GPUEngine { get; set; }
    public string? PowerUsage { get; set; }
    public string? PowerUsageTrend { get; set; }
    public string? Type { get; set; }
    public string? Status { get; set; }
}

Define test data

private SystemProcess _codeWFObject = new SystemProcess()
{
    PID = 10565,
    Name = "码坊",
    Publisher = "沙漠尽头的狼",
    CommandLine = "dotnet CodeWF.Tools.dll",
    CPU = "2.3%",
    Memory = "0.1%",
    Disk = "0.1 MB/秒",
    Network = "0 Mbps",
    GPU = "2.2%",
    GPUEngine = "GPU 0 - 3D",
    PowerUsage = "低",
    PowerUsageTrend = "非常低",
    Type = "应用",
    Status = "效率模式"
};

2.2. Excluding JSON Serialization

Converting the object to a JSON string is the most common approach in web development because it is concise and easy to handle on both the frontend and backend:

public class SysteProcessUnitTest
{
    private readonly ITestOutputHelper _testOutputHelper;

    private SystemProcess _codeWFObject // defined earlier, omitted here

    public SysteProcessUnitTest(ITestOutputHelper testOutputHelper)
    {
        _testOutputHelper = testOutputHelper;
    }

    /// <summary>
    /// JSON serialization size test
    /// </summary>
    [Fact]
    public void Test_SerializeJsonData_Success()
    {
        var jsonData = JsonSerializer.Serialize(_codeWFObject);
        _testOutputHelper.WriteLine($"Json length: {jsonData.Length}");

        var jsonDataBytes = Encoding.UTF8.GetBytes(jsonData);
        _testOutputHelper.WriteLine($"JSON binary length: {jsonDataBytes.Length}");
    }
}
Standard Output: 
Json length: 366
JSON binary length: 366

Although JSON serialization is very popular in web development for its simplicity and ease of processing, in TCP/UDP network transmission, JSON serialization can lead to unnecessary packet size increase (redundant field name declarations). Therefore, we exclude JSON serialization and look for other more efficient binary serialization methods.

{
  "PID": 10565,
  "Name": "\u7801\u754C\u5DE5\u574A",
  "Publisher": "\u6C99\u6F20\u5C3D\u5934\u7684\u72FC",
  "CommandLine": "dotnet CodeWF.Tools.dll",
  "CPU": "2.3%",
  "Memory": "0.1%",
  "Disk": "0.1 MB/\u79D2",
  "Network": "0 Mbps",
  "GPU": "2.2%",
  "GPUEngine": "GPU 0 - 3D",
  "PowerUsage": "\u4F4E",
  "PowerUsageTrend": "\u975E\u5E38\u4F4E",
  "Type": "\u5E94\u7528",
  "Status": "\u6548\u7387\u6A21\u5F0F"
}

2.3. Binary Serialization Using BinaryWriter

Using the binary serialization helper class SerializeHelper from a previous article by the site owner, which uses BinaryWriter to convert objects to binary data (deserialization uses BinaryReader).

First, we make the SystemProcess class implement an empty interface INetObject and add the NetHeadAttribute attribute to the class (adds a packet header definition to facilitate identification during deserialization of multiple network objects; serialization will add a few extra bytes, mainly for system ID, network object ID, object version number, and other serialization auxiliary fields).

/// <summary>
/// Network object serialization interface
/// </summary>
public interface INetObject
{
}
[NetHead(1, 1)]
public class SystemProcess : INetObject
{
 	// Field definitions omitted
}

Then, we write a test method to verify the correctness of serialization and deserialization, and print the length of the serialized binary data.

/// <summary>
/// Binary serialization test
/// </summary>
[Fact]
public void Test_SerializeToBytes_Success()
{
    var buffer = SerializeHelper.SerializeByNative(_codeWFObject, 1);
    _testOutputHelper.WriteLine($"Binary length after serialization: {buffer.Length}");

    var deserializeObj = SerializeHelper.DeserializeByNative<SystemProcess>(buffer);
    Assert.Equal("码坊", deserializeObj.Name);
}
Standard Output: 
Binary length after serialization: 152

This is more than half the size of JSON (366 to 152, and with a few extra fields). The unit test above also verifies data correctness after deserialization. We will continue optimizing based on this foundation.

2.4. Data Type Adjustments

To further optimize the binary data size, we adjusted the data types. Through analysis of the process data example, we found that some fields could be represented more compactly. For example, CPU utilization can transmit only the numeric part (e.g., 2.3) without the percent sign; the process type can transmit only the enum value instead of a personalized string. Such adjustments reduce packet size.

Field Name Data Type Description Example
PID int Process ID 10565
Name string? Process name 码坊
Publisher string? Publisher 沙漠尽头的狼
CommandLine string? Command line dotnet CodeWF.Tools.dll
CPU float CPU (total processing utilization across all cores) 2.3
Memory float Memory (physical memory used by the process) 0.1
Disk float Disk (total utilization across all physical drives) 0.1
Network float Network (network utilization on the primary network) 0
GPU float GPU (highest utilization across all GPU engines) 2.2
GPUEngine byte GPU engine, 0: None, 1: GPU 0 - 3D 1
PowerUsage byte Power usage (impact of CPU, disk, and GPU), 0: Very low, 1: Low, 2: Medium, 3: High, 4: Very high 1
PowerUsageTrend byte Power usage trend (impact over time), 0: Very low, 1: Low, 2: Medium, 3: High, 4: Very high 0
Type byte Process type, 0: Application, 1: Background process 0
Status byte Process status, 0: Normal, 1: Efficiency mode, 2: Suspended 1

Modify the test data definition:

[NetHead(1, 2)]
public class SystemProcess2 : INetObject
{
    public int PID { get; set; }
    public string? Name { get; set; }
    public string? Publisher { get; set; }
    public string? CommandLine { get; set; }
    public float CPU { get; set; }
    public float Memory { get; set; }
    public float Disk { get; set; }
    public float Network { get; set; }
    public float GPU { get; set; }
    public byte GPUEngine { get; set; }
    public byte PowerUsage { get; set; }
    public byte PowerUsageTrend { get; set; }
    public byte Type { get; set; }
    public byte Status { get; set; }
}
/// <summary>
/// Ordinary optimized field data types
/// </summary>
private SystemProcess2 _codeWFObject2 = new SystemProcess2()
{
    PID = 10565,
    Name = "码坊",
    Publisher = "沙漠尽头的狼",
    CommandLine = "dotnet CodeWF.Tools.dll",
    CPU = 2.3f,
    Memory = 0.1f,
    Disk = 0.1f,
    Network = 0,
    GPU = 2.2f,
    GPUEngine = 1,
    PowerUsage = 1,
    PowerUsageTrend = 0,
    Type = 0,
    Status = 1
};

Add unit test as follows:

/// <summary>
/// Binary serialization test
/// </summary>
[Fact]
public void Test_SerializeToBytes2_Success()
{
    var buffer = SerializeHelper.SerializeByNative(_codeWFObject2, 1);
    _testOutputHelper.WriteLine($"Binary length after serialization: {buffer.Length}");

    var deserializeObj = SerializeHelper.DeserializeByNative<SystemProcess2>(buffer);
    Assert.Equal("码坊", deserializeObj.Name);
    Assert.Equal(2.2f, deserializeObj.GPU);
}

Test result:

Standard Output: 
Binary length after serialization: 99

Packet size is further reduced by about one-third, from 152 bytes to 99 bytes, thanks to adjusting some field data types from string? to float or byte.

2.5. Further Data Type Adjustments and Bit-Field Optimization

Going a step further, we introduced bit-field technology. Bit-fields allow us to control the memory layout of fields more finely, thereby further reducing binary data size. We redefined the field rules and used bit-fields to represent some enum value fields. In this way, we can significantly reduce packet size.

Comparing the previous table with the one below, the main adjustments involve two types of data type changes, with the following rules:

  • First type: Some fields are just enum values, represented by byte (8 bits). For instance, the process type has only two states (0: Application, 1: Background process), which can be represented by 1 bit (0, 1); power usage, for example, has only 5 states, which can be represented by 3 bits (can represent 6 states).
  • Second type: Some float data types, in practice we only require precision to one decimal place. The values represent percentages, so they will not exceed 1 (i.e., 100.0%). We can consider rounding, e.g., 23.3% passes 23.3, multiply by 10 to get 233, max not exceeding 1000 (i.e., 100.0, 100%). Another process parses the data and divides by 10 for use. Thus, the data type can be optimized from a float (4 bytes, 32 bits) to 10 bits (maximum value 1024).

According to this rule, we redefine the field rules as follows:

Field Name Data Type Description Example
PID int Process ID 10565
Name string? Process name 码坊
Publisher string? Publisher 沙漠尽头的狼
CommandLine string? Command line dotnet CodeWF.Tools.dll
Data byte[8] Fixed-size fields. Why 8 bytes? (Note: During deserialization, an additional 4 bytes are used to represent the byte[] length, so the Data field occupies 12 bytes total)

Detailed description of the fixed fields (Data):

Field Name Offset Size Description Example
CPU 0 10 CPU (total processing utilization across all cores), last digit indicates decimal place, e.g., 23 means 2.3% 23
Memory 10 10 Memory (physical memory used by the process), last digit indicates decimal place, e.g., 1 means 0.1%, value computed from basic info 1
Disk 20 10 Disk (total utilization across all physical drives), last digit indicates decimal place, e.g., 1 means 0.1% 1
Network 30 10 Network (network utilization on the primary network), last digit indicates decimal place, e.g., 253 means 25.3% 0
GPU 40 10 GPU (highest utilization across all GPU engines), last digit indicates decimal place, e.g., 253 means 25.3% 22
GPUEngine 50 1 GPU engine, 0: None, 1: GPU 0 - 3D 1
PowerUsage 51 3 Power usage (impact of CPU, disk, and GPU), 0: Very low, 1: Low, 2: Medium, 3: High, 4: Very high 1
PowerUsageTrend 54 3 Power usage trend (impact over time), 0: Very low, 1: Low, 2: Medium, 3: High, 4: Very high 0
Type 57 1 Process type, 0: Application, 1: Background process 0
Status 58 2 Process status, 0: Normal, 1: Efficiency mode, 2: Suspended 1

The table above shows the bit-field rules for some fixed example fields. Offset indicates the position of the field in the Data byte array (calculated in bits), and Size indicates the size occupied by the field in Data (also in bits). For example, the Memory field occupies bits 10 to 20 in the Data byte array.

Thus, the 10 fixed-size fields, originally 25 bytes, are optimized to 8 bytes (5 float fields from 4 bytes/32 bits to 10 bits each, single-byte 8-bit fields optimized to 2, 4, or 6 bits, i.e., 200 bits (25*8) optimized to 64 bits (actually 60 bits, but since the smallest unit for network transmission is a byte, it is rounded up to 8 bytes/64 bits)).

Modify the class definition as follows; pay attention to the comments in the code:

[NetHead(1, 3)]
public class SystemProcess3 : INetObject
{
    public int PID { get; set; }
    public string? Name { get; set; }
    public string? Publisher { get; set; }
    public string? CommandLine { get; set; }
    private byte[]? _data;
    /// <summary>
    /// Serialization: this is the actual data that needs to be serialized
    /// </summary>
    public byte[]? Data
    {
        get => _data;
        set
        {
            _data = value;

            // Key: convert byte array to object during deserialization for program use (bit-field operations)
            _processData = _data?.ToFieldObject<SystemProcessData>();
        }
    }

    private SystemProcessData? _processData;

    /// <summary>
    /// Process data: adding NetIgnoreMember will ignore this field during serialization
    /// </summary>
    [NetIgnoreMember]
    public SystemProcessData? ProcessData
    {
        get => _processData;
        set
        {
            _processData = value;

            // Key: convert object to byte array (bit-field serialization)
            _data = _processData?.FieldObjectBuffer();
        }
    }
}

public record SystemProcessData
{
    [NetFieldOffset(0, 10)] public short CPU { get; set; }
    [NetFieldOffset(10, 10)] public short Memory { get; set; }
    [NetFieldOffset(20, 10)] public short Disk { get; set; }
    [NetFieldOffset(30, 10)] public short Network { get; set; }
    [NetFieldOffset(40, 10)] public short GPU { get; set; }
    [NetFieldOffset(50, 1)] public byte GPUEngine { get; set; }
    [NetFieldOffset(51, 3)] public byte PowerUsage { get; set; }
    [NetFieldOffset(54, 3)] public byte PowerUsageTrend { get; set; }
    [NetFieldOffset(57, 1)] public byte Type { get; set; }
    [NetFieldOffset(58, 2)] public byte Status { get; set; }
}

Add unit test as follows:

/// <summary>
/// Extreme optimized field data types
/// </summary>
private SystemProcess3 _codeWFObject3 = new SystemProcess3()
{
    PID = 10565,
    Name = "码坊",
    Publisher = "沙漠尽头的狼",
    CommandLine = "dotnet CodeWF.Tools.dll",
    ProcessData = new SystemProcessData()
    {
        CPU = 23,
        Memory = 1,
        Disk = 1,
        Network = 0,
        GPU = 22,
        GPUEngine = 1,
        PowerUsage = 1,
        PowerUsageTrend = 0,
        Type = 0,
        Status = 1
    }
};

/// <summary>
/// Extreme binary serialization test
/// </summary>
[Fact]
public void Test_SerializeToBytes3_Success()
{
    var buffer = SerializeHelper.SerializeByNative(_codeWFObject3, 1);
    _testOutputHelper.WriteLine($"Binary length after serialization: {buffer.Length}");

    var deserializeObj = SerializeHelper.DeserializeByNative<SystemProcess3>(buffer);
    Assert.Equal("码坊", deserializeObj.Name);
    Assert.Equal(23, deserializeObj.ProcessData.CPU);
    Assert.Equal(1, deserializeObj.ProcessData.PowerUsage);
}

Test output:

Standard Output: 
Binary length after serialization: 86

Reduced from 99 to 86 bytes – a saving of 13 bytes, which is very significant in extreme network environments. For example, with 1 million data points, that would be 12.4 MB! The bit-field serialization and deserialization code is not detailed here (it can be dry and the site owner may not explain it clearly). The code looks like this:

public partial class SerializeHelper
{
    public static byte[] FieldObjectBuffer<T>(this T obj) where T : class
    {
        var properties = typeof(T).GetProperties();
        var totalSize = 0;

        // Calculate total bit length
        foreach (var property in properties)
        {
            if (!Attribute.IsDefined(property, typeof(NetFieldOffsetAttribute)))
            {
                continue;
            }

            var offsetAttribute =
                (NetFieldOffsetAttribute)property.GetCustomAttribute(typeof(NetFieldOffsetAttribute))!;
            totalSize = Math.Max(totalSize, offsetAttribute.Offset + offsetAttribute.Size);
        }

        var bufferLength = (int)Math.Ceiling((double)totalSize / 8);
        var buffer = new byte[bufferLength];

        foreach (var property in properties)
        {
            if (!Attribute.IsDefined(property, typeof(NetFieldOffsetAttribute)))
            {
                continue;
            }

            var offsetAttribute =
                (NetFieldOffsetAttribute)property.GetCustomAttribute(typeof(NetFieldOffsetAttribute))!;
            dynamic value = property.GetValue(obj)!; // Use dynamic to get property value
            SetBitValue(ref buffer, value, offsetAttribute.Offset, offsetAttribute.Size);
        }

        return buffer;
    }

    public static T ToFieldObject<T>(this byte[] buffer) where T : class, new()
    {
        var obj = new T();
        var properties = typeof(T).GetProperties();

        foreach (var property in properties)
        {
            if (!Attribute.IsDefined(property, typeof(NetFieldOffsetAttribute)))
            {
                continue;
            }

            var offsetAttribute =
                (NetFieldOffsetAttribute)property.GetCustomAttribute(typeof(NetFieldOffsetAttribute))!;
            dynamic value = GetValueFromBit(buffer, offsetAttribute.Offset, offsetAttribute.Size,
                property.PropertyType);
            property.SetValue(obj, value);
        }

        return obj;
    }

    /// <summary>
    /// Write value to buffer bit by bit
    /// </summary>
    /// <param name="buffer"></param>
    /// <param name="value"></param>
    /// <param name="offset"></param>
    /// <param name="size"></param>
    private static void SetBitValue(ref byte[] buffer, int value, int offset, int size)
    {
        var mask = (1 << size) - 1;
        buffer[offset / 8] |= (byte)((value & mask) << (offset % 8));
        if (offset % 8 + size > 8)
        {
            buffer[offset / 8 + 1] |= (byte)((value & mask) >> (8 - offset % 8));
        }
    }

    /// <summary>
    /// Read value from buffer bit by bit
    /// </summary>
    /// <param name="buffer"></param>
    /// <param name="offset"></param>
    /// <param name="size"></param>
    /// <param name="propertyType"></param>
    /// <returns></returns>
    private static dynamic GetValueFromBit(byte[] buffer, int offset, int size, Type propertyType)
    {
        var mask = (1 << size) - 1;
        var bitValue = (buffer[offset / 8] >> (offset % 8)) & mask;
        if (offset % 8 + size > 8)
        {
            bitValue |= (buffer[offset / 8 + 1] << (8 - offset % 8)) & mask;
        }

        dynamic result = Convert.ChangeType(bitValue, propertyType); // Convert based on property type
        return result;
    }
}

3. Optimization Results and Summary

Through step-by-step optimization, we reduced the size from the initial JSON serialization of 366 bytes to 152 bytes using ordinary binary serialization, and then further to 86 bytes using bit-field technology. This optimization is very significant for network transmission, especially when large amounts of data need to be transferred.

This article explores optimization methods for binary serialization of C# objects using an example case. By employing bit-field technology, we achieved extreme compression of packet size, improving network transmission efficiency. This is a pleasure for C/S program development and a reflection of pursuing extreme performance.

Finally, we provide the GitHub link to the test source code for readers' reference and study.

Bonus: The repository also contains the case code from the previous article "C# Million Object Serialization Deep Analysis: How to Achieve a Perfect Balance Between Speed and Volume in Network Transmission", as well as TCP/UDP server and client debugging programs.

Keep Exploring

Related Reading

More Articles