C# and LINQ have been together since C# 3.0. It makes working with data sources, both local and remote, a pleasant experience with its' functional-style approach. However, LINQ does have its' pitfalls in the form of unnecessary memory allocations and sub-par algorithmic performance in few cases.

When writing abstractions on top of remote data access patterns, sometimes called the Data Access Layer, we might run into subtle bugs when using these abstractions in our application code. I will detail one such bug today.

The Bug

The job of the code which the bug had infested was simple:

  1. Fetch data from the database.
  2. Update the data.
  3. Return the updated data.

During our testing we found that our application had difficulties in carrying out this simple task: We were not receiving the updated data in our client!

Given that we were receiving some data, we were able to single out any issue with our database. We believed that we were not correctly updating the data before we sent it to the client.

The Investigation

We started out by observing the code which was serving the client’s request:

public List<Data> SendDataToClient()
{
	var dbResult = await dataAccessLayer.GetDbData<Data>();
	var data = dbResult.ToList();
	await UpdateData(data);
	return dbResult.ToList();
}

We checked the code of UpdateData to ensure that we were correctly updating the data. We could even observe that data had the updated information.

The Fix

The fix was really easy: We simply had to change the return statement:

public List<Data> SendDataToClient()
{
	var dbResult = await dataAccessLayer.GetDbData<Data>();
	var data = dbResult.ToList();
	await UpdateData(data);
	return data; // The fix
}

Congratulations, bug fixed!

That’s it?

Yes, that’s it for the bug.

No, there’s more at play here.

If you see, both data and dbResult serve the same purpose of providing access to a collection (or sequence) of Data objects. We also know that objects are reference types, this means that any change to an object in data should also have been reflected to the same object in dbResult.

This example demonstrates that behavior:

public class Data
{
	public string Property { get; set; }
}

var entity = new Data
{
	Property = "Nothing change me",
};

// Returns an IEnumerable<Data>
var dbResult = Enumerable.Repeat(entity, 1);

var data = dbResult.ToList();
data[0].Property = "Change is the only constant";

// Prints "Change is the only constant"
Console.WriteLine(dbResult.First().Property);

The above example looks similar to the erroneous code which spawned this post but they differ in their results. What could be the reason for this?

Peeling the Layer

The key insight to understand this behavior lies in the data access layer. Let’s look at the method which returns the entities:

public IEnumerable<T> GetDbData()
{
	var result = client.Query<T>(_ => true);
	return result.Select(r = > r); // The key!
}

(For the purpose of this post, I have chosen Azure Table Storage, as it has an open-source emulator available, but this access pattern is common in many of the .NET SDKs for working with remote data sources.)

As we can see, we query the database for our entities using the client and then return a IEnumerable<T> by performing a Select operation on the IEnumerable<T> returned by the client.

On this returned IEnumerable<Data>, we used it in two ways:

  1. Store the sequence items into a local list by calling ToList().
  2. Return it as a response for the client.

We expected that we are working with the same data in both these operations but that is not the case!

When we first call ToList() on the IEnumerable<Data>, the (database) client returns a new sequence of objects which we store into a list.

When we return the response, the client again returns a new sequence of objects.

As the client creates new objects for each time the query is enumerated through, we are working with different objects each time we enumerate through the client.

The following diagram illustrates what’s happening:

image.png

For each operation, we are working with new objects!

If we had our data access layer code like the following, then the original usage would’ve worked fine:

public IEnumerable<T> GetDbData()
{
	var result = client.Query<T>(_ => true).ToList();
	return result.Select(r = > r);
}

The above code will store all the results into a local list and then return an IEnumerable<T> by using that list. Any change to an object in the sequence will persist as ultimately the object in the list is being modified and all the sequences utilize the same list.

Closing Words