Entity Framework Cache Busting

The DbContext in Entity Framework 6 automatically caches data that it retrieves from your database. This is useful, but sometimes data changes outside your context (perhaps by another user) and you end up with stale data. How can you force Entity Framework to reload the updated data from the database, and when should you do this?

There are several ways to manage this, depending on which version of Entity Framework you are using and what type of application you are writing.

  1. Disable Tracking using AsNoTracking()
  2. Throw away the DbContext and create a new one
  3. Use an ObjectQuery instead of a DBQuery and set MergeOptions
  4. Refresh the Entities
  5. Detatch the Entities
  6. Call GetDatabaseValues to get the updated values for a single Entity
  7. Use the stale data

Code related to this post can be found at https://github.com/codethug/EFCaching

Problem: Caching in Entity Framework 6

Before we get to the solution, we’ll take a look at the problem. How does Entity Framework handle caching in general? Let’s get a list of customers from the database.

1
2
3
var context = new MyDbContext();
var customers = context.Customers.Where(c => c.State == "VA")
.Take(2).ToList();

If we enumerate this customers object, we’ll see these results:

ID    Name  State
---   ----  -----
850   Sam   VA
851   Sue   VA

And if we use SQL Profiler, we can see that the LINQ query is translated by EF to this SQL query, which is sent to the database:

1
2
3
4
5
6
7
SELECT TOP (2)
[Extent1].[CustomerId] AS [CustomerId],
[Extent1].[Name] AS [Name],
[Extent1].[State] AS [State],
-- a bunch of other columns --
FROM [dbo].[Customers] AS [Extent1]
WHERE [Extent1].[State] = 'VA'

That’s not surprising. Next, we’ll update an existing record - outside of the context we created, as if another user had updated this record. It turns out we had one of our customer’s names wrong, so the other user corrected it:

1
UPDATE Customers SET Name = 'Susan' WHERE CustomerID = 851

We’ll try to get the updated data, reusing the context from before:

1
customers = context.Customers.Where(c => c.State == "VA").Take(2).ToList();

If we check out SQL Profiler, sure enough, the exact same SQL Query as above is sent to the database again - and that SQL query returns the name ‘Susan’ for CustomerID 851. So far, so good. The updated data from the database is returned.

However, check out what happens if we then enumerate our customers object:

ID    Name  State
---   ----  -----
850   Sam   VA
851   Sue   VA

The name is still the old name - ‘Sue’. Why am I not seeing the name ‘Susan’? What’s going on?

It turns out that Entity Framework uses the Identity Map pattern. This means that once an entity with a given key is loaded in the context’s cache, it is never loaded again for as long as that context exists. So when we hit the database a second time to get the customers, it retrieved the updated 851 record from the database, but because customer 851 was already loaded in the context, it ignored the newer record from the database (more details).

That’s the problem. What can we do about this? We have several options.

1. Disable Tracking using AsNoTracking()

You can instruct Entity Framework to bypass the cache when making a query by using the AsNoTracking() method:

1
context.Customers.Where(c => c.State == "VA").Take(2).AsNoTracking();

This will cause Entity Framework to retrieve the data from the database, map it to the appropriate C# classes, and return a collection of them to you. Nothing is added to the context’s cache, and nothing is read from the cache.

This can be a good option if these entities are read only. Nonetheless, if you do need to end up editing entities retrieved with AsNoTracking() and save the updated entities, you can always attach the edited entities to the context.

More Information: http://www.c-sharpcorner.com/UploadFile/ff2f08/entity-framework-and-asnotracking/

2. Throw away the DbContext and create a new one

The reason we saw a problem above is because we reused the context between the two queries. If we throw away the first context and use a new context for the second query, then nothing is cached and we get the updated data on the second query.

1
2
3
4
5
6
7
8
9
10
11
12
13
using (context1 = new MyDbContext())
{
var customers = context1.Customers.Where(c => c.State == "VA")
.Take(2).ToList();
}

// Update a customer between these two calls

using (context2 = new MyDbContext())
{
var updatedCustomers = context2.Customers.Where(c => c.State == "VA")
.Take(2).ToList();
}

If the data in the database is updated between the two queries, the second query will return the updated data, because context2 has a brand new, empty cache.

How long should you hang on to a context before you throw it away? Here is Microsoft’s recommendation on how long to hang on to a context that you create:

When working with Web applications, use a context instance per request.
When working with Windows Presentation Foundation (WPF) or Windows Forms, use a context instance per form. This lets you use change-tracking functionality that context provides.
https://msdn.microsoft.com/en-gb/data/jj729737.aspx

3. Use an ObjectQuery instead of a DBQuery and set MergeOptions

You probably have a DbContext that has DbSet properties, like this:

1
2
3
4
public class MyDbContext : DbContext
{
public DbSet<Customer> Customers { get; set; }
}

When you then query for some customers, you’ll receive back an IQueryable<Customer> object (an object with a type of DbQuery<Customer>).

1
2
3
4
var context = new MyDbContext();
// californiaCustomers is a DbQuery<Customer>
// (which is a IQueryable<Customer>)
var californiaCustomers = context.Customers.Where(c => c.State == "VA");

However, if we create an ObjectQuery instead of a DbQuery, we can set the MergeOption property on the ObjectQuery. This can be set to one of 4 enumerated values: AppendOnly, NoTracking, OverwriteChanges, and PreserveChanges. Using NoTracking does effectively the same thing as the AsNoTracking() extension method we just discussed. In our case, we want to use OverwriteChanges, which does the following:

OverwriteChanges: Objects that do not exist in the object context are attached to the context. If an object is already in the context, the current and original values of object’s properties in the entry are overwritten with data source values. The state of the object’s entry is set to Unchanged, no properties are marked as modified.
https://msdn.microsoft.com/en-us/library/system.data.objects.mergeoption%28v=vs.110%29.aspx

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
using (var context = new MyDbContext(),
// Get ObjectContext from DBContext
objectContext = ((IObjectContextAdapter)context).ObjectContext) {

// Construct an ObjectQuery
var customers = objectContext.CreateObjectSet<Customer>()
.Where(c => c.State == "VA").Take(2));

// Set the MergeOption property
(customers as ObjectQuery<Customer>).MergeOption =
MergeOption.OverwriteChanges;

// Do something with your data
foreach(var customer in customers) { ... }
}

First, we retrieve the ObjectContext from the DbContext. DbContext is a wrapper around ObjectContext that makes many things easier, and allows for a Code First EF approach. However, sometimes you need to get at the underlying ObjectContext to do things that DbContext doesn’t support. This is one of those times.

Once we have the ObjectContext, we create a ObjectSet<Customer>, off which we can build a LINQ query. Once we’ve written our query, we have a ObjectQuery<Customer>, but the LINQ query returns it as an IQueryable<Customer>. Once we cast it to an ObjectQuery<Customer>, we can access the MergeOption property and set it to MergeOption.OverwriteChanges.

That will allow you to force EF to update the data in the cache with data from the database using a specific LINQ query.

4. Refresh the Entities

Another way to get Entity Framework to update entities in the cache is to call the Refresh method. The first way to do this is by issuing one SQL command for each entity that you want to refresh.

1
2
3
4
5
var customersToRefresh = context.Customers.Where(c => someFilter(c));
foreach(var customer in customersToRefresh)
{
context.Entry(customer).Reload();
}

This will suffice if you only have a couple of entities to refresh, but if you have a large collection to reload, you’ll hit the database for each and every entity that you reload, which can be a serious performance problem.

You can also reload data using the Refresh method on the ObjectContext.

1
2
3
var objectContext = ((IObjectContextAdapter) context).ObjectContext;
var objectsToRefresh = new Customer[] { jim, sue };
objectContext.Refresh(RefreshMode.StoreWins, objectsToRefresh);

This is better than calling Reload(), because it batches all of the queries together. However, it will have to look up every entity by it’s ID, which can also cause performance problems if you’re looking up more than a few. Here is what the SQL looks like for Reload():

1
2
3
4
5
6
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
[Extent1].[State] AS [State]
FROM [dbo].[Customers] AS [Extent1]
WHERE [Extent1].[ID] IN (124,123)

5. Detatch the entities

You can explicitly remove an entity from the cache. Then, the next time you retrieve it from the database, the data from the database will be loaded into the cache.

1
2
3
4
5
6
// Load data from cache
var sue = context.Customers.Where(c => c.ID == 15).First();
// Detatch sue, removing her entity from the cache
context.Entry(sue).State = EntityState.Detatched;
// Reload the data from the database into the cache
sue = context.Customers.Where(c => c.ID == 15).First();

6. Call GetDatabaseValues to get the updated values for a single Entity

Another way to bypass the DbContext cache is to call GetDatabaseValues(). This will not affect the cache at all, but will query the database for the latest data for a particular entity, and return a dictionary with the latest data.

1
2
3
4
5
6
7
8
9
10
// Retrieve stale entity from cache
var sue = context1.Customers.First(c => c.ID == 15);
Console.WriteLine(sue.Name); // outputs "Sue", the stale data in the cache

// Use GetDatabaseValues to get a dictionary of the current
// Database values (ignoring the cache)
DbPropertyValues sueDbValues = context1.Entry(sueNotUpdated)
.GetDatabaseValues();
// outputs "Susan", the latest data from the database
console.WriteLine(sueDbValues["Name"]);

For more information, see https://msdn.microsoft.com/en-us/data/jj592677.aspx.

7. Use the stale data.

Sometimes, the correct approach is to use the stale data in the cache. You don’t always need the latest, most accurate data. Sometimes old data is good enough.

Caution: Deleted Data

If another user has deleted data in the database and you have entities for those deleted items, the entities will not automatically disappear or be set to null. They won’t even automatically have their Entity State set to EntityState.Deleted. In order to get the news that these entities have been deleted, you must use one of the methods in this post to get the updated data and notice that the entity no longer exists in the result set.

If you retrieve Customer entities from your database and also Include related Invoice entities, and the related data is updated by another user, making your Invoice entities stale, and you use the strategies in this post to force the context to update its cache with the latest Customer entities, it will not automatically update the related Invoice entities.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Retrieve customers with related invoices
var customer = context.Customers
.Where(c => c.State == "VA")
.Include(c => c.Invoices)
.First();

// Some other user updates one of the Invoices you retrieved.
// Your invoice data is now stale

// Force the context to reload your customer the next time you query for it
context.Entry(customer).State = EntityState.Detatched;

// Reload the customer. You'll still have stale Invoice Data
var customer = context.Customers
.Where(c => c.State == "VA")
.Include(c => c.Invoices)
.First();

If you want to get the latest Invoice data, you’ll need to clear the cache for the Invoices before you query the context again.

1
2
3
4
5
6
7
8
9
10
11
12
// Detatch your invoices from the context, forcing them
// to be reloaded on the next query
foreach(var invoice in customer.Invoices)
{
context.Entry(invoice).State = EntityState.Detatched;
}

// Reload the customer. This time you'll see the updated Invoice entities
var customer = context.Customers
.Where(c => c.State == "VA")
.Include(c => c.Invoices)
.First();

Good News: Added Data

The good news is that if another user adds data to the database, Entity Framework’s caching mechanism will allows it to pick up on those changes, even if you don’t use any of the ideas outlined in this post. If you have a context where you query the database, another user adds a record, and then you reuse your original context to query the database again, the second query will pick up any new records that match the query.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
using (var context1 = new EFTestContext())
{
// Get count of customers in Virginia
var numInVABefore = context1.Customers.Where(c => c.State == "VA")
.ToList().Count;

// Add new customer using other context (simulating a second user)
int jamesID;
using (var context2 = new EFTestContext())
{
var james = context2.AddCustomer("James", "VA");
context2.SaveChanges();
jamesID = james.ID;
}

// Verify new record exists in database
using (var context3 = new EFTestContext())
{
var jamesExists = context3.Customers
.FirstOrDefault(c => c.ID == jamesID);
jamesExists.Should().NotBeNull();
}

// And our original context shows the new record when queried
var numInVAAfter = context1.Customers.Where(c => c.State == "VA")
.ToList().Count;
numInVAAfter.Should().Be(numInVABefore + 1);
}

References: