Monday, July 06, 2009

LINQ, Persistence Ignorance, and Testing with Data Context/Entity Framework

One thing I really try to avoid when writing tests is testing things that just aren’t going to break; or at least are of no concern to the application code I wrote. Database IO certainly fits this description, and yet I have found myself testing it by consequence of the desire to test Linq-based statements embedded in the data access code.

Since other people have encountered and attempted to solve the same problem, a few hours of reading showed that in order to remove database IO testing from my tests, and to improve the design of the data access layer, it is necessary to implement some pattern of persistence ignorance like the Repository.

In short form a repository allows business code to ask questions of the data layer without knowing anything more than the interfaces of data-storage objects. Want all the customers with outstanding invoices? Get a list of ICustomer objects back from the CustomerRepository.CustomersWithOutstandingInvoices method. As a business layer developer I have no idea what you did to return this list of customers, and don’t care. You don’t expose your DataContext (or other DAL implementation) to me, and I don’t write LINQ directly against the Customers table because the management of the DataContext and particulars of locating outstanding invoices may be too wonderful for me to know. Pretty standard separation of concerns, and very simple to implement. There are a few unique issues in a LINQ-centric world:

  1. How can I take advantage of delayed execution?
  2. How can LINQ best be supported outside the Repository
  3. How can I mock-up data-centric tests when I don’t have a database to execute against?

If you hide LINQ execution behind a repository you don’t really want to provide access to ObjectQuery objects or hand out Table<T> references, as you are tied immediately back into what you were hiding. This being the case you really need to execute the queries before they leave the repository*. Performance should come from good caching techniques, and by limiting the scope of queries through tight definition of access to the data.

*update* It isn't really necessary to disconnect the query in any way that breaks delayed execution. You can maintain all the benefits and still hand back an IQueryable<T>. However, it would then be up to developer discipline to not do further manipulation that takes advantage of the underlying data provider's specific types.

LINQ can work on any IQueryable, so handing back objects that can be used in LINQ syntax queries outside the context is as easy as handing back List<T>, or IQueryable<T> references. Referring back up to issue 1 above, these post-repository queries will be disconnected from the database.

When it comes to testing, we played around with a number of different mocks and tricks to fool LINQ into executing disconnected; fake data contexts, entities without db connection strings, overrides and events that modify queries, etc. It turns out that the simplest way (in my view) is to use the repository to simply remove the database from the equation – that was after all the goal of this whole exercise. However, rather than go with a repository directly exposed to business logic code, I prefer to put what I will call a Provider in front of the repository. This ‘Provider’ is really just another sort of repository as some envision the concept, except that it contains all the LINQ queries and specific entity related methods and takes an IRepository implementation that simply has CRUD operations exposed with internal knowledge of the DAL implementation contained in this simple repository. Why?

  1. It’s easier to create a general purpose IRepository interface for use in any project.
  2. With a simple IRepository interface you can create a simple general-purpose FakeRepository for use in testing.
  3. Once you can have LINQ statements dependent upon a single interface, you can inject the dependency and thereby replace the source of data during testing. You can test your LINQ query logic without going to the database.
  4. The logic used to obtain entity objects from methods like “CustomersWithOutstandingInvoices” is still removed from knowledge of the DAL, moving persistence ignorance as far up as we can.

How does this all come together?

  1. You create a concrete repository that has a concrete ObjectContext or DataContext.
  2. With EF ObjectContexts (my preference) you implement the Get<T>() functions on your repository by using CreateQuery methods:
    public IQueryable<T> Get<T>() where T : class
    {
    return _context.CreateQuery<T>(typeof(T).Name)
     as IQueryable<T>;
    }


  3. You create a concrete ‘Provider’ that takes an inject-able IRepository object in the constructor.

  4. Write your Provider LINQ queries against Get<T>() statements from the repository

  5. Write tests against your provider, inject a FakeRepository that has lists as it’s data source. Add objects to your Provider in test setup and confirm proper retrieval/manipulation in tests.



You can download our assembly with an IRepository/FakeRepository implementation from:

http://atgitesting.codeplex.com/


Inspiration and understanding from:

The Repository Pattern Explained

Andrew Peters’ Blog » Blog Archive » Fixing Leaky Repository Abstractions with LINQ

Diego Vega - Unit Testing Your Entity Framework Domain Classes

Dynamic Queries and LINQ Expressions - Rick Strahl's Web Log
Submit this story to DotNetKicks