9/10/2010 Update #3: It appears that MS has deleted the bug on Microsoft Connect. Here is a cached version indexed from Google where you can see it has 34 upvotes, and 22 users saying they can reproduce. I blogged about this in January 2010, and the bug was opened on 3/17/2010. MS didn’t respond until 6/29/2010, when they said:
Sorry for not responding here. We are aware of performance issues with TPT hierarchies. We are currently investigating solutions and expect to make improvements in this area in a future release. This is definitely at the top of our query/perf optimizations list. I know this isn’t a great answer, and is not something that helps you today. Will keep you posted as we find out more, however. Hope this helps. Entity Framework TeamApparently “keep you posted” means “we’ll delete your posts”. I don’t want to jump to any conclusions, but it is certainly very odd. Especially when you consider that they have mentioned the problem on their own blog (though they don’t nearly describe how bad it actually is).
3/17/2010 Update #2: I have submitted this problem as a bug on Microsoft Connect. If you’ve been bitten by this bug, vote it up (click the “I can too” link indicating that you can reproduce the bug), and leave some helpful comments.
3/17/2010 Update: I’ve put together a quick sample program that illustrates this problem, if you want to try it for yourself.
Download EFInheritanceTest.zip.
The data project has the included SQL scripts to create a simple EFTest database, with a base class table, some sub class tables, and a bit of dummy data.
Download EFInheritanceTest.zip.
The data project has the included SQL scripts to create a simple EFTest database, with a base class table, some sub class tables, and a bit of dummy data.
I’ll keep the intro short and sweet. Everyone says you’ll eventually get burned by Microsoft data access strategies, because they eventually get abandoned. Well, my turn finally came. And it’s not that Entity Framework is being abandoned, it’s just that it DOESN’T WORK.
</intro>
Regarding inheritance mapping, EF advertises that it offers you 3 solutions:
- TPT: Table per type
- TPH: Table per hierarchy
- TPC: Table per concrete class
Let’s say I have a line of business application where users answer a bunch of Yes/No questions, and depending on their answers, fill out a series of forms, each form having a unique set of fields that we want to capture end-user data for.
Q: Did you open a bank account?
Yes
Name of bank:
Date account opened:
Purpose of account:
etc.
EVERY form has a common set of fields as well: The end-user that the particular instance belongs to, the date it was created, whatever. Now, when I have a User object, and I want to see all the forms he’s filled out, I don’t want to have to access 20 different collections of the User object, one collection for each form type:
User user = User.GetByKey(20); user.NewAccountForms; user.TaxForms; user.ReplacedLimbsForms; user.CentralServicesForms;
I want an IList<FormBase> or something similar. This means I need to use inheritance, and each form needs to be derived from my FormBase class (actually named FormInstance in this case).
If you need this type of design, or think that your software at some point in the future will require this type of design, DO NOT use the Entity Framework. The Table-Per-Type inheritance is 100% unusable. If you feel that one of the other forms of inheritance suits you, well, you may be crazy, but just the same….tread lightly. (And please don’t have 50 different discriminator values in a table that’s 200 columns wide…the SQL gods will strike you down).
So what’s wrong with EF’s TPT inheritance? The SQL that EF generates. With some simple testing (2 or 3 subclasses) everything may appear to be fine. In fact, you may even be impressed that EF manages to fetch all the data you need in a single trip, indeed, in a single query. Wow, their SQL generation must be top notch… But once you start adding some more subclasses, surprise! Your application grinds to a halt. So how bad is it? I decided to find out definitively.
Results
The setup is very basic. Here’s the Linq query I’m executing against my model:var q = from insts in context.FormInstances select insts; List<FormInstances> instances = q.ToList();Very basic. Give me the FormInstances. FormInstance is the base class. I started with just a single derived class, ran my test 500 times, averaging the time it took to execute my simple query. Then I added a 2nd derived class, a 3rd one, etc etc. The base class has less than 10 fields. Each derived class (to keep things consistent) is exactly the same, with 3 simple fields. Also, the tables are empty. Yes, empty. No data! So the performance numbers here strictly represent the time it takes EF to generate its abhorrent SQL, and the amount of time it takes SQL Server to grind it out.
Most of these are averaged over 500 executions. Some of the later ones were fewer, because it started to take over an hour to execute.
| Num Derived Classes | Avg ms for query generation and execution | Lines of SQL Generated |
| 1 | 16 | 80 |
| 2 | 35 | 106 |
| 3 | 67 | 152 |
| 4 | 119 | 214 |
| 5 | 203 | 292 |
| 6 | 340 | 386 |
| 7 | 523 | 496 |
| 8 | 784 | 622 |
| 9 | 1137 | 764 |
| 10 | 1608 | 922 |
| 11 | 2198 | 1096 |
| 12 | 2958 | 1286 |
| 13 | 3900 | 1492 |
| 14 | 4828 | 1686 |
| 15 | 5962 | 1892 |
| 20 | 17412 | 3302 |
| 25 | 41206 | 5112 |
| 30 | 108807 | 7862 |
Here’s what it looks like graphed:
Ouch! That sure doesn’t look linear folks… And guess what, my “I got burned” experience doesn’t have 30 form tables. It has 89. Based on the trend outlined above, it would take 97 minutes to generate and execute the SQL to select the data from those 89 tables. Funny, because 30 SQL statements querying those tables, even with WHERE fk IN(SELECT id FROM BaseTable) clauses on each one only take 132ms on the same computer…And it returns the same data.
Pragmatists vs Idealists
Joel Spolsky has some great words (scroll down to “The Two Forces at Microsoft”) regarding two developer camps at Microsoft; the Pragmatists, and the Idealists. Windows XP was a product of the Pragmatists; Vista, of the Idealists. Enough said. While the SQL generated by EF in cases such as these may call to the inner relational mathematician idealist in us, the result is impossible to read or easily understand, takes a long time to be generated, and even longer to execute. In other words, it’s 100% unusable.Now, as a matter of research, I have to create this same test in NHibernate, to see what the generated SQL looks like, and to see how well it performs. I’ll update this post when I have that data. Until then, if you decide to use EF’s inheritance “features”, here is a list of movies that are all 97 minutes long, so that your end users will have something to do while your site grates off your SQL Server’s face from the comfort of their browsers.
- The Lost Boys
- Christmas Vacation
- The Virgin Suicides
- Event Horizon
- eXistenZ
- Confidence
- Better Off Dead…
- Carrie
15 Comments
Great post! Unfortunately I’ve run into the same problem… But let’s try to clarify something here: Where exactly is the bottleneck? I don’t disagree with your numbers, but for the time under “Avg ms for query generation and execution”, how much of that is actually spent in the CLR generating the query and materializing the object, and how much is spent in SQL?
Using a CLR profiler and SQL Server Profiler, it seems to me as though the real bottleneck is in the time it takes for EF to generate the query and materialize the subsequent object(s).
Example for querying an entity with about 5 or 6 subclasses and several associations:
TSource System.Linq.Queryable.FirstOrDefault(IQueryable) – 2,303 ms
Number of SQL lines in query – 1084
Time taken to execute query in SQL – 111 ms
So it takes just over 2 seconds to bring back my object. Despite the fact that the generated SQL is an ugly, outer join filled mess, it really isn’t all that bad performance wise (relatively speaking, I can deal with 111 ms queries). But whatever EF is doing behind the scenes is causing the real slow down.
This is with EFv1 – so I wonder how EFv2 compares. I haven’t installed VS2010 yet because I haven’t seen any evidence that it fixes the problem.
@DK: I did do a little differentiation initially, splitting the measurements between the time it takes EF to generate SQL and the time it takes SQL Server to execute it, and in the simpler cases, SQL Server does execute the sql quickly enough. However, in some of my later cases, where you have 30+ subclasses, even SQL server takes 30+ seconds to execute your queries, even when no data is coming back. It’s just a mess all around. It’s not fixed in EFv2 or EFv4 either. Because the problem affected both EF sql generation, and SQL Server execution times, I just combined the times together to make a simpler case.
I’t not a problem of time. It’s a problem of sql complexity & scalability.
We use TPT with 3 derived tables, the execution plan is huge and contains joins with missing join predicade warning.
That kinda of select cannot be used in production environment. Ii is tottaly unmanageble.
We’ll try and select from the base table the old fashioned way (sqlcommand)
i was thinking about sql entities and the oftype operator, but it didnt help.
any other ideea?
OfType should actually produce pretty decent sequel if you can use it. The problem with OfType is that you can only select objects of a single subclass at a time, like this:
var q =
from subclass in context.BaseClass.OfType<SubClass>()
select subclass;
That would return all instances of SubClass, and the SQL would be reasonably simple. The problem is http://samscode.com/wp-admin/#comments-formthat that isn’t terribly useful. You can only fetch instances of 1 subclass at a time, and you have to know the type ahead of time (at compile time). You’re not really getting any of the advantages of inheritance or polymorphism.
And unfortunately no, I don’t really have any other ideas on how to fix this (aside from my initial advice: if you need to use inheritance, don’t use EF). I think I’ll head over to Microsoft Connect and try to submit a bug report regarding the performance and the complexity of the SQL in these scenarios. Maybe we can get them to acknowledge the problem.
I submitted this problem as a bug on Microsoft Connect.
https://connect.microsoft.com/data/feedback/details/542660/tpt-table-per-type-inheritance-performance
Go vote it up if you’ve been plagued by this problem.
I’m interested in this problem as i’m working on an EF project using inheritance so I ran your test with VS2010 RC (quad core, 10gb ram, 64 bit win7). I don’t have the time right now to investigate the results but thought i’d post them here for you. Feel free to email me.
——————————————————————————–
With the simple model (1 base class, 3 subclasses)….
——————————————————————————–
*** Query 1:
*** 64 lines of generated SQL, 6 selects, 2 joins, 3 unions
*** Generated by EF in 306 ms, Executed by SQL Server in 90 ms
Query results for all shapes: 3 shapes returned.
Color: Black
EFInheritanceTest.Data.Model.Triangle
Color: Green
EFInheritanceTest.Data.Model.Circle
Color: Red
EFInheritanceTest.Data.Model.Square
*** Query 2:
*** 52 lines of generated SQL, 5 selects, 2 joins, 3 unions
*** Generated by EF in 10 ms, Executed by SQL Server in 7 ms
Query results for the 1 shape:
Color: Red
EFInheritanceTest.Data.Model.Square
*** Query 3:
*** 9 lines of generated SQL, 1 selects, 2 joins, 1 unions
*** Generated by EF in 53 ms, Executed by SQL Server in 31 ms
Query results for the 1 square:
Color: Red
EFInheritanceTest.Data.Model.Square
——————————————————————————–
With the larger model (1 base class, 12 subclasses)….
——————————————————————————–
*** Query 1:
*** 658 lines of generated SQL, 24 selects, 2 joins, 12 unions
*** Generated by EF in 655 ms, Executed by SQL Server in 396 ms
Query results for all shapes: 3 shapes returned.
Color: Black
EFInheritanceTest.Data.Model_MoreShapes.Triangle
Color: Green
EFInheritanceTest.Data.Model_MoreShapes.Circle
Color: Red
EFInheritanceTest.Data.Model_MoreShapes.Square
*** Query 2:
*** 637 lines of generated SQL, 23 selects, 2 joins, 12 unions
*** Generated by EF in 192 ms, Executed by SQL Server in 187 ms
Query results for the 1 shape:
Color: Black
EFInheritanceTest.Data.Model_MoreShapes.Triangle
*** Query 3:
*** 9 lines of generated SQL, 1 selects, 2 joins, 1 unions
*** Generated by EF in 69 ms, Executed by SQL Server in 31 ms
Query results for the 1 square:
Color: Red
EFInheritanceTest.Data.Model_MoreShapes.Square
Thanks for posting those results.
*** Query 1:
*** 658 lines of generated SQL, 24 selects, 2 joins, 12 unions
*** Generated by EF in 655 ms, Executed by SQL Server in 396 ms
That’s just insane. This is exactly what I’m talking about. That’s for a simple base class, with 12 simple subclasses, and a query that only returns 3 records.
How many subclasses do you have in your EF project that’s using inheritance? Have you ran into any problems yet?
Hi Sam!
Thanks for sharing your analysis…
Definitely EF needs an urgent tuning if it wants to be considered a serious player in real world applications. That’s why in our team we call it: “lentity” which in Spanish resembles the word “lento” (slow).
Thanks,
Julio César.
Hi Sam,
More results about the EF test in my box:
Intel Core Duo 2 Ghz
3GB RAM
Windows XP Professional SP2
SQL Server 2005 SP2 Developer Edition
——————————————————————————–
With the simple model (1 base class, 3 subclasses)….
——————————————————————————–
*** Query 1:
*** 64 lines of generated SQL, 6 selects, 2 joins, 3 unions
*** Generated by EF in 418 ms, Executed by SQL Server in 149 ms
Query results for all shapes: 3 shapes returned.
Color: Black
EFInheritanceTest.Data.Model.Triangle
Color: Green
EFInheritanceTest.Data.Model.Circle
Color: Red
EFInheritanceTest.Data.Model.Square
*** Query 2:
*** 52 lines of generated SQL, 5 selects, 2 joins, 3 unions
*** Generated by EF in 11 ms, Executed by SQL Server in 12 ms
Query results for the 1 shape:
Color: Red
EFInheritanceTest.Data.Model.Square
*** Query 3:
*** 9 lines of generated SQL, 1 selects, 2 joins, 1 unions
*** Generated by EF in 58 ms, Executed by SQL Server in 36 ms
Query results for the 1 square:
Color: Red
EFInheritanceTest.Data.Model.Square
——————————————————————————–
With the larger model (1 base class, 12 subclasses)….
——————————————————————————–
*** Query 1:
*** 658 lines of generated SQL, 24 selects, 2 joins, 12 unions
*** Generated by EF in 846 ms, Executed by SQL Server in 510 ms
Query results for all shapes: 3 shapes returned.
Color: Black
EFInheritanceTest.Data.Model_MoreShapes.Triangle
Color: Green
EFInheritanceTest.Data.Model_MoreShapes.Circle
Color: Red
EFInheritanceTest.Data.Model_MoreShapes.Square
*** Query 2:
*** 637 lines of generated SQL, 23 selects, 2 joins, 12 unions
*** Generated by EF in 324 ms, Executed by SQL Server in 324 ms
Query results for the 1 shape:
Color: Black
EFInheritanceTest.Data.Model_MoreShapes.Triangle
*** Query 3:
*** 9 lines of generated SQL, 1 selects, 2 joins, 1 unions
*** Generated by EF in 79 ms, Executed by SQL Server in 37 ms
Query results for the 1 square:
Color: Red
EFInheritanceTest.Data.Model_MoreShapes.Square
Press any key to continue . . .
It’s look like MS deleted bug from MS Connect…
No bug report – no problem)
FYI, I have submitted a feedback item to the Improvement Program on Connect that provides details of this incident and suggests preventing Microsoft employees from deleting valid items without providing a reason, based on published guidelines on what is considered unacceptable in bug reports. Find it, and vote on it, here: https://connect.microsoft.com/Connect/feedback/details/625776/prevent-microsoft-from-deleting-feedback-items-without-explanation
Of course, they might well delete or ignore this item as well, but it’s worth a shot, right?
LOL…Had the same problem but with database projects and generation of the delta scripts via the schema comparison.
They acknowledged there was a problem than closed.
This is madness. I fully agree with your post, and in fact I always pull it up when I want to convince people *not* to use the EF at the moment? It’s just not very good at some really important things.
So, it seems Microsoft have deleted your Connect issue? Really? If that’s true, it’s madness. I’ve heard no news about this problem being addressed, which is quite disconcerting. An update from yourself on what you now know would be much appreciated.
Okay, so I just noticed your Update #3. Not sure how I missed that. Very odd indeed. Have you heard any news about it since?
I have tested this scenario with 19 derived tables using TPT in Entity Framework 4.2 The time to compile and execute the query went from 8,457ms to 3,334ms.
2 Trackbacks
[...] Performance By Sam | Published: September 10, 2010 Just a quick post today. In January 2010, I wrote about the problem of Table-Per-Type (TPT) inheritance performance with Entity Framework (both v1 and v4). I had opened up a bug on Microsoft Connect in March, and MS finally got around to [...]
[...] Hey, what’s with the “new”? Well, if you are using any sort of entity inheritance, your child entities will use the new keyword to mask any references from the parent class. This will come up if you are using either of the inheritance schemes available in EF4, which are table-per-hierarchy (TPH) and table-per-type (TPT). If you are using TPT, be aware that there are significant performance considerations. [...]