SQL Server Query Optimization – Sargable Functions

When a function is used as part of a SQL join or predicate, if the query optimizer is able to make efficient  use of indexes to speed up the query, then the function is sargable (Search ARGument ABLE). Not all Transact-SQL functions are sargable. So care must be taken to avoid non-sargable functions when creating queries.

In this blog I’m going to demonstrate some workarounds for non-sargable pitfalls.

Setup

Let’s create a table, load it with data (10,000 rows), and create some non-clustered indexes to target when optimizing our queries.

CREATE TABLE Product (
	ProductId int IDENTITY(1,1) NOT NULL,
	ProductName nvarchar(100) NOT NULL,
	ProductLine nchar(2) NULL,
	CreatedDT datetime,
	CONSTRAINT PK_Product PRIMARY KEY CLUSTERED
	(
		ProductId ASC
	)
) ON [PRIMARY];
GO

INSERT INTO Product (ProductName, ProductLine, CreatedDT)
VALUES		('Product 1', 'L1', DATEADD(year, -1, GETDATE())), (' Product 1 ', 'L1', DATEADD(year, -2, GETDATE())),
			('Product 2', 'L1', DATEADD(year, -3, GETDATE())), (' Product 2 ', 'L1', DATEADD(year, -4, GETDATE())),
			('Product 3', 'L2', DATEADD(year, -5, GETDATE())), (' Product 3 ', 'L2', DATEADD(year, -6, GETDATE())),
			('Product 4', 'L2', DATEADD(year, -7, GETDATE())), (' Product 4 ', 'L3', DATEADD(year, -8, GETDATE())),
			('Product 5', 'L4', DATEADD(year, -9, GETDATE())), (' Product 5 ', NULL, DATEADD(year, -10, GETDATE()));
GO 1000

CREATE NONCLUSTERED INDEX IX_Product_ProductName ON dbo.Product
(
	ProductName ASC
) INCLUDE (ProductId) ON [PRIMARY];

CREATE NONCLUSTERED INDEX IX_Product_CreatedDT ON dbo.Product
(
	CreatedDT ASC
) INCLUDE (ProductId, ProductName) ON [PRIMARY];

CREATE NONCLUSTERED INDEX IX_Product_ProductLine ON dbo.Product
(
	ProductLine ASC
) INCLUDE (ProductId, ProductName) ON [PRIMARY];

LTRIM and RTRIM

First off, we  want to optimize a query that returns Product records with ProductName equal to “Product 1”. But note that a subset of the products are prefixed and suffixed with spaces, and we also want to return these.

Here’s how NOT to do it –

SELECT	ProductId, ProductName
FROM	Product
WHERE	LTRIM(RTRIM(ProductName)) = 'Product 1';

We are targeting the IX_Product_ProductName index, and the query plan shows that the optimizer has chosen to use this index.

query1

However, it is scanning all 10,000 rows of the index in order to return the expected 2000. The LTRIM and RTRIM functions are not sargable.

If ProductName was only prefixed with spaces, and not suffixed, then we could optimize the query by using the LIKE operator.

SELECT	ProductId, ProductName
FROM	Product
WHERE	ProductName LIKE 'Product 1%';

But as we need to account for the suffixed spaces, this will not do for us.

To optimize the query, we add a computed column which uses the LTRIM and RTRIM functions to strip the spaces, and then create a non-clustered index on this column –

ALTER TABLE Product ADD ProductNameTrimmed AS LTRIM(RTRIM(ProductName));

CREATE NONCLUSTERED INDEX IX_Product_ProductNameTrimmed ON dbo.Product
(
	ProductNameTrimmed ASC
) INCLUDE (ProductId, ProductName) ON [PRIMARY];

If we now modify the query to use the computed column in the predicate –

SELECT	ProductId, ProductName
FROM	Product
WHERE	ProductNameTrimmed = 'Product 1';

The query optimizer is now able to perform a seek on the non-clustered index –

query2

The number of logical reads has reduced from 50 to 16, and the execution time from 193ms to 67ms. And if the query returned a smaller subset of records, then the performance improvement would be even greater.

DateTime Functions

If we now query on a specific CreatedDT datetime –

SELECT	ProductID, ProductName, CreatedDT
FROM	Product
WHERE	CreatedDT = '2014-11-21 14:08:42.593';

The optimizer performs an efficient seek on the IX_Product_CreatedDT index.

However, if we want to return rows for a date, irrespective of time, we may try the following widely used query –

SELECT	ProductID, ProductName, CreatedDT
FROM	Product
WHERE	DATEADD(dd, 0, DATEDIFF(dd, 0, CreatedDT)) = '2014-11-21';

But the DATEADD and DATEDIFF functions are not sargable so the query optimizer is unable to perform a seek on the IX_Product_CreatedDT non-clustered index. Instead it scans the clustered index, reading all 10,000 rows.

Instead let’s try the following –

SELECT	ProductID, ProductName, CreatedDT
FROM	Product
WHERE	CONVERT(date, CreatedDT) = '2014-11-21';

The query optimizer is now able to perform the seek on the index.

query3

The logical reads have Now lestbeen reduced from 54 to 8. The CONVERT function is this case is sargable.

If we look at the details of the index seek, we see that the query optimizer is using the following seek predicate –

Seek Keys[1]: 
Start: [Sandbox].[dbo].[Product].CreatedDT > Scalar Operator([Expr1005]), 
End: [Sandbox].[dbo].[Product].CreatedDT < Scalar Operator([Expr1006])

So it is translating the CONVERT predicate into a greater than and less than query. This gives us a clue as to another method for enabling the optimizer to efficiently utilize the index –

SELECT	ProductID, ProductName, CreatedDT
FROM	Product
WHERE	CreatedDT > '2014-11-21' AND CreatedDT < '2014-11-22';

Now let’s try and return all rows created in year 2014 –

SELECT	ProductID, ProductName, CreatedDT
FROM	Product
WHERE	YEAR(CreatedDT) = '2014';

Not surprisingly, the query optimizer scans all rows as the YEAR function is not sargable.

To enable the optimizer to use the IX_Product_CreatedDT index we can re-write the query as follows –

SELECT	ProductID, ProductName, CreatedDT
FROM	Product
WHERE	CreatedDT > '2014-1-1' AND CreatedDT < '2015-1-1';

query4

The number of logical reads is reduced from 54 to 8.

ISNULL and COALESCE

The following query returns all Product records with ProductLine equal to “L4”.

SELECT	ProductID, ProductName, ProductLine
FROM	Product
WHERE	ProductLine = 'L4';

The query optimizer performs an efficient seek on the IX_Product_ProductLine index.

query5

The ProductLine column is nullable, and if we want to return these as well we could use a ISNULL or COALESCE statement as follows –

SELECT	ProductID, ProductName, ProductLine
FROM	Product
WHERE	COALESCE(ProductLine, 'L4') = 'L4';

However, the optimizer now chooses a non-optimal scan on the index.

query6

The COALESE function, and also the ISNULL function, is not sargable. The fix is simply to change the predicate to use an OR .. IS NULL as shown below –

SELECT	ProductID, ProductName, ProductLine
FROM	Product
WHERE	ProductLine = 'L4' OR ProductLine IS NULL;

The optimizer now chooses an optimal seek on the index.

query7

Rule of Thumb

Sargable functions in SQL Server are few and far between. As a rule of thumb, I’d recommend that SQL Server developers, without a great deal of experience in optimizing queries, err on the side of caution by not applying functions, or operators, to the columns used in joins or predicates. If there is a imperative to do so, then consider whether there are alternative such as computed columns.

Inheritance Modelling Patterns with Entity Framework 6 Code First

Introduction

In this post I’m going to demonstrate how to model inheritance using Entity Framework 6 Code First. In particular, I’m going to demonstrate the following alternative database mapping strategies –

  • Table Per Hierarchy (TPH)
  • Table Per Concrete Type (TPC)
  • Table Per Type (TPT)

My model is as follows –

contract

Contract, the base class, has a start and end date, and a unique identifier. CreditContract and PrepaymentContract derive from Contract and hence inherit it’s properties. In addition, CreditContract has credit limit and currency properties, whilst PrepaymentContract has prepayment amount and period properties..

Code First Classes

First step is to create the model classes.

Contract

Contract is the base class for the hierarchy, hence the abstract modifier.

using System;
using System.ComponentModel.DataAnnotations;
using System.ComponentModel.DataAnnotations.Schema;

namespace CodeFirstInheritance.Models
{
    public abstract class Contract
    {
        [Key]
        [DatabaseGeneratedAttribute(DatabaseGeneratedOption.Identity)]
        public int ContractID { get; set; }

        [DataType(DataType.Date)]
        [Required]
        public DateTime StartDate { get; set; }

        [DataType(DataType.Date)]
        [Required]
        public DateTime EndDate { get; set; }
    }
}

CreditContract

CreditContract is derived from Contract.

For simplicity I have modelled the CreditLimitCurrency property as a string with data annotations mapping it to a char(3) column. In practice it would be better to model this as a Enum.

using System.ComponentModel.DataAnnotations;
using System.ComponentModel.DataAnnotations.Schema;

namespace CodeFirstInheritance.Models
{
    public class CreditContract : Contract
    {
        [Required]
        public float CreditLimit { get; set; }

        [Column(TypeName = "char")]
        [StringLength(3)]
        [Required]
        public string CreditLimitCurrency { get; set; }
    }
}

PrepaymentContract

PrepaymentContract also derives from Contract.

using System.ComponentModel.DataAnnotations;
using System.ComponentModel.DataAnnotations.Schema;

namespace CodeFirstInheritance.Models
{
    public class PrepaymentContract : Contract
    {
        [Required]
        public float PrepaymentAmount { get; set; }

        [Column(TypeName = "char")]
        [StringLength(1)]
        [Required]
        public string PrepaymentPeriod { get; set; }
    }
}

Table Per Hierarchy (TPH)

With the TPH mapping strategy, all classes in the inheritance tree are mapped to a single database table. A database field is used as a discriminator in determining which concrete type each database record should be mapped to.

To instruct Entity Framework to use this strategy, all that is required is to derive a class from the DbContext class, and add a DBSet property for the Contract type, whilst NOT adding DBSet properties for CreditContract nor PrepaymentContract.

I have also added an override of the OnModelCreating method. I’ll discuss this in a moment.

CodeFirstInheritanceDbContext

using System.Data.Entity;

namespace CodeFirstInheritance.Models
{
    public class CodeFirstInheritanceDbContext : DbContext
    {
        public DbSet<Contract> Contracts { get; set; }

        protected override void OnModelCreating(DbModelBuilder modelBuilder)
        {
            modelBuilder.Entity<Contract>()
                .Map<CreditContract>(m => m.Requires("Discriminator").HasValue("C"))
                .Map<PrepaymentContract>(m => m.Requires("Discriminator").HasValue("P"));

            base.OnModelCreating(modelBuilder);
        }
    }
}

If we then enable migrations and add an initial migration, the following DbMigration class is generated –

namespace CodeFirstInheritance.Migrations
{
    using System;
    using System.Data.Entity.Migrations;

    public partial class Initial : DbMigration
    {
        public override void Up()
        {
            CreateTable(
                "dbo.Contracts",
                c => new
                    {
                        ContractID = c.Int(nullable: false, identity: true),
                        StartDate = c.DateTime(nullable: false),
                        EndDate = c.DateTime(nullable: false),
                        CreditLimit = c.Single(),
                        CreditLimitCurrency = c.String(maxLength: 3, fixedLength: true, unicode: false),
                        PrepaymentAmount = c.Single(),
                        PrepaymentPeriod = c.String(maxLength: 1, fixedLength: true, unicode: false),
                        Discriminator = c.String(nullable: false, maxLength: 128),
                    })
                .PrimaryKey(t => t.ContractID);

        }

        public override void Down()
        {
            DropTable("dbo.Contracts");
        }
    }
}

A single table, Contract, is created with fields from all types. Of note is the Discriminator field, which we did not explicitly specify in our hierarchy. EF has generated this field for us and will use it to discriminate between credit and prepayment contract records.

If we update the database and then execute the following code –

using (var db = new CodeFirstInheritanceDbContext())
{
    CreditContract creditContract = new CreditContract();
    creditContract.StartDate = new DateTime(2015, 1, 1);
    creditContract.EndDate = new DateTime(2015, 12, 31);
    creditContract.CreditLimit = 100000;
    creditContract.CreditLimitCurrency = "USD";
    db.Contracts.Add(creditContract);

    PrepaymentContract prepaymentContract = new PrepaymentContract();
    prepaymentContract.StartDate = new DateTime(2015, 1, 1);
    prepaymentContract.EndDate = new DateTime(2015, 12, 31);
    prepaymentContract.PrepaymentAmount = 10000;
    prepaymentContract.PrepaymentPeriod = "M";
    db.Contracts.Add(prepaymentContract);

    db.SaveChanges();
}

The following records are added to the Contract table –

tphresults

By default EF will use the full names of the types as the discriminator values. This may significantly increase the size of each record. EF allows us to override these values using the fluent API. This is what I did in the OnModelCreating method of the CodeFirstInheritanceDbContext class, in order to shorten the discriminator to a single character.

We can then query the model across contract types and for specific contract types as follows –

using (var db = new CodeFirstInheritanceDbContext())
{
    var countOfAllContract = db.Contracts.Count();
    var countOfCreditContracts = db.Contracts.OfType<CreditContract>().Count();
}

TPH is the simplest mapping strategy to implement. And as all data is stored in a single table there are no joins required when querying data. It is therefore the most performant strategy.

However, there are a few issues –

  • If you look back at the CreditContract and PrepaymentContract classes you’ll note that all the properties are decorated with the [Required] attribute. However, to fit records of both types into a single database table EF has had to generate the fields as nullable. Data integrity has potentially been compromised.
  • It is not clear from the database table which fields relate to which type. This may make life difficult when populating the table from other sources.
  • The table breaks 3rd normal form as the values contained in the sub-type fields depend on the discriminator. 3rd normal form states that fields should only depend on the table key.

Table Per Concrete Type (TPC)

With the TPC mapping strategy, each concrete class is mapped to it’s own database table. So we get 2 tables, CreditContract and PrepaymentContract, with each having all fields in the respective branch of the hierarchy. We do not get a table for the abstract base class.

Before we instruct EF to use this strategy, we need to make a small modification to the Contract class – change the type of the ContractID property from int to Guid. I’ll discuss why in a moment.

using System;
using System.ComponentModel.DataAnnotations;
using System.ComponentModel.DataAnnotations.Schema;

namespace CodeFirstInheritance.Models
{
    public abstract class Contract
    {
        [Key]
        [DatabaseGeneratedAttribute(DatabaseGeneratedOption.Identity)]
        public Guid ContractID { get; set; }

        [DataType(DataType.Date)]
        [Required]
        public DateTime StartDate { get; set; }

        [DataType(DataType.Date)]
        [Required]
        public DateTime EndDate { get; set; }
    }
}

As per the TPH strategy, we derive a class from DbContext, and add a DBSet property for the Contract type, whilst NOT adding DBSet properties for CreditContract nor PrepaymentContract.

We then override the OnModelCreating method and add fluent API code to instruct EF to map the CreditContract entity to a specific table. And to also map the inherited properties to this table. The table name does not have to be the same as the entity name.

We do the same for PrepaymentContract.

using System.Data.Entity;

namespace CodeFirstInheritance.Models
{
    public class CodeFirstInheritanceDbContext : DbContext
    {
        public DbSet<Contract> Contracts { get; set; }

        protected override void OnModelCreating(DbModelBuilder modelBuilder)
        {
            modelBuilder.Entity<CreditContract>().Map(m =>
            {
                m.MapInheritedProperties();
                m.ToTable("CreditContract");
            });

            modelBuilder.Entity<PrepaymentContract>().Map(m =>
            {
                m.MapInheritedProperties();
                m.ToTable("PrepaymentContract");
            });   
           
            base.OnModelCreating(modelBuilder);
        }
    }
}

If we then enable migrations and add an initial migration, we can see from the migration script that 2 separate tables are created on the database.

namespace CodeFirstInheritance.Migrations
{
    using System;
    using System.Data.Entity.Migrations;
    
    public partial class Initial : DbMigration
    {
        public override void Up()
        {
            CreateTable(
                "dbo.CreditContract",
                c => new
                    {
                        ContractID = c.Guid(nullable: false, identity: true),
                        StartDate = c.DateTime(nullable: false),
                        EndDate = c.DateTime(nullable: false),
                        CreditLimit = c.Single(nullable: false),
                        CreditLimitCurrency = c.String(nullable: false, maxLength: 3, fixedLength: true, unicode: false),
                    })
                .PrimaryKey(t => t.ContractID);
            
            CreateTable(
                "dbo.PrepaymentContract",
                c => new
                    {
                        ContractID = c.Guid(nullable: false, identity: true),
                        StartDate = c.DateTime(nullable: false),
                        EndDate = c.DateTime(nullable: false),
                        PrepaymentAmount = c.Single(nullable: false),
                        PrepaymentPeriod = c.String(nullable: false, maxLength: 1, fixedLength: true, unicode: false),
                    })
                .PrimaryKey(t => t.ContractID);
            
        }
        
        public override void Down()
        {
            DropTable("dbo.PrepaymentContract");
            DropTable("dbo.CreditContract");
        }
    }
}

Update the database and then execute the following code –

using (var db = new CodeFirstInheritanceDbContext())
{
    CreditContract creditContract = new CreditContract();
    creditContract.StartDate = new DateTime(2015, 1, 1);
    creditContract.EndDate = new DateTime(2015, 12, 31);
    creditContract.CreditLimit = 100000;
    creditContract.CreditLimitCurrency = "USD";
    db.Contracts.Add(creditContract);

    PrepaymentContract prepaymentContract = new PrepaymentContract();
    prepaymentContract.StartDate = new DateTime(2015, 1, 1);
    prepaymentContract.EndDate = new DateTime(2015, 12, 31);
    prepaymentContract.PrepaymentAmount = 10000;
    prepaymentContract.PrepaymentPeriod = "M";
    db.Contracts.Add(prepaymentContract);

    db.SaveChanges();
}

The CreditContract entity has been persisted to the CreditContract table –

tpc1

Whilst the PrepaymentContract entity has been persisted to the PrepaymentContract table –

tpc2

We can then query the entities exactly as we did when the TPH strategy was adopted. The consumer of the model does not need to be aware of the mapping strategy.

We changed the ContractID property from int to Guid. The reason is that we are asking the database server to generate unique identifiers for our records. The concrete entity types are mapped to separate tables, and the database server does not and should not know this. Integer identity fields are only unique within the context of a single table, and so if the type was int we would soon experience errors similar to the following –

The changes to the database were committed successfully, but an error occurred while updating the object context. The ObjectContext might be in an inconsistent state. Inner exception message: AcceptChanges cannot continue because the object's key values conflict with another object in the ObjectStateManager. Make sure that the key values are unique before calling AcceptChanges.

However, by changing the type to Guid we are, to all intents and purposes, generating unique fields.

The downside of doing this is that SQL Server GUID fields are 16 bytes in length, compared to 4 bytes for an int.

Compared to TPH, TPC is less performant when querying across types as the queries require a UNION between the tables. However, it will be equally performant when querying a single type.

As TPC stores the data in separate tables, it allows for non-nullable columns, hence ensuring data integrity. And it is clear which properties are required by which type.

Table Per Type (TPT)

With the Table By Type (TPT) strategy, both abstract and concrete types are mapped to their own tables in the database with the inheritance relationships represented by foreign key relationships.

To instruct EF to use this strategy we have 2 options.

The first option is to override the OnModelCreating method and add Fluent API code to instruct EF to map the CreditContract entity to a specific table, and the PrepaymentContract entity to a different specific table.

using System.Data.Entity;

namespace CodeFirstInheritance.Models
{
    public class CodeFirstInheritanceDbContext : DbContext
    {
        public DbSet<Contract> Contracts { get; set; }

        protected override void OnModelCreating(DbModelBuilder modelBuilder)
        {
            modelBuilder.Entity<CreditContract>().Map(m =>
            {
                m.ToTable("CreditContract");
            });

            modelBuilder.Entity<PrepaymentContract>().Map(m =>
            {
                m.ToTable("PrepaymentContract");
            });   
           
            base.OnModelCreating(modelBuilder);
        }
    }
}

The second option is to annotate the concrete types with the [Table()] attribute.

using System;
using System.ComponentModel.DataAnnotations;
using System.ComponentModel.DataAnnotations.Schema;

namespace CodeFirstInheritance.Models
{
    public abstract class Contract
    {
        [Key]
        [DatabaseGeneratedAttribute(DatabaseGeneratedOption.Identity)]
        public int ContractID { get; set; }

        [DataType(DataType.Date)]
        [Required]
        public DateTime StartDate { get; set; }

        [DataType(DataType.Date)]
        [Required]
        public DateTime EndDate { get; set; }
    }

    [Table("CreditContract")]
    public class CreditContract : Contract
    {
        [Required]
        public float CreditLimit { get; set; }

        [Column(TypeName = "char")]
        [StringLength(3)]
        [Required]
        public string CreditLimitCurrency { get; set; }
    }

    [Table("PrepaymentContract")]
    public class PrepaymentContract : Contract
    {
        [Required]
        public float PrepaymentAmount { get; set; }

        [Column(TypeName = "char")]
        [StringLength(1)]
        [Required]
        public string PrepaymentPeriod { get; set; }
    }
}

Enabling migrations and adding an initial migration gives us the following –

namespace CodeFirstInheritance.Migrations
{
    using System;
    using System.Data.Entity.Migrations;

    public partial class Initial : DbMigration
    {
        public override void Up()
        {
            CreateTable(
                "dbo.Contracts",
                c => new
                    {
                        ContractID = c.Int(nullable: false, identity: true),
                        StartDate = c.DateTime(nullable: false),
                        EndDate = c.DateTime(nullable: false),
                    })
                .PrimaryKey(t => t.ContractID);

            CreateTable(
                "dbo.CreditContract",
                c => new
                    {
                        ContractID = c.Int(nullable: false),
                        CreditLimit = c.Single(nullable: false),
                        CreditLimitCurrency = c.String(nullable: false, maxLength: 3, fixedLength: true, unicode: false),
                    })
                .PrimaryKey(t => t.ContractID)
                .ForeignKey("dbo.Contracts", t => t.ContractID)
                .Index(t => t.ContractID);

            CreateTable(
                "dbo.PrepaymentContract",
                c => new
                    {
                        ContractID = c.Int(nullable: false),
                        PrepaymentAmount = c.Single(nullable: false),
                        PrepaymentPeriod = c.String(nullable: false, maxLength: 1, fixedLength: true, unicode: false),
                    })
                .PrimaryKey(t => t.ContractID)
                .ForeignKey("dbo.Contracts", t => t.ContractID)
                .Index(t => t.ContractID);

        }

        public override void Down()
        {
            DropForeignKey("dbo.PrepaymentContract", "ContractID", "dbo.Contracts");
            DropForeignKey("dbo.CreditContract", "ContractID", "dbo.Contracts");
            DropIndex("dbo.PrepaymentContract", new[] { "ContractID" });
            DropIndex("dbo.CreditContract", new[] { "ContractID" });
            DropTable("dbo.PrepaymentContract");
            DropTable("dbo.CreditContract");
            DropTable("dbo.Contracts");
        }
    }
}

3 tables are created, one per type, as shown in the following database schema –

dbdiagram

Update the database and then execute the following code –

using (var db = new CodeFirstInheritanceDbContext())
{
    CreditContract creditContract = new CreditContract();
    creditContract.StartDate = new DateTime(2015, 1, 1);
    creditContract.EndDate = new DateTime(2015, 12, 31);
    creditContract.CreditLimit = 100000;
    creditContract.CreditLimitCurrency = "USD";
    db.Contracts.Add(creditContract);

    PrepaymentContract prepaymentContract = new PrepaymentContract();
    prepaymentContract.StartDate = new DateTime(2015, 1, 1);
    prepaymentContract.EndDate = new DateTime(2015, 12, 31);
    prepaymentContract.PrepaymentAmount = 10000;
    prepaymentContract.PrepaymentPeriod = "M";
    db.Contracts.Add(prepaymentContract);

    db.SaveChanges();
}

And this is what we get in our Contract table –

tpt1

And in the CreditContract table –

tpt2

And finally the PrepaymentContract table –

tpt3

TPT is the least performant of the strategies, but enables the highest degree of database integrity and normalization.

 

 

 

SQL Server Database Management Crib Sheet

I thought I’d share my crib sheet on SQL Server Database Management. I will expand of some of the recommendations at a later date.

Data and Log File Management
  1. Isolate data and log files  from other application files.
    • Minimizes file fragmentation, which can lead to degradation of transactions
  2. Create data and log files on separate volumes.
    • This helps to maximize performance and availability.
  3. When creating data and log files, size them according to foreseeable data growth.
    • If the files run out of space then the database server will need to grow the files by allocating additional space. This holds up other processes, and exacerbates file fragmentation.
  4. As a catch all, enable a database’s auto-growth settings to avoid running out of space.
    • Ensure that the growth increments are large enough to avoid regular auto-growth events from occurring.
    • An auto-growth event should be an abnormal event, and the aims should be to avoid them through regular maintenance and monitoring.
  5. Do not enable auto-shrink
    • Leads to excessive file fragmentation
  6. Regularly check for free space on the data, log, and backup volumes.
Index Management
  1. Regularly reorganize or rebuild indexes to avoid fragmentation.
  2. Consider determining which indexes should be rebuilt based on analyzing fragmentation.
Statistics Management
  1. Enable the databases’s Auto Create Statistics option.
    • Up-to-date statistics enable the query optimizer to choose an optimal query plan.
    • When an index is created or rebuilt, a statistical histogram is generated for the first column in the index.
    • Additionally, if the Auto Create Statistics option is enable for the database, then statistics will be created for non-indexed columns that are referenced in queries.
  2. Enable the Auto Update Statistics option.
    • To avoid statistics from becoming out-of-date and this leading the optimizer to choose a sub-optimal query plan.
    • If the Auto Update Statistics option is enabled, then when data modifications exceed specific thresholds, the database server will automatically update statistics for the table.
  3. Regularly rebuild indexes (see above) which in turn rebuilds statistics
    • When statistics are created or updated automatically, only a sub-set of a columns data is sampled rather than a full scan being performed. The sample may not be representative of the population.
    • However, when an index is created or rebuilt then a full scan is performed in generating the statistics.
  4. Regularly rebuild index statistics, but only for indexes that have not been rebuilt, and column statistics.
    • Use the WITH FULLSCAN option where feasible.
Database Backups
  1. Set the database’s recovery model to Full or Bulk-logged where point-in-time recovery is required.
    • Applicable for most OLTP databases which experience regular transactions throughout the day.
  2. Set the database’s recovery model to Simple where point-in-time recovery is not required.
    • Data warehouses typically only experience regular scheduled updates. Point in time recovery is therefore not required.
    • The simple recovery model is easier to manage and may improve bulk copy performance.
  3. Schedule regular full back-ups.
    • A full back-up is resource intensive and may degrade database performance whilst executing. So if possible schedule the full back-up for an off-peak time.
  4. Intersperse full backups with regular differentials back-ups.
  5. If the Full or Bulk-logged recovery model is being used, then intersperse the differential backups with log backups.
    • This is necessary for point-in-time recovery.
  6. Use backup compression to save space and also to speed up backups.
  7. To ensure the integrity of the backups, enable the following backup options –
    • “Perform checksum before writing to the media” option.
    • “Verify backup when finished” option.
  8. Schedule the removal of old backups.
    • But ensure that you have at least 2 available at any one time just in case one of the backups is corrupt.
  9. In addition to backing up application databases , regularly backup the system databases – Master, TempDB, and Model.
Disaster Recovery
  1. Determine a disaster recovery plan with the business. Document and test.

Heaps, Clustered, and Non-Clustered Indexes

I started programming back in the year dot, at the grand old age of 10. My father acquired a Sinclair ZX80 for his engineering company and I was fascinated by it. I was soon trawling through the 110 page user guide and teaching myself the exotically named Beginner’s All-purpose Symbolic Instruction Code. Peek’ing and poke’ing and beep’ing and flash’ing, and generally amazing all comers with my sophisticated programming skills – 10 PRINT “HELLO WORLD” 20 GOTO 10.

A couple of years later I was treated to a Commodore 64. It was a joint birthday and Christmas present, and at just under £200 (equivalent to £800 today) I was a very lucky boy. This time the manual was a whopping 490 pages long, and I read it from cover to cover again and again.

I then upgraded to a Commodore 128. The user manual was a disappoiting 399 pages long. A step back from the Commodore 64. But what redeemed the situation was the 756 page “official” programmers reference guide, which I had to buy separately.

Computer magazines started to appear in the shops, but these focused on either basic programming tasks or games listings which you spent all day typing in and never ever worked. The local library had a few incomprehensible manuals, too technologically specific to be of relevance to anyone. This just about exhausted the information that was available, and so understandable each and every user manual and reference guide that I had was well loved, and well worn.

What I’m trying to say – in a very round about way – is that back then there wasn’t much information around. Learning and understanding was pretty much a matter of trial and error. Lot’s of perspiration and the occasional bit of inspiration. Today however there are a wealth of credible resources available. And YouTube has become one of the best of these. There are videos on most topic – some good and some bad. Occasionally I come across a gem. And I wanted to share one of those gems with you.

It’s by Voluntary DBA, and it describes the differences between heaps, clustered, and non-clustered indexes.  It’s an important topic, but one that typically is not described particularly well IMO. Voluntary DBA does as good a job and I’ve seen, and it’s well worth watching if you’re new to the topic, or need a refresher –

https://www.youtube.com/watch?v=ITcOiLSfVJQ