Specializations

Thursday, January 3, 2013

20 tips to write a good SQL SERVER stored procedure


1. Keywords - Use SQL keywords in capital letters to increase readability. Also use proper
indentation to increase readability.

2. SQL-92 - Always try to use ANSI 92 syntax. Till now the old syntax is working the old syntax
will be deprecated in the next release of MS SQL server.
As an example, for joining, use

SELECT * FROM employee e1 INNER JOIN employee _dtl e2
ON e1.id = e2.id
Instead of
SELECT * FROM employee e1, employee_dtl e2
WHERE e1.id = e2.id

Reference: http://bytes.com/topic/sql-server/answers/82859-sql-syntax-inner-outer-join-vs-where

3. Variables - Use as few as possible variables. It frees spaces in cache.

4. Dynamic Queries - Try to minimize the usage of dynamic queries. If you are using a dynamic
query like: SELECT * FROM mydb.dbo.emp where empid = @eid then there is no problem.
You can supply a value for the @eid parameter and there is no recompilation of the execution
plan in the database cache. But if you are using a SQL query like SELECT * FROM emp where
empid = " + @eid and supply a parameter (say 100), then the cache will keep the execution plan
for the value of 100 only. If the id changes (to say 101), it will recompile the statement. Hence,
this approach is slower than the previous one. (You can get the exact value of the SQL statement
from Profiler)

5. Fully Qualified Names - Always use the fully qualified name when calling stored procedures.
This would be the format database_name.schema_name.table_name. For example, use EXEC
master.dbo.Your_Proc_name instead of EXEC Your_Proc_name This is a very common
mistake, which causes an extra trip to the procedure cache to get the execution plan for
execution. Also try to use the schema name while creating a procedure. Like: CREATE
PROCEDURE dbo.Your_Proc_name instead of CREATE PROCEDURE Your_Proc_name

6.SET NOCOUNT ON - This suppresses the message that shows number of rows affected by
SQL statement. Otherwise this can cause extra network traffic and can have some serious impact
on performance when the procedure is called frequently.

Update: I would recommend you to use SET NOCOUNT ON for the shake of performance
unless there is a very good reason for using it.

7. The sp_ prefix - Don't use the "sp_" prefix in a stored procedure name as the "sp_" prefix is
reserved for system stored procedures. Any stored procedure that has the "sp_" prefix will cause
an extra lookup in the MASTER database If a stored procedure uses same name in both the user
database and a system database, the stored procedure in the user database will execute but first
it will find the procedure in resource database and then the user database (for SQL server 2005)
hence causing an extra burden to the server.

8.sp_executeSQL and the KEEPFIXED PLAN options- Both sp_executesql and the
KEEPFIXED PLAN option avoid the recompilation of a stored procedure. If you want to
provide parameterized dynamic SQL, then go for sp_executesql instead of EXEC(proc_name).
Here the execution plan for the procedure is stored with the variable name in cache memory.
When the variable values are supplied, then the values are simply mapped to the query, hence no
need for a recompilation.

Update: Keep in mind that recompilations are not always bad. Sometimes, using an old and
inefficient plan can make the procedure more slower.

Use the OPTION KEEPFIXED PLAN hint while selecting records from temporary tables. If the
query contains this hint, then its plan is not recompiled. For more information about procedure
recompilation, please go through the following article: http://technet.microsoft.com/en-us/
library/cc966425.aspx

CREATE
AS
CREATE
SELECT
INSERT
SELECT
OPTION

PROCEDURE my_proc

TABLE #t (a int )
* FROM #t
#t SELECT * from retest
COUNT(*) FROM #t WHERE a = 37
(KEEPFIXED PLAN)

As an example of sp_executesql, we can write:

sp_executesql N'SELECT * FROM mydb.dbo.emp where empid = @eid', N'@eid int', @eid=40

9. SELECT vs SET - A single SELECT statement can assign values to different variables and is
much faster than multiple SET statements assigning values to multiple different variables.

SELECT @Var1 = @Var1 + 1, @Var2 = @Var2 - 1

instead of

SET @Var1 = @Var1 + 1
SET @Var2 = @Var2 - 1

10. WHERE clauses - In a WHERE clause, the various operators used directly affect how fast a
query can run. Here are the conditional operators used in the WHERE clause, ordered by their
precedence.

=, >, <, >=, <=, <>, !=, !>, !<

for details, refer to the article: http://msdn.microsoft.com/en-us/library/ms190276.aspx

11. More WHERE clause hints - Avoid unnecessary conditions in the WHERE Clause. Also try
to avoid a function in the WHERE clause as it presents SQL engine to do index seek. Even it
forces SQL full index scans or even table scans.

Also, try to avoid IN. While checking the existence of some values, then use EXISTS instead of
IN. IN counts the NULL values also, but EXISTS not. EXISTS returns Boolean(Yes/No) but IN
returns all values hence result set for IN is heavier than EXISTS.

Here, teacher and student table has 1:m relationship and I want to find the teachers who have
students. Both the queries have the same result but the second query will run faster because of
EXISTS operator.

SELECT name
FROM teacher
WHERE teacher_id IN (SELECT teacher_id FROM student)
SELECT name
FROM teacher
WHERE EXISTS (SELECT 1 FROM student
where teacher.teacher_id = student.teacher_id)

12.CAST and CONVERT - Try to use CAST instead of CONVERT. CAST is ANSI-92 standard
but CONVERT works in MS SQL server only. Also, some CONVERT styles may be deprecated
in future MS SQL releases. It is better to use CONVERT only when you need to format the
DATETIME datatype with the style option. CAST cannot do this.

Reference: http://msdn.microsoft.com/en-us/library/ms187928.aspx

13. Avoid DISTINCT and ORDER BY - If you don't need the DISTINCT/ORDER BY clause,
then try to avoid so. Unnecessary DISTINCT or ORDER BY clauses cause extra work for the
database engine. Hence making performance slower. (Sometimes ORDER BY helps to speed up
the operation).

14. Avoid using cursors - Using cursors make the program slower as it works against SET based
SQL. Try to use temporary table/table variables with identity column and then iterate all the
tables using WHILE loop and a looping counter, which will map with the identity column. For
details, refer to my previous article.

15. SELECT statements - Try to use only the required number of columns in the SELECT clause
instead of using *. Using * returns all columns, which unnecessarily create a fat recordset.

16. Subquery vs JOINs - This is a famous debate in many online forums. In fact most sub queries
can be expressed as an equivalent form of JOIN. I have a good rule of thumb: subquery is faster
when we have to retrieve data from large number of tables because it becomes tedious to join
more tables. JOIN is faster to retrieve data from database when we have less number of tables.
But try to avoid correlated sub queries because it makes the query much slower.

17.CREATE TABLE vs. SELECT INTO - Select * INTO works fine for small tables, but when
dealing with large record sets or long-running queries, it creates locks on the system objects
within the tempdb database. As a result, other queries and procedures that need to create objects
within the tempdb database will have to wait for the long-running query to complete. This is
because when an object is created, an exclusive lock is taken against the sysobjects, syscolumns,
sysindexes tables.

Update: This problem has been greatly reduced from the MS SQL 7.0. For more details, please
refer to the following article:http://support.microsoft.com/kb/153441/EN-US/

Also, SELECT INTO not only copies data but also it copies the table structure, hence it performs
slower.

18. Try to use table variables instead of Temporary Tables - Temp tables can cause stored
procedures to recompile. (From SQL 2005, using temporary table not always causing
recompilations. But adding rows to temporary tables may cause recompilations). But table
variables were designed specifically to guard against stored procedure recompiles during
execution.

If the result set is not containing a huge number of records then you should stick to table
variable, otherwise temp table has its advantages. There is a misconception that temp
tables always use the tembdb database but table variable do not. Table variables also use
tempdb after a certain size. For more information refer to the following article : http://
www.sqlservercentral.com/articles/Temporary+Tables/66720/

19.Use proper indexes - You can use the help of the data tuning advisor, but it does not gives
the proper result all the time. Index scans and index seeks are much faster than table scans. So
identify the table scans from the execution plans. But when a table returns smaller number rows,
then it is better to use a table scan. You can see an excellent article on execution plans by G
Vijayakumara at the following link: http://www.sqlservercentral.com/articles/Administration/
executionplans/1345/

20. Use Profiler - The cachemiss event class indicates that the stored procedure is not in the
cache. If the SP:Cachemiss class occurs frequently, it can indicate that more memory should be
available to MS SQL server, thereby increasing the size of procedure cache. The cachehit event
class indicates that a stored procedure is in the cache.

Last but not the least, again I am repeating the same advice from my previous article. Keep these
guidelines in your mind, but don't hesitate to break them if needed. After all, performance is the
ultimate goal. If violating the general rule gives you good performance (it may happen based on
your database environment) then don't stick with the guidelines. Always use the trace files to get
the performance statistics and then apply the changes.

No comments:

Post a Comment