Dowemo

This paper turns out
tuning sql queries: how to write high performance sql statements - zock - bloggers

  • 1, first to figure out what's called the execution plan.

The execution plan is a query scheme that the database made based on the statistics of the sql statement and related tables. This is generated by the query optimizer, such as an sql statement that's used to check for records from a table of 10 records, and if the table is archived, the query optimizer changes the scheme and uses"full table scan"if the table is archived.

As a result, the execution plan isn't fixed and is"personalized.". There are two points to produce a correct"execution plan":

( 1 ) whether the sql statement clearly tells the query optimizer it wants?

( 2 ) the database statistics obtained by the query optimizer are the latest, correct.

  • 2, the writing of a unified sql statement

For the following two sql statements, the programmer thinks the same, the database query optimizer looks different.


select*from dual 



select*From dual






In fact, the query parser is considered to be two different sql statements, and must be parsed twice. Generate 2 execution plan. So as a programmer, you should ensure that the same query statements are consistent in any place, and more than one space.

  • 3, don't write the sql statement too complex

I often see that one sql statement captured from the database is printed out with 2 a4 paper so long. Generally, this complex statement is usually a problem. I take the 2 page long sql statement to ask the author, and he says it's too long, and he can't read it. Imagine that even if the author is likely to see a confusing sql statement, the database will look confused.

In general, the result of a select statement is a subset and then queried from the child. This layer of nested statements is also common, but based on experience, the query optimizer is very easy to give the wrong execution plan. Because it's. Like this kind of artificial intelligence, it's better than people 's resolution, and if everyone is seeing halo, I can make sure that the database will glow.

Additionally, the execution plan is reusable and the easier the simple sql statement is to be reused. Complex sql statements have to be parsed as long as one character changes, and then the bulk of the garbage is in memory. Imagine the efficiency of the database.

  • 4, using temporary tables to hold intermediate results

The important way to simplify the sql statement is to use temporary tables to hold intermediate results, but the benefits of temporary tables are far more than that. Temporary tables are temporarily available in tempdb, which can avoid multiple scanning of the main table in the program, and reduce the blocking and increase the concurrency performance.

  • 5, the oltp system sql statement must be bound variable

select*from orderheader where changetime> '2010-10-20 00:00:01' 


select*from orderheader where changetime> '2010-09-22 00:00:01'






The following two statements, the query optimizer think of a different sql statement and need to be parsed twice. If you use a binding variable


select*from orderheader where changetime> @chgtime






A @ chgtime variable can pass in any value so a large number of similar queries can reuse the execution plan, which can significantly reduce the burden. One parsing, multiple reuse, is the principle to improve database efficiency.

  • 6, bind variable endoscopy

There are two sides of the thing, the binding variables are applicable to most oltp processing, but there's an exception. For example, when fields in where conditions are"tilt fields".

Most of the value in the"tilt field"column is the same, such as a population questionnaire, where"national"is listed, and 90% is. If an sql statement wants to query for 30 's population, the"national"list must be placed in the where condition. At this point, there's a big problem with the binding variable @.

Imagine that if the first value passed in @ nation is"", the entire execution plan will choose to scan the table. Next, the second value is passed in the"fabric according to family", and the proportion of"cloth according to family"may only be, and the index lookup should be used. However, the second time, the table scanning method will be used because the execution plan of the first parsing is reused. This problem is known as"binding variable endoscopy", and it's recommended that you don't use bound variables.

  • 7, use begin only if necessary

An sql statement in sql server defaults to a transaction that's also the default commit after the statement executes. In fact, this is a minimized form of begin tran, like a begin tran at the beginning of each sentence, and a commit is implied at the end.

In some cases, we need to explicitly declare begin tran, such as"insert, delete,""the"action needs to be modified at the same time, either in a few tables or. Begin tran can act like this, and it can execute several sql statements together, and finally commit them together. The benefit is to guarantee the consistency of data, but nothing is perfect. The cost of begin tran is that all the resources locked by the sql statement can't be released until the commit is committed.

Visible, if the begin tran is too many sql statements, the database performance is worse. Before the large transaction commits, the other statement will be blocked, resulting in a lot of blocks.

The principle of begin tran is that the fewer statements that begin tran hold under data consistency! In some cases, the trigger can be used to synchronize data without necessarily using begin.

  • 8, some sql query statements should add

Nolock is an important way to improve concurrent performance of sql server in sql statements. In oracle, there's no need to do this because oracle is more reasonable. In this way, oracle reads, writes can don't affect, and this is where oracle 's. The read and write of sql server can block each other, and to improve concurrency performance, for some queries, you can add vcl so that it can be written. A 3 principle is used with vcl.

( 1 ) the results of the query are used for"insert, delete,"and"nolock".

( 2 ) the table of queries is a frequent occurrence of a page split, with.

( 3 ) the use of temporary tables can save""and acts as a function of the undo table space for oracle.

It can improve concurrent performance by using a temporary table.

  • 9, the clustered index doesn't have a table in the order field, which makes the page split easier

For example, an order table has an order number orderid, and a customer number contactid, where the clustered index should be added. For the table, the order number is added in order, if the clustered index is added to the orderid, the new row is added at the end, which isn't easy to. However, because most queries are checked out according to customer numbers, it's meaningful to add clustered indexes to contactid. Contactid isn't a sequential field for order tables.

For example,"contactid"is 001, then the order information of"tom"must be placed on the first data page of this table, if the"tom"new order is! If the first page is filled,? I'm sorry, all the data in the table will be moved back to the record.

The index of sql server and the index of oracle are different, and the clustered index of sql server is actually sorted by the order of clustered index fields, equivalent. The clustered index of sql server is an organizational form of the table itself, so its efficiency is very high. Because of this, I've inserted a record, which isn't in place, but in order to put the data page in the order, if that data page has no space, the page breaks. So it's obvious that the clustered index isn't on the table 's order field, which makes the page split easily.

Once you've ever encountered one, a friend of a table rebuild the index, and the efficiency of the insertion is significantly reduced. It's probably like this. The clustered index of the table may not be on the table 's order field, which is often archived, so the table 's data is in a sparse state. For example, orders 20 orders, and the last 3 months have only 5 sheets, the archive policy is to keep 3 monthly data, so the. In this case, there's no page splitting due to the available space. However, query performance is low because the query must scan the empty space without data.

After rebuilding the clustered index, since rebuilding the clustered index is the data in the table, the original empty space is missing, and the page fill.

If the clustered index doesn't have a table on a sequential field, do you want to give a lower page fill rate? Do you want to avoid rebuilding clustered indexes? It's a question.

  • 10, queries often occur when the query is, easy to produce skip or repeat read.

can be queried at the same time, but in some cases when the data page is full, the page split is inevitable, such as the records that have read in the fi & t page, such as the records that have read in the fi & t page, resulting in duplicate reading, resulting in a recurring read of the during the fi & t page of the table. Similarly, if the data on the 100 page hasn't been read to 99 pages, it may leak the record, resulting in"skip".

As mentioned above, some of the operations that have been added to the nolock, the estimate may cause a duplicate read, 2 the same records are inserted into another table, and of course a primary key conflict will occur.

  • 11, when using like for fuzzy queries

There will be some fuzzy queries, for example.


select*from contact where username like '%yue%'






Keyword % yue %, because yue uses"%", so the query must go through the full table scan, unless necessary, don't precede the keyword

  • 12, implicit conve & ion of data types affects query efficiency

Sql server 2000 database, our program doesn't use a strong type to submit the value of this field, automatically convert the data type by sql server 2000, resulting in. It isn't found on sql2005, but it should be careful.

  • 13, three ways to connect sql server tables

( 1 ) merge join

( 2 ) nested loop join

( 3 ) hash join

Sql server 2000 has only one join mode nested loop join, and if a result set is smaller, then the default is the appearance. Each record in a must be scanned in b, and the actual number of rows is equivalent to a result row number x _ b result row number. So if the result set is large, the result of the join is bad.

Sql server 2005 adds merge join, if the join field of a table and b table is exactly the field of the clustered index, then the order of the table is arranged, as long as the

If there's no index on the attached field, the efficiency of sql 2000 is quite low, and sql2005 provides a hash join, which is equivalent to a, b table result set and index, so the efficiency of sql2005 is much higher than that of sql 2000.

Summarize the following points when a table connection is made:

( 1 ) join fields try to select the fields where the clustered index is located

( 2 ) carefully consider where conditions and minimize a, b table result set

( 3 ) if you're missing an index for many join fields, and you're still using sql server 2000, you're ready to upgrade.




Copyright © 2011 Dowemo All rights reserved.    Creative Commons   AboutUs