Skip to content

Exact String Searching without a Full-Text index

An annoying case when using standard LIKE wildcard searching of text columns is when you only want the exact word searched for e.g. “Search” and not “Searchable” or “Research

So, you add a space to sides of the searchterm:
WHERE MyDescription LIKE ‘% ‘ + @searchterm + ‘ %’

but, then you realise that this misses cases when the search term appears at the start or end of the column you are searching, so you try:
WHERE ‘ ‘ + MyDescription + ‘ ‘ LIKE ‘% ‘ + @SearchTerm + ‘ %’

usually, this would be a killer to any use of an index on the MyDescription column – but as we are searching for a string anywhere in the column then a non-FULLTEXT index wont help us anyway

Then you realise that you are missing cases when the search term is adjacent to a fullstop or some html
e.g.: “When I look, I search.” or “Why don’t I just use Solr for this search<somehtmltag>

So, I like to use this (for non-case, non-accent sensitive collations)
WHERE ‘ ‘ + MyDescription + ‘ ‘ LIKE ‘%[^a-z]’ + @SearchTerm + ‘[^a-z]%’

this excludes any occurrences of the search term within another word, but allows:

hyphens  e.g. re-search

punctuation e.g. search,  search. Search!

Numeric compounds e.g. Search101

Html tags e.g. search<somehtmltag>

However, this wont work for unicode foreign languages e.g. Rзsearch

need some way to translate [^a-z] to Russian.

WHERE N’ ‘ +mydescription + N’ ‘ LIKE N’%[^a-Я]’ + @SearchTerm + N'[^a-Я]%’

you just need to make sure you get the a from the russian alphabet – use the onboard keyboard with Russian language on windows

seems to work but needs some more testing… I cant decide how the LIKE interprets the range – presumably on Unicode value, but how does it do the accent sensitivity?

declare @SearchTerm NVARCHAR(50) = ‘search’
declare @t table(mydescription NVARCHAR(50))

insert into @t(mydescription) values(‘Search me’),(‘Research me’), (N’Rзsearch me’), (‘Résearch me’), (N’%search me’), (N’.search me’), (N'</br>search me’)
, (N’RΨsearch me’)

SELECT * FROM @t
WHERE N’ ‘ +mydescription + N’ ‘ LIKE N’%[^a-z]’ + @SearchTerm + N'[^a-z]%’

SELECT * FROM @t
WHERE N’ ‘ +mydescription + N’ ‘ LIKE N’%[^a-z^А-Я]’ + @SearchTerm + N'[^a-z^А-Я]%’

SELECT * FROM @t
WHERE N’ ‘ +mydescription + N’ ‘ LIKE N’%[^a-Я]’ + @SearchTerm + N'[^a-Я]%’

Snapshot Isolation

Gathering some info on using Snapshot Isolation levels in SQL statements…

When Snapshot isolation is enabled on a database, versions of data are kept in tempdb whilst transactions are running in order to much reduce readers & writers from blocking each other.

However, if Read Committed Snapshot (RCSI) option is Off then statements wont use this info by default.

This means most transactions run in the default isolation level (Read Committed) unless otherwise specified:

SELECT
case transaction_isolation_level
when 0 then ‘Unspecified’
when 1 then ‘ReadUncomitted’
when 2 then ‘ReadCommitted’
when 3 then ‘Repeatable’
when 4 then ‘Serializable’
when 5 then ‘Snapshot’
end as transaction_isolation_level
,*
FROM sys.dm_exec_sessions

Usage

Traditionally, developers reach for the ‘with (no lock)’ table hint when they want to exec a long-running select statement on a busy table without causing, or waiting for, locks.

This uses the ‘Read Uncomitted’ isolation level and allows dirty reads of uncommitted updates.

Overall, its fine for quick & dirty results, but far from ideal for results that you want to act upon with a degree of confidence as you can get incorrect results from updates that get rolled back, or duplicate data from page splits

Using Snapshot isolation lets you query without the problems due to ‘read Uncommitted’ and with the benefits of increased concurrency. It allows data to be read consistently as it was at the start of the transaction, as previous versions of rows at that time are snapshotted in tempdb whilst updates take place.

In order to get consistency over a batch of reads from separate tables then you should wrap the statements in a transaction, even if you are only reading the data!

If you should need to prevent anyone from updating any table whilst you are running the transaction, then you can hint ‘with (UPDLOCK, ROWLOCK)’

If using Snapshot isolation when updating data, you need to be careful. The default ‘Read committed’ is like checking out a file  in source control so that no one else can change it, Snapshot is like branching & merging. You can get update conflicts, or you can get results that are just different to those you would get normally. Needs consideration & testing.

example of conflict error, under Read Committed one transaction would block the other.

Msg 3960, Level 16, State 2, Line 15 Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot isolation to access table ‘dbo.tbl’ directly or indirectly in database ‘yourdbname’ to update, delete, or insert the row that has been modified or deleted by another transaction. Retry the transaction or change the isolation level for the update/delete statement.

Pooled Connections

One thing to note is that Isolation levels can persist across statements in a pooled connection! So, if you specify a change to the default level then make sure to set back to default at the end of the statement to avoid inadvertently using snapshot level on a query where no level has been specified.

Table Hints

If you add table hints within a query eg with (nolock) then this will override the level set at statement level – so, be sure and remove these hints if you want to use snapshot isolation.

Conflict with Temp Tables

If you see an error message like this:

“Snapshot isolation transaction failed in database ‘tempdb’ because the object accessed by the statement has been modified by a DDL statement in another concurrent transaction since the start of this transaction.  It is disallowed because the metadata is not versioned.  A concurrent update to metadata can lead to inconsistency if mixed with snapshot isolation.”

it can have a couple of sources: firstly if you are rebuilding indexes or actually changing the source tables with a DDL statement, or less obviously, if the statement includes a temporary table with a primary key. Havent found a way around this apart from skipping the Primary key/index.

References

http://www.brentozar.com/archive/2013/01/implementing-snapshot-or-read-committed-snapshot-isolation-in-sql-server-a-guide/

http://www.brentozar.com/isolation-levels-sql-server/

Link

statistics io parser

statistics io parser

neat way to aggregate total IO costs from STATISTICS IO ON

perf tuning by resource usage

http://michaeljswart.com/2013/09/my-queries-for-top-20-resource-hogs/

this nicely aggregates queries across databases

useful where you might have similar databases on the same server

SQL2008+

Left outer Joins Vs Not Exists

Aaron Bertrand @ SQL Performance with a great comparison of methods of getting data where other data doesn’t exist

I’m in the habit of using left outer joins, but, as this post explains, you have to be careful when looking at nullable columns

the kicker is the performance gains to be had with NOT EXISTS & EXCEPT

http://www.sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join

Link

Speed up Backup & Restore

Speed up Backup & Restore

make use of extra CPUs & RAM and backup & restore much faster

Link

Implicit Conversions

Implicit Conversions

awesome