As a follow-up to my previous two articles (Complex grouping of strings in T-SQL and Working with delimited strings in T-SQL),
Requires Free Membership to View

Method #1: Using a table of ASCII code values
This method removes all characters other than [a to z] , [A to Z] and [0 to 9]. In other words, it removes all symbols ("#?&/() and so on. You can do this by creating a table with all bad ASCII code values and cycling through a string, replacing the bad codes with an empty string.
--Create table with all bad codes DECLARE @mycode INT CREATE TABLE #badcodes(badcode INT) SET @mycode=33 WHILE @mycode<=255 BEGIN IF (@mycode BETWEEN 33 AND 47) OR (@mycode BETWEEN 58 AND 64) OR (@mycode BETWEEN 58 AND 64) OR (@mycode BETWEEN 91 AND 96) OR (@mycode BETWEEN 123 AND 255) INSERT INTO #BADCODES VALUES(@mycode) SET @mycode=@mycode+1 END GO --strip out the characters DECLARE @test VARCHAR(100) SET @test='se+*ar%c&h;^da#tab~se' UPDATE #badcodes SET @test=REPLACE(@test,CHAR(badcode),'') PRINT @test
Method #2: Using specified patterns
For more info, check out SQL Server books online for help on "PATINDEX" and "STUFF".
DECLARE @pos INT DECLARE @myString VARCHAR(20) SET @myString='se+*ar%c&h;^da#tab~se' SET @pos = PATINDEX('%[^a-z0-9_]%',@myString) -- sets @pos to the position in @myString of the first character -- not (^) equal to a letter (a-z), a digit (0-9) or an underscore -- or sets @pos to 0 if no such character exists in @myString (in which -- case you are done) WHILE @pos > 0 BEGIN SET @myString = STUFF(@myString,@pos,1,'') -- overwrites the non-alphanumeric character at position @pos with '' SET @pos = PATINDEX('%[^a-z0-9_]%',@myString) -- looks for the next non-alphanumeric character END SELECT @myString
An example: how to replace non-alphanumeric characters from a column
Let's create a test table first:
--//////////////////////////////// Create table tblTest(col1 varchar(20),col2 varchar(20)) insert into tblTest values('one$','00#7') insert into tblTest values('tw*o','%420') insert into tblTest values('thre@e','11&1') --select * from tblTest col1 col2 -------------------- -------------------- one$ 00#7 tw*o %420 thre@e 11&1 --////////////////////////////////
Add one new column and insert the values by appending existing columns.
--//////////////////////////////// ALTER TABLE tblTest ADD Newcol varchar(40) update tblTest set Newcol=col1+col2 --select * from tblTest col1 col2 Newcol -------------------- -------------------- --------------- one$ 00#7 one$00#7 tw*o %420 tw*o%420 thre@e 11&1 thre@e11&1 --////////////////////////////////
Here's the solution. You can use one of the above two methods to remove non-alphanumeric characters from a table column. I have choosen Mehtod #2 to solve this example problem. You have to add row numbers to the table to loop through the table and update.
-- Add row number to each row SELECT IDENTITY(int,1,1) AS ID, * INTO #tblAlpha FROM tblTest --select * from #tblAlpha --update new column DECLARE @MaxRank int DECLARE @Counter int declare @pos int Declare @s varchar(100) Set nocount on SELECT @MaxRank = Max(ID) From #tblAlpha SET @Counter = 1 WHILE @Counter <= @MaxRank BEGIN select @s=newcol from #tblAlpha where id=@Counter set @pos = patindex('%[^a-z0-9_]%',@s) while @pos > 0 begin set @s = stuff(@s,@pos,1,'') set @pos = patindex('%[^a-z0-9_]%',@s) end update #tblAlpha set newcol=@s where id=@Counter SET @Counter = @Counter +1 END truncate table tbltest insert into tbltest select col1,col2,newcol from #tblAlpha drop table #tblAlpha select * from tbltest
Reader Feedback
Israel P. writes: This took over 24 hrs to run on my machine so it doesn't work for my purposes. I found this code much more effective ( it took only 35 mins to run):
while @@rowcount > 0 update user_list_original set fname = replace(fname, substring(fname, patindex('%[^a-zA-Z ]%', fname), 1), '') where patindex('%[^a-zA-Z ]%', fname) <> 0
Is there anything missing by using this? I have a table with millions of records that I need to 'clean'. Any pointers for cleaning data (emails, addresses, etc.)?
For More Information
- What do you think about this tip? E-mail the Editor at [email protected] with your feedback.
- The Best SQL Server Web Links: tips, tutorials, scripts, and more.
- Have an SQL Server tip to offer your fellow DBA's and developers? The best tips submitted will receive a cool prize--submit your tip today!
- Ask your technical SQL Server questions--or help out your peers by answering them--in our live discussion forums.
- Check out our Ask the Experts feature: Our SQL, Database Design, Oracle, SQL Server, DB2, metadata, and data warehousing gurus are waiting to answer your toughest questions.
This was first published in June 2002
Join the conversationComment
Share
Comments
Results
Contribute to the conversation