Below is a function that can be used to strip HTML from a string. It works by looping through the string until it meets an opening HTML tag, and ignores everyting until it meets a closing tag.
Finally it replaces the HTML characters before returning the result.
Create function SS_StripHTML(@html varchar(max)) returns varchar(max) as Begin declare @p int declare @nonspace bit declare @istag bit declare @chr varchar(1) declare @result varchar(max) set @p=1 set @istag=0 set @result='' set @nonspace=0
--Tidy up the Raw HTML before we start, not interested in carriage returns, --but we want to keep the <br> for later. set @html=replace(@html,char(13)+char(10),'') set @html=replace(@html,'<br>','{&&BR&&}')
loop around the text while @p<len(@html) begin set @chr=substring(@html,@p,1)
--is it the start of a HTML tag, if so do nothing until we meet a closing tag if @chr='<' set @istag=1 --is it a closing tag if @chr='>' begin set @istag=0 set @nonspace=0 end else begin --is it space, because HTML can contain lots of blank space between tags we --aren’t interested in, especially if its hand written HTML code. if ascii(@chr)>32 set @nonspace=1 --if its not space set a flag and from then on we can collect characters for --the result until we meet another HTML tag. if @istag=0 and @nonspace=1 set @result=@result+@chr end set @p=@p+1 end
--clean up the code and replace some of the HTML codes --string concats are to prevent errors displaying in this page but codes --can go into one string. set @result=replace(@result,'&nbs'+'p;',' '); set @result=replace(@result,'&am'+'p;','&'); set @result=replace(@result,'&l'+'t;','<'); set @result=replace(@result,'&g'+'t;','>'); set @result=replace(@result,'&quo'+'t;','"'); --add our line feeds back in set @result=replace(@result,'{&&BR&&}',char(13)+char(10));
return @result
end
To call the function use Select dbo.SS_StripHTML('<b>This is an example</br>')
|