Tuesday, August 6, 2013

SQL Server- User Defined Function to Parse HTML- Strip HTML -Without Using Regular Expression

This is a simple tip post. In this post I will show you a user defined function in SQL Server which parse the HTML and retrieve the Text only from it.

In previous posts, I explained Case Sensitive Search in SQL Server, Check Primary Key Existence in Table, Reset Identity Column in SQL Server, Insert Values into Identity column, Identity Column in SQL Server , STUFF Function, LEN Function, UNICODE Function, LEFT Function, CHARINDEX Function, Simple script to backup all SQL Server databases, Table-Valued Parameters and some other articles related to SQL ServerASP.Net, C#.

Following UDF takes the HTML as input and returns Text only from it. Please note you need to replace all the single quotes (if any) with two single quote (not double quote) in HTML before passing it as input to function.

CREATE FUNCTION [dbo].[udf_ParseHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX) AS
BEGIN
    DECLARE @Start INT
    DECLARE @End INT
    DECLARE @Length INT
    SET @Start = CHARINDEX('<',@HTMLText)
    SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
    SET @Length = (@End - @Start) + 1
    WHILE @Start > 0 AND @End > 0 AND @Length > 0
    BEGIN
        SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
        SET @Start = CHARINDEX('<',@HTMLText)
        SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
        SET @Length = (@End - @Start) + 1        
    END
    RETURN LTRIM(RTRIM(@HTMLText))
END
To test the above function, write the following SQl query-
SELECT dbo.udf_StripHTML1('<b>Dot Net World- </b><b><br><br><i>User Defined Function to Parse HTML</i></b>')
Output-
Above qquery will give the following output-
Parse HTML in SQL Server
I hope this will be helpful for you. I would like to have any feedback from you. Your valuable feedback, question, or comments about this article are always welcome.

No comments:

Post a Comment

^ Scroll to Top