Como: Identificar Hyperlinks em uma String HTML em Visual Basic

This example demonstrates a simple regular expression for identifying hyperlinks in an HTML document.

Exemplo

This example uses the regular expression <A[^>]*?HREF\s*=\s*"([^"]+)"[^>]*?>([\s\S]*?)<\/A>, which means:

  1. The string "<A", followed by

  2. The smallest set of zero or more characters that does not include the character ">", followed by

  3. The string "HREF", followed by

  4. Zero or more space characters, followed by

  5. The character "=", followed by

  6. Zero or more space characters, followed by

  7. The quotation-mark character, followed by

  8. The set of characters that does not include the quotation-mark character (captured), followed by

  9. The quotation-mark character, followed by

  10. The smallest set of zero or more characters that does not include the character ">", followed by

  11. The character ">", followed by

  12. The smallest set of zero or more characters (captured), followed by

  13. The string "</A>".

The Regex object is initialized with the regular expression, and specified to be case-insensitive.

The Regex object's Matches method returns a MatchCollection object that contains information about all the parts of the input string that the regular expression matches.

    ''' <summary>Identifies hyperlinks in HTML text.</summary>
    ''' <param name="htmlText">HTML text to parse.</param>
    ''' <remarks>This method displays the label and destination for
    ''' each link in the input text.</remarks>
    Sub IdentifyLinks(ByVal htmlText As String)
        Dim hrefRegex As New Regex( 
            "<A[^>]*?HREF\s*=\s*""([^""]+)""[^>]*?>([\s\S]*?)<\/A>", 
            RegexOptions.IgnoreCase)
        Dim output As String = ""
        For Each m As Match In hrefRegex.Matches(htmlText)
            output &= "Link label: " & m.Groups(2).Value & vbCrLf
            output &= "Link destination: " & m.Groups(1).Value & vbCrLf
        Next
        MsgBox(output)
    End Sub

This example requires that you use the Imports statement to import the System.Text.RegularExpressions namespace. For more information, see Declaração Imports (Tipo e Namespace .NET).

Consulte também

Conceitos

Exemplo: Procurando HREFs

Outros recursos

Analisando Sequências de Caracteres em Visual Basic