Interface HTMLCleaner


  • @Role
    public interface HTMLCleaner
    Transforms any HTML content into valid XHTML that can be fed to the XHTML Parser for example.
    Since:
    1.6M1
    Version:
    $Id: 7a5aea04496574c79f20aa94ecac9c1efc07d527 $
    • Method Detail

      • clean

        Document clean​(Reader originalHtmlContent)
        Transforms any HTML content into valid XHTML that can be fed to the XHTML Parser for example. A default configuration is applied for cleaning the original HTML (see getDefaultConfiguration()).
        Parameters:
        originalHtmlContent - the original content (HTML) to clean
        Returns:
        the cleaned HTML as a w3c DOM (this allows further transformations if needed)
      • clean

        Document clean​(Reader originalHtmlContent,
                       HTMLCleanerConfiguration configuration)
        Transforms any HTML content into valid XHTML. A specific cleaning configuration can be passed to control the cleaning process.
        Parameters:
        originalHtmlContent - the original HTML content to be cleaned.
        configuration - the configuration to use for cleaning the HTML content
        Returns:
        the cleaned HTML as a w3c DOM
        Since:
        1.8.1
      • getDefaultConfiguration

        HTMLCleanerConfiguration getDefaultConfiguration()
        Allows getting the default configuration that will be used thus allowing the user to configure it like adding some more filters before or after or even remove some filters to completely control what filters will be executed. This is to be used for very specific use cases. In the majority of cases you should instead use the clean API that doesn't require passing a configuration.
        Returns:
        the default configuration that will be used to clean the original HTML
        Since:
        1.8.1