Class URI


  • public class URI
    extends Object
    This class represents a generic URI, as defined in RFC-2396. This is similar to java.net.URL, with the following enhancements:
    • it doesn't require a URLStreamhandler to exist for the scheme; this allows this class to be used to hold any URI, construct absolute URIs from relative ones, etc.
    • it handles escapes correctly
    • equals() works correctly
    • relative URIs are correctly constructed
    • it has methods for accessing various fields such as userinfo, fragment, params, etc.
    • it handles less common forms of resources such as the "*" used in http URLs.

    The elements are always stored in escaped form.

    While RFC-2396 distinguishes between just two forms of URI's, those that follow the generic syntax and those that don't, this class knows about a third form, named semi-generic, used by quite a few popular schemes. Semi-generic syntax treats the path part as opaque, i.e. has the form <scheme>://<authority>/<opaque> . Relative URI's of this type are only resolved as far as absolute paths - relative paths do not exist.

    Ideally, java.net.URL should subclass URI.

    Since:
    V0.3-1
    Version:
    0.3-3 06/05/2001
    Author:
    Ronald Tschal�r
    See Also:
    rfc-2396
    • Field Detail

      • ENABLE_BACKWARDS_COMPATIBILITY

        public static final boolean ENABLE_BACKWARDS_COMPATIBILITY
        If true, then the parser will resolve certain URI's in backwards compatible (but technically incorrect) manner. Example:
         base   = http://a/b/c/d;p?q
         rel    = http:g
         result = http:g      (correct)
         result = http://a/b/c/g (backwards compatible)
         
        See rfc-2396, section 5.2, step 3, second paragraph.
        See Also:
        Constant Field Values
      • defaultPorts

        protected static final Hashtable defaultPorts
      • usesGenericSyntax

        protected static final Hashtable usesGenericSyntax
      • usesSemiGenericSyntax

        protected static final Hashtable usesSemiGenericSyntax
      • alphanumChar

        protected static final BitSet alphanumChar
      • markChar

        protected static final BitSet markChar
      • reservedChar

        protected static final BitSet reservedChar
      • unreservedChar

        protected static final BitSet unreservedChar
      • uricChar

        protected static final BitSet uricChar
      • pcharChar

        protected static final BitSet pcharChar
      • userinfoChar

        protected static final BitSet userinfoChar
      • schemeChar

        protected static final BitSet schemeChar
      • hostChar

        protected static final BitSet hostChar
      • opaqueChar

        protected static final BitSet opaqueChar
      • reg_nameChar

        protected static final BitSet reg_nameChar
      • resvdSchemeChar

        public static final BitSet resvdSchemeChar
        list of characters which must not be unescaped when unescaping a scheme
      • resvdUIChar

        public static final BitSet resvdUIChar
        list of characters which must not be unescaped when unescaping a userinfo
      • resvdHostChar

        public static final BitSet resvdHostChar
        list of characters which must not be unescaped when unescaping a host
      • resvdPathChar

        public static final BitSet resvdPathChar
        list of characters which must not be unescaped when unescaping a path
      • resvdQueryChar

        public static final BitSet resvdQueryChar
        list of characters which must not be unescaped when unescaping a query string
      • escpdPathChar

        public static final BitSet escpdPathChar
        list of characters which must not be escaped when escaping a path
      • escpdQueryChar

        public static final BitSet escpdQueryChar
        list of characters which must not be escaped when escaping a query string
      • escpdFragChar

        public static final BitSet escpdFragChar
        list of characters which must not be escaped when escaping a fragment identifier
      • type

        protected int type
      • scheme

        protected String scheme
      • opaque

        protected String opaque
      • userinfo

        protected String userinfo
      • port

        protected int port
      • query

        protected String query
      • fragment

        protected String fragment
      • url

        protected URL url
    • Constructor Detail

      • URI

        public URI​(String uri)
            throws ParseException
        Constructs a URI from the given string representation. The string must be an absolute URI.
        Parameters:
        uri - a String containing an absolute URI
        Throws:
        ParseException - if no scheme can be found or a specified port cannot be parsed as a number
      • URI

        public URI​(URI base,
                   String rel_uri)
            throws ParseException
        Constructs a URI from the given string representation, relative to the given base URI.
        Parameters:
        base - the base URI, relative to which rel_uri is to be parsed
        rel_uri - a String containing a relative or absolute URI
        Throws:
        ParseException - if base is null and rel_uri is not an absolute URI, or if base is not null and the scheme is not known to use the generic syntax, or if a given port cannot be parsed as a number
      • URI

        public URI​(URL url)
            throws ParseException
        Construct a URI from the given URL.
        Parameters:
        url - the URL
        Throws:
        ParseException - if url.toExternalForm() generates an invalid string representation
      • URI

        public URI​(String scheme,
                   String host,
                   String path)
            throws ParseException
        Constructs a URI from the given parts, using the default port for this scheme (if known). The parts must be in unescaped form.
        Parameters:
        scheme - the scheme (sometimes known as protocol)
        host - the host
        path - the path part
        Throws:
        ParseException - if scheme is null
      • URI

        public URI​(String scheme,
                   String host,
                   int port,
                   String path)
            throws ParseException
        Constructs a URI from the given parts. The parts must be in unescaped form.
        Parameters:
        scheme - the scheme (sometimes known as protocol)
        host - the host
        port - the port
        path - the path part
        Throws:
        ParseException - if scheme is null
      • URI

        public URI​(String scheme,
                   String userinfo,
                   String host,
                   int port,
                   String path,
                   String query,
                   String fragment)
            throws ParseException
        Constructs a URI from the given parts. Any part except for the the scheme may be null. The parts must be in unescaped form.
        Parameters:
        scheme - the scheme (sometimes known as protocol)
        userinfo - the userinfo
        host - the host
        port - the port
        path - the path part
        query - the query string
        fragment - the fragment identifier
        Throws:
        ParseException - if scheme is null
      • URI

        public URI​(String scheme,
                   String opaque)
            throws ParseException
        Constructs an opaque URI from the given parts.
        Parameters:
        scheme - the scheme (sometimes known as protocol)
        opaque - the opaque part
        Throws:
        ParseException - if scheme is null
    • Method Detail

      • canonicalizePath

        public static String canonicalizePath​(String path)
        Remove all "/../" and "/./" from path, where possible. Leading "/../"'s are not removed.
        Parameters:
        path - the path to canonicalize
        Returns:
        the canonicalized path
      • usesGenericSyntax

        public static boolean usesGenericSyntax​(String scheme)
        Returns:
        true if the scheme should be parsed according to the generic-URI syntax
      • usesSemiGenericSyntax

        public static boolean usesSemiGenericSyntax​(String scheme)
        Returns:
        true if the scheme should be parsed according to a semi-generic-URI syntax <scheme>://<hostport>/<opaque>
      • defaultPort

        public static final int defaultPort​(String protocol)
        Return the default port used by a given protocol.
        Parameters:
        protocol - the protocol
        Returns:
        the port number, or 0 if unknown
      • getScheme

        public String getScheme()
        Returns:
        the scheme (often also referred to as protocol)
      • getOpaque

        public String getOpaque()
        Returns:
        the opaque part, or null if this URI is generic
      • getHost

        public String getHost()
        Returns:
        the host
      • getPort

        public int getPort()
        Returns:
        the port, or -1 if it's the default port, or 0 if unknown
      • getUserinfo

        public String getUserinfo()
        Returns:
        the user info
      • getPath

        public String getPath()
        Returns:
        the path
      • getQueryString

        public String getQueryString()
        Returns:
        the query string
      • getPathAndQuery

        public String getPathAndQuery()
        Returns:
        the path and query
      • getFragment

        public String getFragment()
        Returns:
        the fragment
      • isGenericURI

        public boolean isGenericURI()
        Does the scheme specific part of this URI use the generic-URI syntax?

        In general URI are split into two categories: opaque-URI and generic-URI. The generic-URI syntax is the syntax most are familiar with from URLs such as ftp- and http-URLs, which is roughly:

         generic-URI = scheme ":" [ "//" server ] [ "/" ] [ path_segments ] [ "?" query ]
         
        (see RFC-2396 for exact syntax). Only URLs using the generic-URI syntax can be used to create and resolve relative URIs.

        Whether a given scheme is parsed according to the generic-URI syntax or wether it is treated as opaque is determined by an internal table of URI schemes.

        See Also:
        rfc-2396
      • isSemiGenericURI

        public boolean isSemiGenericURI()
        Does the scheme specific part of this URI use the semi-generic-URI syntax?

        Many schemes which don't follow the full generic syntax actually follow a reduced form where the path part is treated is opaque. This is used for example by ldap, smtp, pop, etc, and is roughly

         generic-URI = scheme ":" [ "//" server ] [ "/" [ opaque_path ] ]
         
        I.e. parsing is identical to the generic-syntax, except that the path part is not further parsed. URLs using the semi-generic-URI syntax can be used to create and resolve relative URIs with the restriction that all paths are treated as absolute.

        Whether a given scheme is parsed according to the semi-generic-URI syntax is determined by an internal table of URI schemes.

        See Also:
        isGenericURI()
      • toExternalForm

        public String toExternalForm()
        Returns:
        a string representation of this URI suitable for use in links, headers, etc.
      • toString

        public String toString()
        Return the URI as string. This differs from toExternalForm() in that all elements are unescaped before assembly. This is not suitable for passing to other apps or in header fields and such, and is usually not what you want.
        Overrides:
        toString in class Object
        Returns:
        the URI as a string
        See Also:
        toExternalForm()
      • equals

        public boolean equals​(Object other)
        Overrides:
        equals in class Object
        Returns:
        true if other is either a URI or URL and it matches the current URI
      • hashCode

        public int hashCode()
        The hash code is calculated over scheme, host, path, and query.
        Overrides:
        hashCode in class Object
        Returns:
        the hash code
      • escape

        public static String escape​(String elem,
                                    BitSet allowed_char,
                                    boolean utf8)
        Escape any character not in the given character class. Characters greater 255 are always escaped according to ??? .
        Parameters:
        elem - the string to escape
        allowed_char - the BitSet of all allowed characters
        utf8 - if true, will first UTF-8 encode unallowed characters
        Returns:
        the string with all characters not in allowed_char escaped
      • escape

        public static char[] escape​(char[] elem,
                                    BitSet allowed_char,
                                    boolean utf8)
        Escape any character not in the given character class. Characters greater 255 are always escaped according to ??? .
        Parameters:
        elem - the array of characters to escape
        allowed_char - the BitSet of all allowed characters
        utf8 - if true, will first UTF-8 encode unallowed characters
        Returns:
        the elem array with all characters not in allowed_char escaped
      • unescape

        public static final String unescape​(String str,
                                            BitSet reserved)
                                     throws ParseException
        Unescape escaped characters (i.e. %xx) except reserved ones.
        Parameters:
        str - the string to unescape
        reserved - the characters which may not be unescaped, or null
        Returns:
        the unescaped string
        Throws:
        ParseException - if the two digits following a `%' are not a valid hex number
      • main

        public static void main​(String[] args)
                         throws Exception
        Run test set.
        Throws:
        Exception - if any test fails