java.lang.Object
org.exoplatform.common.http.client.URI

public class URI extends Object
This class represents a generic URI, as defined in RFC-2396. This is similar to java.net.URL, with the following enhancements:
  • it doesn't require a URLStreamhandler to exist for the scheme; this allows this class to be used to hold any URI, construct absolute URIs from relative ones, etc.
  • it handles escapes correctly
  • equals() works correctly
  • relative URIs are correctly constructed
  • it has methods for accessing various fields such as userinfo, fragment, params, etc.
  • it handles less common forms of resources such as the "*" used in http URLs.

The elements are always stored in escaped form.

While RFC-2396 distinguishes between just two forms of URI's, those that follow the generic syntax and those that don't, this class knows about a third form, named semi-generic, used by quite a few popular schemes. Semi-generic syntax treats the path part as opaque, i.e. has the form <scheme>://<authority>/<opaque> . Relative URI's of this type are only resolved as far as absolute paths - relative paths do not exist.

Ideally, java.net.URL should subclass URI.

Since:
V0.3-1
Version:
0.3-3 06/05/2001
Author:
Ronald Tschal�r
See Also:
  • Field Details

    • ENABLE_BACKWARDS_COMPATIBILITY

      public static final boolean ENABLE_BACKWARDS_COMPATIBILITY
      If true, then the parser will resolve certain URI's in backwards compatible (but technically incorrect) manner. Example:
       base   = http://a/b/c/d;p?q
       rel    = http:g
       result = http:g      (correct)
       result = http://a/b/c/g (backwards compatible)
       
      See rfc-2396, section 5.2, step 3, second paragraph.
      See Also:
    • defaultPorts

      protected static final Hashtable defaultPorts
    • usesGenericSyntax

      protected static final Hashtable usesGenericSyntax
    • usesSemiGenericSyntax

      protected static final Hashtable usesSemiGenericSyntax
    • alphanumChar

      protected static final BitSet alphanumChar
    • markChar

      protected static final BitSet markChar
    • reservedChar

      protected static final BitSet reservedChar
    • unreservedChar

      protected static final BitSet unreservedChar
    • uricChar

      protected static final BitSet uricChar
    • pcharChar

      protected static final BitSet pcharChar
    • userinfoChar

      protected static final BitSet userinfoChar
    • schemeChar

      protected static final BitSet schemeChar
    • hostChar

      protected static final BitSet hostChar
    • opaqueChar

      protected static final BitSet opaqueChar
    • reg_nameChar

      protected static final BitSet reg_nameChar
    • resvdSchemeChar

      public static final BitSet resvdSchemeChar
      list of characters which must not be unescaped when unescaping a scheme
    • resvdUIChar

      public static final BitSet resvdUIChar
      list of characters which must not be unescaped when unescaping a userinfo
    • resvdHostChar

      public static final BitSet resvdHostChar
      list of characters which must not be unescaped when unescaping a host
    • resvdPathChar

      public static final BitSet resvdPathChar
      list of characters which must not be unescaped when unescaping a path
    • resvdQueryChar

      public static final BitSet resvdQueryChar
      list of characters which must not be unescaped when unescaping a query string
    • escpdPathChar

      public static final BitSet escpdPathChar
      list of characters which must not be escaped when escaping a path
    • escpdQueryChar

      public static final BitSet escpdQueryChar
      list of characters which must not be escaped when escaping a query string
    • escpdFragChar

      public static final BitSet escpdFragChar
      list of characters which must not be escaped when escaping a fragment identifier
    • OPAQUE

      protected static final int OPAQUE
      See Also:
    • SEMI_GENERIC

      protected static final int SEMI_GENERIC
      See Also:
    • GENERIC

      protected static final int GENERIC
      See Also:
    • type

      protected int type
    • scheme

      protected String scheme
    • opaque

      protected String opaque
    • userinfo

      protected String userinfo
    • host

      protected String host
    • port

      protected int port
    • path

      protected String path
    • query

      protected String query
    • fragment

      protected String fragment
    • url

      protected URL url
  • Constructor Details

    • URI

      public URI(String uri) throws ParseException
      Constructs a URI from the given string representation. The string must be an absolute URI.
      Parameters:
      uri - a String containing an absolute URI
      Throws:
      ParseException - if no scheme can be found or a specified port cannot be parsed as a number
    • URI

      public URI(URI base, String rel_uri) throws ParseException
      Constructs a URI from the given string representation, relative to the given base URI.
      Parameters:
      base - the base URI, relative to which rel_uri is to be parsed
      rel_uri - a String containing a relative or absolute URI
      Throws:
      ParseException - if base is null and rel_uri is not an absolute URI, or if base is not null and the scheme is not known to use the generic syntax, or if a given port cannot be parsed as a number
    • URI

      public URI(URL url) throws ParseException
      Construct a URI from the given URL.
      Parameters:
      url - the URL
      Throws:
      ParseException - if url.toExternalForm() generates an invalid string representation
    • URI

      public URI(String scheme, String host, String path) throws ParseException
      Constructs a URI from the given parts, using the default port for this scheme (if known). The parts must be in unescaped form.
      Parameters:
      scheme - the scheme (sometimes known as protocol)
      host - the host
      path - the path part
      Throws:
      ParseException - if scheme is null
    • URI

      public URI(String scheme, String host, int port, String path) throws ParseException
      Constructs a URI from the given parts. The parts must be in unescaped form.
      Parameters:
      scheme - the scheme (sometimes known as protocol)
      host - the host
      port - the port
      path - the path part
      Throws:
      ParseException - if scheme is null
    • URI

      public URI(String scheme, String userinfo, String host, int port, String path, String query, String fragment) throws ParseException
      Constructs a URI from the given parts. Any part except for the the scheme may be null. The parts must be in unescaped form.
      Parameters:
      scheme - the scheme (sometimes known as protocol)
      userinfo - the userinfo
      host - the host
      port - the port
      path - the path part
      query - the query string
      fragment - the fragment identifier
      Throws:
      ParseException - if scheme is null
    • URI

      public URI(String scheme, String opaque) throws ParseException
      Constructs an opaque URI from the given parts.
      Parameters:
      scheme - the scheme (sometimes known as protocol)
      opaque - the opaque part
      Throws:
      ParseException - if scheme is null
  • Method Details

    • canonicalizePath

      public static String canonicalizePath(String path)
      Remove all "/../" and "/./" from path, where possible. Leading "/../"'s are not removed.
      Parameters:
      path - the path to canonicalize
      Returns:
      the canonicalized path
    • usesGenericSyntax

      public static boolean usesGenericSyntax(String scheme)
      Returns:
      true if the scheme should be parsed according to the generic-URI syntax
    • usesSemiGenericSyntax

      public static boolean usesSemiGenericSyntax(String scheme)
      Returns:
      true if the scheme should be parsed according to a semi-generic-URI syntax <scheme>://<hostport>/<opaque>
    • defaultPort

      public static final int defaultPort(String protocol)
      Return the default port used by a given protocol.
      Parameters:
      protocol - the protocol
      Returns:
      the port number, or 0 if unknown
    • getScheme

      public String getScheme()
      Returns:
      the scheme (often also referred to as protocol)
    • getOpaque

      public String getOpaque()
      Returns:
      the opaque part, or null if this URI is generic
    • getHost

      public String getHost()
      Returns:
      the host
    • getPort

      public int getPort()
      Returns:
      the port, or -1 if it's the default port, or 0 if unknown
    • getUserinfo

      public String getUserinfo()
      Returns:
      the user info
    • getPath

      public String getPath()
      Returns:
      the path
    • getQueryString

      public String getQueryString()
      Returns:
      the query string
    • getPathAndQuery

      public String getPathAndQuery()
      Returns:
      the path and query
    • getFragment

      public String getFragment()
      Returns:
      the fragment
    • isGenericURI

      public boolean isGenericURI()
      Does the scheme specific part of this URI use the generic-URI syntax?

      In general URI are split into two categories: opaque-URI and generic-URI. The generic-URI syntax is the syntax most are familiar with from URLs such as ftp- and http-URLs, which is roughly:

       generic-URI = scheme ":" [ "//" server ] [ "/" ] [ path_segments ] [ "?" query ]
       
      (see RFC-2396 for exact syntax). Only URLs using the generic-URI syntax can be used to create and resolve relative URIs.

      Whether a given scheme is parsed according to the generic-URI syntax or wether it is treated as opaque is determined by an internal table of URI schemes.

      See Also:
    • isSemiGenericURI

      public boolean isSemiGenericURI()
      Does the scheme specific part of this URI use the semi-generic-URI syntax?

      Many schemes which don't follow the full generic syntax actually follow a reduced form where the path part is treated is opaque. This is used for example by ldap, smtp, pop, etc, and is roughly

       generic-URI = scheme ":" [ "//" server ] [ "/" [ opaque_path ] ]
       
      I.e. parsing is identical to the generic-syntax, except that the path part is not further parsed. URLs using the semi-generic-URI syntax can be used to create and resolve relative URIs with the restriction that all paths are treated as absolute.

      Whether a given scheme is parsed according to the semi-generic-URI syntax is determined by an internal table of URI schemes.

      See Also:
    • toURL

      public URL toURL() throws MalformedURLException
      Will try to create a java.net.URL object from this URI.
      Returns:
      the URL
      Throws:
      MalformedURLException - if no handler is available for the scheme
    • toExternalForm

      public String toExternalForm()
      Returns:
      a string representation of this URI suitable for use in links, headers, etc.
    • toString

      public String toString()
      Return the URI as string. This differs from toExternalForm() in that all elements are unescaped before assembly. This is not suitable for passing to other apps or in header fields and such, and is usually not what you want.
      Overrides:
      toString in class Object
      Returns:
      the URI as a string
      See Also:
    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Object
      Returns:
      true if other is either a URI or URL and it matches the current URI
    • hashCode

      public int hashCode()
      The hash code is calculated over scheme, host, path, and query.
      Overrides:
      hashCode in class Object
      Returns:
      the hash code
    • escape

      public static String escape(String elem, BitSet allowed_char, boolean utf8)
      Escape any character not in the given character class. Characters greater 255 are always escaped according to ??? .
      Parameters:
      elem - the string to escape
      allowed_char - the BitSet of all allowed characters
      utf8 - if true, will first UTF-8 encode unallowed characters
      Returns:
      the string with all characters not in allowed_char escaped
    • escape

      public static char[] escape(char[] elem, BitSet allowed_char, boolean utf8)
      Escape any character not in the given character class. Characters greater 255 are always escaped according to ??? .
      Parameters:
      elem - the array of characters to escape
      allowed_char - the BitSet of all allowed characters
      utf8 - if true, will first UTF-8 encode unallowed characters
      Returns:
      the elem array with all characters not in allowed_char escaped
    • unescape

      public static final String unescape(String str, BitSet reserved) throws ParseException
      Unescape escaped characters (i.e. %xx) except reserved ones.
      Parameters:
      str - the string to unescape
      reserved - the characters which may not be unescaped, or null
      Returns:
      the unescaped string
      Throws:
      ParseException - if the two digits following a `%' are not a valid hex number
    • main

      public static void main(String[] args) throws Exception
      Run test set.
      Throws:
      Exception - if any test fails