URLEncoder.java in » 6.0-JDK-Core » net » java » net » Java Source Code / Java Documentation Java Source Code and Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server
26.	ERP CRM Financial
27.	ESB
28.	Forum
29.	Game
30.	GIS
31.	Graphic 3D
32.	Graphic Library
33.	Groupware
34.	HTML Parser
35.	IDE
36.	IDE Eclipse
37.	IDE Netbeans
38.	Installer
39.	Internationalization Localization
40.	Inversion of Control
41.	Issue Tracking
42.	J2EE
43.	J2ME
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Music
50.	Natural Language Processing
51.	Net
52.	Parser
53.	PDF
54.	Portal
55.	Profiler
56.	Project Management
57.	Report
58.	RSS RDF
59.	Rule Engine
60.	Science
61.	Scripting
62.	Search Engine
63.	Security
64.	Sevlet Container
65.	Source Control
66.	Swing Library
67.	Template Engine
68.	Test Coverage
69.	Testing
70.	UML
71.	Web Crawler
72.	Web Framework
73.	Web Mail
74.	Web Server
75.	Web Services
76.	Web Services apache cxf 2.2.6
77.	Web Services AXIS2
78.	Wiki Engine
79.	Workflow Engines
80.	XML
81.	XML UI
Java Source Code / Java Documentation » 6.0 JDK Core » net » java.net
Source Cross Referenced Class Diagram Java Document (Java Doc)
        /*
         * Copyright 1995-2006 Sun Microsystems, Inc.  All Rights Reserved.
         * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
         *
         * This code is free software; you can redistribute it and/or modify it
         * under the terms of the GNU General Public License version 2 only, as
         * published by the Free Software Foundation.  Sun designates this
         * particular file as subject to the "Classpath" exception as provided
         * by Sun in the LICENSE file that accompanied this code.
         *
         * This code is distributed in the hope that it will be useful, but WITHOUT
         * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
         * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
         * version 2 for more details (a copy is included in the LICENSE file that
         * accompanied this code).
         *
         * You should have received a copy of the GNU General Public License version
         * 2 along with this work; if not, write to the Free Software Foundation,
         * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
         *
         * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
         * CA 95054 USA or visit www.sun.com if you need additional information or
         * have any questions.
         */

        package java.net;

        import java.io.ByteArrayOutputStream;
        import java.io.BufferedWriter;
        import java.io.OutputStreamWriter;
        import java.io.IOException;
        import java.io.UnsupportedEncodingException;
        import java.io.CharArrayWriter;
        import java.nio.charset.Charset;
        import java.nio.charset.IllegalCharsetNameException;
        import java.nio.charset.UnsupportedCharsetException;
        import java.util.BitSet;
        import java.security.AccessController;
        import java.security.PrivilegedAction;
        import sun.security.action.GetBooleanAction;
        import sun.security.action.GetPropertyAction;

        /**
         * Utility class for HTML form encoding. This class contains static methods
         * for converting a String to the <CODE>application/x-www-form-urlencoded</CODE> MIME
         * format. For more information about HTML form encoding, consult the HTML 
         * <A HREF="http://www.w3.org/TR/html4/">specification</A>. 
         *
         * <p>
         * When encoding a String, the following rules apply:
         *
         * <p>
         * <ul>
         * <li>The alphanumeric characters &quot;<code>a</code>&quot; through
         *     &quot;<code>z</code>&quot;, &quot;<code>A</code>&quot; through
         *     &quot;<code>Z</code>&quot; and &quot;<code>0</code>&quot; 
         *     through &quot;<code>9</code>&quot; remain the same.
         * <li>The special characters &quot;<code>.</code>&quot;,
         *     &quot;<code>-</code>&quot;, &quot;<code>*</code>&quot;, and
         *     &quot;<code>_</code>&quot; remain the same. 
         * <li>The space character &quot;<code>&nbsp;</code>&quot; is
         *     converted into a plus sign &quot;<code>+</code>&quot;.
         * <li>All other characters are unsafe and are first converted into
         *     one or more bytes using some encoding scheme. Then each byte is
         *     represented by the 3-character string
         *     &quot;<code>%<i>xy</i></code>&quot;, where <i>xy</i> is the
         *     two-digit hexadecimal representation of the byte. 
         *     The recommended encoding scheme to use is UTF-8. However, 
         *     for compatibility reasons, if an encoding is not specified, 
         *     then the default encoding of the platform is used.
         * </ul>
         *
         * <p>
         * For example using UTF-8 as the encoding scheme the string &quot;The
         * string &#252;@foo-bar&quot; would get converted to
         * &quot;The+string+%C3%BC%40foo-bar&quot; because in UTF-8 the character
         * &#252; is encoded as two bytes C3 (hex) and BC (hex), and the
         * character @ is encoded as one byte 40 (hex).
         *
         * @author  Herb Jellinek
         * @version 1.38, 05/05/07
         * @since   JDK1.0
         */
        public class URLEncoder {
            static BitSet dontNeedEncoding;
            static final int caseDiff = ('a' - 'A');
            static String dfltEncName = null;

            static {

                /* The list of characters that are not encoded has been
                 * determined as follows:
                 *
                 * RFC 2396 states:
                 * -----
                 * Data characters that are allowed in a URI but do not have a
                 * reserved purpose are called unreserved.  These include upper
                 * and lower case letters, decimal digits, and a limited set of
                 * punctuation marks and symbols. 
                 *
                 * unreserved  = alphanum | mark
                 *
                 * mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
                 *
                 * Unreserved characters can be escaped without changing the
                 * semantics of the URI, but this should not be done unless the
                 * URI is being used in a context that does not allow the
                 * unescaped character to appear.
                 * -----
                 *
                 * It appears that both Netscape and Internet Explorer escape
                 * all special characters from this list with the exception
                 * of "-", "_", ".", "*". While it is not clear why they are
                 * escaping the other characters, perhaps it is safest to
                 * assume that there might be contexts in which the others
                 * are unsafe if not escaped. Therefore, we will use the same
                 * list. It is also noteworthy that this is consistent with
                 * O'Reilly's "HTML: The Definitive Guide" (page 164).
                 *
                 * As a last note, Intenet Explorer does not encode the "@"
                 * character which is clearly not unreserved according to the
                 * RFC. We are being consistent with the RFC in this matter,
                 * as is Netscape.
                 *
                 */

                dontNeedEncoding = new BitSet(256);
                int i;
                for (i = 'a'; i <= 'z'; i++) {
                    dontNeedEncoding.set(i);
                }
                for (i = 'A'; i <= 'Z'; i++) {
                    dontNeedEncoding.set(i);
                }
                for (i = '0'; i <= '9'; i++) {
                    dontNeedEncoding.set(i);
                }
                dontNeedEncoding.set(' '); /* encoding a space to a + is done
                 * in the encode() method */
                dontNeedEncoding.set('-');
                dontNeedEncoding.set('_');
                dontNeedEncoding.set('.');
                dontNeedEncoding.set('*');

                dfltEncName = (String) AccessController
                        .doPrivileged(new GetPropertyAction("file.encoding"));
            }

            /**
             * You can't call the constructor.
             */
            private URLEncoder() {
            }

            /**
             * Translates a string into <code>x-www-form-urlencoded</code>
             * format. This method uses the platform's default encoding
             * as the encoding scheme to obtain the bytes for unsafe characters.
             *
             * @param   s   <code>String</code> to be translated.
             * @deprecated The resulting string may vary depending on the platform's
             *             default encoding. Instead, use the encode(String,String)
             *             method to specify the encoding.
             * @return  the translated <code>String</code>.
             */
            @Deprecated
            public static String encode(String s) {

                String str = null;

                try {
                    str = encode(s, dfltEncName);
                } catch (UnsupportedEncodingException e) {
                    // The system should always have the platform default
                }

                return str;
            }

            /**
             * Translates a string into <code>application/x-www-form-urlencoded</code>
             * format using a specific encoding scheme. This method uses the
             * supplied encoding scheme to obtain the bytes for unsafe
             * characters.
             * <p>
             * <em><strong>Note:</strong> The <a href=
             * "http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars">
             * World Wide Web Consortium Recommendation</a> states that
             * UTF-8 should be used. Not doing so may introduce
             * incompatibilites.</em>
             *
             * @param   s   <code>String</code> to be translated.
             * @param   enc   The name of a supported 
             *    <a href="../lang/package-summary.html#charenc">character
             *    encoding</a>.
             * @return  the translated <code>String</code>.
             * @exception  UnsupportedEncodingException
             *             If the named encoding is not supported
             * @see URLDecoder#decode(java.lang.String, java.lang.String)
             * @since 1.4
             */
            public static String encode(String s, String enc)
                    throws UnsupportedEncodingException {

                boolean needToChange = false;
                StringBuffer out = new StringBuffer(s.length());
                Charset charset;
                CharArrayWriter charArrayWriter = new CharArrayWriter();

                if (enc == null)
                    throw new NullPointerException("charsetName");

                try {
                    charset = Charset.forName(enc);
                } catch (IllegalCharsetNameException e) {
                    throw new UnsupportedEncodingException(enc);
                } catch (UnsupportedCharsetException e) {
                    throw new UnsupportedEncodingException(enc);
                }

                for (int i = 0; i < s.length();) {
                    int c = (int) s.charAt(i);
                    //System.out.println("Examining character: " + c);
                    if (dontNeedEncoding.get(c)) {
                        if (c == ' ') {
                            c = '+';
                            needToChange = true;
                        }
                        //System.out.println("Storing: " + c);
                        out.append((char) c);
                        i++;
                    } else {
                        // convert to external encoding before hex conversion
                        do {
                            charArrayWriter.write(c);
                            /*
                             * If this character represents the start of a Unicode
                             * surrogate pair, then pass in two characters. It's not
                             * clear what should be done if a bytes reserved in the 
                             * surrogate pairs range occurs outside of a legal
                             * surrogate pair. For now, just treat it as if it were 
                             * any other character.
                             */
                            if (c >= 0xD800 && c <= 0xDBFF) {
                                /*
                                  System.out.println(Integer.toHexString(c) 
                                  + " is high surrogate");
                                 */
                                if ((i + 1) < s.length()) {
                                    int d = (int) s.charAt(i + 1);
                                    /*
                                      System.out.println("\tExamining " 
                                      + Integer.toHexString(d));
                                     */
                                    if (d >= 0xDC00 && d <= 0xDFFF) {
                                        /*
                                          System.out.println("\t" 
                                          + Integer.toHexString(d) 
                                          + " is low surrogate");
                                         */
                                        charArrayWriter.write(d);
                                        i++;
                                    }
                                }
                            }
                            i++;
                        } while (i < s.length()
                                && !dontNeedEncoding
                                        .get((c = (int) s.charAt(i))));

                        charArrayWriter.flush();
                        String str = new String(charArrayWriter.toCharArray());
                        byte[] ba = str.getBytes(charset);
                        for (int j = 0; j < ba.length; j++) {
                            out.append('%');
                            char ch = Character
                                    .forDigit((ba[j] >> 4) & 0xF, 16);
                            // converting to use uppercase letter as part of
                            // the hex value if ch is a letter.
                            if (Character.isLetter(ch)) {
                                ch -= caseDiff;
                            }
                            out.append(ch);
                            ch = Character.forDigit(ba[j] & 0xF, 16);
                            if (Character.isLetter(ch)) {
                                ch -= caseDiff;
                            }
                            out.append(ch);
                        }
                        charArrayWriter.reset();
                        needToChange = true;
                    }
                }

                return (needToChange ? out.toString() : s);
            }
        }
w_ww.__j___a___v__a_2__s.___c_o_m_ | Contact Us
All other trademarks are property of their respective owners.