Importance of URL Input Encoding and How To Set It

A while ago, we had a problem with international characters being mangled on parameters from an URL, which at the end made us uncover, that we had just relied on defaults for the setting of URL Input Encoding. And that can be a problem.

What URL Input Encoding Is

When a HTTP GET or POST request arrive at a servlet-container, the container implementation must interpret the incoming characters using some character set. In the case of a POST request, the web application developer has a chance to submit this information explicitly, by including this element in the FORM:

<input type="application/x-www-form-urlencoded; charset=charsetNameHere">

In the case of a GET request, there is no way to set this information, so the container use a default. I do not think, that the servlet specification mandates a specific default. In all cases, I think it is wise to uncover what it is set to for your container, and set it explicitly if it does not match your clients.

Setting Input Encoding on Tomcat

On tomcat, it is in the Connector element in server.xml, that you specify this setting. Here is an example of a Connector element that includes the setting:

<Connector
port="8080"
maxThreads="150"
minSpareThreads="25"
maxSpareThreads="75"
enableLookups="false"
redirectPort="8443"
acceptCount="100"
debug="0"
connectionTimeout="20000"
disableUploadTimeout="true"
URIEncoding="UTF-8" />

If not set, tomcat defaults to ISO-8859-1.

Setting Input Encoding on Weblogic

On Weblogic, the setting is done in the weblogic.xml deployment descriptor, using the input-charset element. Here is a small example:

<input-charset>
<resource-path>/some/url/resource</resource-path>
<java-charset-name>UTF-8</java-charset-name>
</input-charset>

It appears that weblogic supports setting this more fine-grained than tomcat, as you can set a resource url for the setting to apply to.

November 4, 2008  Tags: , ,   Posted in: Uncategorized

Leave a Reply