Tech Per

27 Feb

4 Ways To Stream Pdf and Some Tips

When it comes to streaming pdf content to a browser, suprisingly enough, a good how-to or reference on how to do this properly, is hard to find. It turns out, that there are numerous ways to do this, each with its own pros and cons. In this post, I show 4 different ways to stream pdf content to the browser. This is not about how to construct pdf content, but more on how to get it shown properly in the browser.

Direct Streaming Of Pdf Bytes

This is the easiest and properly also the most “correct” in the sense that it has the highest probability of working in a great deal of browsers (I think). With this method, you simple output the pdf bytes directly to the browser as a complete document, without html or anything like it, and then let the browser determine what to do with the content.

To let the browser know which type of content it is receiving, you need to set the Content-Type HTTP header to the value application/pdf before starting to stream any content. In Java, this is done by calling the setContentType() method on HttpServletResponse (also set Content-Length header if possible, but more on that later).

Here is some code to do this from a Java servlet:

ByteArrayOutputStream pdfBytesStream = constructPdf();
response.setContentType("application/pdf");
pdfBytesStream.writeTo(response.getOutputStream());

Embed Inside Other HTML Content

There are three html tags to do this: embed, object or iframe. Using iframe, embed or object makes it possible to have other content on the page, along with the pdf. It also makes it possible to apply scripting, to the pdf content. Like showing or not showing it.

Using The embed Tag

The embed tag is an old, and non-standard tag to embed content in a html page. One thing is that it is non-standard, but being old, it is also well supported today. It works both in IE and Firefox. Here is an example on the use of this tag:

<embed id="pdf" src="<%= request.getContextPath() %>/pdf/stream.pdf"
    width="100%" height="100%"
    type="application/pdf" />

Setting the “id” property, makes it easy to apply scripting on the element.

Using The object Tag

This is the standard tag to use, when embedding content in html. Like embed, it also can be scripted, but it also supports other stuff. Like being only declared, and then activated elsewhere. Here is a simple example on the use of object tag:

<object id="pdf" border="0" width="100%" height="100%" type="application/pdf" data="<%= request.getContextPath() %>/pdf/stream.pdf" standby="Loading pdf...">
    Oops, you have no pdf viewer enabled.
</object>

Using The iframe Tag

You simply pop the direct pdf link into the src attribute of an iframe, and watch it load. Like this:

<iframe id="pdf" src="<%= request.getContextPath() %>/pdf/stream.pdf" frameborder="0" width="100%" height="100%" type="application/pdf">
    Oops, you have no support for iframes.
</iframe>

Tips

Here are some nice to know stuff, which can help you serve pdf better, and possibly avoid some trouble for your users.

Set Content-Length Header

If at all possible, it is a good idea to set the Content-Length header to the length of the pdf bytes, before streaming them directly. Some browsers might not show the pdf correctly (or at all), if this header is missing. Do it like this:

ByteArrayOutputStream pdfBytesStream = constructPdf();
response.setContentLength(pdfBytesStream.size());

Normally, this is no problem to do, but it can be a problem, if your pdf document is very big, as you will need to build up the complete document on the server, before starting to stream it. At best, this is resource intensive. At worst, the browser may timeout, before any pdf data is received, if the pdf construction takes long to finish. Anyway, this header is a near-must.

Turn Caching Off

If the pdf is dynamically generated, and as such could change content between invocations of the url that generates it, caching of the pdf should be turned off. I know of two methods to do this. One way is to use the Expires and Cache-Control HTTP headers, and here is some Java code to set this from a Java servlet:

response.setHeader("Expires", "0");
response.setHeader("Cache-Control", "must-revalidate, post-check=0, pre-check=0");

The other way is to use dummy parameters on the URL that generates the pdf. The simplest thing to do is to put something like this ?dummy=<%= System.currentTimeMillis() %> on the URL of the pdf generation.

Suggest A Filename On Save

You can suggest a filename to be used if the client tries to save the streamed pdf, by setting the Content-Disposition HTTP header, like this:

response.setHeader("Content-disposition", "inline; filename=SuggestedFilename.pdf" );

Open Pdf In Standalone Browser

As such, there is no way for the server streaming the pdf, to ensure that the pdf is opened in an external, standalone pdf application on the client. There might not even be such a beast there. But, there is a small tip on something close, that you can do. Using the same Content-Disposition header as above, you can mark it as attachment instead of inline, which will make browsers popup a dialog with the “Save or open” message, when the pdf URL is hit. Clicking “open” in this dialog, will open in a standalone reader on most platforms. Like this:

response.setHeader("Content-disposition", "attachment; filename=SuggestedFilename.pdf" );

Pdf Streaming URL Must Be Idempotent

There is a bug in Internet Explorer, that sometimes make it request the same pdf resource multiple times, even though you only visited the URL once. I myself have seen this, when embedding pdf using the object-tag, as shown above. So, the URL being hit to get the pdf, must be able to be called multiple times, with the same parameters, and produce the same result. If not, IE might show a blank page.

Map Stream URLs With a .pdf Extension

Consider mapping pdfstream-producing resources in web.xml to URLs with a .pdf extension. Some browsers might be so stupid as to look at the filename (URL-ending) of a resource, instead of the mime-type, when determining what to do with it. This is easily done in web.xml in Java, as shown here for a servlet:

<servlet>
    <servlet-name>pdfservlet</servlet-name>
    <servlet-class>net.techper.pdfstreaming.DirectStreamServlet</servlet-class>
</servlet>
<servlet-mapping>
    <servlet-name>pdfservlet</servlet-name>
    <url-pattern>/pdf/stream.pdf</url-pattern>
</servlet-mapping>

Example Code

I put together a small Java web application that shows how to stream pdf content, and you can download it here.

One Response to “4 Ways To Stream Pdf and Some Tips”

  1. 1
    Carlo Pasaol Says:

    I’ve done everything by the book but sometimes (yes, I say sometimes), all the user sees is a white div (containing only the ‘embed’ tag for the pdf) on the page.

    Adobe PDF Viewer version is 8. IE version is 6.
    I am VERY sure that the ALL the bytes have been completely written onto the response.

    I have also observed IE requesting the PDF more than once. I can’t find any explanation for this as well…

Leave a Reply

© 2008 Tech Per | Entries (RSS) and Comments (RSS)

GPS Reviews and news from GPS Gazettewordpress logo