Generating Dynamic PDF Documents using the Open Source Scryber Library
Introduction
It has always been a chore to create dynamic PDF documents. And even harder to create good looking dynamic PDF documents. It usually involves reams of custom code, hard coded element placement, and at least a little knowledge of how PDF documents are structured.
With Scryber, writing PDF documents has now become as easy as creating HTML pages. Cascading styles and data binding are supported along with generic layout components - page headers, lines, text blocks, columns, containers, images, fonts, etc.
Background
Scryber is an open source PDF Library, created by PerceiveIT Limited, and released under the LGPL license. This is less restrictive than other GPL libraries and allows you to link your commercial applications as well as open source applications - provided you have not modified the original Scryber source code.
Overview
To show some of the capabilities of the Scryber library, this article is going to take the CodeProject New Articles RSS feed (available here) and build a nice PDF document with the contents of the feed similar to below:
Downloading and installing scryber
Before starting, you will need to install the current Scryber framework
- Visual Studio 2012 / .NET 4.0+
- The framework is available to download as a VSIX directly from within Visual Studio - search the available online Extensions and Updates for scryber, and you will be able to directly download the latest version
http://visualstudiogallery.msdn.microsoft.com/1f14c378-c102-4687-a16f-ce4eaaef845d[^] - NuGet Package .Net 4.0+
- The framework itself is published as a package on Nuget.org and can be installed from there. This does not include the item templates discussed below, so you will need to add your own xml files.
http://www.nuget.org/packages/scryber/[^] - Older versions including VS 2008 / VS2010, .NET 3.5
- The installer for older versions is available to download from the codeplex site. It is kept up to date at the moment, but this may cease in the near future.
http://scryber.codeplex.com/releases/[^]
The Scryber installer or VSIX will add the required assemblies and templates to your environment and the rest of the article will focus on using Visual Studio to generate the document from the feeds.
The available source code for this project is attached as the final version.
Our First PDF Document
The first thing we are going to need is a site, if you don’t have one, create a new .NET 3.5 web application in Visual Studio and add a folder called ‘PDFs’ to it. Right click on this folder and ‘Add -> New Item…’. If everything is installed correctly, there should be a project item group on the left called Scryber, and within this a bunch of templates. Choose the PDF Document Template and rename it to CodeProjectFeed.pdfx then add it to the project.
The document should be opened in the XML editor as below:
<?xml version="1.0" encoding="utf-8" ?>
<?pdfx parser-mode="Strict" parser-log="True" ?>
<pdf:Document id="CodeProjectFeed"
xmlns:pdf="Scryber.Components, Scryber.Components"
xmlns:style="Scryber.Styles, Scryber.Styles"
xmlns:data="Scryber.Data, Scryber.Components"
auto-bind="true" >
<Styles>
<!-- Default page style -->
<style:Style applied-type="pdf:Page">
<style:Page size="A4" orientation="Portrait" />
</style:Style>
</Styles>
<Pages>
<!-- First page-->
<pdf:Page id="FirstPage" >
<Content>Hello World</Content>
</pdf:Page>
</Pages>
</pdf:Document>
This should automatically add the required references to the Scryber libraries or Scryber NuGet package, depending on the version – if not, you will need to manually add ‘
Scryber.Components
’ etc. from the install directory to the project references. The package will also add a number of extra sections to the web.config file to support scryber.Outputting the PDF
To actually generate the PDF file, we need to invoke it. So let’s put a button on our default.aspx page (or any other page) to raise a post back event.
<asp:Button runat="server" ID="GeneratePDF" Text="Generate"
OnClick="GeneratePDF_Click" />
In the code behind handle this event, by parsing the pdfx file and then generating the PDF onto the response of the post back.
using Scryber.Components;
.
.
.
protected void GeneratePDF_Click(object sender, EventArgs e)
{
using(PDFDocument doc = PDFDocument.ParseDocument("~/PDFs/CodeProjectFeed.pdfx"))
{
doc.ProcessDocument(this.Response);
}
}
Now if we run our project and open our browser to this ASPX page, we should see the ‘Generate’ button. Click this button and our PDF should be returned as an attachment to be opened in the default PDF reader application.
Code Level Support
Because Scryber has full support for code based document creation or modification, we could have also done this entirely in the page source code had we wanted to…
protected void GeneratePDF_Click(object sender, EventArgs e)
{
using(PDFDocument doc = new PDFDocument())
{
PDFPage page = new PDFPage();
page.PaperSize = Scryber.PaperSize.A4;
page.PaperOrientation = Scryber.PaperOrientation.Portrait;
doc.Pages.Add(page);
PDFLabel lbl = new PDFLabel();
lbl.Text = "Hello World";
page.Contents.Add(lbl);
}
doc.ProcessDocument(this.Response);
}
And we can also modify the contents after parsing if we need to:
protected void GeneratePDF_Click(object sender, EventArgs e)
{
using(PDFDocument doc = PDFDocument.ParseDocument("~/PDFs/CodeProjectFeed.pdfx"))
{
PDFLabel lbl = new PDFLabel();
lbl.Text = "Hello World Again";
lbl.Style.Fill.Color = System.Drawing.Color.Aquamarine;
lbl.Style.Position.PositionMode = Scryber.Drawing.PositionMode.Block;
doc.Pages[0].Contents.Add(lbl);
doc.ProcessDocument(this.Response);
}
}
…but let's leave the hand cranking for the moment.
Adding Some Real Content
It is easy to modify the content using the XML editor and because the Scryber installer added the schemas to the Visual Studio XML schemas library, we should also get intelli-sense for describing components.
For our RSS feed content, we are going to have a descriptive heading block and a group of items below. Delete the Hello World text and replace with the following between the
For our RSS feed content, we are going to have a descriptive heading block and a group of items below. Delete the Hello World text and replace with the following between the
<pdf:Page> <Content>
element:<pdf:Page id="FirstPage" >
<Content>
<pdf:Div style:class="heading" >
<pdf:H1 text="This is the Title" ></pdf:H1>
<pdf:Label text="And this is going to be the description" /><BR/>
Date: <pdf:Label text="Today" /><BR/>
<pdf:Label text="CopyrightOwner" ></pdf:Label>
</pdf:Div>
</Content>
</pdf:Page>
The syntax is similar to HTML. Save the document and hit the Generate button again as there is no need to rebuild the project. We should see the expected content rendered to a PDF.
This is static content and is all very well. But we need the data to come from the feed itself.
<rss version="2.0">
<channel>
<title>CodeProject Latest Articles</title>
<link>http://www.codeproject.com</link>
<description>Latest Articles from CodeProject</description>
<language>en-us</language>
<image>
<title>CodeProject Latest Articles</title>
<url>http://www.codeproject.com/App_Themes/Std/Img/logo100x30.gif</url>
<link>http://www.codeproject.com</link>
<width>100</width>
<height>30</height>
<description>CodeProject</description>
</image>
<copyright>Copyright CodeProject, 1999-2013</copyright>
<webMaster>Webmaster@codeproject.com (Webmaster)</webMaster>
<lastBuildDate>Tue, 02 Apr 2013 09:50:10 GMT</lastBuildDate>
<ttl>20</ttl>
<generator>C# Hand-coded goodness</generator>
<item d3p1:type="item" xmlns:d3p1="http://www.w3.org/2001/XMLSchema-instance">
<title>Square Root algorithm for C</title>
<description>Square Root algorithm for C.</description>
<link>http://www.codeproject.com/Articles/570700/SquareplusRootplusalgorithmplusforplusC</link>
<author>Edison Heng</author>
<category>C</category>
<pubDate>Tue, 02 Apr 2013 09:16:00 GMT</pubDate>
<subject />
<guid>http://www.codeproject.com/Articles/570700/SquareplusRootplusalgorithmplusforplusC</guid>
</item>
.
.
.
</channel>
</rss>
In our page, add an
XMLDataSource
pointing to the RSS feed URL, along with a bit of caching and an id
attribute.<data:XMLDataSource source-path="http://www.codeproject.com/WebServices/ArticleRSS.aspx"
cache-duration="20"
id="CPArticleSource" />
This specifies where the data comes from which can be a reference to a local file or a remote source. It can even be set in the code to some other file or even a custom loaded
XPathNavigator
on the XMLData
property before processing.Then wrap our heading
div
in a ForEach
component, specifying the id for the previously declared data source along with the XPath
to the root channel element in the feed.<data:ForEach datasource-id="CPArticleSource" select="rss/channel" >
<Template>
<pdf:Div style:class="heading" >
</pdf:Div>
</Template>
</data:ForEach>
To bind the data in the feed for each of the text components, use
XPath
expressions in the following format{xpath:any valid xpath expression}
. So our page content will be.<pdf:Page id="FirstPage" >
<Content>
<data:ForEach datasource-id="CPArticleSource" select="rss/channel" >
<Template>
<!-- start of heading -->
<pdf:Div style:class="heading" >
<pdf:H1 text="{xpath:title}" ></pdf:H1>
<pdf:Label text="{xpath:description}" /><BR/>
Date: <pdf:Label text="{xpath:lastBuildDate}" /><BR/>
<pdf:Label text="{xpath:copyright}" ></pdf:Label>
</pdf:Div>
<!-- end of heading -->
</Template>
</data:ForEach>
<!-- xml data source for code project rss feed -->
<data:XMLDataSource source-path="http://www.codeproject.com/WebServices/ArticleRSS.aspx"
cache-duration="20"
id="CPArticleSource" />
</Content>
</pdf:Page>
Save the changes and if we generate this with our button, we should see a PDF document similar to below:
Binding All the Items
Now there are the individual items in the RSS feed to include in our document. After the heading
div
, but before the end of the Template, nest the following ForEach
block.<!-- repeating rss item blocks -->
<pdf:Div id="CPAllItems" >
<data:ForEach select="item" >
<Template>
<pdf:Div style:class="rss-item" >
<pdf:H2 text="{xpath:title}" ></pdf:H2>
<pdf:Label text="{xpath:description}" />
</pdf:Div>
</Template>
</data:ForEach>
</pdf:Div>
<!-- end of repeating rss item blocks -->
We do not need to specify the datasource-ID because there is an existing data context that Scryber will use for binding. Save then generate, and we should have a set of items within the document output with all the text flowing nicely along the lines and down the page.
Overflow Onto a New Page
Looking at the generated PDF, we can see that not all the items are rendered because the
pdf:Page
is, by default, limited to generating a single page of content. There is however a pdf:Section
component that will allow generated content to flow onto more pages. If we change the declaration to a section, we should see all of the content rolling onto multiple pages:<Pages>
<!-- First page-->
<pdf:Section id="FirstPage" >
<Content>
<!-- Content as per the original page -->
</Content>
</pdf:Section>
</Pages>
A quick check of the output and we can see the result flows over 2 pages or more. (Don’t forget to save first.)
Adding Some Style
We have a document at the moment that has most of the information we need, but boy is it ugly. Adding some style is easy with Scryber.
All the information about how to render a document and its components is driven from styles in a very similar way to CSS. Most common style values can be defined as attributes on the actual elements, but it is much more powerful and flexible to define them at the top level of the document or in externally referenced files.
Let’s add a referenced style set. First select the PDF’s folder in our solution and Add -> New Item… From the Scryber group, select the PDF Styles template and call it CPStyles.psfx. Back in our pdfx file, add a reference to these styles at the top of the document and remove the style definition for the page. The source must be relative to the referencing document or absolute to the root of the site.
<Styles>
<style:Styles-Ref source="CPStyles.psfx" />
</Styles>
Styles are built in order of declaration and can be identified as applicable based on one of the following criteria
applied-class
, applied-id
, and applied-type
, and the specified criteria must be met for the style to be used on any component.- applied-class matches based on the
style:class
value on a component. Multiple values can be specified on a component by separating with a space. - applied-id matches based on the id of the component. Id’s must be unique within a single file (or template) but there can be multiple components with the same id in the final output.
- applied-type matches based on the runtime type of the component and follows the rules of the namespace prefixes referencing code namespaces and assemblies
A detailed analysis of styles is beyond the scope of this document, but the rest of this section shows how they can be used and how powerful they are. In CPStyles.psfx, remove any existing style definitions and add the following within the
Styles
element:<style:Style applied-type="pdf:Page" >
<style:Page size="A4" orientation="Landscape"/>
<style:Margins all="20pt" top="40pt" />
</style:Style>
<style:Style applied-class="heading" >
<style:Background color="#f90" />
<style:Fill color="black"/>
<style:Font bold="false" size="12pt" />
<style:Padding all="10pt"/>
<style:Margins bottom="20pt" />
</style:Style>
<style:Style applied-type="pdf:H1" >
<style:Fill color="white" />
<style:Font bold="false" size="30pt" />
</style:Style>
<style:Style applied-id="CPAllItems" >
<style:Columns count="2" alley-width="20pt"/>
</style:Style>
<style:Style applied-class="rss-item" >
<style:Padding all="5pt"/>
<style:Border sides="Bottom" color="#f90"/>
<style:Margins bottom="10pt"/>
<style:Font size="14pt" />
</style:Style>
<style:Style applied-type="pdf:H2" >
<style:Font size="20pt" bold="false" italic="false" />
<style:Fill color="#f90"/>
</style:Style>
We will go through each in a second, but for now save and generate the document. You should end up with something similar to the following:
As we can see, all pages are now in landscape orientation. The heading
Div
has an orange background. And all the content has a fill color of black and font size of 12 point. As these are inherited from the outer component, the H1
component within the heading Div
defines its own color, so this overrides the black. Within the CPAllItems
applied style, two columns are created and content flows down one column, and then onto the next before moving on to the next page. Within each rss-item
, we have added a bottom orange border and a H2
specific style for the CodeProject orange text fill color.Adding Some Images and Links
At the moment, it is looking pretty good, but it would be nice to include the CodeProject image and link to the items. We know that the image link is included in the RSS feed data:
<image>
<title>CodeProject Latest Articles</title>
<url>http://www.codeproject.com/App_Themes/Std/Img/logo100x30.gif</url>
<link>http://www.codeproject.com</link>
<width>100</width>
<height>30</height>
<description>CodeProject</description>
</image>
This can be used to add an image to our PDF. We want to put it at the right of the heading without affecting the flow of the rest of the content, and therefore we use relative positioning to put the image where required. As the page is A4 landscape (297mm wide), we can use this to define where to place the image. Scryber understands inches (in) and millimeters (mm) along with points (pt). Points are the default unit of measurement if none are specified. In CPStyles.psfx, add:
<style:Style applied-class="right-image" >
<style:Position mode="Relative" x="240mm"
y="2mm" width="30mm"/>
</style:Style>
And in the heading
div
, add:<pdf:Link action="Uri"
file="{xpath:image/link}" style:class="right-image" >
<pdf:Image src="{xpath:image/url}" />
</pdf:Link>
Viewing the final generated content shows us the logo to the top right with a link back to www.codeproject.com.
It is also possible to link to specific pages or named destinations in the current document, or even specific named destinations in other documents.
As a next step, we need to add a link to each article on CodeProject site from the item, and we can a link to the article below in the item block.
<pdf:Div style:h-align="Right" >
<pdf:Link action="Uri" file="{xpath:link}" new-window="true" >
<pdf:Label text="more..." />
</pdf:Link>
</pdf:Div>
The right alignment is an inline style that applies to the content within a block component rather than applying to the actual component itself.
You should now have the complete document dynamically loading all your content and rendering beautifully in a PDF reader.
Almost there, but there are a couple of things to do for completeness.
Stop Serving Files, Start Serving PDFs
This is done automatically for those using the VSIX / NuGet packages. The modifications were made to the web.config file when adding the package reference. For these projects you can already add a link to point to your pdfx document and the generated file will be downloaded to the browser. Try it now in your web page....then you can move on to Reusing Content.
<a href='PDFs/CodeProject.pdfx' >My Code Project PDF</a>
For those using the .NET 3.5, VS 2010 installer, then the instructions below will help achieve the same thing.
<a href='PDFs/CodeProject.pdfx' >My Code Project PDF</a>
For those using the .NET 3.5, VS 2010 installer, then the instructions below will help achieve the same thing.
CodeProjectFeed.pdfx and CPStyles.psfx are actual files in the virtual directory. At the moment, if we point our browser to them, then the content will be served. In our case, http://localhost:3058/PDFs/CodeProjectFeed.pdfx.
Whilst this does not form a major security risk now, it may be that we start to have references to XML files on our system or other sensitive information. It therefore needs to be blocked from being served via IIS. The easiest way to do this is the same way ASCX files are not served from IIS, and add handlers to the web.config. Scryber installed a helper config file in the install directory (‘C:\Program Files (x86)\Scryber\v0.8\Configuration’ by default). Open this file and lots of juicy configuration options are available, but at the bottom are the
<system.web>
and <system.webServer>
sections that contain the handlers that should be used to block content in your web application. Copy the httpHandlers
and handlers
sections in turn and add them to the existing sections in your web application configuration.<system.web>
<httpHandlers>
<add path="*.psfx" verb="*" type="System.Web.HttpForbiddenHandler" />
<add path="*.ppfx" verb="*" type="System.Web.HttpForbiddenHandler" />
<add path="*.pcfx" verb="*" type="System.Web.HttpForbiddenHandler" />
<add path="*.pdfx" verb="*" type="System.Web.HttpForbiddenHandler" />
</httpHandlers>
</system.web>
<system.webServer>
<handlers>
<add name="Scryber.Styles" path="*.psfx" verb="*"
type="System.Web.HttpForbiddenHandler" />
<add name="Scryber.Components.Page" path="*.ppfx"
verb="*" type="System.Web.HttpForbiddenHandler" />
<add name="Scryber.Components.UserComponent" path="*.pcfx"
verb="*" type="System.Web.HttpForbiddenHandler" />
<add name="Scryber.Components.Document" path="*.pdfx"
verb="*" type="System.Web.HttpForbiddenHandler"/>
</handlers>
</system.webServer>
Now when you navigate to the document or style files you should get a blocked content message. This will not affect the server side loading of content or referenced content (provided the referenced paths are relative).
Note: If you are not yet using IIS Integrated Pipeline, then you will have to alter the IIS settings to so that requests for psfx, ppfx, pcfx and pdfx files are passed through to .NET.
Start Serving Files
If we can block content from being served, then we can also allow content to be served, and Scryber supports this too. If you change the handler for pdfx in both sections from the
HttpForbiddenHandler
to the following:Scryber.Web.ScryberPDFHandlerFactory, Scryber.Components, Version=0.8.0.0,
Culture=neutral, PublicKeyToken=872cbeb81db952fe
So you should have a
httpHandlers
section as below:<httpHandlers>
<add path="*.psfx" verb="*"
type="System.Web.HttpForbiddenHandler"/>
<add path="*.ppfx" verb="*"
type="System.Web.HttpForbiddenHandler"/>
<add path="*.pcfx" verb="*"
type="System.Web.HttpForbiddenHandler"/>
<add path="*.pdfx" verb="*"
type="Scryber.Web.ScryberPDFHandlerFactory, Scryber.Components,
Version=0.8.0.0, Culture=neutral, PublicKeyToken=872cbeb81db952fe"/>
</httpHandlers>
And for your handlers section:
<handlers>
<add name="Scryber.Styles" path="*.psfx"
verb="*" type="System.Web.HttpForbiddenHandler"/>
<add name="Scryber.Components.Page" path="*.ppfx" verb="*"
type="System.Web.HttpForbiddenHandler"/>
<add name="Scryber.Components.UserComponent" path="*.pcfx" verb="*"
type="System.Web.HttpForbiddenHandler"/>
<add name="Scryber.Components.Document" path="*.pdfx" verb="*"
type="Scryber.Web.ScryberPDFHandlerFactory, Scryber.Components,
Version=0.8.0.0, Culture=neutral, PublicKeyToken=872cbeb81db952fe"/>
</handlers>
You should be able to navigate directly to CodeProjectFeed.pdfx from your browser, and the content will be generated for you.
Now any link on your site pointing to CodeProjectFeed.pdfx will generate the dynamic document.
Reusing Content
One of the great benefits of Scryber is the splitting of components and referencing. Pages can be referenced and included in multiple documents, and components can be referenced and reused in multiple pages. With our project, we can add a new CPFeedContents.pcfx PDF User Component Template and then reference this in our document instead.
Cut the contents from within the first
ForEach
template section and replace with a component reference.<pdf:Section id="FirstPage" >
<Content>
<data:ForEach datasource-id="CPArticleSource" select="rss/channel" >
<Template>
<pdf:Component-Ref source="CPFeedContents.pcfx"/>
</Template>
</data:ForEach>
<!-- xml data source for code project rss feed -->
<data:XMLDataSource source-path="http://www.codeproject.com/WebServices/ArticleRSS.aspx"
cache-duration="20"
id="CPArticleSource" />
</Content>
</pdf:Section>
And add the cut content into the component file.
<pdf:UserComponent id="FeedContents"
xmlns:pdf="Scryber.Components, Scryber.Components,
Version=0.8.0.0, Culture=neutral, PublicKeyToken=872cbeb81db952fe"
xmlns:style="Scryber.Styles, Scryber.Styles,
Version=0.8.0.0, Culture=neutral, PublicKeyToken=872cbeb81db952fe"
xmlns:data="Scryber.Data, Scryber.Components,
Version=0.8.0.0, Culture=neutral, PublicKeyToken=872cbeb81db952fe" >
<Content>
<!-- start of heading -->
<pdf:Div style:class="heading" >
<pdf:H1 text="{xpath:title}" ></pdf:H1>
<pdf:Label text="{xpath:description}" /><BR/>
Date: <pdf:Label text="{xpath:lastBuildDate}" /><BR/>
<pdf:Label text="{xpath:copyright}" ></pdf:Label>
<pdf:Link action="Uri"
file="{xpath:image/link}" style:class="right-image" >
<pdf:Image src="{xpath:image/url}" />
</pdf:Link>
</pdf:Div>
<!-- end of heading -->
<!-- repeating rss item blocks -->
<pdf:Div id="CPAllItems" >
<data:ForEach select="item" >
<Template>
<pdf:Div style:class="rss-item" >
<pdf:H2 text="{xpath:title}" ></pdf:H2>
<pdf:Label text="{xpath:description}" />
<pdf:Div style:h-align="Right" >
<pdf:Link action="Uri"
file="{xpath:link}" new-window="true" >
<pdf:Label text="more..." />
</pdf:Link>
</pdf:Div>
</pdf:Div>
</Template>
</data:ForEach>
</pdf:Div>
<!-- end of repeating rss item blocks -->
</Content>
</pdf:UserComponent>
When we generate again, there should be no visual difference to the PDF document, but we now have the main document content in a separate file. It would be a simple matter to add 5 other PDF documents using the same layout for the other CodeProject category feeds. http://www.codeproject.com/Info/opml.aspx.
Scryber Development Status
At the moment, Scryber is very definitely beta. There are plenty of ‘undocumented features’ to sort out, but we are getting there, and there is usually a different way to achieve what is needed if it doesn't work initially. A forum has been set up here where we would really like to here issues, fixes and anything else about the libraries. We will also be creating more components for Scryber such as tables, lists, and graphic paths along with SQL and Object sources, security, forms, JavaScript actions, etc. all to come. Documentation is also limited to this article and a read me, but we hope the intelli-sense and templates help, and it will be growing over time.
A Bit About the Badge
You will have noticed by now that there is a badge on every page of your document – “generated by Scryber”. You can alter the style of the badge and move it around the page using the
<styles:Badge>
options but you cannot get rid of it, without breaking in to the code. Scryber is open source and free to use, but we do ask that the badge is rendered on the page. If you need to remove the badge for some reason (and it is usually a commercial reason), then we do ask you to buy a license to do this. It has taken a long time to build Scryber, and we have only touched the surface of what is possible with the libraries. We think the badge is the best way to help us to continue to work on Scryber and improve it.Article History
- 1.4 - 1st April, 2014
- Updated the installation options to include the VSIX and NuGet package options
- (Hopefully) changed the article location within code project to a more accessible path
- 1.3 - 6th October, 2013
- Updated the
PDF:Link
element as it no long requires the innerContent
element
- Updated the
- 1.2 - 23rd April, 2013
- Added the final config entries based on reader feedback, wrapped the
PDFDocument
processing in ausing
statement for best practice
- Added the final config entries based on reader feedback, wrapped the
- 1.1 - 10th April, 2013
- Modified spelling mistakes
- 1.0 - 5th April, 2013
- Initial publication