Introduction

The history of JsaPar

In the year 2008 Jonas Stenberg - the creator of the JsaPar library - decided to develop a Java library for processing fixed width and delimited separated value files. At that time there where a few Java libraries available that could process these kind of files, but not to the satisfaction of the creator of the JsaPar library.
As with all fixed width and delimited separated value files, it's hard to write a pure Java implementation in code without getting messy and create unreadable code. Each time a developer needs to process these kind of files, different requirements and specifications apply which causes that the Java code is always specific and applicable for that situation only!

With complex fixed width or delimited separated value files comes a lot of processing complexity which puts a burden on the shoulders of a developer who is responsible for the correct processing of these kind of files. Not only the design and coding process is a hard task to do, but also the maintenance of such a specific file processing solution is time consuming. Especially when - due to external regulations - the file specifications change over time and need to be redesigned and modified.

The creator of the JsaPar library decided to try something different: design an universal piece of code which can be configured for different kind of situations, but without the cumbersome and repetitive coding burden in Java. In this way, the developer is relieved of programming dozens of lines of burdensome Java code and is left with programming only a few Java lines and configuring the processing library using a XML file. Now the library does the hard part and the developer does the easy part: the JsaPar library was born.

What is JsaPar?

JsaPar stands for Java schema based parser. It is an open source Java library and is written with one purpose in mind: easily process fixed width and delimited separated value data sources by specifying a schema definition in which the structure of the to be processed data source is specified using the extensible markup language (XML). It moves the code complexity of processing fixed width and/or delimited separated value data sources into a Java library and provides the developer with a more general Java interface (API) to handle the input and output between those files and Java objects.

The JsaPar library is written for a Java 5.0 or higher virtual machine and can be used in one of the following ways:
  • Document based, or
  • Line based using Java events (according to the Delegation Event Model)
The design of the library is focused on providing the user with a parser/producer that has great flexibility and is easy to use. Simplicity and extensibility to the user is seen as the main purpose of this library. This is reflected by the simple configuration options that can be specified within a single XML file, or within the Java code itself.
The library is released under the Apache License, Version 2.0, and thus can be used in both open source as well closed source (e.g. commercial) software.

What does JsaPar solve?

It doesn't solve the programming task(s) of the developer, but it changes the focus of the programming task from "how to process this data?" to "what to do with this data?". The actual processing of the data still needs to be processed by the client code, but the boring tasks of writing the glue logic between the Java world and the fixed width or delimited separated value data source(s) world is made much easier than before. See it as a shift in responsibilities: the library does the tricky read/write work so you as a developer only have to focus on how to deal with the data that is being read or written.

How does JsaPar work?

By specifying an xml file that is based on a particular xml schema for fixed width or delimited separated value data sources, the developer instructs the library with important information, like: what kind of data source should be read or written, what the content of the data source looks like, what kind of records can be detected in the data source and if there are any controlcharacters which must be taken into account when processing the data source. The schema definition describes the layout of the data source for the library, so that the developer is exempted from writing Java code for processing each individual line in the data source itself.
The library uses the schema definition to translate the data source that needs to be processed into Java Document, Line and Cell objects, which are actually place holders for the read or written lines of the data source. These Java objects (POJOs) are then used in conjunction with the Java client code for further processing within the client code.
The Document object covers the entire structure of the data source and holds all the read lines as Line objects. Each Line object is made up of Cell objects of different types and with different formattings. For huge files (in the amount of Gigabytes), the library has a special feature to work event based so that a Java Document object doesn't have to be constructed in memory at all. This is a major advantage over other third party libraries that do need to load the complete data source into memory first before any operations can be applied to the read or written data sources(s).

What are the capabilities of JsaPar?

The JsaPar library currently supports:
  • reading and writing of fixed width data sources (a.k.a. flat data sources) with fixed positions.
  • reading and writing of delimited separated value files (a.k.a. CSV data sources).
  • localization.
  • inputs are specified as a java.io.Reader. Outputs are specified as a java.io.Writer. This means that it is not necessary that files have to be parsed or generated.
  • processing small data sources by using a Document object for the representation of the data source in memory.
  • processing large data sources by sending events for each line that has been processed successfully. The processing of large data sources (for example: files in the amount of 10 Gigabytes) is done without loading the entire data source into memory.
  • the Document class can be transformed into a Java object (using reflection) if the schema is carefully written.
  • Java objects can also be produced directly from the parser.
  • the schema can be expressed with xml notation or created directly within the Java code.
  • the file parsing schema contains information about how to parse each cell within a line regarding the data type and syntax.
  • parsing errors can be handled by throwing exceptions at the first error or the parsing errors can be collected during the parsing process so that the client code can deal with these errors later.
  • the Document class can be built from a xml file (according to an internal xml schema).
  • It is possible to convert a list of Java objects into a file according to a schema if the schema is carefully written.
  • a low memory footprint of the library itself (e.g. the JAR file) which is about 137 kBytes total. This low memory footprint is mandatory for applications that need to be small in size, like apps for mobile devices.
  • a low runtime memory footprint when working with events (about 25 kByte in total).

What are the limitations of JsaPar?

Known limitations are:
  • The controlcharacter has to be the first character in the line. This means that if your record structure contains the controlcharacter at another position than the first position in the line, you cannot process the records by using the controlcharacter as a detection mechanism for the recordtype.
  • When dealing with master-slave (a.k.a. multi-line) records, some records cannot be processed when the slave-records also have some sort of controlcharacter which decide if any follow-up slave-records are different than the current one.

Does JsaPar have any dependencies?

The JsaPar library has no dependencies to other external libraries in runtime. The provided test classes require JUnit version 4.x in order to run. As of version 1.6 of the JsaPar library, the binaries in the download package are built with Java 1.7 and with target compatibility level 1.7. Previous versions - before version 1.6 - are built with Java 1.6 but with target compatibility level 1.5. If you want to use the latest version of the JsaPar library with an earlier version of Java, then you have to download the source and compile your own version of the JsaPar library. It is no guarantee that the library will work for versions of Java prior to 1.5.


Where to get the latest version of JsaPar?

The latest version can be downloaded from the JsaPar project home, which can be found here: http://jsapar.tigris.org/. The JsaPar library is also available within the central Maven repository as of version 1.5.0 and higher.

How to get started?

First of all, you should read the setting up article on this website to help you set up your development environment (Eclipse IDE) and project to use the JsaPar library. Once you have set these up, you can start reading the getting started article on this website, which will introduce you to the basic API calls of the JsaPar library in conjuction with fixed width and/or delimited separated value data sources. After you have gained knowledge of the API use of the library, you can read the basic features article and the advanced features article on this website for further help and assistance, which will demonstrate the versatility of the JsaPar library in more complex situations. For help with processing large files, you can read the working with events article on this website. This will teach you how to use the event mechanism of JsaPar in conjunction with your own code. The article handling errors discusses the options you have when file errors and/or document errors occur or when exceptions are thrown from within the JsaPar library. The article document schemas describes how to design your own xml schema within XML or from within your Java code to instruct the JsaPar library on how to process the data sources(s).

For your convenience, this website provides also how to's which cover small grounds and focus specifically on one task. As a last resort you might find the answer you are looking for in the frequently asked questions page of this website.

As with all examples that are covered on this website, they can be downloaded from the code samples page as a ZIP-format archive. Free available software programs to unzip these archives can be found using the search-engine Google, or you could grab your free copy of 7-Zip to unzip the archives.


Yours faithfully,
     JsaPar Developer.