POS TLog Parser


I am releasing this parser I coded in perl under the GPL.  Hopefully, it will be useful to someone who does not work at a backwards company.  My company went out of business sucking up to Bill Gates and Micro$oft, and has foregone all claim or interest in this software.

This whole project was (for me anyway) a demonstration of the superiority of open-platforms/open-source software.  At my company, we had hundreds of IBM's 4690 POS (Point Of Sale) systems across the country running IBM's Supermarket application (no, it's not a supermarket - just retail).   The numerous registers send transactions in real time to this application which appends them to a Transaction Log (henceforth, called TLog) in a proprietary IBM format. 

The trick for me was discovering how to unpack "packed-decimal" fields in perl.   IBM produces a spec for the default record layouts, but leaves plenty of options for customization.   Project inception, the company purchased numerous high-dollar commercial and contractual services to put this together.  The TLog parsing job went to [a commercial company].  I had to match enough of that layout to be compatible with the database infrastructure we had already built:
1) Stores TLogs go in separate directories, labeled "Store_###", under a defined root directory.
2) TLog files accumulate in each directory.  They have numerical names which grow sequentially.
3) TLogs are parsed into separate files, each representing a different record type. 
4) Description of the output files and layout is in an *.ini
5) A keyed binary kept tract of the last TLOG sequence parsed.

The Nightmare of Commercial Software

In practice, this approach had numerous drawbacks.  The program was not stable and frequently hung, with no indication of what was wrong.  The program was not robust and would crash and output garbage for any minor discrepancies in the data.   Fixing broken parsing runs was difficult and often caused more problems ; editing the binary START file often failed.   The compiled program exhibited typical problems representing poor programming practices - memory leaks, hangs, slow downs.   Evolving business processes meant new records and fields were constantly being added to the tlog.  Each minor fix or enhancement turned into an expensive, delayed support request often resulting in unexpected new problems requiring a complete end to end QA.

A Better Way

Developing a perl solution to duplicate [the commercial product] took me about 2 weeks on linux, except the company required it to run on Windows.  Porting to Windows (and making it production ready) took another 6 weeks.   First I had to duplicate the output of [the commercial product] for every conceivable record (required adding silly hacks to my clean code to duplicate their bugs), and then modify it for new TLog requirements.   [the commercial company] could not deliver a reasonable estimate for the same enhancement to their code, so my script replaced it, about 3 months after I had started the development.   Of course, done right, my script offers many improvements over [the commercial product]:
1) The process is split into two distinct parts - an input filter which converts TLogs into simple text records, and an output filter which generates plain text in csv formatted files.  The input filter duplicates the TV++ (a Windows program) functionality (altho better) and can be used for many other purposes.
2) The perl process has been completely stable - it has never crashed or hung.  Believe it or not, the perl process is about twice as fast as the compiled  [commercial product].  The DBA's are happy because they can call it as needed from MS SQL Server DTS packages.
3)  Both input and output formats are described by .ini files.  Unlike [the commercial product] they are functional descriptions of the formats.  The data definitions drive the logic of the process.  95% of the changes made in the TLogs have been handled by simple changes (by non-programmers) with any file editor to the ini files.
4) Consequently, changes are free and take only a few minutes.  (instead of $50,000 and 3 months!)
5) A simple text file called START specifies the last TLog processed and byte count.  (Hint: if the same TLog grows it is re-parsed.  A parallel project parses current day TLogs thru-out the day)  It is easy to change this file and parse any number of previous days tlogs, eg., if a format changes and the ini needed to be updated.  Note - corrupted or different formats are just ignored and parsing continues, unlike [the commercial product] which would just crash.
6) Built into all my scripts is a complete debugging capability.  This proved invaluable to resolve parsing problems and isolate bizarre data from the POS controllers.   There was no such feature in [the commercial product].

Open Source vs. Proprietary Commercial Software

No legal encumbrances.  Vastly superior development and maintenance model.  Faster, more stable and robust, less error prone.  Cross-platform.  Re-useable components.  Complete control for no marginal costs.

Clearly you can see why management was so upset with my new program.  I had been an open source advocate at the company for 8 years.  This program was an opportunity to prove it, and it succeeded spectacularly.   As soon as they realized what I had done, they tried to replace it.  They tasked a team member to duplicate [the commercial product] with another commercial tool (some expensive data mapping software called Mercator).  Nine months later with the help of numerous others and vendor support, he had a working parser.   Tests were not convincing as the [the commercial product] output could not be duplicated.    Now 3 years later, M$ has footed the bill to duplicate the parsing process again in SSIS for Yukon:  Project Real

Makes you wonder why they were so eager to jump out of the Ferrari they owned and back into a rental Pinto.

contact me - kdavenpo at tx dot rr dot com