Real Systems Do Extracts - Part 5
As you've heard repeatedly in this blog, Real Systems Do Extracts. In our OLS.Switch implementations, we've take the pains to grind through the details of our extract and reporting requirements at the most granular level possible. That's resulted in a good deal of stability. We're working from a strong base: I'm thankful to Alejandro for his brilliant, simple and extensible design of this important application component. It's reflective of the key to our working relationship: it works best when I describe the business requirements without providing design input. I come, in a very unconscious way, encumbered with a fair degree of baggage of legacy payment systems. Much as a try to shake off that bias, it comes through on many of my suggestions. The beauty (and it is a beautiful thing...does that make me a geek? I suppose so) of Alejandro's design is something I never could have envisioned. There's a lesson there for sure on how to best build your team to take on jPOS-fueled payment systems projects.
Unfortunately, real systems blow up, too. And, most typically, they like to crater at, oh, 2 AM. [NOTE: I'm only referring to the batch job ending abnormally. Our OLTP engine kept chugging away.] At our flagship client, the last time we had a blow-up in the nightly batch cycle was early August, 2007...eight and a half months of interruption-free sleep. You can't brag about records like that, because I think the damn thing can hear me and show me who's boss by going south that very night.
Well, the skein was emphatically broken on Thursday morning. Bottom line: this is IT. And, in the words of Forrest Gump, it happens (see pop-up at left). We talked about not blowing up the extract in such a destructive way, but instead noting the specific exception, excluding the record and moving on. We definitely don't want to do that..these are material items we'd be excluding and perhaps forgetting forever if we're not diligent in checking the exception report each morning. As it is, the extract runs hands-off and requires no eyeballing in the morning. I'll take the pain of the occasional 2 AM phone calls. These incidents find the weak points & give us an opportunity to firm them up. For the record, I've posted my analysis of what happened (with some masking of real client and file names). I think anyone who has taken one of these calls can appreciate that what we did at 2 AM - 3 AM is triage. The 'real fix' ideas are something you do as follow-up within the next couple of days.
We're going to do a couple of additional things to clarify the fallout from the abnormal ends of our extracts:
- Right now, when the extract is in process, all outputted files have *.tmp suffixes. When complete, they get *.txt (or *.csv for some) suffixes. This is a little technique that allows our clients' DBAs to key on this suffix transformation for the moment when they can PGP-encrypt and ship files via FTP to their external authorizers. The issue is that the transformation happens even when the extract ends abnormally, resulting in the FTP-ing of half-baked files. This is an irritant more than anything else - these error out at FDR, AMEX, etc. because the files lack file trailer records. We'll make a revision so that if there's not a normal ending, the files will get *.bad suffixes instead.
- The error (see my analysis) is something that gets logged inside our main q2.log right now. Depending upon the nature of the error, that can be hard to spot sometimes because we've got all the OLTP stuff swamping it. We're working to separate all our batch components into a separate JVM (subject of a future post!) and at that time we'll see the error spotlighted in a smaller batch-only version of the q2 log. Alejandro is going to take the separation one step further and write the error to a separate file.
- We'll also make some revisions to reflect the ID (row) of the record where we stumbled as well as give some guidance as to the field that blew us up. As you can see from my attached analysis, it took a bit of detective work to find the offending record and field this past Thursday!
Comments