« March 2008 | Main | May 2008 »

15 posts from April 2008

Wednesday, April 30, 2008

Mechanical instructions on "Continuations"

Here are some mechanical instructions on implementing jPOS’ “transaction continuations” feature.

For an explanation as to why we use continuations in our OLS.Switch jPOS-based implementations, refer to my write-up here (in a nutshell, we're trying to solve a specific concerns about smaller endpoints plugging up our switch and wreaking havoc on the more heavily-trafficked endpoints like FDR North and AMEX).  Alejandro introduces the ‘Continuations’ concept here.

Our main server listens to the incoming port like this (full contents of our 50_ev_server.xml)

<ev-server class="org.jpos.ev.Server" logger="Q2">
<property name="port"  value="33000" />
<property name="space" value="tspace:default" />
<property name="queue" value="RACEV.TXN" />
<property name="timeout" value="60000" /> </ev-server>

Then, in our *main* transaction manager (30_ev_txnmgr.xml), we take items off that queue like this:

<ev-txnmgr class="org.jpos.transaction.TransactionManager" logger="Q2"> <property name="sessions" value="96" />
<property name="space" value="tspace:default" />
<property name="queue" value="RACEV.TXN" />
<property name="debug" value="false" />

(those are the production settings, BTW)

Then, within the context of a 'switch' participant in 30_ev_txnmgr, we choose to move the processing of some transactions to another TransactionManager:

<!-- GreenDot Transactions -->
<property name="S.43" value="ForwardToGreenDotTxnMgr" />
<property name="S.44" value="ForwardToGreenDotTxnMgr" />
<property name="S.45" value="ForwardToGreenDotTxnMgr" />

Then, later in 30_ev_txnmgr, we have this group:

<group name="ForwardToGreenDotTxnMgr">
  <participant class="org.jpos.transaction.Forward">
   <property name="queue" value="GreenDot.TXN" />
   <property name="timeout" value="60000" />
  </participant>
</group>

Once we do that, we've freed up the session in the main transaction manager to service a new request off the server queue.  So, we shift the risk of slow endpoint response to an isolated transaction manager.

Then, finally, we have 30_greendot_txnmgr.xml, which starts off like this...

<greendot-txnmgr class="org.jpos.transaction.TransactionManager" logger="Q2"> <property name="sessions" value="16" />
<property name="queue" value="GreenDot.TXN" />
<property name="debug" value="false" />

<participant class="org.jpos.ev.Switch" logger="Q2" realm="switch">
<property name="S.43" value="CreateSV GDotActivate" />
<property name="S.44" value="CreateSV GDotRefresh" />
<property name="S.45" value="CreateSV GDotDeactivate" />
</participant>

Now featured on Glenbrook's Payment News

Gplogohorizontal Glenbrook Partners' excellent Payment News site now features a new page that provides its readers a "current summary of the latest content from some of our favorite payments and banking blogs based upon their RSS feeds."  Dave, Alejandro and I appear on the list.  Our thanks to Glenbrook's Scott Loftesness, who put out the call and compiled the list.

Saturday, April 26, 2008

I want my DTV (New Initiatives, Part 1)

Here's the first of a series of three posts letting you know about some of the new initiatives we're tackling for our OLS.Switch clients.  This one has to do with compliance to the National Telecommunications and Information Administration ('NTIA') Coupon Program. 

The subtext to these posts drives to the heart of why, as a manager, you want run your own payment switch.  In short, you control your own destiny.  You can respond to any new initiative with alacrity, rather than fret and concern about when your processor or vendor will get around to thinking about you again (and looking rather ineffectual when you try to explain to your line-of-business partner why it'll be another nine months and/or $200k to even get mindshare).  With the cost of relevant hardware being a commodity these days, there's been a dramatic enlarging of the circle of enterprises who can and should consider running their own payment switch.  As I've mentioned ad naseum on these pages, we've got a client running over 1,000,000 transactions a day at peak (soon to be their 'new normal') and doing it on $28,000 of core server technology.  I recall the not-too-distant days when a similar-sized enterprise required a $14m initial investment in hardware, software and other build-out considerations in order to get into the game.  Moore's Law + powerful open source building blocks like jPOS + convergence towards Intel-based server computing + the flexibility offered by JVM-based computing + knowledgeable payment systems professionals = why the hell wouldn't you consider running your own switch?   

So, in this case, our client is sitting running their version of OLS.Switch.  Among various other facets of functionality, the thing runs three-quarters of a million transactions a day through a Debit/EBT/Offline Debit/Credit payment gateway to FDR North.  Now comes the NTIA requirement.  Instead of me fumbling through a re-explanation of the program, here's a nice program summary from Visa:

After February 17, 2009, the Federal Communications Commission (‘FCC’) will require full power television stations across the nation to cease analog broadcasting and begin broadcasting solely digital transmissions. An estimated 20 million American households rely exclusively on over-the-air broadcasting received by analog televisions not connected to cable or satellite services. In order to continue to receive television programming, this change will require the acquisition of a digital-to-analog converter box, a digital television, or cable or satellite television service.

To help consumers defray the cost of acquiring converter boxes, the U.S. government has authorized the NTIA to create a digital-to-analog converter box coupon program. After January 1, 2008, the NTIA will issue up to 33.5 million electronic coupon cards with a value of $40 to be used toward the purchase of a Coupon Eligible Converter Box (‘CECB’). These cards will be distributed with open eligibility on a first-come, first-served basis initially. Once initial funds are expended, more coupons will be available to over-the-air-only households.

Retailers choosing to participate in the sale of these digital-to-analog converter boxes may want to accept and redeem these coupons, which will be issued in the form of a non-branded plastic mag-stripe card. In total, there are six coupon redemption alternatives that are available for retailers to choose from.  VisaNet will offer support for three of the six coupon redemption alternatives.

[For more information, see Visa Business Review, October 2007.]

Visa_ntia_123 As mentioned in the Visa passage, the NTIA program offers six approaches to coupon redemption.  Visa supports three of them:  UPC@Auth; UPC@Clearing; and Sales Detail Reporting (see pop-up at left which depicts Visa's compare/constrast look at the three options).  Our client was looking for the approach that was the least distruptive to its store systems infrastructure.  UPC@Auth would require the store systems team to change their interface to OLS.Switch to start providing us the UPC info.  So, that's out (too high on the change-o-meter, especially with all the other important initiatives going on).  And, in the OLS.Switch operating model - where we do host draft capture ('HDC') from our framework for nightly extract to FDR North - there's essentially NO difference impact-wise between UPC@Auth and UPC@Clearing.  We would need to collect the UPC at auth time either way.  Sales detail reporting is the winner here:

  1. Auths can come in just as they do today.  The coupon comes into OLS.Switch as a Visa or MasterCard-looking auth for $40.
  2. We put a small change into OLS.Switch to reject the transaction locally if != USD 40.00.
  3. We also made some entries to our BIN range table to identify the TV Converter transactions (we flag them with a specific 'cardBrand' of "TV").  These changes are made real time and require no down-time.
  4. We route all transactions where cardBrand = "TV" out via our already-in-place multi-channel jPOS MUX connection to FDR North.
  5. FDR North routes to TSYS, who - to the best of my knowledge - is providing online authorization of the TV Coupons on behalf of NTIA contact award winner CLC (the erstwhile Corporate Lodging Consultants).
  6. We record the authorization response in OLS.Switch's tranlog, then extract it that evening for inclusion into nightly settlement activity.  [In this case, we wrote a new internal extract program extension to create a new file for delivery to our client's Data Warehouse.  These are all Approvals where cardBrand = 'TV'.]
  7. [Not related to OLS.Switch...] Our client matches up this sales data we provide with a second asynchronous internal data stream from the stores that is already capturing UPC data; then, it will merge this data into something it'll deliver daily to CLC.

Binrange Only in Step 6 were we required to make any programming change, and that was 20 minutes to change, test and ready for prod branch inclusion.  In fact, when the Coupon program was under discussion with our client, FDR and CLC, there was a big conference call about how to approach, how to test, which option to choose, would the transaction get through FDR without a UPC, etc., etc., etc.  Lots of talking.  Knowing how OLS.Switch would handle this, I recommended that our client immediately add the 'TV' to the cardtype/binrange tables and then quickly walk down the street to the store and try it in production and see what happens (see resulting full table at left - this is the cardtype + binrange tables presented as a cohesive whole in the OLS.Switch UI, which leverages the strengths of the jPOS-EE offering).

That's exactly what they did.  Later that day, FDR was able to confirm that they passed the transaction along successfully to TSYS, and CLC got TSYS to confirm that they'd received and processed the transaction without issue.  Our client shared a good line with me that "this is the first time we went into production without certification...without testing, in fact!"  And, here, it was a good thing.  Indeed, it's been another month or two ensuring proper registration with the right governmental agencies.

Tuesday, April 22, 2008

Next release and beyond

At any given time, we've got a lot going on at our OLS.Switch payment system client sites - new projects, conversions, fixes, ideas for enhancements or better operations...you name it.  If you're converting off of legacy payment application platforms, my experience is that the initial conversions are only the beginning of your journey.  Churchill said it best: "Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning."

Some times it's easy to get a little overwhelmed by the rate of change.  I try to view it as a vote of confidence in our platform and the OLS team.  Still, these things can easily slip away from you control-wise.  As a example, since last Friday alone (not even three full working days), there's been over 50 separate changes inside of the SVN repository we maintain for one of our clients.  I'm in there doing stuff for our FSA project (subject of a future blog post).  Mike is making some reporting enhancements.  Dave is driving the MethCheck project and adding some other features.  Chuck is upgrading Discover from their v07.1 settlement format to v08.1.  Plus, we've always got some excellent ideas in process courtesy of Alejandro...invaluable stuff I'll categorize as 'jPOS Best Practices.'

To try to make some semblance of all this activity, I keep a document under constant revision at our team's Basecamp site called 'Next Release and Beyond.'  This is implementation-specific (i.e., client-specific).  I showed it to Alejandro, whose reaction was "it's a confidence builder for the customer and a great guide for the development and operations team."  I was very pleased by that, because that's my intent.

The document tends to be very fluid, but in general you can see I have four categories:

  • Next Release - The changes in the the next planned release.  When I wrote the version you see referenced in this post, we were less than 48 hours from the proposed test slot; so, we're fairly committed to the items you see here.
  • Next Next Release - These tend to be items where we've done development- and internal test-wise, but there's some external factor that will push the release out by a week.  Here, we need some coordinated activity from GreenDot on the first item and DBA involvement on the second item.
  • Beyond (Works in Progress) - Just like it says, these are items where we've made some type of formal check-in into our SVN trunk, but there's development and/or testing work left to go.
  • Beyond Beyond (Works in Discussion) - Here, we've kicked around some ideas and maybe even SVN-ed things into our internal Incubator repository (no visibility by OLS clients) or tried out concepts, but this is gestation-stage stuff with no formal SVN check-in into the client space.

Hello, Cracow

Cracow Part of what motivates my blog-writing is the readership I get from around the world.  Thanks so much for your interest. 

One reader - or group of readers - holds some special interest at the moment: those from Cracow, Poland.  My OLS colleague, Dave Bergert, will be in Cracow from late June through mid-July.  He'd love to meet up while there with anyone interested in chatting about jPOS, payment systems, and specifically payment systems security.  Though the last topic is Dave's specific area of expertise, he's done a wide range of payment systems projects, both issuer- and acquirer-side, and is a great resource.  To set something up, you can contact Dave here.

Saturday, April 19, 2008

J2SE upgrades and jPOS

Here is a quick observation here on the effect of taking your jPOS environment from a J2SE 1.4.2 to J2SE 5 JVM.  Sun is EOL-ing 1.4.2, so we recently helped a client upgrade to JRE5.  When we did it, I noticed that our nightly extract/report cycle sped up by 20 - 25% each night.  That's not insignificant because I was growing a bit concerned about the length of these jobs - we write more than five million lines of intricately determined output each night at our largest OLS.Switch implementations.  Additionally, batch jobs like that only run on a single CPU, so they can only go so fast.  So, gaining back over 30 minutes on our largest days is something that pleases me to no end. 

A more casual observation about online performance:  On Friday, April 11th at that same installation we had a problem develop on one of our replicated application nodes (I'll probably blog about that separately later).  We took the service down on that node while we investigated and remediated.  The load balancer swung all traffic to the other node.  Despite being a Friday afternoon peak (sustained volume was probably just short of 25 TPS), we ran comfortably on a single node, never exceeding 25% CPU.  This application server cost about $7,000 three years ago, so by Moore's Law those dollars today ought to buy a server 4x as powerful (maybe a little less to account for inflation).  [I'll throw in another observation that the database server never exceeds an average of 5% CPU consumption, even during the most strenuous of peaks.]

jPOS & SQL Server: A compilation of best practices

Because jPOS runs in a JVM and provides a wall of separation between itself and RDBMS vendor specifics via Object-Relational Mapping ('ORM'), we live a world where we can test OLS.Switch locally on Linux/MySQL and then deploy that exact code at a customer site that may be running Windows Advanced Server 2003/MS SQL Server 2005.  For those of us who grew up with legacy payment systems tied at the hip to specific hardware/software architectures, this new reality is an everyday repeating joy.

That's not to say that our MS SQL Server experiences has been without speed bumps.  We've learned a lot about running jPOS with SQL Server.  I've mentioned our 'lessons learned' at various points.  In this post, I'm doing a quick brain dump to try to combine all the points into a single list of best practices:

  1. Love your DBA - Nothing on this list happens successfully without forming good, cooperative bonds with your DBA team.  They will be an instrumental part of your success.
  2. Keep your production tranlog thin - We keep eight days on our tranlog in production; the rest get culled each night and pushed to a historical repository via a nice elegant script written by the DBA team.
  3. Re-Index nightly - We check all our indices on all tables every night.  If they're fragmented beyond a certain threshold percentage, a re-indexing operation is performed on it.  Another nice script by the DBA team.  [I can't overestimate the importance of this step.]
  4. Use SQL Server 2005 Virtual Server - We reference a Virtual Server in our connection.url, which makes node failover transparent to us.
  5. Keep up to date on jTDS releases - This is our JDBC 3.0 driver. We just upgraded from jTDS 1.2 to 1.2.2.  Pay attention to new features available in each release to determine their relevancy to you.
  6. Set prepareSQL=3 in the jTDS connection.url - Rid yourself of connection in SQL Server's tempDB space.  Enough said.
  7. Keep up to date with Microsoft releases - We're prepping for an upgrade to MS SQL Server 2005 SP 2 next week.  As prep, we did a complete regression and stress test.  [Our DBA notes there's no backout from this upgrade, so we're trying to make damn sure of no obvious side effects.]
  8. Separate your connection.url into a configuration file - We did this at the request of our customer's DBA for what he calls 'catastrophic event planning.'  Now, we can point to a fallback server in the event of a catastrophe without recompiling our code.
  9. Always test there - Since we test locally in conjunction with MySQL, we never take anything for granted: we test locally, then test at our client sites running MS SQL Server, and only then are we ready for production.
  10. Maintain your own MS SQL Server schema - In the early days, we'd let Hibernate generate the MS SQL Server script.  But I noticed too many quirks; specifically, it seemed to miss some of the alternate indexes we needed.  We still do the Hibernate generation thing for guidance, but we essentially have a hand-crafted and obsessively compiled and scrutinized MS SQL Server schema that we maintain separately.
  11. Focus on how the app accesses your data and design indices accordingly - We once implemented a wonderful 'findDuplicate' function which, ahem, on its first iteration ended up scanning the tranlog on each online transaction request.  Nasty stuff.  SQL Server's Query Optimizer picks one and only one index for each access of a table.  Design accordingly!  [Our 'findDuplicate' index ended up having eight components - create index perf_ix_dupcheck1 on tranlog (storeNumber, registerTranId, tenderNumber, internalresultCode, registerNumber, internalTranCode, revInd, reconId)]
  12. Set your connection.isolation level to 1 -  We do it like this:

    <property name="connection.isolation">1</property>

    This is critically important.  Without this in place, our DB connections from our jPOS application were rupturing mid-transaction when under duress and wreaking havoc.

The New Normal

Hh_20080418_2 Thanks to the ongoing addition of new store locations (about 1,300 additional brick and mortar locations added since this time last year), the 'New Normal' at our flagship OLS.Switch installation is that a 'typical' Friday can have 880,837 transactions from 4,674 store locations across the four US continental time zones.  Coming soon: an ending point north of 5,000 locations and a New Normal of 1,000,000+ separate customer engagements handled each and every day.  In the recent past, those types of days were something we witnessed only on pre-Christmas, Valentine's Day and pre-Easter surges.

When thinking about these volumes, there's something else to comment on: the jPOS mailing lists often features a particular type of request that says "I need jPOS to handle 8,000 TPS, so please do the needful and hand over your source code."

Some perspective:

> This retailer does > $20B in sales per year.

> They are in the Fortune 150.

> There are approximately 25,000 separate register lanes.

> In my example, the sustained peak (which we measure in 30-minute gulps - see pop-up to left) never touches 25 TPS.

> Even on the most frentetic day in our record books, the peak hovered near - but never surpassed -  40 TPS.

Your mileage may vary, but...I'm just saying.

Tuesday, April 15, 2008

HostRev=0 is Double Plus Good

Hostrev0 We collect and distribute a small group  of key stats for our flagship OLS.Switch payment system implementation.  [See pop-up on left - this is a nice little SQL-based report that the DBAs run there.]  It's not a big group of metrics, but it allows us to keep our finger on the pulse of what's going on.  It's not a technical set of metrics; it's more a managerial thing that keeps tabs on growth and provides a one-line answer to the question "What was performance like yesterday?" (or, the day before, or over the past week).

I mentioned in a recent post that we maintain 32 separate connections to remote authorizers in this acquirer-side implementation.  If we have a day with 825,000 transactions, we probably switched-out about 750,000 of those for remote authorization.  'HostRev' counts how many of those requests timed-out because we didn't get a request back from the issuer in time...or because we weren't able to get the request out the door before our threshold timer expired.

My favorite sight is the usually rare "'HostRev=0" day.  You can see that three times over the last eight days, we've attained that metric.  That's a rare occurrence to say the least.  Lots of things come together to achieve that:

  1. A good transaction multiplexer - Our jPOS-based transaction framework provides us with the best MUX available anywhere (we use Alejandro's QMUX).
  2. Good issuer performance - This one is out of our control, but to get that 0, all the issuers we're shipping transactions to have to get their replies back to us by 25 seconds. 
  3. No problem at the Gateway - We gateway about 75% of those requests through FDR North, so they've had a pretty good run, too (let's give credit where its due).
  4. Good internal functioning of our switch - We need to have the right number of threads defined and the right timers set in order to prevent backups in our queues.
  5. Firepower - We need to have appropriately-sized servers.  Luckily, you can get that firepower cheaply today (the site in this example is powered by four servers costing about $7,000 each).

Saturday, April 12, 2008

Yes, there are two paths you can go by

This post harkens back to my original "Build vs. Buy" discussion regarding whether you should build an payment switch application using jPOS, or go the 'buy' route and take advantage of something like OLS.Switch, build on the jPOS-EE infrastructure.  I've talked before about the risks of the 'build' approach .  The best way to avoid those risks is to build a good team.  If you have that going for you, by all means go for it.  jPOS is brilliant at allowing you to leverage those skills to build a really great payment systems solution.

That said, I wrote that original Build vs. Buy thing more than three years ago and since that time we've now got about 6+ person-years of effort invested in iterative and continuous improvement in our product offering.  I took a shot once at tallying the various things we've done in OLS.Switch - I posted it under the heading of "The Evolution of jPOS: Your Complete Solution Framework."  These are things you can do, too, when you choose jPOS.  It is, in fact, a list of the things we've already done in OLS.Switch.  This reality has brought up some interesting evolutions in our sales cycles.  Of course, we see the 'traditional' opportunities where our solution is a really good fit, both in terms of system need and in our ability to provide any and all professional services required to do what I call the 'build to fit' exercise.  [Payment systems switches are never drop-ins.  Sorry if I've ruined that fantasy.]

But we also see situations where OLS.Switch isn't the best fit - either being too much of a square peg vs. a company's need for a round peg for its round hole, or - in the case of The Gladiators - a company having a super-strong team to handle the bulk of professional services to promulgate their very distinct vision.   But yet, even The Gladiators can appreciate that we've implemented a number of vary valuable features and fought and won the wars that they'll need to re-fight (especially in the case of these often very tricky authorizer interfaces).  In these situations, we've done very effective engagements where we provide our source code for mining, adaption and integration.  These have proven to be very rewarding and fruitful projects for everyone involved. 

So, when it comes to OLS engagements, yes, there are two paths you can by.  And if you're currently going in a different direction, well, in the long run, there's still time to change the road you're on.  [Thanks to these guys for that wording.]

Real Systems Do Extracts - Part 5

As you've heard repeatedly in this blog, Real Systems Do Extracts.   In our OLS.Switch implementations, we've take the pains to grind through the details of our extract and reporting requirements at the most granular level possible.  That's resulted in a good deal of stability.  We're working from a strong base:  I'm thankful to Alejandro for his brilliant, simple and extensible design of this important application component.  It's reflective of the key to our working relationship:  it works best when I describe the business requirements without providing design input.  I come, in a very unconscious way, encumbered with a fair degree of baggage of legacy payment systems.  Much as a try to shake off that bias, it comes through on  many of my suggestions.  The beauty (and it is a beautiful thing...does that make me a geek?  I suppose so) of Alejandro's design is something I never could have envisioned.  There's a lesson there for sure on how to best build your team to take on jPOS-fueled payment systems projects.

Unfortunately, real systems blow up, too.  And, most typically, they like to crater at, oh, 2 AM.  [NOTE:  I'm only referring to the batch job ending abnormally.  Our OLTP engine kept chugging away.]  At our flagship client, the last time we had a blow-up in the nightly batch cycle was early August, 2007...eight and a half months of interruption-free sleep.  You can't brag about records like that, because I think the damn thing can hear me and show me who's boss by going south that very night.

It_happens Well, the skein was emphatically broken on Thursday morning.  Bottom line:  this is IT.  And, in the words of Forrest Gump, it happens (see pop-up at left).  We talked about not blowing up the extract in such a destructive way, but instead noting the specific exception, excluding the record and moving on.  We definitely don't want to do that..these are material items we'd be excluding and perhaps forgetting forever if we're not diligent in checking the exception report each morning.  As it is, the extract runs hands-off and requires no eyeballing in the morning.  I'll take the pain of the occasional 2 AM phone calls.  These incidents find the weak points & give us an opportunity to firm them up.  For the record, I've posted my analysis of what happened (with some masking of real client and file names).  I think anyone who has taken one of these calls can appreciate that what we did at 2 AM - 3 AM is triage.  The 'real fix' ideas are something you do as follow-up within the next couple of days.

We're going to do a couple of additional things to clarify the fallout from the abnormal ends of our extracts:

  • Right now, when the extract is in process, all outputted files have *.tmp suffixes.  When complete, they get *.txt (or *.csv for some) suffixes.  This is a little technique that allows our clients' DBAs to key on this suffix transformation for the moment when they can PGP-encrypt and ship files via FTP to their external authorizers.  The issue is that the transformation happens even when the extract ends abnormally, resulting in the FTP-ing of half-baked files.  This is an irritant more than anything else - these error out at FDR, AMEX, etc. because the files lack file trailer records.  We'll make a revision so that if there's not a normal ending, the files will get *.bad suffixes instead.
  • The error (see my analysis) is something that gets logged inside our main q2.log right now.  Depending upon the nature of the error, that can be hard to spot sometimes because we've got all the OLTP stuff swamping it.  We're working to separate all our batch components into a separate JVM (subject of a future post!) and at that time we'll see the error spotlighted in a smaller batch-only version of the q2 log.  Alejandro is going to take the separation one step further and write the error to a separate file.
  • We'll also make some revisions to reflect the ID (row) of the record where we stumbled as well as give some guidance as to the field that blew us up.  As you can see from my attached analysis, it took a bit of detective work to find the offending record and field this past Thursday!

Saturday, April 05, 2008

Love Your DBA

In our payment systems projects, we work with a number of constituencies at our OLS.Switch client sites.  An incomplete list (arranged in no particular order) consists of:

  • The core application team (our everyday contacts)
  • The store systems team (we interact with these guys)
  • The application's line-of-business owner (the core application team are the stewards, but this is the person who decides features and directions)
  • Accounting
  • Fraud Control
  • Telecommunications
  • Data Security
  • Systems Administration
  • Database Administration

This post concerns DBAs.  I've mentioned before that "a talented DBA is a very important part of a successful jPOS implementation...I know this from experience."  That feeling was validated once again over the past couple of weeks.  I'm thankful that our flagship OLS.Switch client has a super-strong DBA team.  If anything, over the past year they've become an even stronger group, as their manager is re-tooling and expanding his team with MS SQL Server-specific talent.  That's a boon for us, as this client runs OLS.Switch on a Windows Advanced Server 2003/MS SQL Server 2005 configuration.  From this team, we get proactive monitoring of our operating environment as it relates to our consumption of DB resources.  A few weeks back they contacted me with concerns about occasional resource contention lockups in SQL Server.  I'd seen these, too, but was having trouble pinpointing the source, as I don't have the visibility in production that those guys have. 

Turns out, the issue came down to contention in SQL Server's tempDB space.  After getting a full debrief from those guys as to their findings, I did some research and found out that this was an issue encountered by others and with a very clear resolution.  It involves our use of the jTDS driver.  [We use jTDS for our SQL Server connectivity.  As best described on the jTDS Project's web site, it is "an open source 100% pure Java (type 4) JDBC 3.0 driver for Microsoft SQL Server (6.5, 7, 2000 and 2005)."] 

The key dialogue about this problem is something I found on the Hibernate forums.  Alin of the jTDS project provides the key insight: 

Updating jTDS to the latest version will fix the problem. tempdb is used by earlier versions because prepared statements were by default creating temporary stored procedures to "prepare" the query.  jTDS 1.1 uses sp_prepare and sp_execute by default (you can still switch to 3 other modes, read the jTDS FAQ for more detailed information); this doesn't use tempdb and isn't affected by transaction rollbacks.

The key passage of the FAQ is the one that describes the prepareSQL parameter:

prepareSQL (default - 3 for SQL Server, 1 for Sybase)

This parameter specifies the mechanism used for Prepared Statements.

                           
   

Value

   
   

Description

   
 

0

 

SQL is sent to the server each   time without any preparation, literals are inserted in the SQL (slower)

 
 

1

 
 

Temporary stored procedures are   created for each unique SQL statement and parameter combination (faster)

 
 

2

 
 

sp_executesql is used (fast)

 
 

3

 
 

sp_prepare and sp_cursorprepare   are used in conjunction with sp_execute and sp_cursorexecute (faster, SQL Server only)

We came into jTDS before their 1.1 driver, and hadn't made switch from prepareSQL=1 mode to prepareSQL=3.  We made the switch in test and just confirmed in a stress test (we revved the thing up to 130 TPS sustained on a really small box) that we addressed the tempDB contention concerns.  The exact response from the DBA was:  "I am happy with the numbers I am seeing. CPU time is low and the tempdb issues we have in prod are not happening in test."  Woohoo!

So, this coming week, we'll upgrade to a production environment like this:

  • JRE 1.5 (we were on 1.4.2, but Sun is EOL-ing that).
  • MS SQL Server 2005 SP 2 (our stress test validated that proposed upgrade, too...we were on SP1).
  • jTDS 1.2.2 (we were on 1.2)
  • jTDS connection.url with prepareSQL=3.

2 AM Saturday Phone Call

A couple of years back, when we were attempting to position OLS in the marketplace, I made special note of our team's operational experience as something that set us apart from a 'typical switch vendor.'  What I had said specifically was:

OLS’ competitive advantage is the collective experience and stability of our core team.  In contrast to the personnel churn that is a hallmark of major software and hardware suppliers in our market segment, OLS’ key employees have over 100 years of combined practice in the OLTP marketplace, designing, developing, installing and optimizing technologically superior systems for common business work processes.   

Moreover, the OLS team is ‘production-oriented.’  We understand the importance of up-time and the criticality of our customers’ systems.  We know that 2 A.M. phone calls are not uncommon…that installs and upgrades can take place at odd hours…that comprehensive ‘backout’ plans are a must…that problems usually can’t wait until morning.  This is more than an ‘approach.’  It’s an attitude.  It is critical to our success.  It does not come easily or cheaply.  Rather, it is hard-earned and well-honed over years of experience.

Okay, all well and good...but can I sue Hillary for stealing my idea?  She's tried to define herself (and her opponent in the process) with this 3 AM call on the red phone idea.  Of course, if Hillary gets a 3 AM phone call, it's most likely to be Bill after a night out with Ron Burkle, as this excellent skewering from Mike Lukovich deftly and hilariously points out.

PABP Prep

We're currently doing a PABP certification of our OLS.Switch payment system solution.  [Well, I mean we've engaged a QSA to perform it.]  There's substantial confusion between PCI requirements and the new PABP edicts.  That confusion is reinforced by things like searching for PABP in Wikipedia and landing directly on the PCI DSS page (with no mention at all of PABP as of today's post).  As my colleague Dave Bergert points out, PABP != PCI Compliance.

It's great having Dave on board to help us navigate through this process.  As a former QSA (and his CISSP, CISA, CompTIA Security+ bona fides), he's spearheaded a critical pre-assessment of our offering.  He and Alejandro have shared some good ideas along the way on critical touchpoints like the treat of user accounts and passwords, key management and SAF implications.  You can see the result of some of that dialogue in Dave's post detailing some of the changes he's made in relation to accounts & passwords.  [Some of those items will find their way back to jPOS-EE SVN code base after the completion of the audit.]  Additionally, you can see that Alejandro has his finger on the pulse of a hot button PABP issue.

PABP goes beyond the software itself to examine your practices.  For example, a software vendor needs to track its customer issues, from open, to resolution, to close.  We've got those procedures in place, thankfully, and it's nice to see that diligence derive some additionally benefits.  We use MySQL's Eventum to track issues, place Subversion reference numbers into all Eventum updates, and then tie everything together in the form of a Release Note.  [Here's a recent Release Note detailing the content, installation procedure and backout instructions pertaining to a relatively small, straightforward (to the point of automation) release of new features and fixes.] 

Beyond all that process is the documenting of it, and that's what has been consuming a great majority of Dave's hours.

Can jPOS handle it? (part 5)

Here is part of my continuing series supplies metrics to answer the class of jPOS questions that fall under the heading of "Can jPOS handle it?"  Typically, the questioner wants to know if jPOS can support their super-sized needs because they plan on running 8,000 TPS (or some eye-popping number like that).  I typically begin to address these questions by putting things in perspective, noting that one of our OLS.Switch implementations  powers one of the US' largest retailers (4,566 locations as of yesterday) and sustained a 40 TPS peak (I define 'sustained ' as the 30-minute measured rate) over an eight-hour 45 minute period on a white hot December 24th.  [NOTE: There were only 4,000 stores on the system at that point.]

I was going to call this post "The New Normal."  With the significant uptick in store count, we're now seeing 'normal' weeks (i.e., no pre-holiday buying bulges) that look like this:

  • Monday, March 31st:  801,064
  • Tuesday, April 1st:  826,970
  • Wednesday, April 2nd:  805,684
  • Thursday, April 3rd:  824,969
  • Friday, April 4th:  852,341

...for an average "average" week of 822,205 transactions.  That's the new normal.  That number will go higher as our client gets the remaining 400 or so stores converted from their most recent M&A activity.

Other metrics of note...

  • 4,566 separate retail locations over four time zones
  • 21,172 separate register (lane) locations
  • Acceptance of seven credit card brands, five stored value card brands, Debit (in either offline or online modes), EBT and some internal applications like Employee Verification, Discount Coupon, Shopping Card reverse lookup and custom Check Authorization
  • 25 separate external extracts and internal reports
  • Five million (approx) extract and report records written during each nightly batch cycle.
  • Over $30 million (USD) in extractable transactions on peak days
  • Relationships with eight distinct remote authorizers...in each case, a jPOS MUX supports two channel connections to each entity; plus, we replicate our configuration over two application nodes, meaning that we manage 32 (8x2x2) separate connections
  • Average sub-second response time (including remote auth time) for all transaction classes except Debit (where hardware-based PIN translation and at least three separate institutions results in an average end-to-end time of approx. 1.2 seconds)

...all on a core hardware configuration costing less than $28,000 (measuring the cost of the four core servers, not including load balancers, networking or licenses for system software like MS SQL Server 2005).

My Photo

Tools

  • Google

    The entire web
    www.andyorrock.com
AddThis Social Bookmark Button

Resources

  • About Me
  • Dave Bergert's blog
    Insightful payment systems thoughts by my OLS colleague, Dave Bergert, CISSP, CISA, CompTIA Security+, and former Visa-certified QSA.
  • Glenbrook Partners' Blog List
    Glenbrook Partners has compiled "a current summary of the latest content from some of our favorite payments and banking blogs based upon their RSS feeds." Alejandro, Dave and I are on the list, as are many other good info sources.
  • jPOS
    Faced with payment systems challenges? Start here to learn more about Alejandro Revilla's jPOS project.
  • Randy San Nicolas' blog
    My OLS colleague Randy San Nicolas writes about his wealth of experience in various Issuer- and Acquirer-side endeavors in his Prepaid Enterprise blog.
  • soliSYSTEMS
    My friend Roque Solis is our go-to guy for RFID, smart cards, chip cards, integrated circuit(s) cards (ICC), HSMs, cryptographic accelerators, DES and public-key cryptography.
  • Specs Online - AMEX
    American Express (Amex) puts all its acquirer specs online for public retrieval.
  • Specs Online - First Data
    First Data Merchant Services (FDMS, aka 'FDR') puts all its acquirer specs online for public retrieval. [NOTE: FDMS' spec repository is accessible only via Internet Explorer; this link will not work with Firefox or other browsers.]
Blog Widget by LinkWithin

Enter your email address:

Delivered by FeedBurner

Blog powered by TypePad

If you're looking here...

  • Your attention to detail is a great asset. Use it wisely.