Evan Schuman’s excellent blog StorefrontBacktalk has been doing some top-notch reporting on an incident of online debit double-billing at Macy’s. In the latest piece – written by Evan and Fred J. Aun – they note that Macy’s VP of corporate communications told them that:
As we investigated the cause of the gateway issue, we discovered a formatting flaw in the software that runs our point-of-sale register system. This is proprietary software that we developed in-house to run our POS system. At a certain level of transaction volume, the formatting flaw caused our system not to recognize messages coming back into Macy’s gateway from our banks that a debit transaction had been approved. Absent that approval, the transaction was not accepted and the customer was asked to re-swipe their debit card–which led to the double debits when these additional transactions also went through successfully.
The piece is excellent. Schuman and Aun are both dogged investigators and great writers.
Here’s a couple of other important passages from the article:
Scope: “The glitch…was limited to PIN debit transactions, ignoring signature debit transactions.”
Resolution: “Within a half-hour of noticing the first incident, Macy’s began shifting stores to an alternate gateway that was operating properly, and the shift of all stores was completed in less than two hours, said Macy’s officials.”
Other concern [1]: “[A]n even more vexing failure occurred in an application that was supposed to keep an eye out for identical sales tickets that are processed multiple times.”
Other concern [2]: “[The President of Macy’s credit and consumer services division] said his ‘immediate concern’ was why the automated reversal system didn’t work, as designed, to prevent double (and several triple) charges. The system was supposed to automatically send a credit to the bank for the second debit.”
Let’s break this down:
- It’s unlikely this was a problem with the “point-of-sale register system.” That phrase implies an in-store system. An institution the size of Macy’s will have a payment switch at their operations center through which they’ll run all transactions. The point-of-sale register system communicates with the centralized host that, in turn, communicates with payment gateway providers like FDR and Fifth Third and direct authorization links to AMEX and Discover. The problem was probably on this host application of theirs. We can say this because the problem got resolved “shifting stores to an alternate gateway that was operating properly.” (‘gateway’ here refers to Macy’s host). Nothing at the store system level got changed to get things back on track.
- Regarding the scope, I guarantee you that credit and offline debit transactions got caught up in this crap, too. [See my Credit v. Debit – Part 2 post for background.] The difference, is that credit and offline debit have a wide tolerance for processing error. You can screw up the online interaction (multiple times, in fact), but most times you’ll end with one good interaction – all the customer intended – and that’s the one that ends up in the nightly extract/settlement file, which is the financial letter of record. By contrast, PIN-ed Debit/EBT has very little tolerance for error. If you screw up the online interaction…you are well and screwed. Take it from somebody who has the scars on his back to prove it.
- Regarding the concern about the ‘vexing failure’ on the component that was supposed to check for duplicates (see [1] above) – I suspect that the ‘Dup Check’ (as I like to call it) wouldn’t have worked here. There’s a strong chance that the Macy’s payment switch had these original transaction attempts flagged as failures (see some plausible scenarios below). When the Dup Check does its ‘FindDuplicate,’ it’s going to look for an approved transaction that looks like the one coming through…but there’s no approval on file, only a failure.
- I agree with the concern [2]: “why didn’t the automated reversal system work as designed?” I have some theories on that which follow…
Some guesses on what might have happened (these are independent theories – I’ve seen everyone of these):
- Transaction responses from the authorizer were received late; reversals were placed into Store and Forward queue, but the queue/space was corrupted.
- A garbage record got at the head of the SAF queue; no one could figure out how to remove it, so the solution was to move to the other server node with a clean SAF queue.
- Macy’s had a slowdown internally on their server and didn’t have a threshold mechanism in place to prevent sending out already-aged requests for remote authorization. [This one is a stone-cold killer.]
- The request/response match-up mechanism in host system’s multiplexer quit working above a certain volume level due to a programmatic flaw (I’ve seen implementations where this gets so twisted up that killing the process is the only solution).
Note that in anyone of those scenarios, the reversal processing would have failed: in scenarios 1 and 2, the reversals are generated, but the SAF is a jail; in scenario 3, no reversal is generated because things “worked” from the host application’s perspective (even though the store system has long-since closed out the transaction); and in scenario 4, all hell has broken loose.