Semaphore Corporation continues to canonicalize addresses wrong and refuses to admit it

By | August 4, 2011

Long-time followers of my advocacy may recall my story from years ago of attempting to get the Social Security Administration (SSA) to deal with the fact that they sent several of my kids’ social security cards to the wrong address because the software they were using to canonicalize mailing addresses was buggy.

Some sleuthing on my part and help from people on the Web revealed that the most likely explanation for what was going on was that the Social Security Administration was using a software package from Semaphore Corporation to perform address canonicalization. To be clear, the evidence behind this theory was purely circumstantial — the SSA was canonicalizing my address wrong, and Semaphore Corp.’s software canonicalizes my address wrong in the same way.

Semaphore was notified in August 2005 that their software was canonicalizing my address and others incorrectly. As noted in my original story, their ridiculous response was:

The USPS has a large number of esoteric rules about which ZIP+4 to match when the address-city-state-ZIP inputs are incomplete or conflicting and ambiguous, and rules don’t even exist for many cases, so you’ll continue to see the logic evolve as CASS changes to include more of the above situations.

There are two reasons why their response was ridiculous:

  1. The U.S. Postal Service (USPS) is by definition the authority on address canonicalization, and they canonicalize it correctly when given exactly the same input that Semaphore canonicalizes incorrectly.
  2. Neither I nor anyone else was able to find a single other web site or software package that canonicalizes my address the way Semaphore’s software does — everything else out there canonicalizes it the same way the USPS does.

I originally posted my story in August 2005. Tonight, six years after I first posted it, I received a long, convoluted email message from an unnamed individual at Semaphore which can be briefly summarized as follows:

  • “Our software is still broken.”
  • “We’re still not going to admit that it’s broken…”
  • “…because we’re too stupid to understand that if our software behaves differently from the USPS’s own software and differently from every other software package in the world that does address canonicalization, that means ours is wrong and everybody else’s is right.”

But you don’t have to trust me. Below is their email message in full, with some commentary from me.


Someone pointed out your page at http://www.mit.edu/~jik/ssa-zip.htmlto us, and we noticed it contained a number of errors and misconceptions, such as:”The software… has a bug…”
Most people think of CASS as software to simply standardize or validate addresses.  In reality, CASS software implements a large number of strange rules designed to make input addresses conform to USPS databases and regulations so the addresses qualify for certain postal bulk mail discounts.  Because of the complexity of those rules and the USPS databases, the required output for a given input is often surprising and non-obvious, and what might appear to be a bug is actually just a strange requirement of USPS regulations (or misinterpretation or ignorance of the required outputs).

For example, if the input is 3100 1ST AVE, BELLEVILLE NJ 07195 then CASS requires the ZIP code to be deleted (!), and many data managers might identify that as a bug [a meaningless example, since when you type that address in at http://zip4.usps.com/zip4/, it says that the address doesn’t exist, which is not actually the situation we’re discussing here].  Another example of strange CASS rules:  users often wish to combine street and box addresses on one line, such as 207 GRANADA DR PO BOX 2.  Sloppy record management often lets that degrade into something like 207 GRANADA DR BOX 2, at which point CASS requires BOX to be replaced with #, leaving 207 GRANADA DR # 2, and the original meaning is lost forever [another example which has nothing to do with the case we’re discussing].  A third example:  the ZIP+4 database says all house numbers from 1 to 39 (odd) on [elided], BRIGHTON get ZIP+4 code 02135-[elided].  However, if the input is 27 [elided] ST, BRIGHTON 02135-[elided] then CASS forces the +4 to be deleted due to DPV (again, a situation that at first glance might appear to be a bug but is actually a USPS regulation) [yet another meaningless example, since there is no house at number 27 on the street in question — it jumps right from 25 to 29, which I should know, since I live on the street]. [one cannot help but wonder whether Semaphore’s spokesman can cite, rather than three irrelevant examples, the specific CASS regulation which supposedly requires the valid mailing address, “Boston, MA 02135” to be replaced with the invalid mailing address, “Boston, MA 02109”]

Since the USPS databases change monthly, and the CASS regulations typically change yearly (for example “Cycle M” CASS rules ended 7/31/11, and “Cycle N” rules are now required as of 8/1/11), one must be careful when trying to distinguish bugs from requirements.  Also, although CASS software is now required to use DPV, LACSLink, and SuiteLink to generate actual CASS forms, it’s possible to operate CASS software without any of those extra databases if you don’t care about bulk mail discounts (although you get correspondingly less-information address correction).  That means it’s possible to get different output results depending on how many extra databases the user decides to install.

“Brighton is part of Boston. This means that “Brighton, MA 02135” and “Boston, MA 02135″ are both legitimate ways to write my address.”
Actually, every mailing address has a single PREFERRED form as specified in the USPS database, and sometimes other acceptable but not preferred forms.  [none of this disagrees with anything I wrote, so it’s not obvious why this sentence started with “Actually”, nor does it have anything to do with the problem under discussion] (Note that your implication that the preferred city-state-ZIP can be determined without considering the address is not always true [I neither made nor intended any such implication, nor is it relevant to our discussion, which is about Semaphore’s failure to properly canonicalize specific addresses; it doesn’t matter whether any particular thing is “always true,” but rather about what the software is supposed to do in those specific cases]; see http://www.semaphorecorp.com/cgi/zip5.html for an explanation.  Also note that geographic boundaries don’t determine acceptable place names for ZIP codes, only the USPS city-state database determines that.)

You don’t list your address, but let’s assume it’s [elided] ST.  The ZIP+4 database indicates the preferred and standardized city-state-ZIP+4 for that address is BRIGHTON MA 02135-[elided].  Although the USPS city-state database lists BOSTON as a MAILABLE PLACE NAME for 02135, it is not preferred, and will often lead to confusion as you discovered.  (It’s not unusual for a variety of place names to be commonly associated with cities, but often the names are not listed by the USPS as acceptable inputs or mailable substitutes for the preferred forms.) [more evading the issue. whether or not writing “Boston” instead of “Brighton” will “lead to confusion,” it is still a valid mailing address according to the USPS and every other address canonicalization software in existence]

“…street names with overlapping numbers… in the same city…”
Although you might equate Boston and Brighton as the “same” city, the USPS database does not.  [elided] ST 02135 is explicitly linked to the BRIGHTON post office, and [elided] ST 02109 is explicitly linked to the BOSTON post office.  Those address/street records don’t overlap any more than a MAIN ST address in California overlaps a MAIN ST address in Massachusetts.  Furthermore, the DPV database (shipping for the last 10 years) lists only [elided] ST 02135 as a mailable address. [none of this is relevant]

“…enter my address with “Boston, MA” without the ZIP code, you’ll get back 02109.”
CASS software is also required to return the indication that [elided] ST 02109 is not a mailable address if DPV is installed.  If whomever or whatever invokes the CASS software ignores that warning, it’s certainly not a bug in the CASS software. [none of this is relevant]

“…software which standardizes addresses must include the ZIP code provided by the addressee when calling the address standardization API.”
Software can’t force anything from the addressee, especially if the addressee isn’t operating the software.  Although CASS software must recognize ZIP (and ZIP+4) codes provided on input, CASS software can’t require ZIP codes to be input, and indeed CASS requires software to be able to process addresses without ZIP codes (and ZIP codes input without cities).  If a data entry clerk chooses to leave off the ZIP code, or input only the ZIP and not the city, CASS software is still required to process the address as input.  If the clerk subsequently ignores the CASS results (eg, ignores the warning that [elided] ST 02109 is not mailable), there’s nothing CASS software can do about that either. [there is no evidence whatsoever that the SSA failed to input my ZIP code when canonicalizing my address. there is no need to theorize that this occurred, since as already explained, Semaphore’s software canonicalizes the address incorrectly even when the correct ZIP code is entered]

“I applied for a social security number… specifying “Boston, MA 02135″ in my address.”
That’s essentially the source of your problem, because your address is listed and maintained by the USPS as [elided] ST BRIGHTON MA. [no, you idiots, the source of my problem is that your Software canonicalizes my address wrong]

“Your computer system disregarded the valid ZIP code that was provided…”
Again, that’s assuming clerks aren’t taking shortcuts and just leaving off or even overriding the ZIP.  Note that the USPS.COM web site no longer even allows ZIP codes to be input for address lookups. [well, that’s just stupidity on the part of the USPS Web designers, but note that you can still get to the old form which accepts a ZIP code at http://zip4.usps.com/zip4/]

“…the simple idea that both “Boston, MA” and “Brighton, MA” are valid ways to write my address.”
That idea is actually TOO simple, because the cities are NOT equivalent as far as the USPS is concerned.  Insisting that they be considered interchangeable will probably just lead to frustration. [more absurd evasion. of course they aren’t equivalent, but specifying the ZIP code disambiguates them, so it’s simply wrong for software to throw away a valid ZIP code for my address and replace it with an invalid one]

“…it seems obvious to me that both the SSA’s and ZipCo’s software are broken.”
The chain of events probably involved a data entry clerk typing an address (or possibly a machine scanning the address) from a written document, which is probably submitted to an application, which then calls CASS software, which returns results to the application and/or operator.  What combination of address-city-state-ZIP were exchanged at each interface, and whether there were any modifications, is actually unknown. [and doesn’t matter, because we already know that Semaphore’s software does the wrong thing even if the data is entered correctly] Claiming the CASS software did not return the USPS-regulated outputs for whatever inputs it was given is actually the least-likely explanation of any subsequent problem, since it would be difficult for that kind of software to be certified [well, Semaphore’s software is certified, isn’t it? and Semaphore’s software is clearly returning the wrong answer as shown below, isn’t it, so forgive me for finding it hard to believe that it would be difficult for buggy software to be certified].  A more likely explanation is that operators aren’t fully aware of what CASS outputs indicate, or the operators aren’t using DPV, or the outputs are being ignored altogether.  In any case, there’s no actual evidence of a bug (ie, there’s nothing to indicate the CASS software isn’t doing exactly what the USPS requires) [well, um, except the evidence that the USPS and every other software package in the world do it differently from Semaphore and yield a valid address while Semaphore doesn’t. but sure, if you want to pretend it’s OK for your software to behave differently than everybody else’s in the world, by all means, go right ahead].  FYI, here are our outputs for [elided] ST using BOSTON and BRIGHTON 02135:

INPUT: [elided] st, boston ma 02135
Address (final) … [elided] St
City state ZIP (final) … Boston MA 02109
DPV confirm … N
DPV footnotes … AAM3
Dropped +4 … 5532
Edition … 201107
Error message … ZIP changed
Error numbers (detailed) … 14.2
Uncertified … U
Version … 99INPUT: [elided] st, brighton ma 02135
Address (final) … [elided] St
Barcode digits … 02135[elided]7
Carrier route (final) … C022
Carrier route rate … B
Certified … C
City state ZIP (final) … Brighton MA 02135-[elided]
Congress code … 08
County code … 025
County name … Suffolk
Date certified … 20110804
DPV CMRA … N
DPV confirm … Y
DPV footnotes … AABB
Edition … 201107
ELOT sequence … 0021
ELOT sort … A
Fragment (house) … [elided]
Fragment (street) … [elided]
Fragment (suffix) … St
USPS address code … S
Vacant … N
Version … 99

“We provided our correct mailing address in our application.”
If you had submitted BRIGHTON instead of BOSTON you probably wouldn’t have had any problems [probably not, but that’s not really the point].  If, unlike the USPS database, you consider BOSTON correct for your address [the USPS database does consider “Boston” correct for my address. as you said above, “Brighton” is preferred, but “Boston” is not wrong, and even the USPS’s own Web site accepts it], BRIGHTON could only be called “more” correct.

At best, the SSA, OPI, IG, congressmen and senators will probably never go beyond appearing to lend a sympathetic ear.  To get the database changed, or the CASS rules changed, you should just go straight to the AMS system and the CASS department.  See http://www.semaphorecorp.com/cgi/dirt.html for contact details.  In the mean time, we definitely recommend using BRIGHTON instead of BOSTON.

Share

Leave a Reply

Your email address will not be published. Required fields are marked *