Input Fields are the workhorses of forms. They can restrict the number of characters input and the type of data entered. They can choose to display or mask the input as in the case of passwords. While there are far more options than these, this small set of variables is more than enough for poorly-implemented forms to become nearly unusable, leading to frustration and outright silliness.
Some of this silliness has declined over the decades because developers have gravitated toward some standards of behavior. Sadly, one still to this day encounters forms with some of the same lameness that we saw in the 90's or 00's -- and we can't blame the COGs (Crusty Old Guys) for all of it. Too often, new developers fall into the same traps of bad form design, disregard of user experience, cool-philia, laziness or the "I am IT and I decide" mindset that has consistently been focused on the mechanics of a site even when in opposition to the success of people using a site. For some reason, I encounter these issues most frequently on banking and government sites, where some noob has clearly discovered the modern equivalent of the blink tag and can't resist inflicting their new-found toy on users.
Data like phone numbers, ZIP codes, Social Security numbers are pretty standardized in the US. Phone numbers, for example, are expected to have 10 digits, but they may be formatted in a number of ways:
- (800) 555-1212 (traditional)
- 800-555-1212 (business traditional)
- 800.555.1212 (standard geek)
- 8005551212 (machine version)
While a company may want to format a phone number in a particular way, all they actually need is those 10 digits -- they shouldn't really be concerned about how a person might format their number since it's so easy to parse out the digits and store just that; storing digits only allows more efficient sorting anyway. A number can then be reformatted as desired on subsequent read and display. Still, I constantly find forms that throw errors if a user includes hyphens, periods or, gasp, spaces in a phone number -- or doesn't include one of those. Some forms simply limit the phone number field to 10 characters and leaving the user to guess what's going on, adding no benefit but inflicting a developer's idea of proper form entry.
The US Post Office instituted ZIP (Zone Improvement Plan) codes in 1963, and extended those codes with the "5+4" format in 1983. That's all before forms existed, but the 5+4 format isn't widely used even now. But it still manages to cause some forms (or the developers behind them) indigestion.
Synchronicity is Key
One particularly idiotic situation occurred when I was logging for the first time into an account that had been set up through an separate process. To verify my identity, I was asked to enter my ZIP code, at which time my verification was rejected. Checking back with the administrator, I found that they had my ZIP code had been recorded as a 5+4 value. The web form, however, only allowed 5 digits, making a match impossible unless the developers actually used some common sense. I've seen this occur as well when a form on a web page allows or requires something different than what is allowed on that same form on a mobile app.
The Future is Here (Occasionally)
"The future is already here -- it's just not very evenly distributed." - Michael Gibson
I sometimes run across forms that ask for the ZIP code first, then use that information to automatically populate the fields for city and state. Smart, and rare, a form that actually helps you fill in the blanks. With technology that's been in use since at least 1998 and information that's been available since 1963...
Divertimento: Non-Regional Postal Codes
In Canada the amount of mail sent to Santa Claus increased every Christmas, up to the point that Canada Post decided to start an official Santa Claus letter-response program in 1983. Approximately one million letters come in to Santa Claus each Christmas, including from outside of Canada, and all of them are answered in the same languages in which they are written. Canada Post introduced a special address for mail to Santa Claus, complete with its own postal code:
NORTH POLE H0H 0H0
Gotta love Canadians.
Form input and the Database
Nearly all data input through forms ends up in a database. To store data, most databases receive those data in the form of an SQL (Structured Query Language) query. For example:
insert "xyz1234pplr" into table passwords where user = "John Yaya"
[Note to geeks: these examples are simplified for a lay audience; this is not a coding class, so please back off.]
That query is evaluated by a parser and the value is stored. That parser evaluation, however, is were we can get into trouble.
Code injection is one of the oldest, most common and most insidious forms of hacking. Parsers look for meaning as part of their evaluation process, and some characters can cause some really interesting things to happen. Take our example from above, and let's add something destructive:
insert "xyz1234pplr|'/bin/sh unlink /'" into table passwords where user = "John Yaya"
The vertical bar character "|" is used in some systems (like Unix variants) as what's called a "pipe", a way of passing the output from one program to be input into another program. The inserted phrase:
|'/bin/sh unlink /'
Means "pass the command to the main system to unlink the filesystem root."
In times before developers and operating systems were careful about input (actually this is still a problem), this query might have been executed, resulting in the effective deletion of the entire filesystem on the database server. These days, rather than directly destroy a filesystem, someone might try to use injection to give themselves administrative privileges, and from there they can get into all sorts of trouble.
Input Character Restriction
Sadly, some IT groups continue the ancient (by IT standards) and lame (by any standard) practice of trying to protect against injection by restricting what characters can be entered into a form. So things like "|" or ">" might be disallowed. Problem is, good passwords these days make use of all sorts of punctuation.
Institutional Arrogance : Personal Capital IT
Recently, I opened an trial account with Personal Capital (PC), a Mint-like web site where you enter all you financial accounts and it advises you about your investments (I'm intentionally not including a link out here because I can't with good conscience make it easy for anyone to go there). But I was unable to add some of my accounts because the passwords appeared to be getting rejected. After some back and forth, I was informed that the IT staff at PC reads your passwords when you add a financial account and strips out any characters they think are unsafe -- even if those characters are perfectly fine with the financial institution. To be clear:
- You create an account with a bank, establish a login and password.
- You create an account with PC and add the bank and login credentials to your PC account.
- PC reads your password when you supply it and then strips out any characters they feel are objectionable according to their internal policy. You are never informed about this action, and the now-corrupt password is stored rather than the real password that you and your bank agreed on.
- PC uses the corrupt password to attempt to log into your bank.
- The bank rejects the corrupt password.
- On login rejection, PC displays a message to you indicating that the bank doesn't like your password.
- If this happens enough, the bank locks your account. It is then up to you to resolve the mystery with your bank, because at this point neither you nor your bank know why someone has been trying to log into your account with a bad password.
- At no time does PC let you know they caused the entire problem.
PC's support personnel were very defensive about their actions, maintaining that they're following OWASP guidelines and are justified in their policy. They informed me that the solution was for me to change all my passwords to my banks to conform to their (PCs) requirements.
One of the most myopic and arrogant IT abuses I've seen in a while.
A better method of defending against injection attack is by input escaping. Basically, before you parse any input from a field, you "escape", or defang, any characters you feel are dangerous. Escaping can take a lot of forms, but in essence it means flagging a character in a way that tells the system "this is just a character, don't use it as something else." For example, the vertical bar character "|" in some environments is used as a "pipe", a way of passing data from one program into another program. If that character is treated as a pipe when entered into a database, it might be interpreted as an operation, and text that seemed innocuous might get executed rather than simple entered as text. Our nefarious query from above becomes:
insert "xyz1234pplr\|'/bin/sh unlink /'" into table passwords where user = "John Yaya"
The added backslash "\" tells the parser that the vertical bar is just a vertical bar, and the pipe is neutralized. Smart systems store the escaped string in the database so that later retrievals are safe as well.
It Really Isn't Rocket Science
Finding a well-built form, while not rare, does continue to be uncommon in my experience. As an engineer, I've embraced the rule of parsimony (or Occam's Razor if you prefer) from early in my career, which effectively means "don't make it fancier than it needs to be."
Modern techniques can provide all sorts of functionality to the user experience. The trick is keeping a firm eye on the goal: that technology should be used to enhance the experience, not provide a place to show off at the user's expense.