Along with the confusion on the term End to End Encryption, Tokenization (or just simply tokens) is a term used to describe many things.  But what is a token really?  The PCI Council does not provide any guidance other than the definition for an Index Token in the glossary:

A cryptographic token that replaces the PAN, based on a given index for an unpredictable value.

But even this does not really help us.  To make matters worse, the term “token” itself is defined in the PCI DSS Glossary in the context of a 2-factor authentication device like SecurID.  I’m going to take a crack at defining it and discussing what the variants might be and how they could be weaker or stronger.

Septa Token, by lindseywb

In the purest sense of the word, a token is a replacement value for another piece of data.  Meaning, instead of using 4111 1111 1111 1111 for a Visa card number, you would use some other value to represent that card number, and have a way to look up the original number should you need it.  The token could be something alpha-numeric, numeric only, or even binary values.  Based on the amount of existing data and the design of the applications using it, most tokens tend to take the form of a 16-digit numeric value.

Regardless of the makeup of the token, there should not be any mathematical (or otherwise) relationship between the token value and the original value.  That’s where the word “index” comes in the Council’s definition.  The problem with their definition is that it also includes the term “cryptographic” which could lead assessors and professionals alike to assume that some kind of cryptographic relationship can exist between the token and the original value.

The only relationship that should exist between a token and the original value is the index.  A schema for such an index might look like this:

CREATE TABLE Tokens {
original_value CHAR(16) PRIMARY KEY,
token CHAR(16)
};

Generating tokens would happen outside of the database layer in this case.  Making the original value the primary key will prevent two tokens from representing the same original value.  It’s been a long time since I have done database design, so I wouldn’t suggest implement this directly.  This is simply a way to illustrate a point.

Original values should not be able to be reversed or derived from token values.  If they are, it changes how you will protect the tokens and should in fact be called cipher text, NOT tokens.  If they are cryptographically related, the possibility exists that analysis could be done to reverse the crypto operations.  Hashed cipher text is commonly referred to as a token which is a dangerous association and according to the definitions I am layout out, inaccurate.

I suggest a token for PCI DSS be defined as:

Token: a replacement value for a PAN (or other sensitive data) that has no mathematical or otherwise derivative relationship to the PAN.

Therefore, for a value to be considered a token, it must have two properties:

  1. No relationship to the original value (i.e., not based on or a form of cryptographic cipher text)
  2. No ability to use the replacement value in the place of a PAN to authorize (or otherwise process outside of the token system) a transaction, therefore no value to someone who has access only to a token

Please don’t construe this as an endorsement (or anti-endorsement as it were) of any given product, just a clarification for the term “token.”  Are other methods of rendering cardholder data unreadable under 3.4 that are incorrectly being called tokens still acceptable for compliance?

Certainly, but they are not tokens.

This post originally appeared on BrandenWilliams.com.

Possibly Related Posts: