Passwords as technical debt
Passwords aren't going away...
We're all familiar with passwords, and probably have hundreds of them[1] spread across many applications and websites. Technical debt, however, may be a new term to you. Wikipedia has this definition:
Technical debt (also known as design debt or code debt...) is a concept in software development that reflects the implied cost of additional rework caused by choosing an easy (limited) solution now instead of using a better approach that would take longer.[2]
I'd personally expand that definition to allow for the fact that practices change, and what was fine a few years ago may now be a problem that you have to deal with. Time has caused you to accrue technical debt. In this blog post I'm going to talk about how passwords can be technical debt for your system.
Storing passwords
I've talked about storing passwords before in my post on how password cracking can help your organisation, but, in summary you've got a few options:
- Store the password unencrypted (plain text, e.g. tigersrule123)
- Store the password hashed
- Store the password encrypted
- Options 2 and 3 with a "salt"
- Use a password key derivation function to store the encrypted password
- Don't store the password at all (e.g. use OAuth)
Clearly option 1 shouldn't be chosen, but that's how a lot of systems started out. After storing the password in plain text came storing an MD5 hashed version of the password, then SHA1, then salting. Obviously moving between these schemes requires code changes.
Plain text passwords
You may think storing plain text passwords is entirely a thing of the past yet this still happens. I remember requesting a password reset email for a theatre booking website and seeing my password written in the email. This immediately prompted questions and it turned out that, as suspected, the password was indeed stored in plain text for end users (those wishing to book a theatre ticket). Frustratingly, passwords for system administrators were encrypted in the database - the developer clearly knew how to do it properly but chose not to. That was about 4 years ago, so fairly recent history.
Using old hash mechanisms
MD5 and SHA1 hash methods are not recommended for password storing for a number of reasons. Firstly these mechanisms are considered broken (MD5 for years) and it's possible to have a collission, where two different inputs give the same output. This is unlikely with a password but a consideration nonetheless.
Secondly, rainbow tables are readily available for MD5 and SHA1 hashes which makes it easier for an attacker to determine the original password from a given hash.
Moving to a new scheme
This is where the technical debt comes in. Changing to a new scheme will require a re-write in the account creation, login and password change processes at least. Failing to update the scheme for every activity that touches the stored password will result in part of the system breaking. An example of a password save mechanism, from an old project of mine[3], is below.
Planning the move is a difficult process. If passwords are still stored as plain text in the database then it would be simple enough to just encrypt them using the new method. Once all the other logic (password change, reset, login) is updated the system can go live. Where the passwords are already hashed or protected in some way it won't be possible to just encrypt the stored value - we'd end up with a completely useless password that's unknown to the user.
A hybrid system could be implemented, perhaps with a table schema like this (very fake data):
id | username | password_hash | password_crypt
---|-----------|----------------------------------|----------------
1 | lizet | d4782b35b7c766aaef812818f4a44a38 |
2 | jonathan | 5af0a0feb2094f43bebb50c518c1ebfe | rcudth35489f34d
3 | stephanie | | oehd90h.[3tmej[
In the above example we can see a record created under the old scheme (ID 1), where there's only a hashed password. Entry 2 (jonathan
) was present in the old scheme (hashed) and now has an encrypted password too while entry #3 for stephanie
has only existed since the password mechanism was changed. For this theoretical system the login process is:
- Check to see if the user has a value in
password_crypt
, if so use the new mechanism - If no
password_crypt
, apply hashing algorithm and compare topassword_hash
to log the user in. Also take the provided password, encrypt with the new mechanism and save aspassword_crypt
- If this is a new user, just use the new mechanism and save as
password_crypt
There's a problem with this though - only half the job has been done. It's necessary to remove the values saved in password_hash
in order to prevent the hash being reversed (or looked up) later to obtain the user's password. Step two could be improved by adding "remove the password_hash
value".
Operating in hybrid mode should be a time limited problem, that is to say you don't want to run that way for very long. All the time passwords stored "the old way" are available the password is at risk, and thus so is your user's data. At this point a decision has to be made: how long do you run in hybrid mode?
User experience trade offs
Once the time period expires it's necessary to clean up the database (remove all the old data) and the codebase (remove references to the old value, password_hash
including any logic for it). Any users that haven't logged in, causing a password re-save will find their password won't work anymore - that's what the "forgot password" link is for...
Alternatively you could bite the bullet and get all of the users to create a new password via the forgotten password link but consider how that scales. In a small system with a handful of users there's not a huge problem - heck you could physically hand each user a temporary password and get them to change it. Now increase the number of users to a medium sized business, and then on to a large corporation. Addressing the technical debt will have a user experience trade off and it's important to get the right balance.
Backwards compatibility
Often an antithesis of security. The need to maintain compatibility with old systems is a key reason that some security vulnerabilities are still around. Microsoft's products are often very good for backwards compatibility but that did mean the old, easy to crack LM hash was stored with the new NT hash for a long time[4]. Very useful for penetration test engagements!
Conclusion
Handling passwords correctly in the beginning is always advised, but time will eventually mean there's a need to change the mechanism used and thus you acrue technical debt. Don't forget to clean up the old data as well as the code!
Banner image: Example hybrid scheme.
Note: code and database examples are for example only, unless otherwise stated.
[1] I'm assuming you're not using the same password for everything, as that's really dangerous. I recommend a password manager like LastPass or KeePass.
[2] https://en.wikipedia.org/wiki/Technical_debt
[3] The system is due to go through a complete re-write, and is over 8 years old by this point. It's not published to the public Internet and is only operational for about 15 days a year.
[4] An interesting, albeit technical, short read here: https://medium.com/@petergombos/lm-ntlm-net-ntlmv2-oh-my-a9b235c58ed4