All this seems to be hinting more than ever, that the time to provide these results directly and exclusively to the email address being queried is approaching.
Why is this API being abused? Because it provides valuable information—which took a significant amount of effort to curate—about an email address.
The list of services which have lost my (hashed or not) password at some point ever in the past eventually turns into a list of every service I’ve ever subscribed to.
Whether or not it’s possible to scrape that information together, is it really something that should be available to pull over an API for a million emails a month?
Note this is very different information than the password breach count, which gives you an approximate count of how many times a given password has been breached, and works as a proxy for password strength without disclosing any PII.
Sorry, but the cat is out of the bag. HIBP is evening the playing field, making the data less valuable to those who have the skills to collect it.
It's the same thing as responsible/full disclosure; by making this information available to anyone (publish a vulnerability), you greatly reduce the power of those who have the skills to collect it anyway (the person who found the 0day).
So yes, this information needs to be available, or it'll only be some people who have it, not none, and those few people who do have it will be 10x stronger than they are now.
This is the old Antisec debate all over again, let's skip to the part where we end up agreeing generally that disclosure is better, okay? No need to relive 2009 or whatever.
"Disclosure" could mean many things. The idea of providing the info directly via email to the affected user seems to adequately disclose things to the relevant parties.
Are there additional benefits of the public api that on balance benefit the public more than attackers?
Yeah, the availability of the data being common rather than rare, so the skill of collecting that data doesn't create a power structure where only the hackers/skilled users have power.
Imagine it being $500/month to access HIBP, because that's the alternative, not some, "everyone agrees to only use this info for good".
Explain to me how anybody besides myself can use info about my leaked account for something good or useful.
I can’t think of an example.
Therefore, having that info cost more is better. Having it cost a lot more is a lot better. (I’m assuming I can still get access for free by having provided directly to my email address.)
What? No, you're not understanding. Even if no one but you could use this info legitimately, the fact that it's widely available depowers the people who have the skills to collect it (specifically, people who want to do you harm).
By virtue of the fact that this info is widespread, you have no choice but to take actions to protect yourself from this information. That means the information becomes useless.
You are, in a way, being shamed into acting, through public disclosure. So no, having that info cost is not more better, it's more worse.
Furthermore, it is not an option to only let you have this information. That ship sailed when the breaches happened. You don't get access to this information for free, you don't get to control the dissemination of this information, you are powerless. You're acting like HIBP is the only way people can find this info out; it's not. That $500 price tag is just for you. People who are more skilled than you or I at collecting this info get it for free, and that's never going away.
You can’t have it both ways. Either it’s widely available, or it isn’t.
If it’s already widely available then HIBP doesn’t accomplish anything. (It doesn’t anyway, since it doesn’t “shame” anybody except people who are already signed up, who only need and get their own info.) If it isn’t widely available then HIBP is helping people who are bad at collecting and using this information to do so.
We accept that from bug reports only because of the other benefits that come from releasing the info.
You're not getting that the alternative is much worse.
Your data is out there. Period. The end. You don't have control over that. All you're doing is trying to re-establish control over data you already lost.
The question now is, do you want it only in the hands of people who want to harm you, or do you want it in the hands of both people who want to harm you as well as people who want to help you?
You seem to only want bad guys to have your data. That's weird.
Thanks for the explanation. I get your point now. I did not find BFDM’s proposed benefits from white hats having access to be compelling. So what I’m struggling with is simply the idea that anybody could do something good with my data. If only bad can be done, then the fewer people spreading the data around, the better. Your presupposition is that some people will do good with it if they have access that currently only bad people have. Can you give an example of one of some of those good things?
1Password tells you which of your passwords have been part of a breach. Many other companies will suspend the accounts of anyone whose login information to their leaked as part of another site's breach.
Other websites won't allow you to use a password that's listed as a common password from the aggregated passwords in breaches.
Lots of studies have been done on password frequency, such as the top 100 most common passwords and what security people can do about their repeated use.
Based on your question however, I'm concerned you don't actually get my point. You're being forced into action, exactly how companies are forced into action, by the availability of this information. You have to change your password if it's easily available to anyone who uses this API and who has your email address, you no longer get to pretend it's not a big deal.
> 1Password tells you...
This is software acting as an agent of the effected user. 1Password could be authorized by the email holder to gain access to the API without making the information public.
> Other websites won't allow you to use...
This and the following example in your comment are discussing the breached password API, which is a completely different API that I specifically mentioned up-front as not compromising any PII.
I take zero issue with providing an API to see counts of how many times a password has shown up on breach lists, although I wouldn't use the API myself on any of my own passwords, because it leaks a 1-in-1-million discriminator to the actual password you are querying.
You don't get to take issue with any of this. Your information was already stolen! You have no say, the end.
So your fallback position is that it is perfectly legitimate to traffic in stolen PII. Got it.
Well, I take issue with that.
Yes, in some cases it's perfectly legitimate to "traffic" (terrible word choice) in stolen PII, that is correct.
And my "fallback" position is that it's better this way than the other way, where it's actually being trafficked, rather than your hyperbolic assertion that it is now.
Now apply this same thinking to the Equifax breach and millions of credit reports.
Now apply this same thinking to the OPM breach and security clearances.
Now apply this same thinking to medical records breaches.
I don't want anyone to have my data, but I resign myself to the fact that data security is, and never will be perfect. That does not mean I resign myself to the fact that all my personal data should be freely available to the entire world via a well documented REST API.
A service provider could check the API for the signup email and if previously compromised could challenge the signup with additional CAPTCHA steps to detect bot activity. They could check email+PW entered against leaked pairs and prevent you from registering with a known-compromised PW.
Your bank could check emails attached to customer accounts and work with affected customers to ensure their bank account access is secure.
You employer could check for leaks of accounts using corporate domains. They could check leaked passwords against known last 5 to see if there are active threats.
You’ve convinced me. I didn’t know anybody could lookup my info. I only want it for myself.
Only thing is, there are a couple of old email addresses I used to use that I don’t have access to anymore. I guess I just need to shrug at that at this point.
The bad guys have access to it either way. That's the whole point: this is data that already leaked.
As discussed elsewhere in this thread, there are real benefits provided to bad guys by allowing them to look up this information about anybody in a central location.
There are plenty of other similar services that you can find that cheaply do the same thing AND provide you with the leaked hashes/emails.
HIBP doing the same thing with less friction for people who are trying to learn about security is probably fine in comparison.
This just feels like another iteration of the Full Disclosure debate.
I’ve never heard Full Disclosure concepts applied to serving stolen PII in an API.
The reason is that the purpose of full disclosure is to shame the vendor into ensuring the patch is made, and to warn the user base that the attack is possible, while disclosing a flaw in a commercial product.
In this case, we are not effectively doing either naming or shaming by publishing actual email addresses, rather than just user counts and the type of hashing that was performed.
And at the same time the information being “bartered” is private user information and not merely identifying a flaw in a commercial product.
I fail to see how an API into the HIBP database can be justified under the concept of full-disclosure. Particularly when the service could have been implemented as an email report to the queried email address.
You left out the biggest part of full disclosure in my opinion. The reason for full disclosure is because those who are affected by a security flaw in a product they are using have a right to know about the dangers of that piece of software.
But once I put that down in writing I discovered you are right about the difference in this instance.
The person who has the right to know about the flaw in this instance is the list of people whose accounts were compromised. Giving it to the general public is to further victimize them, rather than help them protect themselves.
I don't think that's the most important part. Rather:
Full disclosure can also protect previously unaffected / potential future customers, by warning them of companies that have been so lax with their security that they've been breached.
So to achieve a comparable upside to full disclosure, HIBP needs to also make aggregate data publicly available. Which they do:
https://haveibeenpwned.com/PwnedWebsites
Interesting. Good point. I'll have to think about that.
It feels that way, but there is definitely a different utility value in “searchable by the whole world” and “leaked in obscure formats in small nonpublic forums”.
Troy has absolutely added value here, although 100% of the data is all “public” from having been leaked already.
Searching over data that was publicly available some time in the past (but isn’t now) is also a value, sort of like time-shifting of the publicness of the data...
Bulk emailing notifications to all affected addresses would be a deliverability nightmare, and would require manual intervention at most ISPs to prevent these messages from being blocked, which said ISPs may or may not be willing to do.
Just think of the number of clueless users who would mark such a notification as spam, and the number of old, dead addresses, some of which are now spamtraps.
edit: clarify bulk vs. individual notifications
That's a service Have I Been Pwned has been offering for years...?
For single addresses that specifically request it, which is both fine and hugely different from bulk notifications to any/all addresses observed in a breach, which is what I was referring to.
But I realize the wording in the original post is a little ambiguous; I had read "provide ... directly" as implying "push", but that may not be the case, and if so my comment above is not relevant.