Why SOPS fails at scale

SOPS is brilliant. The first commit was made on August 14th 2015, by Julien Vehent, who was working at Mozilla then. It quickly gained popularity thanks to its simplicity and ease of use, and it’s still going strong 10 years later.

Why it’s great

SOPS builds on top of third-party key management systems such as AWS KMS or Google Cloud KMS to encrypt & decrypt data keys, that are themselves used to encrypt & decrypt secrets, stored in files (usually secrets.yaml), on disk.

SOPS relies on the KMS providers’ standard APIs, and uses the user’s own permissions to interact with those APIs. With AWS KMS for example, SOPS depends on your AWS profile to access the AWS KMS API. If your AWS profile does not have permission to decrypt an AWS KMS secret, SOPS will not be able to decrypt a file encrypted with that key.

The real value SOPS brings to the table is the reduction of the cognitive load of securing secrets, specifically by obfuscating of the complexity of the encryption process, and handling the burden of interacting with KMS APIs, regardless of provider! To encrypt a file with a GCP KMS key, all you have to do is run sops encrypt --in-place --gcp-kms <gcp-resource-id> secrets.yaml. To edit that file after the fact, just run sops edit secrets.yaml!

On top of that, the encryption keys are stored in the cloud, you can easily grant access to other team members using your existing KMS provider’s RBAC system, without distributing the keys themselves. If your team has a shared Keepass database somewhere, SOPS is here to kill it!

The fun doesn’t stop there though! Since the encrypted material is a mere text file, you can commit it to git and store it alongside your code. In fact, that’s what most people use SOPS for. And since your CI jobs probably already have access to your KMS, deploying the secrets to the right environment becomes trivial.

SOPS is an elegant tool, solving a difficult problem. Yet, its crude simplicity — that makes it effective & likable — also makes it woefully inadequate for all but the smallest software organisations.

Where it falls short

No boundaries

Boundaries are great, not just with your nosy distant family members, but also when building security robustness into a socio-technical system. When you use SOPS to encrypt secrets that you’ll then commit to Github, you’re implicitly accepting that you won’t have a boundary between your code & your secrets.

“Why would that be a problem?” I hear you ask. I’ll answer with another question: would you store bleach right next to your lunch? Would you do that at home? Would you do that in your backpack when you’re out? You wouldn’t? Well, why not? The bleach is in a closed bottle, so you should be okay, right? Obviously, you shouldn’t store bleach next to your food because accidents happen, and bottles leak. The same applies to code.

Code is liquid. It leaks. Not having a boundary between code and secrets means the likelihood of large secret leaks increases significantly. Your code is copied on all engineers’ machines. Unfortunately, machines get infected with malware, and engineers get phished. Your code is also copied to Github’s infrastructure. It’s available to all these AI Agents your CTO has been pushing down your throat for the last 6 months. It’s copied by CodeRabbit, Cursor, Snyk and all the amazing Github apps you can think of. Companies get hacked, their data gets stolen, and their data includes your code, and possibly your secrets.

You get my point, it’s just terrible opsec.

Problematic auditability

SOPS itself does not have audit logs. It’s a simple single executable binary. It offloads traceability responsibilities to the KMS provider — which have audit logs, theoretically — and to the resources that the SOPS-encrypted secrets provide access to, but those don’t always support audit logs.

Let’s take a specific example. You have a secrets.yaml file, containing a dozen secrets, all encrypted by one KMS key. Someone decrypts that file. That will generate a KMS audit log. Now the decryptor has all the secrets in cleartext. If they’re a legitimate user, they’ll probably use one or two secrets to check something, and that would generate further audit logs (if you have those configured on the resources), and then discard all the secrets. But if they’re a malicious actor, well, they now have all your secrets, and you can’t know when they’ll decide to use them. So, if it indeed is a malicious actor, you’ll have to rotate all the secrets in that SOPS file to be safe.

Simply put, SOPS puts a big blind spot in your security observability. That might be fine if don’t have any security observability, but if you’re reading this article, you probably do.

Unnecessarily complex rotations

Going back to the example mentioned in the previous section, and let’s imagine that it was indeed a malicious actor that decrypted the file. Now you need to rotate all the secrets. SOPS being a single executable binary that runs on your computer, it can’t automate secret rotation for you, but you already knew and accepted that.

SOPS’ way of working introduces a set of issues, that aren’t inherent to the rotation process (ie. possible downtime). First off, you can’t just grep a leaked secret across all your repos; you’ll need to sops decrypt all the files, and then run grep. If you have all the right permissions — and that’s a big “if” — you can put together a throwaway script to do that. Once you’ve identified all the places where that secret is used, you’ll have to open as many pull requests as there are occurrences, wait for approval, merge and then wait for your CD pipeline to successfully run. This unnecessarily compounds the stress factor. Good luck mate!

Unscalable RBAC

As previously mentioned, each file, regardless of how many secrets it contains, is encrypted with one key. To be specific:

Each file uses a single data key to encrypt all values of a document, but each value receives a unique initialization vector and has unique authentication data.

— getsops/sops README

And knowing that RBAC is enforced at the KMS level, if you want granular access control over the secrets, you’ll need to create multiple files, each with its own key and it’s own set of permissions. It’s definitely within the realm of possibility, but is the added complexity economically viable? No, not at all.

The CI/CD issue

The operational cost of having different SOPS files per repo is so significant, you’ll likely end up with one SOPS file per repo, or perhaps one file per environment, containing all that repo’s secrets for that environment. And you’ll give that repo’s CI/CD pipeline access to decrypt the SOPS file, so it can deploy the application.

And when you do that, anyone who can open a pull request on that repo, will be able to run this little snippet.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


jobs:
  exfiltrate-sops-secrets:
    name: Exfiltrate secrets
    steps:
      - name: Do bad things
        run: |
          # Load the CI/CD's KMS credentials
          # ...

          # Output secrets to stdout
          sops decrypt secrets.sops.yaml

While you could mitigate that issue in many ways, that will increase the complexity of your CI/CD pipeline, which in turn decreases its reliability and your time to market as a result.

Other minor inconveniences

Let’s be honest: managing secrets with SOPS is a bit like trying to herd cats. There’s no shiny “single pane of glass” where you can just see & organize all your secrets in one place. Instead, you’re hopping between files, repositories, and editors like you’re competing in some kind of GitOps triathlon.

And then there’s the fun little quirk where people without access to the secrets themselves can’t even rename them. Yep, if someone wants to do a purely cosmetic update — say, standardizing keys from DB_PASSWORD to DATABASE_PASSWORD — they’ll still need full-blown access. It’s like needing the keys to the vault just to change the label on the box.

What to do about it?

For all the reasons above, if you’re starting to scale beyond 30 engineers and processing data people care about, you’ll need to consider more advanced solutions, or brace for the inevitable secret leaks.

There are still usecases where SOPS excels though. Some projects may never require advanced security features, either because they don’t process any valuable data or because of their small scale. And that’s perfectly fine! “Not every problem needs kubernetes”, and the same goes for the big secret managers out there.