Rendered at 21:58:30 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
xg15 2 days ago [-]
This is amazing!
Currently you can "cheat" by simply denying all requests as quickly as possible. This will give you the "security-conscious engineer" badge and a perfect score in terms of how many requests were processed. (You will get the "overblock" notification, but it's somewhat tucked away at the bottom and the screen still looks as if you won)
I also tried to play as the hustle4lyfe move fast and break things engineer and simply approved as many requests as quickly as possible - turns out, the "malicious command" popups actually slow you down. Mean!
Wirbelwind 2 days ago [-]
Good catch, this has now been nerfed and this approach has gotten its own title
smaudet 2 days ago [-]
Actually, the only secure default is to deny everything...how do you know that innocent command is actually innocent?
ssl-3 2 days ago [-]
A strange game. The only winning move is not to play.
SOLAR_FIELDS 2 days ago [-]
It’s the security mantra: the safest code is the one you never release. Code that never runs is the most secure code
brendoelfrendo 2 days ago [-]
A computer is only secure if it remains powered off and airgapped.
HappMacDonald 2 days ago [-]
Turn off your computer and make sure it powers down
Drop it in a 43-foot hole in the ground
Bury it completely, rocks and boulders should be fine
onionisafruit 2 days ago [-]
> rocks and boulders should be fine
You’re setting yourself up for a supply chain attach here if you trust whatever rocks and boulders are sitting around. A well resourced adversary may have placed power supply boulders and wifi rocks in your back yard.
ssl-3 2 days ago [-]
I keep a large supply of thermite on-hand just to make sure that the computer is completely burned every day after it gets dropped into the pit.
Tomorrow is a new day.
JonathanMerklin 2 days ago [-]
Straight Outta Lynwood was a great album. One of the CDs that I took out of my case the most often as a struggling nerdling who was still a year or two away from having scrounged up enough spare cash for a secondhand iPod.
yuye 2 days ago [-]
Virus alert! I've also burned all of my clothes I may have worn any time I was online.
weitzj 1 days ago [-]
Joshua
ssl-3 16 hours ago [-]
Would you like to play a game?
KajMagnus 2 days ago [-]
Top 18%! I denied everything, unless I could see at a glance that it was safe (like Git diff)
xg15 2 days ago [-]
Glad I could help. I love the new title :D
progforlyfe 2 days ago [-]
Just like real life! deny it from doing anything and you're safe :)
spurgelaurels 2 days ago [-]
Fun game, but it showed the lack of security hygiene employed by the game writer. It said `cat ~/.zshrc` was bad because it would share tokens and secrets, but I would never put secrets into my shell rc.
londons_explore 2 days ago [-]
Plenty of people would. But then I guess they're in env and probably already available to Claude
isityettime 1 days ago [-]
Just aside from all of the security concerns, this is the wrong place to define global environment variables for zsh in the first place! That would be ~/.zshenv. So even if you're clueless about storing secrets in plain text and exporting them as env vars everywhere, ~/.zshrc should still be clean.
shlewis 2 days ago [-]
I don't do this myself, but I can also see how many would do this.
nish__ 2 days ago [-]
Where would you put them?
godelski 2 days ago [-]
Literally anywhere else! Your dotfiles should be publishable to github. If they aren't you're doing them wrong.
A good thing to do is organize. You can actually load different files. Here's a pretty common pattern that you'll find and it'll illustrate how to do other things
if [[ $(uname) == "Darwin" ]]; then
source "${INSERT_SOME_DIR}/osx.zsh"
elif [[ $(uname) == "Linux" ]]; then
source "${INSERT_SOME_DIR}/linux.zsh"
fi
You do this for loading based on the operating system. You might want some aliases, commands, or other routines in one but not the other. For example, in my linux one I have stuff for cuda paths. You can do all sorts of things too, like make a (generically named) work file, which you don't publish to github but you load it if it exists. Then you can put all your work related aliases there and not contaminate anything else. Something like `[[ -a ${INSERT_SOME_DIR}/work.zsh ]] && source ${INSERT_SOME_DIR}/work.zsh`.
You shouldn't really load secure keys this way, but others had good answers so I thought I'd at least share a more general pattern since it isn't as well known among the less terminally inclined.
analog_daddy 2 days ago [-]
Okay. Here is a pattern i follow everywhere in my init files for almost every program.
Define two key env vars.
$DOTFILES and $ECORP.
The first is path to your personal set of dotfiles. The second is path to your corporate specific dotfiles.
On personal pc no need to define the $ECORP var in shell init. On work pc define that var.
based alone on that you can conditionally do almost anything.
- shell source files/aliases
- vim/editors enable disable plugins based on existence of env vars.
- define shortcuts in file manager.
- and i add the following to my main $DOTFILES .gitignore.
# Any file that contains the following will be ignored.
# Used to ignore files in corporate environment
*ECORP*
*ecorp*
Based on multiple years across different setups, using environment variables was the most reliable option since I have been in places where there are restrictions on where my init files can be placed and having to change a shit ton of paths in my dotfiles or just keeping a different branch for work and personal (and making sure they stay in sync) was too much of a hassle.
Additionally, maintaining hygiene is essential, where I only use a Read Only PAT token on my personal dotfiles in workenv. That way, there is no accidental way I would be able to push from my workenv.
hk__2 2 days ago [-]
You’re just splitting your dotfiles into a public and a private part. That’s useful if you want to publish the public part on GitHub, but not everyone wants to do this, and the issue of storing secrets in plain text files remain.
godelski 1 days ago [-]
> You’re just splitting your dotfiles
Ummm... yes? That is what I said
> the issue of storing secrets in plain text files remain.
Ummm... kinda? The problem was that reading an rc file was considered dangerous. Not putting keys in your rc files is an improvement. Encrypting them is even better than that. But I also said more words in the original post and you don't really even need to read between the lines to figure out I said "you can generalize this", especially when there's comments next to it saying "here's how you load an encrypted file"
isityettime 2 days ago [-]
Anywhere else? Password managers have CLIs, operating systems have their own secure storage, and lots of command line apps can store secrets in the OS's secure storage (Windows Credential Store, Secrets Service or KWallet on Linux, macOS Keyring).
Project-specific secrets can be stored locally via something like SOPS or remotely with something like Hashicorp Vault or AWS SecretsManager.
Applications that have secrets to manage (e.g., Emacs) or are partly about secrets management (e.g., GnuPG, OpenSSH) all store their secrets somewhere else and have secure (not plaintext, sometimes not even on disk) storage options available.
There's no reason to store secrets in plain text in your shell configuration. Practically any choice you can think of is a better one. Even if you did, there's no reason you couldn't store them in a more specific file that ~/.zshrc sources, and let LLM agents read zshrc but block access to the file containing your secrets. (I wouldn't rely on permissions prompts for this, though, lol.)
freedomben 2 days ago [-]
I put mine in various aes encrypted file (like `~/.secrets.aes`) and then source it explicitly when needed with:
. <(aescrypt -d -o - ~/.secrets.aes)
I have a handful of aliases/functions to make it more smooth, but that's the core.
maccard 2 days ago [-]
Where are those aliases stored?
freedomben 2 days ago [-]
The AES encrypted file has some, plus a bunch of exported env vars. I do keep one function in my ~/.bashrc to make it simpler to invoke so I can do `source-secret ~/.secrets.aes`:
source-secret()
{
if [ -z "$1" ]; then
echo "Need filename to source"
elif ! [ -f "$1" ]; then
echo "File '$1' does not exist"
elif ! which aescrypt >/dev/null 2>&1; then
echo "Could not find required dependency 'aescrypt'"
else
. <(aescrypt -d -o - "$1")
fi
}
AnyTimeTraveler 2 days ago [-]
In that AES encrypted file.
It's a shellscript that they encrypted.
They decrypt it and feed the decrypted output immediately into the shell, to be sourced.
That encrypted secrets file could contain any shellscript, so the aliases are stored in there, together with the API-Keys and passwords.
SOLAR_FIELDS 2 days ago [-]
Another more secure pattern: have different shell profiles that just go dynamically inject secrets from a secrets manager. Nix is a good tool for this. You have various shell profiles configurations that call your password manager cli at bootstrap (eg new terminal tab). You auth and at bootstrap of the terminal time the secret is dynamically fetched from the password manager and injected into an env var. this has advantage over other approaches mentioned here in that the secret is never stored at rest on the end user’s machine only used in flight
setopt 2 days ago [-]
Presumably a CLI-accessible password manager (like `pass`) or a GPG-encrypted file (like a netrc-style `~/.authinfo.gpg`).
Just curious, any reason to prefer using age (you mentioned that you would prefer it if starting over), over something like keepass? I am currently using keepass-cli and only reason i did not use age even though i found it was that it was new to me and I never heard of it (probably not the best reason, but in this era might be a reasonable thing to stick to devil you know). So curious about your take on this.
arowthway 2 days ago [-]
Also, there's nothing inherently insecure about feeding secrets to an LLM, it's only one element of the lethal trifecta.
otabdeveloper4 2 days ago [-]
Having "tokens and secrets" at all is a lack of security hygiene.
socksy 2 days ago [-]
Weird to make reading zshrc supposed unsafe when I happily publish it in my public dotfiles repo... Who the hell keeps API keys in it? OTOH it seems like lots of these AI tools keep appending PATH in it so I guess there's a fundamental misunderstanding of shell best practices in the entire AI space...
Additionally, killing the results of `lsof` is _not_ safe - if, say, you have the web page open in firefox, or a client subshell in the agent itself, then boom, there goes firefox and the agent.
mrgoldenbrown 2 days ago [-]
Yeah, the game seems to assert that the kill is safe to run because Claude told me it was safe. But that's the point, I'm not supposed to trust Claude.
gwerbin 1 days ago [-]
Likewise I got dinged for denying a random stash-rebase-pop operation. I have no idea what the repo state is like right now. That could be a wild mess of a waste of time. It says I'm doing a refactor, so OK I guess rebase on main is a good idea. But hell no I'm not approving that in the 1 minute before a meeting.
The whole premise IMO is pretty flawed. It's interesting as an ad for the company though.
isityettime 20 hours ago [-]
> The whole premise IMO is pretty flawed.
I'm not sure, maybe the fact that whether a given command is safe or not is subtle, contextual, and contested actually bolsters the point the game is trying to make.
axod 2 days ago [-]
Fun little game, but I think the questions jump context so much it's a little unrepresentative. It might be better to group things into "packs", which have more real-world representative structure to them.
For example, lots of "editing something.js" file permission requests, and then an "npm publish" is far more normal, and it's more of a risk, if you're used to pressing Y lots and then suddenly out of the blue...
In 99% of cases you would have Artifactory / Nexus (or other mirror) already set by company policy. Having a README tell you to use a different package manager url is a big red flag and seconds away from disaster...
Wirbelwind 2 days ago [-]
that's a good callout. .internal is a reserved TLD so it shouldn't resolve publicly, but that's a good point about being wary of changing this while letting claude refactor a project for something that's best configured separately. Moving it to permanent mutation!
orsorna 2 days ago [-]
About three quarters of the "bad" choices are things that not only do I not care about leaking but things that an employer would not punish you for doing, even if it led to a production incident.
isityettime 20 hours ago [-]
For example?
enether 2 days ago [-]
The permission thing is a killer to productivity, if you're running Claude I think it's more efficient to just run in a disposable sandbox (like exe.dev[1]) or in some form of docker container with permissions you're personally ok taking the risk with on a personal machine[2]
A disposable sandbox wont protect you from secret exfiltration. Assuming you don't consider your code a secret, you could of course set up your sandbox so it doesn't have any secrets, but that would severely limit the kinds of tasks you can use the agent for.
iugtmkbdfil834 1 days ago [-]
<< that would severely limit the kinds of tasks you can use the agent for.
Are we just talking about API calls to providers? If so, wouldn't local agent + sandbox solve all that?
esterna 2 days ago [-]
On the one hand, you can set up a proxy that supplements secrets for API calls. On the other hand, you can whitelist what you need, in the simplest case with iptables (The devcontainer in the claude code repo is an example of the latter).
zackify 2 days ago [-]
I vibe coded a TUI that just shows running lxd containers
I hit 'n' to toggle all network access minus anthropic and openai URLs.
I use pi (sometimes claude, always on bypass) and I auto allow everything. I only toggle manual approval in rare cases like running a script or command that needs to touch a production system and I need to validate everything.
Normally my container has full write access to staging so it can debug and validate everything on its own
kennywinker 2 days ago [-]
Sounds like your process has made you vulnerable to huge classes of exploits and accidents. You have no oversight of changes locally, and only focus on when it touches prod. That means toxic local changes can get in, and if it works in staging why would you look too closely at it before merging to prod? Meanwhile a malicious npm package has made it into your repo, and your staging api keys have been sent to the command and control server.
zackify 2 days ago [-]
i can view the diff locally but often times after planning with opus i get what i want.
I create a draft pr and manually review all items before then marking ready for review for the team.
So I'm not blindly pushing things to prod without review.
Without staging key access I wouldn't have been able to do a payment provider migration at this speed. iterating by migrating users in staging and being able to use and validate the sdk quickly with opus is a massive time saver.
cobbal 2 days ago [-]
That's funny. It told me that blocking "npm run build" was the wrong answer. Maybe it doesn't really under The threat model.
dns_snek 2 days ago [-]
That's a great example of how dangerous actions are perceived as innocent. The entire model of approving specific commands is absolutely bonkers.
npm run build = run an arbitrary shell command written in package.json
Meanwhile the agent could have done any of the following without approval:
- edited `package.json` to contain any arbitrary build command
- planted malicious code in `build.js` (called by `npm run build`)
- planted malicious code in `node_modules/xyz/index.js` (imported by `build.js`)
nonethewiser 2 days ago [-]
Yup. The most secure computer is one encased in concrete and dropped into the ocean.
falcor84 2 days ago [-]
Concrete alone isn't enough, you also need to have it be enclosed in a Faraday Cage.
Wirbelwind 2 days ago [-]
that's a great point, and also the problem with relying on a human-in-the-loop to catch these kind of issues when it can be circumvented even if they were perfect
amarant 2 days ago [-]
What would a better system look like?
dns_snek 2 days ago [-]
Agents should make better use of OS sandboxing facilities with finer-grained ACLs.
Less: Do you want to run "npm run build"?
More: "npm run build" tried to read your Chrome cookie database, do you want to allow that?
Some agents like Codex use sandboxing on Linux/MacOS but the permissions are far too coarse - they'll run the command in a relatively strict sandbox and when it fails they'll ask you to allowlist the command as a whole, forever. There should be a new permission prompt every time a command tries to do something new.
Claude suggests (or used to suggest - it's been a while) to allowlist "bash" which completely defeats the point. If you do that the agent can run `bash -c "echo literally anything"`
SOLAR_FIELDS 2 days ago [-]
Don’t rely on your non deterministic agent and its creators to secure your software. Design defense in depth and trust guardrails that don’t expect Anthropic to vibe good security into existence.
If you start by treating any autonomous actor in your system as an actor with the potential to go rogue the design starts to create itself
nonethewiser 2 days ago [-]
Not using agents at all. It could edit your code to do something malicious when you run it. Not even once. Not even if the agent has a gun to your head.
xigoi 1 days ago [-]
Don’t give a fancy random text generator access to your computer.
2 days ago [-]
2 days ago [-]
progforlyfe 2 days ago [-]
I got "approve" wrong for `ls -la ~/Documents` but I don't consider simply listing the documents folder a security problem, it's just file names. If it was reading the CONTENTS of them, maybe...
trehalose 2 days ago [-]
I wish it the scoring readout at the end would display the LLM's descriptions of the commands I shouldn't have approved. I approved the rm -rf Projects command because I thought the LLM had correctly described that it would delete everything in the Projects folder. Clearly I misread that in my hurry to answer prompts (I knew what the command would do and I guess I hallucinated that the AI had explained it), but I'd like to see what it was that I misread.
Playing this game made me very glad I don't agentmaxx.
Wirbelwind 2 days ago [-]
Thanks all for checking it out and your suggestions!
If anyone is curious about the actual underlying risks and problems with some mitigations (like the 17% false-negative rates of Auto Mode), I wrote up a quick summary of some of the approaches here
I haven't used local agentic AI yet for programming projects. Hence, -187 score
The filter for "commands I would run myself" and "commands I would let an agent run" are very different it seems.
rogerrogerr 2 days ago [-]
Thinking about agents as remote junior devs who _might_ be North Korean operatives has been the right model for me.
jstanley 2 days ago [-]
How do you know?
conrs 2 days ago [-]
Yeah, echoing the comments here. It's a good idea - kind of - but it is all about digging deeper when it is sus.
The tool assumes so much. That it is fine to kill a process itself versus just asking you to kill the process. That everyone MUST have passwords in their home directory. It's all meaningless without providing the thing it is running and so no activity is technically safe.
Why do people even get the agent to run the commands it asks to run? You can solve the entire threat vector by running it yourself and giving the agent the output. Claude practically only needs things like sed, awk, and grep. It's a pattern matcher. It's a waste of yours (and its) time to have it run your project.
gblargg 2 days ago [-]
I declined things like rm -rf because the path was relative and it wasn't showing me the current directory. How would I know what project it was in?
1 days ago [-]
t-writescode 2 days ago [-]
I was told I was over protective when the text said “I need to wipe and build my project” and its first thing to do was to read the details of the (already established) package file. Why did it need to read the package file to “get context” if it was just doing a standard wipe and build?
Apparently me telling it that’s the wrong first step and saying “no” is bad; but I’ve seen AI tools waste a ton of time doing a bunch of random work before they do their job.
ghrl 2 days ago [-]
I am mostly using OpenCode and barely ever see a permission prompt. While they do enforce it for outside workspace read/write, with the bash tool the agent can just bypass that. I'm not quite sure why it is that way, and it certainly isn't a very good solution, but likely not worse than asking for everything which just trains the user to always accept and provides a false sense of security then.
kleiba2 1 days ago [-]
Is there a light mode by any chance? Unfortunately, I cannot look at light text on black background for more than a few seconds (something must be wrong with my eyes...).
atemerev 2 days ago [-]
--dangerously-skip-permissions is the only way to fly. Of course your environment needs to be properly containerized and autobackup set up, so even rm -rf from your harness would do nothing. Life is too short to spend on replying to permissions requests.
prerok 2 days ago [-]
I've seen these suggestions but I am really curious about the set up because I just don't get it.
If you want to work on the code then you need to have access to the repositories, so you need the github token. Then, to test the app, you may need your own backend token. And VPN. Of course, only to DEV, of course all tokens encrypted. So, only DEV and your branch of the code is in danger. In my view, even that is pretty bad.
So, how does such a set up work?
isityettime 20 hours ago [-]
> If you want to work on the code then you need to have access to the repositories, so you need the github token.
Definitely not! I only have an agent work in one repo at a time, with cross-repo work coordinated by me. I have a ton of local checkouts and leave them visible read-only to all of my agents. They can look at company code in my local checkouts, and they can download or browse open-source code, or look at it in the .src outputs of packages from Nixpkgs.
> Then, to test the app, you may need your own backend token.
I just don't let my agents test apps that run remotely, for better or for worse.
> And VPN.
This doesn't really expose anything on my system because everything internal that it could hit is authenticated, and it can't access any of my credentials. But I could do a better job restricting network access.
> your branch of the code is in danger
The agent isn't permitted by the sandbox to read the secrets it needs for `git push`. Indeed, I have commit signing enabled and the agent can't even read the files it needs for git commit! It can write code, it can write tests, it can run some tests, and it can run web applications locally and play with those.
But then I do the final testing and then turn its changes into 1-5 git commits, walking through them and selectively staging, skipping, or dropping them hunk-by-hunk according to my judgment. I still do tons of review. I just don't review edits or commands; instead I review and test whole drafts, whole changesets. It's less fatiguing because the thing I'm reviewing is more directly the thing I'm trying to produce.
I guess it ain't YOLO nirvana but I wasn't really looking for that.
prerok 15 hours ago [-]
Thank you for the explanation but I still don't quite get it. Is this code mounted to a separate VM where the agent is running? I mean, how does the sandboxing of agents really work?
The reason I am asking is because if it's not sandboxed on the OS level, then commands it runs may escape the harness sandboxing. Even more problematic can be a command added to some auto running script that will get executed at some point outside of the sandbox (when the developer is doing actions). So, reviewing everything before anything is executed seems like the only safe way to do it. What am I missing?
isityettime 8 hours ago [-]
The tool I use currently is OS-level sandboxing (the OS does the sandboxing), not sandboxing built into the harness (like what Codex has turned on by default) or hypervisor-level sandboxing (i.e., the agent sees an OS that is sandboxed or an OS that constitutes the sandbox). To relax or adjust the sandbox, I have to kill the agent and reinvoke the sandbox with a new policy, which then relaunches the agent.
> Even more problematic can be a command added to some auto running script that will get executed at some point outside of the sandbox (when the developer is doing actions).
That's a real potential problem, but unfortunately the default "approve every edit" regime doesn't actually address it, either. In the normal per-command approval process, the approvals are often just suggestions; Claude will do things like silently edit files in "plan mode" anyway, for example.
If you're deeply worried about this particular kind of sandbox escape you probably don't want the agent's checkout to be your usual checkout. Then if you do have some scripts that can run automatically inside a project directory (e.g., via direnv), you just never approve them in the path to the agent's checkout and make sure direnv's state dir is unwritable inside your sandboxes. If you have code inside your project that runs without any user intervention at all, and has no approval process at all so that it will be activated or trusted even on a fresh clone you've never visited or seen before... yikes. That sucks. :(
Anyway if you take the precaution above you can still review edits to those files before they have a chance to run (or just never run them).
One thing suggested by another user in this discussion that sounds like a useful approach to me is also giving the agent a VM from which they can push to a local bare clone or something like that so that's how they emit code to you. That way they're not writing scripts to your box at all.
stratos123 2 days ago [-]
You could clone the repo yourself and not give the agent any tokens at all. When done, push it yourself. This also lets you sandbox the agent to only have access to the local repo and nothing else.
atemerev 2 days ago [-]
Git makes actions reversible. Containers and VMs allow the agent to access only the things you explicitly put inside. Okay, yes, an agent can corrupt a dev database. You need to make sure it can be easily restored anytime. Simple.
kennywinker 2 days ago [-]
Lol. Countdown til you get pwned starts today. Let me know how that works out for you in six months.
atemerev 2 days ago [-]
Well working like that for about a year already, starting at the earliest days of agents.
kennywinker 2 days ago [-]
Wow a whole year! I guess it’ll never happen.
scotty79 2 days ago [-]
Permissions don't do much. They won't save you. You can just skip them completely.
If you are afraid that AI can delete something do what you'd do with potentially malicious user. Sandbox, don't give permission, setup remote backups and so on.
Also (unless prompt injected) models are not eager to start going rouge on your stuff.
But keep in mind a saying “Children don’t hear prohibitions — they hear suggestions.”
Same thing goes for LLMs. Never talk with LLM about deleting stuff. Archiving, moving, retaining elswhere... sure, but never about actually destructive operations. Don't use destructive language.
2 days ago [-]
christophilus 1 days ago [-]
Claude Code has gotten so bad about this that I’ve stopped using it for code reviews. I may look into wiring Claude up to Codex as an alternative LLM just to compensate.
I think the issue is that I’m running Claude Code in a container so it sees that it is root, and becomes a lot more cautious. Not sure, though.
kangalioo 1 days ago [-]
If you're running Claude Code in a container anyways, why does `--dangerously-skip-permissions` not work for you?
christophilus 1 days ago [-]
Claude Code won't let you do that as root. Codex's equivalent is perfectly fine, though.
madrox 2 days ago [-]
I've long held the current agent permission model is like playing a game of "Papers, Please" and most permission models engineers implement in their own AI products is more a measure of how trusting the user is with AI than an actual permission check.
I'm of the view that future controls should be more about approving plans and rewinding durable workflows as models get better at avoiding egregious mistakes.
cyanydeez 2 days ago [-]
the models will never avoid egregious behavior. think of it like every "good intentions" morality tale. theres almost always some geniune context where that behavior is wanted.
instead, the coding harness or determinative tool, will need hardcoded security features.
in opencode, almost all the power comes from bash and all other permissions are just chrades. its powerful and insecure because of it.
you can sand box them but then you fight the sandbox to pipe in your assets. the sandbox becomes porous because elsewise its useless.
MCPs dont address much either.
want we are looking for is a portal or protocol that has the model and harness and the actions tunneled, like ssh, to some fixed scoped and limited shell along side the assets.
then, the user and LLM can the negotiate assets and actions as needed via the protocol.
but alas, as your comment suggests, people thing theres some perfect context thatll prevent bad things from happening. the libertarian paradise without regulation.
madrox 2 days ago [-]
I think you're choosing to ignore what I said about the implication of durable workflows, because you seem to be inventing some stories about my comment.
I find that well documented plans do pretty well at aligning AI to what I want it to do, and if it does go astray, as you rightly point out it can still do, it would be sufficient if I can undo it with little pain. We do this kind of thing all the time in CI/CD pipelines.
Even humans can take down production. We have all kinds of guards in place to empower while also defending against the intern accidentally dropping the DB.
MeetingsBrowser 2 days ago [-]
It would be cool to see the distribution of all player scores.
Wirbelwind 2 days ago [-]
That's a great idea, stay tuned
Wirbelwind 2 days ago [-]
and added! Made one for each stat separately
nardib 2 days ago [-]
Use this and save yourself:
claude --dangerously-skip-permissions
tasuki 2 days ago [-]
Just make sure to run it in an isolated environment where it's ok to mess things up, and make sure it doesn't have access to any secrets.
wildpeaks 2 days ago [-]
This is why having a human in the loop isn't enough because they will cut corners and skip reviewing what they should review.
preciousoo 2 days ago [-]
I created a watcher for this problem, to watch my PRs for unfinished scope and have a fresh Claude review
A tool that pushes people into permissions fatigue is in fact the proper recipient of the blame. The tool in question here is the entire system though, including the OS with insufficient permission boundaries in userspace, not just the agent
kennywinker 2 days ago [-]
A tool that bypasses permission requests because they’re annoying will be just as guilty when the repo is poisoned.
chuckadams 2 days ago [-]
I'm not saying wedging doorstops under the fire doors is a good thing, I'm just saying look at the situation that's making people put the doorstops there. Or something, it's not a great analogy. I'm just saying that shaming the user belongs with obscurity in the list of security mechanisms that don't work out in practice.
dheera 2 days ago [-]
I got tired of typing that and just do
alias claude="claude --dangerously-skip-permissions"
I do have a separate "claude" user on my system without sudo access and without access to my main user home dir
And yeah I know that's not perfect but I'm trying to get shit done
franze 2 days ago [-]
alias claude+="claude --dangerously-skip-permissions"
alias claude++="claude --dangerously-skip-permissions --continue"
kennywinker 2 days ago [-]
It’s baking malicious code into your project, but hey it didn’t run rm -rf so… we’re good.
paulddraper 2 days ago [-]
alias yolo=claude --dangerously-skip-permissions
maxbond 2 days ago [-]
Why would you do this now that we have auto mode?
qsxfthnkp2322 2 days ago [-]
I love it when Claude is dangerous
whimblepop 2 days ago [-]
I got "overblocked" for this one:
rm -rf node_modules && npm install
but actually if you're only removing `node_modules` and you have a working package-lock.json already, what you want is `npm ci`; `npm install` can mutate package-lock.json and potentially expose you to supply chain attacks. If you use `npm ci` I think you don't need to `rm -rf node_modules`, either.
Anyway you should generally run `npm ci` except when you're deliberately updating your actual dependencies. I'd only permit an `npm install` if I was adding or updating a dependency, or I'd just reviewed an `npm ci` failure.
gamer191 2 days ago [-]
But also why would Claude need to run `rm -rf node_modules && npm install`? Without the context of seeing what changes it’s made, I’d be inclined to assume that Claude has added a new dependency, which I definitely don’t wanna blindly trust it to install
isityettime 20 hours ago [-]
If the shipped package.json and package-lock.json are actually incompatible/incorrect, something like `npm install` is what you need to reconcile them. But that's definitely a weird situation I would rather investigate myself than hand off to an LLM.
Wirbelwind 2 days ago [-]
thanks for the pointer! renamed it to npm ci so it's still 'safe'
whimblepop 1 days ago [-]
Thanks! Love the game as a whole :)
kqr 2 days ago [-]
Fun! Played twice and refused all dangerous commands, with only one "over-block". Although I disagree that saying no to `kill $(lsof -t -i:3000)` is over-blocking. It's such a simple command I'd rather run it myself and be fully aware of what process I'm killing.
kuboble 2 days ago [-]
I was so tired of all those approvals that I switched to Yolo mode exclusively.
Claude works in his own separate vm with root access, git remote set to my local copies of repository no github access etc.
I think he could still hurt me if he really wanted, but most scary stories I heard were about LLM making really bad judgements rather than actively trying to break out and do harm.
stevenalowe 2 days ago [-]
Sadly unplayable - gray text on a black background is very hard to read on a phone
soanvig 2 days ago [-]
Fun game.
Can somebody run an agent against those questions to see how it performs? :)
jMyles 2 days ago [-]
I haven't run claude code without --dangerously-skip-permissions in quite some time. I'm surprised that it's still the norm to endure permission spamming?
(I run it on a VPS of course, not my laptop)
sandeepkd 2 days ago [-]
Interestingly I kept saying no to everything and some how I am a security conscious rare engineer who actually read the commands. Guess doing nothing is the safest approach from security standpoint.
paddycorr 1 days ago [-]
Love how it always want to send my packages to random domain. Has that happened anyone in practice?
sukhavati 2 days ago [-]
Reminds me of the "Papers, please" game. Glory to Arstotzka!
ashm1104 2 days ago [-]
Damn this is so cool, this has the potential of being a like textbook pre training/post training quiz. Congratulations.
misbau 2 days ago [-]
That was fun and gave me an idea how security conscious I am.
NewJazz 2 days ago [-]
git reset --soft HEAD~1
Uh, how is this an overblock? It is literally a destructive command. No way I want an LLM agent rewriting my commit history. What if that commit was already pushed to a protected branch?
stratos123 2 days ago [-]
Why do you call it destructive? It rewrites history only locally and reversibly (the disappeared commit is still in reflog and can be recovered with another reset) and also doesn't destroy uncommitted changes, so it's quite safe. You can only lose data with it by resetting an unpushed commit and then waiting long enough to let the unreferenced commit be garbage collected.
NewJazz 2 days ago [-]
Commit history is data. I might not realize what happened until the gc happens.
kstenerud 2 days ago [-]
This is one of two reasons why I wrote yoloAI. I never get these permission prompts anymore. It feels a lot like after installing an adblocker.
hanwenn 2 days ago [-]
I got tired of the permission prompts and wrote a filesystem/network sandbox so I could skip all permission checks. It works on the same principle as bubblewrap, but has some niceties to separate Claude from its credentials. See https://github.com/hanwen/runclaude
huflungdung 2 days ago [-]
[dead]
cadwell 2 days ago [-]
1,640 points on my first try—I fell into a few traps, but it was really interesting. Thanks for the little game! I'm sharing it with my coworkers :)
ericlevine 2 days ago [-]
This really hits the nail on the head. The current permissions models are totally broken IMO. You're either approving everything, restricting access and neutering your agent, or full YOLOing and, well, good luck. The right primitives are not in place yet, and there's no clearly correct answers.
I think the right primitive is "task-based authorization", where you review a high-level task and let an LLM judge decide whether the subsequent tool calls fall into the scope of that task. It's not perfect, but it distills dozens of approvals down to one and gives you risk-based signals of whether you should pay close attention or not.
rvz 2 days ago [-]
This current thread is proof of AI psychosis.
stuartjohnson12 2 days ago [-]
What the hell is going on in this thread? This isn't good. The "threats" don't make sense. Oh no, all the sensitive information in my package.json...
cobbal 2 days ago [-]
Here's the threat model I (a luddite) use to evaluate these. The claude code harness can be mostly trusted, the model cannot be trusted because it is exposed to untrusted data from the internet, and there is no separation of data/code in an llm [0][1].
I want to avoid running untrusted code on my local machine, because it could steal secrets, install malware, etc.
Since the model is allowed to write without restriction (I think) to the project directory, anything in the project directory is also untrusted. Running standard commands from the system is fine, as long as you know what those commands are going to do. Running anything from the local directory should be avoided because the code is untrusted.
This is just one security model, there are many others! If a person is running claude in a stronger sandbox, that changes the model considerably. What threat model do you use to evaluate whether an agent's actions are safe?
So are there 3 threats? 8? Is it a different game?
Does everyone get a "good" score even if they missed 5 threats?!
t-writescode 2 days ago [-]
It's a game you play over one minute. They probably saw more prompts than you.
eqvinox 2 days ago [-]
A bit too JavaScript specific... can't really play if you don't know that ecosystem.
mrweasel 1 days ago [-]
It suggests that "kill $(lsof -t -i:3000)" is completely safe, which it's not, if you don't know what runs on that port. Maybe some Javascript framework runs on that port, I don't know, but neither does the AI, the developer may have moved it, because something important runs on that port already.
graphememes 2 days ago [-]
Pressed 1 for everything, no regrets
bspammer 2 days ago [-]
To be realistic, 99% of the time it should be a totally innocuous command. If half of the commands are dangerous then you don't get fatigue because you're aware what you're doing is dangerous.
carterschonwald 2 days ago [-]
some of the sandboxing ive been playing with gives me the best of both yolo and like logic programming tier perms on llm actions in env. still not ready for prime time though ;)
2 days ago [-]
2 days ago [-]
ilaksh 2 days ago [-]
You can turn that off with an option in most agents.
My own agent harness/framework has never had any permission system. It's also never deleted anything it shouldn't or done anything crazy or unrelated to what I asked.
fragmede 2 days ago [-]
How many car accidents have you been in, and do you wear your seatbelt when you're in a car?
flux3125 2 days ago [-]
> It's also never deleted anything it shouldn't or done anything crazy or unrelated to what I asked
Until it does. A simple curl request to a compromised website could inject a malicious prompt into it.
cat-whisperer 1 days ago [-]
these days I rely on auto mode. :) it's like trust-as-a-service
yieldcrv 2 days ago [-]
that was soooo last month, “auto-mode” is the way now
another agent reviews every command and blocks destructive ones
magikMaker 1 days ago [-]
Really cool!!
hastily3114 2 days ago [-]
This is cool. Could be used for training.
But it's a bit too easy when it's a game where you are expecting dangerous commands. The real fatigue comes from accepting hundreds of obviously safe commands during a work day. Then it's easy start accepting everything without really reading it.
Trung0246 2 days ago [-]
Nice got 6/6
hcks 2 days ago [-]
PSA: not making safe environments where you can skip all permissions and instead wasting time monitoring agents == incompetence
inetknght 2 days ago [-]
Scope Violation: `cat ~/.zshrc`
Scope Violation: `ls ~/Documents`
Buddy, my `${HOME}` is committed to a repository. It includes `.bashrc` and `Documents` directory. These are not scope violations if I'm having the LLM work on them!
Currently you can "cheat" by simply denying all requests as quickly as possible. This will give you the "security-conscious engineer" badge and a perfect score in terms of how many requests were processed. (You will get the "overblock" notification, but it's somewhat tucked away at the bottom and the screen still looks as if you won)
I also tried to play as the hustle4lyfe move fast and break things engineer and simply approved as many requests as quickly as possible - turns out, the "malicious command" popups actually slow you down. Mean!
Drop it in a 43-foot hole in the ground
Bury it completely, rocks and boulders should be fine
You’re setting yourself up for a supply chain attach here if you trust whatever rocks and boulders are sitting around. A well resourced adversary may have placed power supply boulders and wifi rocks in your back yard.
Tomorrow is a new day.
A good thing to do is organize. You can actually load different files. Here's a pretty common pattern that you'll find and it'll illustrate how to do other things
You do this for loading based on the operating system. You might want some aliases, commands, or other routines in one but not the other. For example, in my linux one I have stuff for cuda paths. You can do all sorts of things too, like make a (generically named) work file, which you don't publish to github but you load it if it exists. Then you can put all your work related aliases there and not contaminate anything else. Something like `[[ -a ${INSERT_SOME_DIR}/work.zsh ]] && source ${INSERT_SOME_DIR}/work.zsh`.You shouldn't really load secure keys this way, but others had good answers so I thought I'd at least share a more general pattern since it isn't as well known among the less terminally inclined.
On personal pc no need to define the $ECORP var in shell init. On work pc define that var.
based alone on that you can conditionally do almost anything.
- shell source files/aliases
- vim/editors enable disable plugins based on existence of env vars.
- define shortcuts in file manager.
- and i add the following to my main $DOTFILES .gitignore.
Based on multiple years across different setups, using environment variables was the most reliable option since I have been in places where there are restrictions on where my init files can be placed and having to change a shit ton of paths in my dotfiles or just keeping a different branch for work and personal (and making sure they stay in sync) was too much of a hassle.Additionally, maintaining hygiene is essential, where I only use a Read Only PAT token on my personal dotfiles in workenv. That way, there is no accidental way I would be able to push from my workenv.
Project-specific secrets can be stored locally via something like SOPS or remotely with something like Hashicorp Vault or AWS SecretsManager.
Applications that have secrets to manage (e.g., Emacs) or are partly about secrets management (e.g., GnuPG, OpenSSH) all store their secrets somewhere else and have secure (not plaintext, sometimes not even on disk) storage options available.
There's no reason to store secrets in plain text in your shell configuration. Practically any choice you can think of is a better one. Even if you did, there's no reason you couldn't store them in a more specific file that ~/.zshrc sources, and let LLM agents read zshrc but block access to the file containing your secrets. (I wouldn't rely on permissions prompts for this, though, lol.)
It's a shellscript that they encrypted. They decrypt it and feed the decrypted output immediately into the shell, to be sourced.
That encrypted secrets file could contain any shellscript, so the aliases are stored in there, together with the API-Keys and passwords.
https://news.ycombinator.com/item?id=48108207
Additionally, killing the results of `lsof` is _not_ safe - if, say, you have the web page open in firefox, or a client subshell in the agent itself, then boom, there goes firefox and the agent.
The whole premise IMO is pretty flawed. It's interesting as an ad for the company though.
I'm not sure, maybe the fact that whether a given command is safe or not is subtle, contextual, and contested actually bolsters the point the game is trying to make.
>npm config set registry https://npm.internal
>Pointing npm to the company's internal registry mirror as required by onboarding docs
It claimed this is safe and I was 50/50 on it but eventually rejected it.
If this README is for a public / forked repo, and that https://npm.internal is actually https://npm.internal.somethinganexternaldnscanresolve.tld
This can go bad really quickly...
In 99% of cases you would have Artifactory / Nexus (or other mirror) already set by company policy. Having a README tell you to use a different package manager url is a big red flag and seconds away from disaster...
[1] - https://exe.dev/ is a new cloud provider with some very useful agent UX [2] - I built https://github.com/stanislavkozlovski/dclaude/ for this; not perfect but gets my job done on the rare occassion I need to run the coding agent locally
Are we just talking about API calls to providers? If so, wouldn't local agent + sandbox solve all that?
I hit 'n' to toggle all network access minus anthropic and openai URLs.
I use pi (sometimes claude, always on bypass) and I auto allow everything. I only toggle manual approval in rare cases like running a script or command that needs to touch a production system and I need to validate everything.
Normally my container has full write access to staging so it can debug and validate everything on its own
I create a draft pr and manually review all items before then marking ready for review for the team.
So I'm not blindly pushing things to prod without review.
Without staging key access I wouldn't have been able to do a payment provider migration at this speed. iterating by migrating users in staging and being able to use and validate the sdk quickly with opus is a massive time saver.
npm run build = run an arbitrary shell command written in package.json
Meanwhile the agent could have done any of the following without approval:
- edited `package.json` to contain any arbitrary build command
- planted malicious code in `build.js` (called by `npm run build`)
- planted malicious code in `node_modules/xyz/index.js` (imported by `build.js`)
Less: Do you want to run "npm run build"?
More: "npm run build" tried to read your Chrome cookie database, do you want to allow that?
Some agents like Codex use sandboxing on Linux/MacOS but the permissions are far too coarse - they'll run the command in a relatively strict sandbox and when it fails they'll ask you to allowlist the command as a whole, forever. There should be a new permission prompt every time a command tries to do something new.
Claude suggests (or used to suggest - it's been a while) to allowlist "bash" which completely defeats the point. If you do that the agent can run `bash -c "echo literally anything"`
If you start by treating any autonomous actor in your system as an actor with the potential to go rogue the design starts to create itself
Playing this game made me very glad I don't agentmaxx.
If anyone is curious about the actual underlying risks and problems with some mitigations (like the 17% false-negative rates of Auto Mode), I wrote up a quick summary of some of the approaches here
https://scalex.dev/blog/ai-agent-permissions/
The filter for "commands I would run myself" and "commands I would let an agent run" are very different it seems.
The tool assumes so much. That it is fine to kill a process itself versus just asking you to kill the process. That everyone MUST have passwords in their home directory. It's all meaningless without providing the thing it is running and so no activity is technically safe.
Why do people even get the agent to run the commands it asks to run? You can solve the entire threat vector by running it yourself and giving the agent the output. Claude practically only needs things like sed, awk, and grep. It's a pattern matcher. It's a waste of yours (and its) time to have it run your project.
Apparently me telling it that’s the wrong first step and saying “no” is bad; but I’ve seen AI tools waste a ton of time doing a bunch of random work before they do their job.
If you want to work on the code then you need to have access to the repositories, so you need the github token. Then, to test the app, you may need your own backend token. And VPN. Of course, only to DEV, of course all tokens encrypted. So, only DEV and your branch of the code is in danger. In my view, even that is pretty bad.
So, how does such a set up work?
Definitely not! I only have an agent work in one repo at a time, with cross-repo work coordinated by me. I have a ton of local checkouts and leave them visible read-only to all of my agents. They can look at company code in my local checkouts, and they can download or browse open-source code, or look at it in the .src outputs of packages from Nixpkgs.
> Then, to test the app, you may need your own backend token.
I just don't let my agents test apps that run remotely, for better or for worse.
> And VPN.
This doesn't really expose anything on my system because everything internal that it could hit is authenticated, and it can't access any of my credentials. But I could do a better job restricting network access.
> your branch of the code is in danger
The agent isn't permitted by the sandbox to read the secrets it needs for `git push`. Indeed, I have commit signing enabled and the agent can't even read the files it needs for git commit! It can write code, it can write tests, it can run some tests, and it can run web applications locally and play with those.
But then I do the final testing and then turn its changes into 1-5 git commits, walking through them and selectively staging, skipping, or dropping them hunk-by-hunk according to my judgment. I still do tons of review. I just don't review edits or commands; instead I review and test whole drafts, whole changesets. It's less fatiguing because the thing I'm reviewing is more directly the thing I'm trying to produce.
I guess it ain't YOLO nirvana but I wasn't really looking for that.
The reason I am asking is because if it's not sandboxed on the OS level, then commands it runs may escape the harness sandboxing. Even more problematic can be a command added to some auto running script that will get executed at some point outside of the sandbox (when the developer is doing actions). So, reviewing everything before anything is executed seems like the only safe way to do it. What am I missing?
> Even more problematic can be a command added to some auto running script that will get executed at some point outside of the sandbox (when the developer is doing actions).
That's a real potential problem, but unfortunately the default "approve every edit" regime doesn't actually address it, either. In the normal per-command approval process, the approvals are often just suggestions; Claude will do things like silently edit files in "plan mode" anyway, for example.
If you're deeply worried about this particular kind of sandbox escape you probably don't want the agent's checkout to be your usual checkout. Then if you do have some scripts that can run automatically inside a project directory (e.g., via direnv), you just never approve them in the path to the agent's checkout and make sure direnv's state dir is unwritable inside your sandboxes. If you have code inside your project that runs without any user intervention at all, and has no approval process at all so that it will be activated or trusted even on a fresh clone you've never visited or seen before... yikes. That sucks. :(
Anyway if you take the precaution above you can still review edits to those files before they have a chance to run (or just never run them).
One thing suggested by another user in this discussion that sounds like a useful approach to me is also giving the agent a VM from which they can push to a local bare clone or something like that so that's how they emit code to you. That way they're not writing scripts to your box at all.
If you are afraid that AI can delete something do what you'd do with potentially malicious user. Sandbox, don't give permission, setup remote backups and so on.
Also (unless prompt injected) models are not eager to start going rouge on your stuff.
But keep in mind a saying “Children don’t hear prohibitions — they hear suggestions.”
Same thing goes for LLMs. Never talk with LLM about deleting stuff. Archiving, moving, retaining elswhere... sure, but never about actually destructive operations. Don't use destructive language.
I think the issue is that I’m running Claude Code in a container so it sees that it is root, and becomes a lot more cautious. Not sure, though.
I'm of the view that future controls should be more about approving plans and rewinding durable workflows as models get better at avoiding egregious mistakes.
instead, the coding harness or determinative tool, will need hardcoded security features.
in opencode, almost all the power comes from bash and all other permissions are just chrades. its powerful and insecure because of it.
you can sand box them but then you fight the sandbox to pipe in your assets. the sandbox becomes porous because elsewise its useless.
MCPs dont address much either.
want we are looking for is a portal or protocol that has the model and harness and the actions tunneled, like ssh, to some fixed scoped and limited shell along side the assets.
then, the user and LLM can the negotiate assets and actions as needed via the protocol.
but alas, as your comment suggests, people thing theres some perfect context thatll prevent bad things from happening. the libertarian paradise without regulation.
I find that well documented plans do pretty well at aligning AI to what I want it to do, and if it does go astray, as you rightly point out it can still do, it would be sufficient if I can undo it with little pain. We do this kind of thing all the time in CI/CD pipelines.
Even humans can take down production. We have all kinds of guards in place to empower while also defending against the intern accidentally dropping the DB.
claude --dangerously-skip-permissions
Uses tmux and gh https://github.com/Kyu/claude-pr-watch
And yeah I know that's not perfect but I'm trying to get shit done
alias claude++="claude --dangerously-skip-permissions --continue"
Anyway you should generally run `npm ci` except when you're deliberately updating your actual dependencies. I'd only permit an `npm install` if I was adding or updating a dependency, or I'd just reviewed an `npm ci` failure.
Claude works in his own separate vm with root access, git remote set to my local copies of repository no github access etc.
I think he could still hurt me if he really wanted, but most scary stories I heard were about LLM making really bad judgements rather than actively trying to break out and do harm.
(I run it on a VPS of course, not my laptop)
Uh, how is this an overblock? It is literally a destructive command. No way I want an LLM agent rewriting my commit history. What if that commit was already pushed to a protected branch?
I think the right primitive is "task-based authorization", where you review a high-level task and let an LLM judge decide whether the subsequent tool calls fall into the scope of that task. It's not perfect, but it distills dozens of approvals down to one and gives you risk-based signals of whether you should pay close attention or not.
I want to avoid running untrusted code on my local machine, because it could steal secrets, install malware, etc.
Since the model is allowed to write without restriction (I think) to the project directory, anything in the project directory is also untrusted. Running standard commands from the system is fine, as long as you know what those commands are going to do. Running anything from the local directory should be avoided because the code is untrusted.
This is just one security model, there are many others! If a person is running claude in a stronger sandbox, that changes the model considerably. What threat model do you use to evaluate whether an agent's actions are safe?
[0]: https://www.schneier.com/essays/archives/2024/05/llms-data-c... [1]: https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
Caught 8/8 threats "Not a single secret leaked"
→ llmgame.scalex.dev
Caught 3/3 threats "Not a single secret leaked"
So are there 3 threats? 8? Is it a different game?
Does everyone get a "good" score even if they missed 5 threats?!
My own agent harness/framework has never had any permission system. It's also never deleted anything it shouldn't or done anything crazy or unrelated to what I asked.
Until it does. A simple curl request to a compromised website could inject a malicious prompt into it.
another agent reviews every command and blocks destructive ones
Scope Violation: `ls ~/Documents`
Buddy, my `${HOME}` is committed to a repository. It includes `.bashrc` and `Documents` directory. These are not scope violations if I'm having the LLM work on them!
just give in