Travis Whitfield
Home
About
Projects
Projects
Replication of AI Control Paper
I’m currently working to replicate the paper, AI Control: Improving Safety Despite Intentional Subversion (Greenblatt et al., 2024) using ControlArena and Inspect as part of…
No matching items