Atlas Blog

Your Solution Doesn't Know Your Problem Exists

Evan Miyazono — Sat, 25 Apr 2026 07:40:28 GMT

Evan Miyazono is the founder of Atlas Computing. This piece is his call to action for better field strategy and more scalable execution in the face of growing AI opportunity and risk.

(edit: I stealthily posted this without the email blast to subscribers because (1) I wanted to submit it to Manifund’s essay context with a very relevant prompt last Friday night, and (2) I wanted to do a heavy rewrite for succinctness before sending it to inboxes. But this essay won in its category, so I feel compelled to give it some distribution. 😅

I was recently at an invite-only workshop with 60 people. All brilliant, all ambitious.

We were there to help solve one of the biggest problems we believe is facing our species: in the next few years, amidst all the boons of more capable AI models, we’re heading for a world where millions of people will be able to direct cyberweapons beyond the reach of today’s spies; millions will be able to design bioweapons that could kill billions.

(I also frequent events focused on integrating AI into nuclear command and control, or risks to our ability to trust our collaboration and decision-making ability. Loads of fun to talk about at parties.)

Approaches like the (recently announced) $100M effort (joining other existing efforts) to give defensive cyber capabilities to open-source software may be too little, too late, ultimately too focused on too narrow a subset of civilization’s infrastructure.

There are strong proposals to create mechanisms to prevent dangerous uses of AI, securing various types of vulnerabilities against multiple threat vectors, creating multiple defensive layers; however, adopting the proposals in their current forms would require massive changes across entire hardware supply chains and could cost tens of billions of dollars.

I got up in front of the room to deliver some bad news.

There are six people in the world who could, in 15 minutes, set things in motion that would address all of your concerns.
None of them know you exist, some of them don’t even believe this problem exists, none of them give a damn about anything we do here today.
And it seems clear that no one here is trying to talk to any of them.

You’ve likely already heard that there are a billion dollars in your laptop and you just have to type the right words to get them out. It’s a provocative, simplistic statement that is also clearly true.

I think my bad news is similarly true: either you don’t credit the current technological revolution as being truly revolutionary, or you’re an AI doomer who thinks there’s nothing we can do to avoid catastrophic outcomes, or you understand that there are a certain number of calls, dinners, e-mails, workshops, white papers, and contracts that do get us from apocalyptic outcomes to great ones, because people are powerful and so are the technologies they wield.

There’s a lot (but maybe not enough) of humanpower and money chasing solutions, and I believe many people are approaching the problem of effecting change incorrectly; here’s what I recommend and why it’s worked for me.

The Excluded Middle

Humans are consummate tool-makers.

Organizations are themselves a kind of tool that, in our inventiveness, we’ve learned to create to solve problems. But even the best tools can limit our minds: once you have a hammer, you’re gonna have a hard time using screws instead of nails.

I worry that we, as decision-makers within organizations, are falling under the law of the instrument as we try to solve big problems.

Here’s what I mean:

VCs assume problems can be solved by startups.
Policymakers think things should be new laws, new taxes, new agencies.
Researchers look for research projects, because even sociotechnical problems need a proof-of-concept to derisk or illustrate capabilities.
And even the philanthropically-funded and non-profit organizations like mine have to color inside the lines of the tax code and our employees’ career trajectories.

That means we miss things. (Convergent’s President, Anastasia Gamick, recently wrote a related piece about how this limits our imagination when it comes to for-profit companies.) (Here “we” refers to an ambiguously-defined, self-identified set of organizations trying to improve the world faster and cheaper than what can be achieved deterministically by conventional means.)

There are a lot of entities in this space. Convergent Research and the various FROs, Speculative Technologies, FutureHouse, The Astera Institute, Renaissance Philanthropy, the Institute for Progress, BlueDot Impact, as well as my nonprofit, Atlas Computing, and so on.

Speaking for myself: I worry that all of us are, to some degree, incentivized to put up bigger walls around our particular fiefdoms, maximizing the capital and prestige of the problems we are dealing with. This makes the ecosystem more legible, in exchange making it less flexible or able to adapt to new types of problems. (For example, research funders issue specific calls around topics or subfields; every great researcher I’ve ever met could instantly name at least one project they would love to work on, but there’s never a relevant call for proposals.) This leads to a failure mode where problems that require a solution outside or spanning different fiefdoms remain unsolved; worse still, by survivorship bias, the most important problems will be these problems that don’t fit neatly into buckets.

Some examples:

We’re imbuing AI systems with expert knowledge in incredibly dual-use domains; but our controls, feedback loops, and safeguards against CBRN risks all hinge on the assumption that expert knowledge lives in expert humans. However, it’s difficult and illegal to kidnap, clone, or compel these experts to help you, and they’d try to call the FBI if you started down that path. On the other hand, free tools exist to do analogous things to these AI systems and they can’t or won’t invoke reactions from law enforcement. Who’s responsible for making sure AI systems won’t create catastrophic harm without crossing critical legal and privacy boundaries?
Who sets those boundaries, deciding what an AI system should (or shouldn’t) do? These systems are far more complex and less predictable than any other technology, so existing measures that reduce chaos caused by technology like procurement processes and standards are insufficient; conversely, our laws and institutions cannot be easily adapted to treat them like people. As a result, we default to nominating and arguing for which experts, organizations, or institutions should decide what AI should do or refuse, and this lapses into forcing new problems into fiefdoms built for old solutions. This looks very different from designing the best organization, ex nihilo, and then creating it.
How do we know that the AI agent you created is actually acting on your behalf, rather than someone else’s agent claiming to be yours? (And how do we fix the insecurity of social security numbers? One solution might answer both questions!)

These problems are challenging because they fall outside traditional domains and are, as-such, difficult to pose well within existing frameworks. They’re not quite Wicked Problems; it’s reasonable to imagine each has a solution, but everyone with nuanced understanding of the problem has some restriction (like a conflict of interest) that prevents them from proposing or pursuing a viable, timely solution. Therefore, these problems often become more challenging for anyone to solve because they are outside the domains, leaving no institution that could employ a would-be solver to do the work.

I founded Atlas Computing to map paths for society around the risks and toward the better outcomes that lie within a world with powerful AI. (I’ll call this role a Field Strategist; more on what they do below.) Before my work at Atlas, I built and ran Network Goods, a venture studio focused on building companies that provided novel mechanisms to evaluate and fund public goods and commons. Between those roles, over 5 years, I have identified technological gaps, recruited eight individuals to start a new company or nonprofit, take ownership over the proposed solution, handed off any progress that had been made, and introduced them to their first funder.

Focused on the most important and ignored problems, I have been consistently drawn into the role of fieldbuilding at the intersection of unusual fields. In those conversations, it was consistently clear that there were problems that not only were not being addressed, but that those problems could not be addressed by anyone in any role that was close enough to the problem to be aware of its existence. I had almost become numb to the refrain “it’d be great if someone would do X, but I don’t know who would” before realizing that Atlas Computing could be a living, compelling counterexample: an organization that actually became whatever it needed to be in order to fill that (meta)niche.

I joined Convergent to help design and launch FROs that would help prepare the world for powerful AI. I chose Convergent because this organization is the home of the most successful metascientific experiment of our age. Speaking just for myself: conceiving of a FRO might be (relatively) easy, but delivering enough successes for the NSF to launch the Tech Labs initiative might be the biggest structural change to science since the creation of (D)ARPA. I’m very excited to be here and to continue Atlas’s field strategy and organization-building work under the Convergent umbrella, and I expect some very interesting FROs (among other things) will come out of this work.

Convergent purposefully created the FRO, the Focused Research Organization, to contain, own, and solve some of the problems in the interstitial space. (You can read more about that here and here and here.) It was a new type of organization meant to solve an otherwise-excluded type of problem. Other institutional entrepreneurs have engaged in similar kinds of innovation in creating Advance Market Commitments, prediction markets, and forecasting groups.

I’m quite happy all this organizational innovation has happened. At the same time, it’s not enough, and we cannot just hope that more organizations in the current models will be able to solve the problems by being incrementally better.

Accordingly, it’s important that we consider creating new organizational forms to adopt orphaned problems. The intuition here is that one way to actually achieve systemic change is to find the leverage points to change the system’s behavior.

Michael Neilsen and Kanjun Qiu provide another intuitive path towards this conclusion in their piece, A Vision of Metascience:

Szabo writes of how, during the early Renaissance, exploring the oceans was an extremely risky business. Ships could run aground, or be blown badly off course by storms. Sometimes entire crews and ships and cargoes were lost. There were risks at all levels of an expedition, from the health and livelihood of individual sailors through to financiers who faced ruin if the ship ran aground or was badly damaged. But, Szabo points out, this risk profile changed considerably in the 14th century, when Genovese merchants invented maritime insurance: for the cost of a modest premium, the people financing the expedition would not suffer if the ship was damaged. This spread the risk, and made the expedition much less risky for some (though not all) participants. That change in the funding system helped enable a new age of exploration, discovery, and prosperity.
It is easy to imagine a salty Genovese sea captain, upon being asked how to improve shipping, saying that you “just need good ships, crewed by good sailors”. This would be in the vein of our scientist friends telling us “just fund good people doing good work”. It contains a large grain of truth, but is not incompatible with system-level ideas radically improving the situation. The salty scientists are correct, but only within a limited outlook. Research organizations do need to be maniacal about funding good people with good projects; they can also make system-level changes which have much more profound effects.

I hope at this point I’ve convinced you that we need new mechanisms and a new type of organization to find high-leverage interventions for interstitial problems. I also have a proposal for how that organization should work.

Give me a Long Enough Lever and a Field Strategist…The Care and Keeping of Good Problems

So, how do we find and fund the right people to adopt and care for an orphaned problem? (A line I heard at a recent conference: if you find a bottleneck – or an abandoned baby – you might not be to blame, but you should definitely consider yourself responsible.)

My proposed solution is to find people in society who are near-enough to the problem (either because they are close to it, downstream of it, or care for things that will be affected by it) and simultaneously identify and use creative mechanisms to align incentives between these people and create a properly-shaped solution to solve it.

That might sound tautological or boring, so let’s break it down a few nuances that I think are tripping people up now.

Roadmap what you think needs doing.

My loose mental model is that you could imagine a tree of macroscopically distinct futures, with probabilities distributed across all the branches — something like Tim Urban’s Tree of Life, but for civilizational trajectories. If you want to shape the future, your first job is to decide how you want to influence those probabilities.

That starts with roadmapping. What challenges will arise from growing AI capabilities? This is going to mostly come from asking experts — the people close enough to the frontier to see what’s coming, but often too constrained by institutional roles to act creatively given what they see. It sounds obvious, but trust forecasters who have consistently been right about this, not the canonical experts.

For each of those challenges, identify a reasonable subset of the world — likely a subset of the economy — that you can model with reasonable confidence. When I went through this exercise to generate this tree, I started with a normative premise: “We want AI that follows instructions and doesn’t cause widespread, catastrophic harm.” That decomposes quickly into concrete, industry-specific goals — how to make an AI follow instructions, whose instructions, what kinds of harm exist and how we stop them. For a specific example, biotechnology is a fairly well-defined industry, with clear industry players, predictable roadmaps, and a sea of investors, entrepreneurs, customers, and researchers. If you worry about a particular biosecurity risk from AI, it is challenging but straightforward to list the set of mitigations and the relevant adoption quorum for that mitigation to be successful. The “subsets of the world” are the green boxes in that tree, each inheriting something like a goal or win condition.

Decide how you want to change the future.

Once you’ve identified a relevant risk/industry1, figure out what it would look like to actually mitigate the risks — what you want the end-state to look like once a given AI capability has emerged. If your intervention can be quick, you won’t have to predict the shifting landscape, but if your intervention will take time, be sure to include forecasts of where various trends will leave the world you’re trying to change.

Rather than rejecting or ignoring those trends, I find it’s easier to envision and describe what defense dominance looks like compared to eliminating risk entirely – the goal is to introduce new tools, practices, knowledge, or affordances that make (otherwise destabilizing) new AI capabilities asymmetrically beneficial to defenders of civilization. If you can paint a clear picture, it’s easier for others to check feasibility, spot holes, or identify the role they want to play. Defense dominance is like a castle: very hard to attack a castle with another castle, but very easy to defend land. Formal verification has the same property — if you have tools that help you define security and privacy and prove that software has those properties, it’s very hard to use those same tools to attack the security or privacy of other actors. We created an example of what an early draft roadmap looks like just for cybersecurity here, though others have made even more progress proposing defensive equilibria for other areas; for epistemic security, here’s a list of possible interventions with a full defense-dominant scenario sketched out here.

From those roadmaps, you draft high-leverage proposed solutions — specific interventions that could enable the risk mitigations you’ve described. Then validate them: Talk to the experts who can say “yes, that’s a real problem” and “yes, that is technically possible.” If they say it’s not, ask them to help you problem-solve. “If you were convinced this is a problem, what do you think is the best way to solve this?” Talk to the users, customers, or stakeholders who, if they adopted your solution, would be sufficient to solve the problem. Ask if they can say “if that existed, I would use it” or “it would solve a problem I’m facing.” If not, find out why. Use those conversations to iteratively improve the solution you are proposing.

In my experience, developing these solutions is genuinely hard, and the existence of a solution that satisfies all your desired constraints is not guaranteed. But the experts almost always understand the problem well, want it solved, and frequently have ideas for solutions they simply aren’t positioned to pursue themselves. Stakeholders, in my experience, are far more likely to say “I wish I could help, but I’m limited in what I can do” than to brush off attempts to solve the problem. These are not people who need convincing that the problem matters; they need someone capable to show up with a credible plan and a willingness to help.

Make it easy for someone to step in.

Once you have a validated solution, think about what that intervention needs in order to attain impact. Asking a company to improve civilizational robustness by creating a slightly different product they can sell to a whole new market may be done once you’ve made a well-framed introduction. Setting up a new focus research organization might need a bit more effort and resources. The resources, as well as what business model it might need to be economically sustainable, inform what legal structure it should take. Then make it as easy as possible for someone to execute on the plan: develop milestones, budgets, team structure, job descriptions, a list of interested funders, and the proposed solution with sign-off from your experts and stakeholders — who become the new team’s advisors and first users.

Lastly, find the people who can execute.

This is the part that is controversial in the SF bay area, land of “fund people not projects” – I claim that strategy and execution are two largely non-overlapping skillsets, and can be done by different people. Some people have both, and some problems can only be solved by brilliant generalists who can adapt to shifting uncertain landscapes. In those problems, it’s better if you’re solving problems for stakeholders who don’t know you exist, because there’s so much uncertainty that you don’t even know who the right stakeholders would be yet. However, if you think things are going to break soon and you can predict how, there’s a lot less uncertainty to deal with. You should start solving those mazes from both ends, which looks like identifying a quorum whose agreement would be sufficient to make adoption of your solution inevitable, and elicit conditional commitment for a solution (if only you can provide it). That quorum’s clear demand signals are the hard part of solving the problem, so you need to be damn sure they know you (and your proposed solution) exist.

That demand signal gives you clarity that unlocks specialization of labor, which in turn gives you more capacity to build better solutions, faster:

More capacity, because the required skillset for each role is narrower, which means there are more people who qualify
Faster, because you’re not waiting for the unicorn generalist. Everybody is hunting for the unicorn generalists: but there are so many items in civilization’s maintenance backlog that we’d run out of mythical creatures!
And better, because a field strategist can pivot to a completely different solution the moment a better one is proposed — something a generalist founder, who will inevitably fall prey to the law of the instrument, cannot do. They’ll only succeed if they happen to propose solutions they personally have the skills to pursue.2

The person turning potential risks into a derisked solution that can be handed off to an identified team for execution is what I call a Field Strategist. You may have seen this call for ‘general managers’ for more of the world’s important problems. I agree with everything in that piece, especially the premise that the most important problems have no silver bullet solutions, and rather require a lot of lead ones. That said, finding a GM is hard, providing them with sufficient resources to solve the problem is hard, and that means scaling this approach to all of society’s problems is likely impossible. I claim the field strategy approach is far higher leverage, since the strategist has ownership only as long as they must, and they parallelize as much as they can.

On a more legible level, the output should look like a proposal that is, by construction, the most compelling case you can make for a given intervention. Take the proposal for a new FRO we’re calling Oath Technologies: this organization will design and build new tools and workflows for understanding and checking AI actions by grounding those actions in mathematical precise descriptions of requirements. The direction is, by construction, the most compelling intervention I’ve come across for making formal verification ubiquitous for verifiable cybersecurity (lots of people are working on autoformalization; no one else is focused on validating specs in this way). Similarly, the founder is the best person I’ve come across to lead this effort. And because I landed on this intervention via the approach I’ve described, it’s easy to trace the causal claim back. Oath FRO makes it easy to validate formal specifications, which is the key gap that must be filled for cheap formal verification, which is needed for defense-dominant and provably correct software, which is needed for cyber-resilience, which is needed for “AI that follows instructions and doesn’t cause harm.”

What I’m asking for.

I need more field strategists who feel that their desire to have a positive impact is overconstrained by the requirements and affordances of existing organizations — people who love looking at Gordian coordination problems and finding a solution that aligns incentives, people who can convince a VP at a Fortune 500 company to conditionally commit to trying a solution if it can be built, or people who always know someone with shockingly relevant expertise. I’m currently raising funding to gather and support them, in what I’m convinced is the most important thing I can be doing. And I think we, as a society, need more people, in more places, to adopt this approach. The problems that don’t fit neatly into any existing fiefdom are, by survivorship bias, the most important ones left. They need someone whose job it is to find them, scope them, and make it easy for the right person to step in and solve them.

If you think you might be a Field Strategist, our job listing is here. If you’ve got a problem you think falls between the cracks, or you’re a funder who wants to back this kind of work, you can email me at evan@atlascomputing.org. We’re a nonprofit, so we’ll keep sharing our playbook on our blog as we go.

Biosecurity risks could be addressed by the Biotech industry.
Cybersecurity is its own industry at this stage.
Epistemic security probably requires interventions in the social media industry.

Relatedly, most FROs have a small number of visionary generalists and mostly consist of subject matter experts with a career of doing roughly X for the last 10 to 20 years, and the FRO asks them to do a rather unusual X’.

CSLib: Lean’s Formal Software Foundation

Evan Miyazono — Wed, 24 Dec 2025 13:25:19 GMT

Bottom line up front: If you love Lean and care about software, you’re likely to be excited about the progress on CSLib and might be interested in contributing to one of these two tracks present below.

If you’re reading this, you’re likely already familiar with Lean (the programming language and proof-assistant that’s steadily gaining notability for its use by mathematicians like Terry Tao and AI-for-math efforts like Google DeepMind’s DeepThink). There’s incredible value in using a computer to rigorously and automatically check the correctness of the mathematical proof for a theorem: you don’t have to either trust that the proof is correct or understand and verify every line yourself before you can use the theorem in your own proofs. And you could imagine wanting to rigorously verify software properties as well.

To prove theorems about mathematics, Lean starts from axioms, foundational and widely agreed upon truths, that are combined with formal logic. In software, axioms might take the form of formalizations of programming languages or functional descriptions of how compilers and operating systems behave. If we start from precise, logical definitions, we can prove properties about real code with the same rigor mathematicians use for theorems.

This matters because once you formalize these foundations, you can prove useful things about software. This is already commonly used in high-assurance software (e.g., proving that a cryptography library implements a particular function correctly, or that a microkernel provides guaranteed isolation between processes). However, many of these proof systems were designed for specific verification tasks at a time when the skills, expertise, and cost of generating specifications and proofs were very high. Now that we have a clear line of sight to a future in which the cost of generating proofs and code is very low, it is more important than ever to build a general foundation for proving software properties. This will enable us to compose different properties of subsystems to prove properties of overarching systems. (Imagine being able to easily and confidently reason about privacy guarantees, worst-case runtime bounds, or memory safety for your entire system because every library you import came with such guaranteed assurances.)

This is the long-term promise of CSLib, a new project and library in Lean 4, setting out to build verified foundations that connect high-level CS theory to low-level executable code. At Atlas Computing, we’re proud to host Alexandre Rademaker, one of CSLib’s tech leads, as we see this work as fundamental to building robust software systems. Feel free to check out cslib.io for documentation, or the CSLib roadmap for the full technical vision and where different pieces fit together.

More than an analog of MathLib

Lean has been transformative for mathematics, and Mathlib (the home of the various definitions and proven theorems in Lean) has led to an ecosystem where mathematicians can formalize proofs, build on each other’s work, and verify results with unprecedented rigor. But proving things about software is very different from proving things about math; for example, I’ve yet to meet a mathematician who cares about the performance of their proofs, or a computer scientist who doesn’t care about the performance of their code.

CSLib has two complementary pillars:

Formalizing core CS concepts directly in Lean, like models of computation, algorithms, data structures, and their properties
Building an infrastructure for Lean-based reasoning about everyday imperative code.

Together, these enable proving properties about real software using the theoretical foundations from the first pillar.

This is infrastructure work, and we need a community to help us formalize this computer science. Just as MathLib is building a community and working toward formalizing all sufficiently important theorems in mathematics, CSLib is building a community to formalize the undergraduate CS curriculum and eventually provide strong assurances about (eventually all) sufficiently important software. This will start with developing and building consensus around a set of design choices that the AI-for-software and AI-for-math communities are ready to build on.

Two Ways to Contribute

If you find yourself with some time over the holidays and have a fondness for Lean, here are two active tracks where I hear that CSLib would love some contributions:

Track 1: Formalizing CS Foundations, algorithms, and data structures

The current effort can be seen here, but will eventually include everything from cryptography and complexity theory. If you want to contribute, read the contributing guidelines and familiarize yourself with the repository structure.

Track 2: Verifying Low-Level Code

At Atlas, Alex is porting AWS’s s2n-bignum library to Lean. s2n-bignum provides cryptographic integer arithmetic routines in pure assembly (x86_64 and ARM), each with machine-checked formal proofs in HOL Light. Our goal is to bring these verified implementations to Lean’s ecosystem. The first ARM assembly proof has been completed and is available at github.com/atlas-computing-org/bignum. Near-term priorities include expanding the executable ARM model, completing Mach-O binary parsing, and implementing decision procedures for bit-vectors. The bignum project will serve as an early CSLib consumer, demonstrating how verified low-level code can leverage CSLib’s foundations. We expect to reuse general-purpose definitions and theorems from CSLib throughout the bignum proofs, creating a practical feedback loop that helps shape CSLib’s development.

If you’re looking for a place to ask questions, check out the website and the Zulip community forum.

Why It Matters

The connection between these two tracks might not be evident at first. But consider: you’re simultaneously building the theoretical vocabulary (graphs, complexity, algorithms) and the verified compilation path (starting with proven arithmetic primitives). When both exist, you can finally do something remarkable: prove properties about real systems, from algorithm choice down to machine instructions.

This is how we get to a world where:

Compilers come with correctness guarantees
Operating system kernels have verified security properties
Algorithm implementations carry proven complexity bounds
Software infrastructure is trustworthy by construction

CSLib is seeding this ecosystem. It’s early enough that your contributions will shape the whole trajectory.

An alternative to "fund people not projects"

Evan Miyazono — Tue, 16 Dec 2025 15:15:37 GMT

I’m a firm believer that it’s wise to “fund people, not projects” on the margin. I think this is especially true if you are trying to maximize upside in competitive markets with high uncertainty. This holds especially true when your founder or leader has to adapt to new learnings and a highly dynamic environment (for example, in startups or in bleeding-edge research).

However, there are many environments where, from your vantage point, you can see what’s needed, and are confident that someone should just do the thing. Atlas Computing seeks to do this to improve societal security and AI resilience amid growing AI capabilities.

To advance this, we’ve developed the following catechism for starting organizations to unblock predictable technological development bottlenecks and maximize impact. Metascientists and DARPA alums out there will recognize that this was clearly inspired by and aspiring to be an analog of The Heilmeier Catechism for designing impactful research programs.

This complements the AI Resilience Gap Map by outlining a 6-step process to follow for each listed gap (row).

What are you trying to solve? This should be a bottleneck to unblock or a gap to fill.
1. What’s a good outcome that’s bottlenecked on a breakthrough or effort that no one is working on that would benefit from an intervention?
  1. Note: an intervention could be a new (for- or non-profit) organization, but could also be a contract, advance market commitment, agreement to launch a new product, or bill draft.
2. Or, what’s a risk that could be mitigated with a targeted intervention, but there’s no one working on that at the moment?
3. Artifact: Write a brief (<1 page) description about what’s happening today that seems clearly broken and how it should work instead. Get one relevant expert to attest to the real need.
What would you believe (that other reasonable people might disagree with) that would greatly inform how you would approach closing the resilience gap from 1?
1. Artifact: A written (short) story about how a new org would be sufficient to address the bottleneck (or close the gap).
  1. At least two field strategists should attest that this approach is the most likely to succeed, despite engaged and constructive criticism from the whole cohort of field strategists.
Premised on that belief, what should this intervention do?
1. Artifact: Write a ~2-page document that could be sent to a funder that describes:
  1. What is their north star mission statement?
  2. Who needs to work with this org, and how does this org solve a pain point for them?
  3. What does their 6-month success milestone to demonstrate competence look like?
    1. How many people are needed to achieve that? How much funding is needed?
  4. What is the longer-term (2-5 year) goal?
  5. What is the legal structure and business model for the org? Who benefits? Who pays?
Who are the most relevant 5-10 experts in the world who can validate (or iterate on) your beliefs from 1-3? These should be advisors, potential users or customers, or organizations that cover this cause area. Actually ask them for feedback.
1. Artifact: You can move on when they all point to the 2-pager from step 3 and say, “I want this to exist; it would solve a problem for me.”
2. Once you have buy-in from those users or customers, you have established some preliminary product-market fit. This increases the number of people with the skills to execute on this intervention, since that demand signal will reduce the uncertainty that must be navigated by guiding execution decisions.
Who would be interested in funding this, if presented with the right founding team?
1. Artifact: a list of funders with a realistic expected value calculation that accounts for the roadmap needed to reach the first milestone
2. Artifact: at least one of the two biggest funders in the above list expressing interest in the intervention and committing to diligence a team we source to take over and lead execution.
What skills are needed to run this org? Who is likely to have those skills? What experience do they need? Who would be your dream candidate(s)? Who can you think of who’s a plausible candidate, and what gives you pause? You should be confident that the founding team can make all future hiring decisions themselves.
1. Artifact: Generate a job description with enough specificity that we can give it to a recruiter and find candidates

Feel free to ask questions, comment on specific lines, or download the 1-page PDF of the above doc here.

Post-FMxAI 2025 newsletter

Atlas Computing — Fri, 31 Oct 2025 12:05:13 GMT

This is a summary we sent to our FMxAI attendees. We wanted to share the takeaways here as well.

Thank you for taking part in our meeting. It was great to see so much work going on at the intersection of Formal Methods and AI. Here are a few high-level themes we noticed, and we’d love to hear from you if there’s anything you think we missed.

We need more benchmarks / evals / RL environments to make AI models better at formal methods. We heard a lot of discussion about the lack of good evaluation benchmarks for formal methods. The current generation of models is extensively trained by reinforcement learning against problem-specific benchmarks. If you have a dataset of problems that AI can’t currently solve, even if the number of problems is modest, it seems impactful to turn it into an AI eval and get AI labs to include it in training. If you need help figuring out how to do this (or are generally interested in contributing to this effort), talk to Evan Miyazono (evan@atlascomputing.org).

We need more FM infrastructure. A lot of discussions focused on formal methods infrastructure: if AI gets stronger, will we have tools, conventions, and languages available for the AI systems and workflows to use? One new project that seems like it might help here is the CSlib project, which is building an intermediate representation for computer science concepts in Lean. However, it seemed like the FM infrastructure gap is very large, and would benefit from both more funding and more senior talent.

New orgs are starting. Several organizations represented at the meeting are brand new, having started in the last year. These include (in no particular order) Math Inc., Theorem Labs, Sigil Logic, Principia Labs, Axiom Math, Safer-AI, Ulyssean, and genproof.ai. We take this as a strong signal that people are starting to see the potential in formal methods combined with AI.

Engineering tools matter, specifications matter. Many people discussed possible AI-driven tools that could be used for engineering. We heard several people raise the notion of “verified vibe-coding” or “vibe-speccing”. A particularly important problem seems to be how to specify formally what AIs should do, and how to use these specifications to guide the AI to a correct response.

Lean is a big thing, but not the only thing. The Lean theorem prover was a topic of discussion in many conversations. On the one hand, Lean has become a common denominator —a tool known outside FM expert circles. On the other hand, we talked to many FM experts who were keen to emphasize the broad range of tools in formal methods, including CHERI. It seems to be a live debate in FMxAI whether and how to standardize on Lean, or to try to maintain the diversity of the field, or a defense-in-depth approach.

AI forecasts vary enormously. We heard many conversations about the future of AI, and here, forecasts varied enormously. Broadly speaking, attendees working more closely on AI predicted faster gains, while FM experts were more skeptical. At the most skeptical end of the spectrum, some attendees felt that AI capabilities were unlikely to increase, while at the other end, others predicted that fully automated AI software engineers would be in place by 2030.

Forward pointers

Here are some additional things you might want to sign up for updates on

Newsletter on formal approaches to AI security:
Can We Secure AI With Formal Methods?
Formal methods needs to know that AI security folks are a critical fountain of users. AI security folks need to know how to ask formal methodsititians for widgets. FKA Progress in Guaranteed Safe AI.
By Quinn Dougherty
ARIA Safeguarded AI program: https://www.aria.org.uk/programme-safeguarded-ai/
Verilib
The Atlas Computing blog will have updates on some related projects and spin-outs when there are public announcements (like a potential FRO to build tools to generate and validate formal specs)

Lastly, >70% of attendees who filled out the post-event survey said they’d highly recommend the event to colleagues, and almost 95% said they’d try their best to attend a subsequent event, so it seems like we found a good recipe: great people + lots of space to chat. We’ll keep you updated and hope to see you again soon!

Mike, Evan, and the team.

Civilization's maintenance backlog

Evan Miyazono — Fri, 17 Oct 2025 13:49:31 GMT

If you want to jump straight to our list of org-shaped holes, here it is. Otherwise… some context:

Atlas Computing has pivoted to forming new organizations to address critical gaps in AI deployment readiness and security infrastructure (or if you live in Berkeley, neglected catastrophic risks from AI).

In our previous post, we talked about how we updated our website to match this, and teased at sharing the list we’ve started. We’re excited to share our in-progress list.

As a teaser, here’s the ontology we’re using for identifying categories of gaps:

This list of gaps is:

not comprehensive
- We’d actually love to take suggestions of things we’re missing. If you have any ideas, please reach out to evan@atlascomputing.org, Or add a comment on this post or a comment on the sheet itself.
- I did try to come up with some labels that feel mutually exclusive and completely exhaustive. I can’t guarantee I did a perfect job, but this seems like a relevant way of sorting possible orgs.
not very well-vetted
- I think we got rid of all of the ideas that are clearly bad but some bad ones doubtlessly snuck through
not full of sexy startup ideas
- These are not brilliant research directions or clever product insights. The expectations of these have you saying, huh, yes, full. And 20% you smacking your forehead wondering why no one’s built this yet.
not strictly nonprofits
- As I’ve said before, we’re neither an incubator nor a fellowship program nor a think tank. We’ll try to get these orgs started (in a way that’s compliant with tax laws), but once someone can be the “go-to person” for that topic, we want to get out of the mix.
not a finished artifact
- and it’s not trying to be. We could take forever just polishing this list and that would lead to nothing getting done. My favorite part of lists is crossing things off
just a to-do list of all of the orgs that we think someone should start, and we’ll do as many as we can as fast as we can.
- The plan is roughly to fill in the columns from left to right, and the columns are pretty specifically designed so that each column provides useful constraints to the next column to the right
linked at the bottom of the page

Most of these gaps don’t just create risks - they prevent confident, trustworthy adoption of what is already a very useful, transformational technology. When you can’t verify security properties, you slow down rollout. When you lack coordination infrastructure, you get duplicated effort. When you don’t have clear standards, it’s hard to blame people for repeated reinvention. We’re looking at reducing risks and increasing upsides.

the important part

However, the most important thing in this blog post by far is not the list. Rather, it’s the illustration of how to use the list. For one of these items, the biorisk clearinghouse (currently row 26), we’ve started an initial exploration of what it would look like to set up this organization.

Why this one? Because it’s concrete, clearly scoped, and has obvious stakeholders I could talk to immediately. Over the coming weeks, we’re interviewing relevant stakeholders to find out if this is really the problem, what hurdles make the problem challenging, and what skills are needed to jump those hurdles. After that, we’ll source someone with those skills and support them in starting the organization.

Maybe somewhere along the way we find this project isn’t necessary. I’d be delighted if someone beats us to it. But these things have gone unaddressed for long enough, I think it’s worth trying to do it ourselves.

Stay tuned to see our progress.

here’s the list

join this effort

Oh, and I’m trying to recruit a small team (2-10 others) to work with me on actually scoping out and starting these organizations. If you think that you (or someone you know) would be great at doing that, please reach out. We don’t have an official job description on the website, but the preview google doc is here. And if you can and want to join this cohort within your org, that’d be welcome too!

Website updated!

Evan Miyazono — Tue, 02 Sep 2025 13:01:13 GMT

Hey everyone,

We’re coming up on the 2-year anniversary of the founding of Atlas (in October), and that seems like a good time to double-down on the things we think we’ve been doing well.

If we reflect on the past successes of Atlas, some of our biggest impacts have been

helping organize the GSAI summit
recruiting and supporting Jason Gross to do the experiments that became the foundation and impetus for Theorem Labs
helping build and guide the nascent flexHEG community, mentoring projects and building teams, including bringing in Mehmet Sencan to develop commercial tamper response mechanisms.

I wrote about this in my Q3 update, but I’m now 50% time at Convergent Research to create two FROs* that reduce risks from AI. One of those efforts will be focused on tools to validate formal specifications, continuing our work on an IDE for formal specifications, and the other will focus on building useful hardware for AI compute governance. That frees up Atlas Computing to look upstream of FRO creation and identify what teams or projects should exist to start addressing neglected potential catastrophic risks from AI.

Our first step is talking to experts and making a list. Once we have that list, we'll share it with all of you here.

After that, we'll start refining our understanding of the problems, identifying potential solutions, relevant experts, potential supporters, and interested stakeholders before gift-wrapping these and hunting for founders.

We hold a somewhat contrarian intake that the “generalist founder archetype” isn’t a binary characteristic, and we can lower the barrier to entry for creating an organization to address these risks IF the potential directly-responsible individual is provided the right problem, relevant context, initial milestones, stakeholders, and advisors. I’d claim that we demonstrated this with Mehmet Sencan and Jason Gross, neither of whom were founders before joining Atlas.

I hope we can reproduce this model and look forward to sharing our learnings with you as we try! (And even if we’re wrong, hopefully the list we make provides some useful starting points for others.)

*for those who don’t know, a Focused Research Organization (FRO) is a tightly scoped, time-bound initiatives (typically ~5 years) that pursue ambitious technical milestones (like large datasets, next-gen tools, or open protocols) through startup-style execution by teams of about 10–30 full-time employees. Their mission is to create and deploy high-impact public goods into the world, via open-sourcing, partnerships, or spinouts, rather than pursuing open-ended research. More here: https://www.convergentresearch.org/about-fros

Daniel Windham: Passing the Torch

Daniel Windham — Sat, 12 Jul 2025 03:58:22 GMT

After over a year and a half as co-founder and CTO of Atlas, I’m writing to share that I’ll be stepping away from my role at the organization. This is a personal transition, not a pivot for Atlas. Our mission to help humans govern increasingly powerful AI systems by setting clear rules remains as vital as ever. I’m thrilled that Evan and the rest of the Atlas team will carry forward this mission.

When Evan and I started Atlas, we had an ambitious hypothesis: that the future of trustworthy AI depends not just on better models, but on better review - tools that help humans understand, validate, and specify the rules that AI systems must follow. The growing pile of evidence for this claim now ranges from growing popularity of AI tools for screening AI-generated resumes to agentic misalignment, where agents behave differently when they believe they’re not being monitored.

More actionably, we believed emerging AI would make it possible to bring the power of mathematical guarantees to people who aren’t specialists in formal methods, and to develop tools that scale human judgment, not replace it.

This would have been a tall order for even the most experienced experts in the world, and I’m proud of the early steps we’ve taken toward this vision. Atlas incubated the development of two critical safety technologies and companies: the flexible hardware governors that became Earandil and the Lean autoformalization that became Theorem. In our in-house R&D, we’ve developed an IDE for specification validation backed by AI tools for aligning formal specs with natural-language documentation. And through our community engagement, we’ve helped shepherd tremendous growing attention and momentum in the GSAI community. Along the way, we’ve gotten to learn from and collaborate with pioneers in formal methods, AI safety, and community building. Most of all, we’ve built a team that cares deeply about doing this right.

Going forward, I’m thrilled that Alexandre Rademaker will be leading technical work at Atlas. Alex brings world-class expertise in logic, formal verification, and natural language processing, and he’s already been instrumental in driving our Spec IDE forward. He’ll lead our research work funded by Schmidt Sciences and our technical collaboration with the Beneficial AI Foundation, and I know he’ll do great things.

I’ll always be cheering for Atlas and I’m incredibly excited for what’s next.

Thank you to everyone who’s supported us, collaborated with us, or just shared ideas along the way. If you care about tools that help humans stay in the driver’s seat as AI systems become more capable, you should keep an eye on Atlas. The work is just getting started.

A refinement-based paradigm for code generation

Evan Miyazono — Tue, 03 Jun 2025 14:31:17 GMT

Consider this to be an extended answer to the question “why would you build the IDE for specification” described in our previous blogpost. The questions in this document are a modified+truncated Heilmeier Catechism that were part of a (rejected) proposal, but I wanted to share it as a public artifact.

“How is it done today, and what are the limits of current practice?” What’s the default trajectory + why is that not ideal?

All strategies for reducing AI risk follow the same limited paradigm: some combination of evaluations and benchmarks (measuring how capable or risky the AI systems are), red-teaming (trying to get AI systems to do the right thing), and “alignment”. Alignment is a vague notion that you’re encoding your values into the AI system itself, so that you don’t have to review its behavior.

Relying on this paradigm is fundamentally risky for many reasons, the first of which is that you have no reason to believe that the system is aligned with your goals, or that alignment is even possible. Additionally, you shouldn’t believe these practices can catch all possible failure modes. Lastly, alignment implies researchers are getting systems to behave well by indoctrinating them with cultural preferences/norms/values, which means that AI systems become vectors for ideologies.

Atlas Computing wants to build and deploy an alternative paradigm. Instead of handing off decision-making to AI systems and hoping that the AI systems will behave as the user would like, we propose building tools that set rules for AI systems that they prove they’re following. This is very similar to the research directions proposed by Davidad, Yoshua Bengio, Stuart Russell, Max Tegmark, and others. But rather than a pure research effort, we want to build and deploy prototypes of systems to make human review possible at scale. The natural place to start doing this is with a technologically adept and forward-looking government. We hope this will help establish Singapore as an ambitious and pragmatic international partner on AI innovation and governance within raising security and resilience baselines for AI and upskilling the workforce through sector specific AI training programs.

Not all areas of AI use are equally risky - we intend to start with the systems where we believe that our paradigm of specification-based AI will show the strongest benefits, namely AI systems generating software and the structuring of natural language responses. This proposal comprises 3 parts:

The remainder of part 1 describes the overarching vision of specification-based AI
Part 2 describes our first development direction: developing and validating formal specifications of software on the path to specification-driven AI generation of software
Part 3 describes a useful tool to use specifications to improve the quality of responses from large language models (LLMs) to reduce hallucinations and improve communication quality and clarity.

We’re proposing pursuing the work in Part 2 and Part 3 simultaneously.

What are you trying to do? What is the vision? Articulate your objectives using absolutely no jargon.

From above: “we propose building tools that set rules for AI systems that they prove they’re following.” Let’s break down this description of the paradigm we’re proposing.

“rules for AI systems”:

Users of AI systems should be able to describe in very precise terms what properties AI outputs should have – this holds for various types of AI outputs, like software, pictures, audio, engineering designs, news articles, and legal opinions. For a concrete example, let’s consider an AI system that generates images for simplicity, though part 2 of this proposal will focus on AI-generated software. At present, it’s very challenging for genAI systems to generate images that have every named feature (especially text).
We need a language to set rules for different types of AI outputs. (Note that this isn’t unprecedented - we have legal terms for various domains of law.) Constitutional AI is an imprecise form of this where the specifications are written in plain English (natural language) where a different language model plays the role of adjudicator. But the languages need to be very precise, so that we don’t have to abdicate review to an AI system. For our image-generating example, this language would likely include terms for image styles, as well as words that denote presence/absence/location/orientation of a feature.
- This specification language does not need be able to describe every aspect of the AI generated output, but should be the medium through which user preferences and governance processes can control the content. For example, in an image, a US Supreme Court Justice once said that the threshold for an indecent image is “I know it when I see it”, and while we don’t argue for the elimination of subjective opinion or human evaluation, objective specifications could define guidelines or conservative boundaries to empower human review.

“That they prove they’re following”:

Once we have specified what we want from an AI output, the AI system should present the user with both the output and corresponding evidence or proof that the output follows the rules. This is analogous to compliance processes (wherein solutions are presented with evidence that the solution adheres to relevant requirements). However, in this case, the required properties and evidence should be able to be evaluated by computers so as not to increase the burden of review. One could imagine one day applying this to not just formal verification of software but simplifying other aspects of compliance, ranging from structural engineering to early-stage drug development (validated via simulation).

“tools that set rules”:

As these rules might end up looking like a programming language in their own right, it is important to build tools to make it easy for normal users to set these rules and understand the implications of these rules..

“we propose building tools”:

We believe that the best first step to convincing anyone this is a better workflow is to start by prototyping the tools.
The intuitive next question becomes “how would these tools work?”. We envision a world where AI systems empower you by helping you understand and manage the complexity of the world — not serving you so that you can abdicate control and responsibility.

Someone with minimal (or even no) technical background should be able to use AI to develop a new software application, design a structure, write a book, or generate a new work of art while deciding exactly how much attention to pay to any design decision.
Their tools should empower them to justify any decision to even an expert in that field.
They should be able to generate descriptions of software systems or any other AI-generated artifact at any level of specificity, and use AI tools to refine the specificity of those requirements until the system is sufficiently constrained, at which point AI tools generate the artifact and prove that it matches the user-generated specifications.

The following figure shows different forms of possible specifications when designing a complex system. We expect a user to start with an informal, big-picture sense of the desired solution (i.e. the left column) and use AI tooling to identify and make well-informed design decisions until tests and an implementation are reached.

Here, every grey arrow is a tool that helps you refine your model of what you want to build with the AI system. The tool should help you ensure consistency across the various levels of abstraction and formality, and could/will look like a our IDE with a pair of panels, as described in our previous post. (Here’s the video demo of the current status of our tool if you missed it.)

While this diagram could apply to multiple types of AI outputs, for the remainder of this proposal, examples will focus on AI systems generating software. We choose this as our first direction because formal languages for completely specifying the behavior of software already exist.

How can this be broken down into manageable steps?

This can be built incrementally by identifying the parts of the above diagram that are already labor-intensive actions currently done by hand, and that building better tools for those actions to enable step-by-step progress toward a comprehensive product. We propose the following roadmap as a rough order (where we’re building tools represented here as arrows, as they make conversions between artifacts).

Success would likely be measured by traditional usage metrics, like the number of active users, their reports on the effectiveness of the tool, and the tool’s prospects to impact a larger number of people.

Year 1

In the first year, we focus on small software systems that are composed of a small number of functions and develop tools to convert informal specifications to formal descriptions and property tests.

Year 2

In the second year, we start looking at larger systems and adapt our system to incorporate architectural requirements in addition to the fine-structure and high-level requirements.

Year 3

In the third (and last year of the proposal) we complete and polish most translation tools. Additionally, we prototype tools to generate verified implementations, though we expect significant progress will be made on this front by other efforts to advance AI-based code synthesis and AI for proof generation.

After the first year, we would also hope to deploy enough tooling into practice to showcase our implementations of specification-based AI.

By year 3, the tool should support each of the following engineering workflows:

New workflow: Start a new Mapped Project from scratch
Initialize workflow: Convert an existing project into a Mapped Project
Update workflow: Evolve a Mapped Project as part of development or maintenance work

In summary

We want to empower people who have an idea of what they want to build with an AI system to:

continuously refine their sense of what they want built,
make informed design decisions, prompted by AI systems, and
focus on requirements of *what* should be built, and be able to ignore how it’s built.

We think that looks like an IDE for specifying properties of software, so we’re building the platform.

IDE for validating specifications

Evan Miyazono — Thu, 22 May 2025 13:07:19 GMT

A reminder of our overarching vision

AI promises to power high-assurance software that is cheap and plentiful with strong guarantees. But guarantees are only helpful if they guarantee what we care about. It will fall to human engineers to determine whether they got the guarantees they need.

This is fundamentally a human-in-the-loop design challenge. Therefore, we’re researching how humans make sense of formal specifications and what leads them to resolve issues and establish confidence in these specs*.

In AI-driven workflows, humans will describe what they want by using their existing designs and documentation, and/or by describing what they want on the spot. These descriptions will be informal compared to the level of precision used in formal specifications. AI systems will increasingly handle converting these documents into formal specifications and the subsequent verified code synthesis. Still, humans will need to review the many clarifying assumptions that refined the informal documents into formal specifications. Human review is vital because these clarifying assumptions will change the meaning of the specification, and because ultimately, the formal specification is what humans trust when they decide to deploy.

To understand and address human review needs, Atlas is studying how professional cryptography engineers establish trust in formalizations of existing natural language specifications.

Where the tool is now

To do this, we’ve built a tool for specification understanding and validation. We’re taking the approach that this should be an open-source platform where anyone can add modular features (like counterexample generation). It’s easy to start waxing poetic about a future paradigm of specification-driven AI (and we will in the next post), but concretely, we’re starting by simply doing line-by-line mapping between natural language and a formal spec.

Our goal before the end of this year is for a software developer with no experience in formal methods to be able to find a mistake we introduced into a mechanized formal specification of a system they’re familiar with simply because the tool steers them toward understanding that the spec says something that is not what they intend.

If you’re interested, we have monthly updates for one of our grantors here in Google Docs. You can also check out the code directly on Github: https://github.com/atlas-computing-org/formal-specification-ide.

Signal is a great testbed

To demonstrate a specific use case, we’ve taken the documentation of X3DH from the Signal Foundation and mapped it to Lean.

Here you can see the markdown description of the X3DH protocol on the left, and the corresponding formalization of sending the initialization message in Lean on the right. Gray highlights show text with corresponding chunks of text on the other side. Yellow highlights are meant to be warnings (possible inconsistencies), and red highlights are likely problems in the correspondence between description and spec.

The current components in this are

The IDE (built by Atlas)
Signal’s natural-language specification of their X3DH protocol at https://signal.org/docs/specifications/x3dh/
A hand-generated formalization of this specification in Lean (by Atlas)
Some hand-generated annotations that identify relationships and concerns between the informal and formal language (by Atlas)

We plan to add the following capabilities:

AI-copilot-style generation of formal specifications
AI-generated annotations mapping parts on the left side to the right side
- AI can do this, but doesn’t do a great job
- We’ll generate these, compare, and identify paths to improving outputs
Integration into VSCode so Lean or other formal code can leverage state-of-the-art IDE features

We’re already finding this valuable in our spec formalization. Mapping how our Lean code matched Signal’s specification and calling out our simplifying assumptions caught multiple mistakes we’d made and suggested additional design improvements. We’re excited to see how this can help others.

Call to action

If you would like to use this tool in your specification workflow, please reach out! We would love to support you if we can make something you’d actively use. hello@atlascomputing.org

* Don’t take our word for it; here’s Talia Ringer at HoTSoS talking about the spec validation problem at the end

(46:08): This is like a challenge I want to leave people with – I think the most important problem right now in this space is to figure out, what tools can actually best help users make sense of a generated specification that comes out of one of these tools.

Progress in autoformalization experiments

Evan Miyazono — Tue, 22 Apr 2025 14:22:19 GMT

programming note: Our quarterly update (separate from this blog) went out last week — check it out here, or tell me if you’d like those cross-posted here in the future. Also, this post is much shorter than what we’ve done previously; let us know if you like this format.

First off, a team update: We’re excited to announce that Jason Gross has spun out of Atlas Computing!

In his (regrettably brief but) eventful time at Atlas Computing, Jason primarily ran experiments using LLMs to transpile from Coq to Lean (repo) while advising work on our specification validation tool (repo; summary slides). These translation efforts were an important evaluation and demonstration, not simply to show that it could be possible for libraries or verification tools in one proof system to benefit others, but also to show that today’s AI systems are sufficient to significantly automate various processes related to proof generation and debugging.

There were generally minor issues that required effort getting this system work; for instance it resulted in a handful of new bug reports in the Coq proof assistant.

The nontrivial part of automating this

There’s already a tool that converts from Lean to Coq (not source-to-source, but compiled output to compiled output)
We started with Coq, and used an LLM to generate Lean.
This was then compiled and sent through the rocq lean importer
Now we can compare the compilation of the original Coq against the twice translated Coq, like so:

How to validate LLM translation (main Atlas-built parts are in green)

In Coq, the goals look fiendishly complicated and the proofs look trivial. This is because Lean uses a powerful elaborator on simple primitives while Coq uses a weak elaborator on more powerful primitives, but if you know the structure of how things should reduce, you can basically make it all go away.

That said capabilities were far better than expected and, we believe, far more useful in practice than most practitioners of formal verification believe. For instance, with some hand-holding, Jason got a frontier model to compose and prove a specification of program equivalence — a couple hundred lines of working Lean code in a couple hours. We have high confidence this will replicate across important codebases, and scale to larger and more complex tasks as models improve.

As a result, we’re excited that Jason will be dramatically scaling up our expectations of this effort. Here’s Jason’s home page if you’re interested: https://jasongross.github.io/

Govern AI with Rules, Not Values

Evan Miyazono — Tue, 01 Apr 2025 14:16:08 GMT

“it is indispensable that they should be bound down by strict rules and precedents, which serve to define and point out their duty in every particular case that comes before them”

Alexander Hamilton, Federalist 78, describing the judiciary, to become arbiters of the law

1. The case for specification-driven AI

If we continue treating AIs as human, we will yield our humanity to them. I’m claiming that specification-driven AI is a paradigm in which humans can translate our notions of norms and morality so that human-level AI systems can be required to respect human autonomy and negotiate concepts like morality as peers.

Consider this example:

If you hired a contractor for a kitchen renovation, you wouldn't share your life philosophy and aesthetic values in hopes that the contractor intuits what kind of cabinets you want. Instead, you’d provide detailed specifications: measurements, materials, deadlines, and acceptance criteria. Perhaps you'd work with a designer first, who specializes in developing these specifications. Importantly, the contractor must also adhere to building codes and safety regulations — external specifications that constrain what can be built regardless of client preferences. The contractor then delivers precisely what was asked for and provides evidence they've met the requirements.

This contractor relationship is fundamentally different from how we form collaborative relationships with other people. With employees, we train them, imbue them with company values, surround them with company culture, and guide them toward a company mission and goals. Then we grant them increasing amounts of autonomy in pursuit of our shared objectives. With children, we guide them and teach them through examples; we pass on values through principles like “treat others as you would like to be treated,” as well as providing copious feedback; and we give them increasing amounts of independence and autonomy.

I claim that effectively all today’s concerns about AI come from the fact that we are (perhaps unintentionally) trying to slot AI systems into the “human” category of the social fabric when we should be treating them as non-human entities, similar to corporations. Trust leads to anthropomorphizing AI systems which creates issues because they’re fundamentally not human. Alternatively, adopting a contractor-like framework could both dramatically reduce the likelihood of failure modes from unsafe AI and facilitate faster, more effective adoption of AI capabilities. This distinction is not merely academic. As AI capabilities expand, the way we instruct and govern these systems will fundamentally shape their impact on society.

In the remainder of this essay, I’ll explain…

Why you should be skeptical of the current strategies to “align” and improve AI,
What we can do instead: scale human review through formalization of policies,
What challenges must be overcome to adopt this paradigm, and
What the future of this approach looks like.

2. Values-based alignment is not democratic

Companies and research labs developing frontier large language models claim that the solution to misbehaving AI is “alignment,” in which the researchers more accurately and effectively embed norms and values into the AI systems themselves. However, AI researchers surveyed widely consider the question “How can one align an AI?” to be among the most important unsolved research questions in the field of AI.

An aligned AI would have internalized human values and preferences, allowing them to "do what humans want," or perhaps a specific as “do what the user wants” across diverse contexts. This goal may seem intuitive, but even if alignment were solved tomorrow, there are still challenges about what to align it to:

Your preferences vary over time: Your values today might not match your values tomorrow, next year, or five years from now. Which version of yourself should an AI align with? An AI system that defers to your future values would be deemed paternalistic, yet we (rightfully) criticize recommendation engines for exploiting our fleeting desires as they maximize engagement. Furthermore, I expect that confidence in an ostensibly aligned AI system would be further eroded if I knew it wasn’t actually aligned to me, but rather to a frontier lab’s approximation of me.

You (provably) can’t please all the people all the time: When multiple people use the same AI system, whose values take precedence? Arrow's impossibility theorem shows that it is provably mathematically impossible to combine preferences (e.g. in a vote) and preserve all of some basic, intuitive fairness properties. Even going from a group of n people with opinions on some options to saying “the group has opinions on the options” requires that you discard all but 1/n of the information! (What mechanism you use, whether it be rank-choice voting, quadratic voting, or markets, should be thought of as simply a choice of which information you’re disregarding.) By comparison, alignment seems to assume we can convince an AI system to act morally; how long after a company claims to have an AI that acts morally, will they (or others) begin to point to the AI’s actions as not just an example of, but rather the paragon of morality?

Alignment enhances cultural conflicts: If the alignment problem were solved today, AI systems become vectors for ideologies. America, China, and other countries are all struggling to ensure their values are embedded in AI systems. This concentrates extraordinary power in the hands of those who define these values, and increases geopolitical instability as countries attempt to ensure that any possible superintelligence might be their culture’s ideological successor. Furthermore, preferences evolve over time, creating a potential dilemma: either we allow future generations to update the values of long-lived AI systems (undermining the strength of alignment today), or we risk a future where humans are governed by superintelligent entities enforcing centuries-old value systems that no longer reflect contemporary moral understanding.

The interpretation problem: Values are inherently ambiguous and context-dependent. Consider a simple instruction to an AI image generator to create "historically accurate" images. Should it:

Replicate biased representation from historical training data?
Correct for historical bias while maintaining period authenticity?
Optimize for some middle ground between accuracy and contemporary values?

Moral philosophy points to the fundamental nature of the “is/ought chasm” (a.k.a. Hume’s Guillotine): you cannot, through logic alone, conclude a statement about what the world should be solely from statements about how the world is. Therefore, to reach an objectively correct answer about what an AI system should do requires that we start from some implicit or explicit premise about what “good” is.

These fundamental limitations (temporal inconsistency of individual preferences, impossibility of lossless preference aggregation, ideological amplification, and inherent ambiguity of values) reveal why the alignment paradigm faces both technical and philosophical obstacles. Rather than planning for AI system to internalize and interpret human values, we need an approach that establishes clear boundaries, enables objective verification, and preserves human agency in determining acceptable AI behaviors. Specification-driven AI offers precisely this alternative path.

3. What is specification-driven AI

Rather than embedding values into an AI to improve alignment, specification-driven (or spec-driven) AI is a family of approaches in which a user generates a formal specification — criteria expressed precisely and unambiguously so they can be automatically verified — and the AI’s output is verified against that specification, ideally with formal logic or mathematical proofs.

The following figure below illustrates how people currently use AI systems and compares it to a spec-driven AI workflow, which has the following steps:

First, the user first generates a human-reviewable, formal specification (the “Solution Spec” in the figure below). While the definition of a formal spec will be provided in the next subsection, consider it to be a list of all the objective properties that a solution should have. This is likely done with the help of an AI tool, but importantly, the spec can be reviewed by and explained to the human.
Once the spec is approved, a different AI system (the “Solution & Proof Generator” automatically generates a solution in addition to a mathematical proof that the solution satisfies the spec.
The proof-verification process is then a fully automated, trustless step that can be performed by a small, generalized, non-AI program.

It's worth noting that although there are multiple approaches that fit under this umbrella, the term spec-driven AI is not a common term (yet). The proposed Guaranteed-Safe AI framework and Safeguarded AI architecture would qualify, as would plans to use AI to generate formally verified software. However, strategies like Constitutional AI do not fall into the category of spec-driven AI because, in those systems, the constitution is subjective and must be interpreted by AI systems which must be trusted and the goal spectrum and AI are objective requirements that do not require trust. It also seems worth noting that spec-driven AI is a type of AI control, wherein the model is assumed to be either fallible or untrustworthy.

What is a formal specification

Formalization is the process of converting ambiguous natural language policies into precise, machine-checkable specifications. A formal spec must take inputs that are anchored in observations or measurements of the world, but can then reason about properties those inputs must have with mathematics and formal logic. (Going slightly deeper, formal logic is the process of generating conclusions from premises using axioms like “A implies B & B implies C ⇒ A implies C”.)

While it is challenging to formally specify all the relevant properties of a solution, there is already adoption or progress developing formalization techniques across various domains:

Software Verification: Formal verification is a branch of computer science in which desired properties of programs are formally specified and the software is proven against those properties. The notion of proving software properties dates back to early proposals by Alan Turing, and techniques already provide mathematical guarantees about critical software properties, including flight systems software on aircraft, train scheduling software, cryptography and computer networking libraries, and even an operating system microkernel. In most instances, adoption is limited by expertise in formal verification. However, AI could democratize these approaches, allowing non-specialists to express desired software behaviors in natural language and receive formally verified implementations.
As a separate example, the Rust programming language is growing in popularity because it enforces a property of memory safety on all programs, which could be considered a form of formal verification.
Tax codes are written in natural language, but tax software run by governments are mechanized formalizations (i.e. runnable as software). However, this formalization is not typically generated with input from the original lawmakers. There are efforts to formalize the tax code into a public, formal source of truth around tax liability, enabling anyone to understand regulatory implications, and deploy autonomous AI agents to operate with confidence.
HR policies can describe objective criteria for hiring, promotion, layoffs or other workplace changes. However, these policies may be inconsistent (i.e. containing conflicting statements) or contain undefined behavior (don’t fully explain what to do in all situations). The head of the largest formal verification team in the world has spoken about the work they’ve done formalizing HR policies, and the limit on who can benefit from this formalization seems largely limited by the cost of the expertise needed to formalize policies.

Hopefully it’s intuitive how formalization could apply to other domains as well:

Building codes specify measurable properties like minimum number of exits, number of electrical outlets, sizes of windows, and structural requirements. Imagine an architect submitting building plans online with all metadata needed to automatically verify compliance with building codes. Instead of waiting weeks for manual review, they could submit their plan with evidence and proof that the plans meet all requirements. This could enable near-instant feedback and approval of compliant plans, in exchange for extra work on the architect’s side generating computational arguments for compliance. This also should enable faster iteration because the formalized policies are transparent and as complete as possible without requiring human interpretation.

Information trustworthiness: as consumers of news and other types of information, we could state what criteria we considered necessary or sufficient to accept information, and parse information into an easily-interpretable set of assumptions, arguments, and conclusions. For instance, users could specify that they do not trust specific sources unless claims are based on primary sources. Especially if combined with tools to track provenance of information, this could enable dramatically better epistemics for the general population.

Biological Specifications: A computational toxicity forecasting competition could create standardized ways to quantify chemical hazards, enabling AI systems to screen potential compounds for safety concerns before synthesis. This specification language would then allow regulatory bodies to formally specify safety requirements that must be satisfied before proceeding with novel chemical development.

Scaling Human Review; avoiding a review crisis

It might seem that I’m simply advocating for more automation of regulatory review that would benefit all humans. While I think this automation would be a dramatic improvement in today’s world of humans regulating human actions, I believe it is critical for humans reviewing AI actions, because human review doesn’t scale once we’ve deployed human-level AI agents.

Consider it in these terms: Today people (a) decide what to do, (b) decide how to do the thing, (c) do the thing, and then (d) review the outcome. As AI systems become increasingly capable, they are currently on track to take over (c), then (b), then (a), while humans are left reviewing. Humans generate fewer than 200 tokens per minute. One million tokens from GPT4.5 costs roughly $150, which is the highest cost for any available model. If costs don’t change significantly, that means a human-level AI system will be able to generate outputs at the equivalent of a human for $3,600 per working year. Since, humans cannot hand-off responsibility to an AI system, so once AI systems become roughly as capable as humans, human reviewers must either become a bottleneck on progress, abdicate review, or find a way to automate the review process. The only other alternative would be expecting all humans to be employed as reviewers of AI outputs, evaluating for compliance.

Formalization can automate this process safely and efficiently. I believe we should build toward a world where humans decide what should (and shouldn’t) be done, and AI systems have to prove their actions against these specifications.

One relevant benefit of this architecture: Specifications compose better than values: If two groups of people reach a fundamental conflict about actions being morally good or wrong (e.g. debates around equal treatment vs religious expression), there’s no clear way to rectify this conflict from the perspective of AI alignment. However, governments have processes to determine what is illegal, and we could imagine each of the above groups using the legal apparatus to set policies defining a formal boundary between legal and illegal that is a compromise in encoding good vs wrong.

In control theory, this could be mapped onto a constrained optimization problem. Training an autoregressive transformer intends to minimize the difference between the reward function and some notion of morality to be optimized. However, specs and rules set constraints. It's hard to robustly combine preferences in a way that my preference can't be cancelled out by your anti-preferences. But constraints stack nicely, similar to international treaties, national laws, and local laws: if something is prohibited at only the state or national level, then it's illegal regardless of the level at which it's illegal.

Additional benefits of spec-driven AI:

The paradigm can be adopted incrementally: Formalization doesn't require an all-or-nothing approach. Organizations or governments can first formalize a subset of their policies, starting with domains most amenable to formalization, then provide a "fast lane" of review for proposals that demonstrably comply with the formalized policies. As these interfaces become increasingly common and adopted amongst companies using AI agents, organizations can gradually expand the scope of formalized policies as they gain confidence in the approach.

Specifications enable black-box trust: One intrinsic risk from this is that embedding values and ideals into a machine makes that machine a vector for your ideologies. This means that anyone with a competing ideology will see yours as an existential threat. However, if your machine is bound by formalized requirements, you should be comfortable using anyone else's system because that output will also have to meet the requirements; any dissatisfaction you have with the outcome could be mapped, purely to an error or omission in the specification.

We have the social institutions and mechanisms to set rules: No society on earth is well-suited to curate a set of examples sufficient to convey its values. And even if it were, alignment is not transparent, and there’s an entire field of research based on the open question of interpreting the actions of AI systems. By comparison specifications are transparent, they can be debated, refined, and revised through democratic processes. All countries have mechanisms to set rules for citizens to follow, and democratic countries have mechanisms to ensure this is participatory. Rather than companies leaving liability to the users of their increasingly autonomous AI systems, or letting those systems be limited by alignment, we should have tools that first formalize the laws to ensure verifiable compliance, and eventually provide these tools to legislators and onboard governance processes so that we don’t risk AI systems that don’t follow laws.

4. Implementation Challenges and Solutions

I’m a staunch believer that to truly have confidence in the existence of a solution is to solve the problem. As this problem is not yet solved, there must therefore be details omitted and questions unanswered. This section is intended to address some of the bigger open questions that must be answered in addition to the significant quantity of engineering that will be needed to make this approach successful.

Q: What is the likelihood spec-driven AI succeeds as a paradigm?

A: Enough that I think it's worth doing, but success is far from guaranteed. I'm not pursuing this because I think it's easily tractable, but because it feels so needed and potentially valuable if successful.

Q: Previous attempts to create formal specifications have been prohibitively expensive. How is this approach different?

A: Prior formal verification efforts like the seL4 microkernel required approximately 20 person-years to generate 10,000 lines of verified code—roughly 200 hours per line of code. At this rate, formalization is only practical for the most security critical applications. However, today's language models have shown promising capabilities on formal reasoning benchmarks with multiple models scoring above 80% on the MATH benchmark of word problems and translate between specification languages. While previously only specialists could write formal specifications, AI assistants are increasingly enabling non-specialists to participate in the formalization process, opening this approach to widespread adoption.

Q: Haven’t there been attempts to create robust ontologies for formalizing?

A: The Internet's early days saw numerous attempts at creating knowledge databases and ontologies, most of which failed because people needed to learn complex schemas in order to use them. The critical difference today is that AI can learn the rules, syntax, and terms for a formal specification language, and handle the translation between natural language and formal specifications, removing this adoption barrier. There are even systems like Logical English which was constructed to be easy to read and understand as an English speaker with no formal logic experience; writing in this system is challenging, but having a formal spec in logical english that could be compiled to a more succinct and mathematical logic (by a formally verified compiler) would enable a user incredibly high leverage without having to learn to write in a new ontology.

Q: Proving things about software is challenging enough, but how can you claim to prove things about the real world?

A: Formal verification provides mathematical guarantees about certain properties, but those guarantees are only as good as the specifications and world models they're based on. To understand the power and limitations of specification-driven AI, I think about four components in the verification workflow:

Only the spec and the model can be written down and examined. A model ⇔ reality gap will grow because our knowledge of the universe is incomplete and our ability to express all relevant aspects of the world is limited. The best mitigation is to build better tools for people to contribute collaborative improvements to the world model, which is an active area of research and investment.

Specification-based approaches can mathematically prove there's no spec ⇔ model gap, ensuring that what gets built actually matches what was specified. This is powerful because it eliminates an entire class of implementation errors.

The intent ⇔ spec gap represents the challenge of translating what you want into a formal specification.For example, a self-driving car might have a specification to "maintain safe distance from obstacles" that fails to account for momentum or slippery roads. One mitigation is to find simpler properties that are easier to formalize and/or prove (e.g. the car shouldn’t drive over a certain speed if ice appears to be present). Another is to provide tools to help users reason about the spec and its implications in the world model, asking questions and understanding design implications.Our approach provides tools to explore edge cases and test specifications against diverse scenarios before implementation.

Rather than claiming to eliminate all gaps between intent and reality, specification-driven AI gives us concrete ways to narrow these gaps systematically while providing mathematical guarantees for the parts that can be formalized.

5. Progress and next-steps for spec-driven AI

Several tools and initiatives are emerging to enable a transition from an alignment-based paradigm to a specification-driven paradigm. This section outlines the current landscape and future directions.

The main components for formal verification are specifications, solutions, and proofs that the solutions satisfy the specs. To deploy these systems widely we need progress making sure we have good specs and good proofs, as well as putting the whole thing together:

Tools for specs

Specification validation:

To ensure specifications match intent, we need spec validation tools can generate test cases, counterexamples, and natural language explanations of formal specifications. These help humans understand the consequences of their specifications before implementation. My nonprofit, Atlas Computing, is building a specification IDE that uses an LLM to help users understand and improve the mapping between a natural language specification and a formal spec, closing the aforementioned intent ⇔ spec gap.

The goal is essentially to load the relevant components of the specification into a person's brain so they can understand if that's actually what they want. This approach is more robust than externalizing only a subset of requirements and hoping the implementation system correctly interprets them.

AI-Powered Translation:

Large language models can translate between natural language statements and formal specifications, making formalization accessible to non-specialists. This removes the primary barrier that prevented earlier ontology projects from succeeding. You no longer need to learn complex schemas—AI can handle that translation for you.

Tools for proof generation

Verification systems

Proof verification systems like Coq, and Isabelle have been under development for decades (recently joined by Lean as a rising star) and will serve as infrastructure for proof generation. These systems provide the mathematical foundation for verifying that implementations meet specifications. While historically requiring specialized expertise, improved interfaces and language model integrations are making these tools more accessible.

The growing interest in formal verification has also led to a renewed interest in investing in formal verification tools and projects. This includes new proof languages, proof libraries, and proof verification systems. The world's largest formal logic team is the Automated Reasoning Group (ARG) at Amazon Web Services; this team includes Leonardo de Moura, who is also leading a Focused Research Organization with significant funding for 5 years to improve the Lean proof assistant as a foundational tool in formal verification. The Lean FRO (Lean Focused Research Organization) has received approximately $50 million in funding to advance the Lean theorem prover and make formal mathematics more accessible.

AI-based Solution and Proof Generation

Several organizations are developing systems to automate the generation of mathematical proofs:

OpenAI and Epoch AI: OpenAI has funded Epoch AI to collaborate on the Frontier Math benchmark, hoping to improve the mathematical capabilities of language models. Their research has shown promising results in applying language models to formal mathematics.

AlphaGeometry2: In February of this year, DeepMind's AlphaGeometry2 demonstrated the ability to perform at gold-medal level on International Mathematics Olympiad questions. AlphaProof, another DeepMind project, is built using Lean.

Startups: Harmonic has raised $18 million to build tools for formal verification and mathematical reasoning, with the goal of enhancing AI safety through provable guarantees. Similarly, Morpheus (founded by OpenAI alumni) raised $20 million to develop systems that can generate and verify mathematical proofs.

DARPA: DARPA has run multiple programs over the past decade to improve tools, methods, and practices around formal verification and continues their support of the field, due to the promise it shows for enhancing national security. Their Automated Rapid Certification Of Software (ARCOS) program aims to develop tools for automated formal verification of mission-critical software while their Pipelined Reasoning of Verifiers Enabling Robust Systems (PROVERS) program aims to facilitate proof repair.

Spec-driven AI coordination

I’m also co-organizing the most recent in a series of workshops with the authors of the Guaranteed-Safe AI paper, which includes Yoshua Bengio (Turing Award winner), David “davidad” Dalrymple (who leads the £59M Safeguarded AI research program at ARIA) and multiple other highly regarded professors in the fields of AI and formal verification. The goal of these workshops is to coordinate with various researchers, funders, and organization builders in order to identify gaps and help drive adoption of specification-based AI as a widely useful and safer system. These efforts aim to identify high-priority domains for initial application of specification-driven approaches and develop roadmaps for necessary technical advancements.

A Path Forward

As AI systems become more powerful, the question of how to govern them becomes increasingly urgent. The specification-driven paradigm offers a practical, democratic alternative to values-based alignment approaches. By treating AI systems more like contractors than employees—providing clear specifications rather than hoping they internalize our values—we can build systems that reliably do what we want because we can verify it.

This approach requires advances in formal methods, natural language understanding, and human-computer interaction. It also demands a shift in mindset from both AI developers and policymakers. Rather than building increasingly autonomous systems and hoping they share our values, we should build increasingly verifiable systems that demonstrably follow our rules.

We’re seeking collaborators:

Please reach out if:

You work in research and are interested in developing any of the aforementioned tools
You work in policy and want to explore expressing regulations as formal specifications
You work in industry and you want to try out verification technologies to more safely leverage AI advancements
You want to know more about how to demand transparency and verifiability from AI systems that impact your life

The future of AI doesn't have to be a choice between stagnation and uncontrolled autonomous AI systems. With specification-driven approaches, we can harness AI's transformative potential while maintaining meaningful human oversight. The time to build this future is now, before powerful AI systems become too entrenched in the alignment-based paradigm to change course.

Our Report On AI-Enabled Tools For Scaling Formal Verification

Evan Miyazono — Tue, 06 Aug 2024 19:06:41 GMT

Scaling Human Understanding and Review Capacity

At Atlas Computing, we focus on advancing humanity’s understanding and enhancing review capacity through innovative technologies. We’re excited to share our latest report, which outlines the tools that we think are needed to understand the software that runs our world, as well as all the software that AI systems will generate in the coming years. The report, titled "AI-Assisted Code Specification, Synthesis, and Verification," outlines a widely-applicable strategy and lists the modular tools that we believe will dramatically facilitate the use of formal verification.

Formal verification (FV) is the gold standard for security and stability to ensure that software is behaving as intended. However, the traditional FV processes are labor-intensive and costly, often limiting their application to only the most critical subsystems, but we believe advances in AI will make formal verification the dominant form of software development in the near future. In the report, we elaborate on the potential and current limitations of formal verification (FV), outline existing formalization workflows, and describe the 12 modular projects that can automate steps in those workflows. We recommend the 2-page executive summary at the start of the document for more information, but you can see the main diagram below. The report was created in collaboration with the Topos Institute, funded by Protocol Labs and the Survival and Flourishing Fund.

Next Step: Prototyping

We are actively looking for funders, potential users, and talented engineers/researchers to join us in this exciting endeavor. If you are interested in funding our work, becoming a user of our tools, or contributing to the research and development efforts, please reach out to us at hello@atlascomputing.org

Your support and participation can help us accelerate the development and adoption of these innovative tools, ultimately contributing to a safer and more secure digital world. Together, we can scale human understanding and review capacity, making formal verification an integral part of software development.

Announcing Flexible Hardware-Enabled Governors

Atlas Computing — Wed, 31 Jul 2024 15:22:57 GMT

Imagine if every uranium atom pulled out of the ground had its own international atomic energy inspector tasked to follow it, report on its usage, perhaps track its location. Ideally this system would even stop the atom from being radioactive if it were being used unsafely, or convert it to lead if the inspector were removed. If this were possible, it would be an incredible boon to nuclear deterrence.

Flexible Hardware-Enabled Governors (flex-HEGs) provide this level of oversight and reporting for datacenter GPUs. If you believe AI can pose a risk similar to nuclear war (which many leading experts do, in this statement on AI Risk), then you may have joined the call for an international governance body for AI. We believe this type of international governance requires technological support for transparency, reporting, and collaboration.

Atlas Computing was created to help humans better understand and review increasingly automated systems. Because using AI safely may require building AI safely, we're excited to announce that Atlas Computing will help prototype Hardware-Enabled Governors (HEGs).

HEGs are specialized hardware components integrated into GPUs and other high-performance computing devices that allow compliance with AI safety best practices by enabling transparent, privacy preserving monitoring of AI training and deployment processes.

Humanity needs a system that prevents any actor from bypassing established safety rules. This, in turn, is critical for preventing catastrophic misuse or accidental deployment of dangerous AI capabilities. By embedding governance mechanisms directly into the hardware, HEGs enable coordination between mistrustful AI developers or regulators.

What are Hardware-Enabled Governors?

By embedding compliance mechanisms directly into AI hardware, HEGs ensure that agreed-upon rules are enforced at the most fundamental level. While the concept inherits a long tradition of hardware compute governance*, we use the term coined in this post on Yoshua Bengio’s blog and the requirements therein.

HEGs consist of three key components:

A compliance processor that determines if AI generation meets negotiated reporting thresholds
A tamper-responsive mechanisms to maintain system integrity
An offline power source to ensure continuous operation of tamper-response

These layers enable a wide range of regulatory capabilities, such as ensuring that safety best practices are used for any training run over a certain size. HEGs leverage secure enclaves and sophisticated methods to distinguish between training and inference. This allows for targeted governance without impeding benign AI applications.

Why is this important?

As nations recognize the potential of advanced AI systems, they may rush to develop these technologies for military or strategic purposes, similar to the nuclear arms race, incentivizing fewer safety precautions and increasing the risk of catastrophes or loss of control.

HEGs could serve as concrete demonstrations of responsible behavior, providing reliable evidence that safety protocols are followed and allowing all players to transparently show adherence to best practices. While they don't define or guarantee safe AI, they ensure consistent application of best practices, reducing safety lapses. As a result, they shift incentives toward cooperation and responsible development, stabilizing the AI landscape and preventing the unchecked escalation of AI capabilities, which could have catastrophic global impacts.

Who decides what is safe?

Let’s start by noting that the creators of HEGs should not be the people responsible for defining safety best-practices or compute thresholds.

Establishing well-defined beliefs and standards about AI risk is crucial for the effectiveness and integrity of HEGs, but also fairly separate from creating the HEGs themselves. Ideally, policymakers or subject matter experts should make these decisions, potentially starting from something like this post on compute thresholds. This approach ensures that AI governance standards are set via representative decisions that incorporate expert opinions.

A common misconception is that having a tool for understanding and governance necessarily provides a government or manufacturer centralized regulatory control. This isn't the case. We would be supportive of an international consortium of frontier labs, governments, and/or device manufacturers interested in setting plans to use and implement HEGs. This could even be created and mandated through something like a professional society requiring all frontier labs that hope to employ top talent to adopt these measures.

The Atlas perspective is that humans should understand and oversee these technologies, ideally accountable humans. However, accountability to a nation state is not necessary (nor desirable).

What is Atlas doing?

Atlas Computing's mission is to improve humanity's capacity for review. This goal clearly includes the objective of scaling the capacity to assess the creation of AI systems from development to deployment. HEGs support this mission by providing tools that enhance transparency and accountability in AI development, allowing us to better understand what AI systems are being created.

Currently, Atlas is focused on advancing the hardware aspects of HEGs. Leveraging Evan's experience in hardware systems and a strong professional network, we are well-positioned to help drive this initiative. Atlas will also coordinate efforts among various stakeholders, including funders, potential grantees, startups, nonprofits, researchers, and developers. This top-level coordination is essential for translating our vision into clear, concrete progress. Additionally, we’ve brought on Mehmet Sencan to move technological readiness levels on components for at least the next 3 months, thanks to support from the AISTOF.

Our goal is to demonstrate a proof of concept for the tamper response mechanism (Technological Readiness Level 3) by this fall, with plans to achieve scalable technology development (TRL 4) by February and prototype demonstration (TRL 5 or 6) on all subcomponents by the end of 2025.

The implementation of HEGs should be open-hardware to foster collaboration and improve security. This technology is not intended to be controlled by one party by another, but rather a tool for the broadest possible coordination.

A Broader Theory of Change

Ensuring that AI systems are created safely is a sociotechnical problem that requires both technical and policy solutions.

Proving the feasibility of the technical solution is essential. This involves identifying abstraction boundaries, de-risking the components, and demonstrating integration. Technical de-risking must ensure the device can be built quickly and cheaply without a meaningful impact on device performance.

There may be concerns from manufacturers if they don't see the value or need for these interventions, which can be addressed with policy incentives. The solutions need to be affordable and scalable, deployed on an incredibly large scale. The machines must be robust against adversaries, both in hardware and in adversarial environments.

In addition to the technical solution, engaging with policymakers and stakeholders throughout is necessary to ensure that solutions are practical and can be widely accepted. This integrated approach is vital for stabilizing the AI development landscape and promoting responsible AI advancements globally.

A Call to Action

Implementing policies to enforce these measures is key and requires approval from policymakers and subject matter experts. Convincing policymakers involves demonstrating why these measures are necessary and integrating them into existing IP and licensing frameworks. This socio-technical problem demands both technical de-risking and substantial dialogue with policymakers to establish robust standards.

A group of funders and researchers are advocating for and actively pursuing development of HEGs. Atlas Computing is helping to develop this community to drive this initiative forward. If you are interested in funding this project or contributing as an engineer, researcher, or developer, please reach out to us. We’re keeping the community somewhat small to move quickly, and cannot guarantee we’ll include you, but would love to hear from you and will include you as soon as it feels like it will accelerate the cause.

*also see: https://www.cnas.org/publications/reports/secure-governable-chips

https://www.governance.ai/post/computing-power-and-the-governance-of-ai

https://futureoflife.org/ai-policy/hardware-backed-compute-governance/

https://arxiv.org/abs/2402.08797

Retrospective on Mathematical Boundaries Workshop

Evan Miyazono — Tue, 07 May 2024 18:37:51 GMT

Primarily written by Evan Miyazono (with help from Manuel Baltieri and others) - mistakes my own

Minimum viable introduction

We ran a workshop on Mathematical Boundaries from April 10-14. This was the successor of the Conceptual Boundaries workshop Feb 10-12
- The overlap in participants and approach was fairly low (notably lower than intended, due to availability restricting participation, which in turn led to a natural difference in approach)
- Intent:
  - The first workshop was intended to develop a sense of what one might want to do with boundaries, and explore possible avenues
  - This event was more focused on making mathematical design decisions that would lead to a more concrete model that was opinionated enough to be useful (the natural question becomes “useful for what”)
You’re probably here because you want to see the outputs, so let’s get to them:

Outputs from the workshop

Here are write-ups started during writing sessions during the workshop

Manuel Baltieri 1: Crossing boundaries
Manuel Baltieri 2: Fighting for boundaries
Kevin Carlson: Nondeterministic dynamical systems and crossing boundaries
Martin Biehl 1: Gliders and similar phenomena in (categorical) systems theory
Martin Biehl 2: Towards a more general law of requisite variety
Owen Lynch: Grothendieck lenses for functors into 2Cat
Sophie Libkind: Ontological commitments for boundaries
Nathaniel Virgo: Boundaries and Good Regulators

Noting that I’m getting these to you before I’ve read them, so don’t expect me to be able to answer questions about them.

Also worth noting, Nathaniel Virgo and Martin Biehl participated in this panel discussion at a later workshop in Kyoto, where we discussed some of the issues that came up at the boundaries workshop

General structure from the workshop

The general daily structure was scheduled to be “a talk and a breakout session before lunch, then a breakout session and a longer-form discussion after lunch,” though we weren’t particularly strict adherents.

We found on Thursday (day 1 of 4) that the group wanted to continue discussing after Martin’s interesting talk and ending up doing more like “A talk and a discussion, followed by breakouts after lunch.” Thursday breakouts were (1) a session on trying to work out a cocategorical formalism for specifying things via wholes in which they participate, rather than by composing together their parts and a session on, and (2) an idea to formalize / keep track of gliders as non-deterministic or possibilistic closed dynamical systems.

Friday morning Nathaniel gave a talk on control theory that was so engaging we reached a consensus on pointing the rest of the workshop towards fleshing out adjacent ideas.

One breakout the rest of the day Friday was focused on choosing formalisms for various words in Nathaniel’s talk, and resulted essentially in Sophie’s blog post.
The other one ended up focusing on an idea of generalizing the law of requisite variety resulting in Martin’s second write-up.

Saturday was primarily time for writing down outputs (learnings from last time: have a big block of time to support people in generating written artifacts), and also included a small breakout group on nondeterminism (that one led to Kevin’s blog post).

Sunday morning some individuals started departing and we had some visitors, most activities involved chatting about a wide range of topics after an intense few days.

Next steps

We're still genuinely interested in boundaries and would like to see additional work happen. We're exploring funding options for work on these open problems, so email me (evan@atlascomputing.org) you would like to work on them.
One possible next step is setting up a workshop adjacent to a conference which most of the Mathematical Boundaries Workshop participants are likely to attend
- Interestingly, it seems like the attendees were split somewhat [40]/[40]/[20]% between researchers who seem most likely to attend conferences exclusively in [applied category theory], [artificial life], and [cross-domain and para-academic conferences like this one], which I think makes this goal hard, but also makes the conversations at such an event particularly interesting.

Evan’s personal takes

Here’s some notes that are very specific to me.

How it differed from the first one:
- Chris, Manuel, and I set out with the intent of bringing people together to build mathematical models of boundaries. As a result, we ended up inviting more people with stronger math background, and people who we expected, based on prior interactions and training, to be inclined toward formalizations and reach for math as a tool.
Where I could have done better:
- It wasn’t ex ante clear that much moderation would fall to me; there was some hope that davidad would be able to attend, but through no fault of his, he was unable.
- Believing that I knew enough math to even moderate this workshop was probably my greatest act of hubris since at least founding Atlas Computing. I knew I didn’t have enough background knowledge to contribute, but I thought at least I would be able to make proposals that could be iterated on to reach a local equilibrium, but others were far better than I at identifying what the participants agreed was a better starting point.
  - Huge thanks to Manuel Baltieri and Brendan Fong for taking the reins.
What’s next from here:
- I’m not sure how involved in logistics, curation, or moderation of future boundaries workshops I’ll be. I’ll likely advocate for their utility, and potentially support aspects like fundraising and translation, but I think I’d be happy if others took up the mantle. (To be fair, that’s what I said before the first and second workshops as well, though 😅)
  - This could be particularly compelling if it invited participation from a broader conference – if someone would like to propose this event as a side event to a relevant existing conference, please reach out!
- To the extent that davidad’s ARIA program is focused on building a github for science, but not a monorepo of science, I think it could be really valuable to have the following:
  - If you have two “repositories” of interoperable / composable scientific theories, we should be able to identify boundaries and define boundary violations in each “repository” in a way that we’re confident that specifying a boundary violation in one scientific model (combination of scientific theories) is sufficient to confidently identify the same boundary violation in another scientific model.
  - At this point, Manuel, Brendan, and I are discussing what it would look like to organize a continuation on this theme. On the bright side, this starts highlighting and framing concrete problems that could be solved. On the other hand, pursuing solutions to this specific problem could also significantly diverge from the original VAPE formulation from Critch’s «boundaries» formulation.

Lastly, here are some random assorted brief insights that I liked:

Some boundaries are (sets of) physical boundaries. Others are parameter regimes, and might be better called “margins” or “viability regimes”. These seem sufficiently distinct that they’re worth calling by different names. “Membranes” may work well for singling out the “physical” boundaries, which don’t have to actually be literally made out of matter but should demarcate an agent’s “body” from its environment, rather than the space of happy states for an agent from its space of sad states.
Models could be defined as low-loss compressions of the environment and agents could be defined as models that scale in complexity with the scope of the universe unless you ascribe them some telos or desires.

Feel free to comment here, or reach out to via email (first name at domain.org).

Evan's thoughts on boundaries (Apr 2024)

Evan Miyazono — Tue, 09 Apr 2024 00:47:55 GMT

This blog post is an informal summary of progress on a technical topic. Feel free to let us know in the comments if you want to see more or less content like this.

Evan helped organize and run the “Conceptual Boundaries” workshop, which was initiated and primarily organized by Chris Lakin. This workshop was intended to extend Andrew Critch’s initial work on «Boundaries» here, though I’ll try not to assume you’ve read any of that.

Notes on these notes (i.e. meta-notes):

This is not an official summary or debrief from the event, and has been published without input from the other workshop participants. This is simply a summary of things I wanted to either remember or share with others.
- Assume insights are primarily from other participants, while controversial takes are my own.
I’ll try to answer questions in various venues, either based on my opinions or participant notes from the workshop, but the participant notes themselves are not directly sharable.
1. That said, if you’re curious what a particular participant thought of the workshop, they likely have an early draft of something they started writing on the last day of the workshop, and your question might be enough to prompt them to finish writing and publish the result.
I like outline formats, as they should let you skim at your desired level of detail.
1. You’re welcome to leave feedback on this format.

Context on the workshop

A conversation at a Foresight Institute event initially prompted the workshop. Chris, Allison Duettmann, and I agreed on the need for more work in this area, leading to Chris committing to lead a workshop with support from others.
1. I’ve personally been surprised at the amount of interest in the workshop generally, and the variety of paths that have led interested people to the topic.
  1. One reason might be that there’s a modern movement in the mental health space (e.g. described here) to recast many interpersonal issues in terms of boundaries, which has the interesting perspective of mapping customs onto the concept of property. This didn’t come up in the workshop; I just thought it was interesting.
The workshop itself included participants who had fairly different backgrounds, context, and goals for boundaries.
1. As a result, much of the time was spent trading context, figuring out what assumptions or goals were shared
At the end of the workshop, many participants (as well as the organizers, i.e. Chris and I) felt we’d done an interesting breadth-first exploration and wanted a second workshop.
1. The goal of the next workshop is to lay the foundation for boundaries as a new research subfield by developing clear and useful definitions, identifying interesting open problems, and setting goals that we think boundaries research agendas could achieve.
  1. This next workshop starts April 10th (i.e. two days from writing this)
  2. Most of this document is my attempt to provide a download of my thoughts leading into that workshop.

My hope for boundaries:

If you’ve come to this page via the Atlas Computing website, you probably know that we’re working to build safeguards for AI, and one way to achieve that might be to provide some baseline constraints.
1. in other words, can we define boundaries in a way that is both
  1. sufficiently grounded in quantifiable, objective (i.e. not subjective) information so that an AI could be trusted to understand what constitutes a boundary and a boundary violation
    AND
  2. is sufficiently useful as a framework that it can easily be made consistent with most people’s intuition for what a boundary is
    1. There’d necessarily be some parameters to set/tune, but the goal would be to have most of the heavy lifting done by the framework rather than, for instance needing to use something like a boundaries language to generate descriptions on a case-by-case basis.
  3. Unsurprisingly, this has a lot of overlap with What does davidad want from «boundaries»?, as davidad is an advisor of Atlas Computing.
2. This could look like some abstracted version of object identification that also encodes some notion of separability or independence of objects.
  1. Current object identification mostly requires existing data or explanations of what a thing is before it can start identifying instances of that thing, or identifies a thing because its pieces move together; boundaries should identify a thing because of some aspects of its “thing-ness”.
    1. I’ll give a very lossy summary of Critch’s VAPE formalization of boundaries here:
      1. You can define a set of Viscera, Active boundary (or Actions), Passive boundary (or Perception), and Environment states that interact with each other, modeled as a Bayesian network
      2. These states are limited in what they can act on (e.g. environment and viscera act on each other only indirectly, via the active and passive boundaries)
      3. This model assumes discrete time, but empowers you to potentially label different parts of the world (or an image, simulation, or video) as different parts V, A, P, or E.
      4. If this is interesting, you should at least read that whole post, if not the whole sequence.
3. If we had this way to identify objects, maybe we could identify a minimum viable set of boundaries, where, if they were not violated by an action, then we could be confident that the action did not result in a catastrophic unforeseen (and therefore unspecified) outcome.
  1. A simple example: if you can assure that an agent’s strategy for making a cup of tea doesn’t end respiration for any humans, perhaps you could claim that it’s more likely that the strategy [makes a cup of tea and doesn’t kill anyone] than the strategy [makes a cup of tea AND creates a hellscape that maintains respiration]. (My language is a little facetious/hyperbolic, but hopefully you get the idea.)
  2. If a system can identify boundaries objectively and understand what it means to violate them, we can validate if an action violates a boundary via something like formal methods.
    1. This could be important because you can use a Safeguarded AI architecture in conjunction with an objective definition of boundaries without worrying about if the AI is trying to subvert your goals*.
I really like the perspective that “boundaries might provide a way to identify the nouns in a normative language”.
1. If you want to make statements about what things should do (with or to other things), you probably want an objective way to start identifying things.
  1. As an example: operationalizing the statement “people shouldn't hurt others” requires definitions of people, others, and hurt that should minimally rely on interpretation so that observers can agree if a proposed or past action violates the statement.
2. Part of what I like about this framing is that I’ve found it fairly compelling to map ethical and political questions into the framework of “which boundary takes precedence in this case”, which is nontrivial because people on both sides of an argument seem willing to accept that both sets of boundaries DO exist.
  1. E.g. pro-choice vs pro-life could be mapped onto the questions “when does a fetus’s boundary exist independently from the boundary of the person in whose uterus the fetus exists?” and “when do governments have the right to violate the will/boundaries of constituents”
  2. E.g. immigration could become a question about “how do we distinguish benefits of being inside the intersecting boundaries of ‘physically in the country’ vs ‘citizen of the country’”

Some topics that were discussed:

Boundary protocols
1. In practice, you do want boundaries to be crossed or modified under the right conditions, because that’s stagnation. An organism with perfectly preserved boundaries will starvation; preserved national boundaries prevent trade; etc.
  1. Realistically, you want to be able to describe (and perhaps even infer) when it’s acceptable to the object for something to cross its boundary.
    1. One challenge is that cells seem to love letting viral DNA in, but that feels like a boundary violation.
    2. Meanwhile, only some people want surgeons to operate on their cancer, so language and the study of informed consent clearly also play a role at some level.
  2. Boundary protocols are embedded in physical reality (e.g. cell receptors on the boundary of a cell encode what is allowed in).
    1. How would one infer boundary protocols? And how would a protocol be updated or renegotiated?
  3. My Q: How much of a boundary protocol can you infer from observation?
    1. For example, by only observing people within a culture, is it possible to learn the social norms sufficiently to participate without causing disruptions? Could you learn them well enough to not change the culture if you now made up >90% of the participants? I’m not sure you could, which prevents this approach from enabling AI to act ethically. It still might not limit its ability to act safely, though, there are interventions (like destroying a food supply) that clearly disrupt a culture in a predicatable way.
Models of Boundaries
1. It seemed like Yann LeCun’s H-JEPA (section 4.6 here) is quite relevant, and we explored that.
2. We also discussed if Petri Nets could be used to model the state of a system, its boundary, and its boundary protocol.
3. Another potential model that’s come up since are Port-Hamiltonian systems
4. Generally, it felt like progress was needed (especially on answering questions like “how could model boundaries in a way that allows for continuous time?”).
  1. There were also a bunch of explorations around things like “do you need to be able to label things as ‘boundary’ or is labeling inside and outside of objects sufficient” or “how to deal with non-contiguous physical boundaries” that didn’t feel to me like they reached clear endpoints.
Types of boundaries
1. I created this list of Examples of Boundaries. It’s definitely got issues, but it was helpful to make sure a statement made about one type of boundaries held for other types one might want to consider
2. I also thought this formulation of boundaries was interesting:
  1. If we identify types of things that are interesting to preserve, it’d be nice to have a way of relating things to other things. Here’s 4 categories of things
    1. Objects (physical arrangements that perservere in time)
      1. E.g. atom or rock; it makes sense to say there’s a “boundary” around it because it’s intuitively recognizable as a thing.
    2. Cycles of objects (physical objects that indirectly beget themselves)
      1. E.g. metabolic cycles; carbon cycle; chicken + egg
    3. Patterns (arrangements of information encoded in physical objects where the objects are transient but the information persists)
      1. E.g. forests or civilization: the trees or people change but the pattern remains; Dawkensian memes, The Ship of Theseus, and living things (probably) fit into this category as well.
    4. Cycles of patterns
      1. E.g. centralization vs decentralization of power within society; the model of punctuated equilibrium in evolutionary biology
  2. “Things” on this scale are clearly composed of other “things”.
    1. While it might be possible to list all types of boundaries from the bottom up or create some sort of directed graph, I don’t think that’s necessary, since the most relevant piece is likely the ability to relate different boundaries to each other, which can be done more succintly in a case-by-case basis than falling back to a taxonomy of boundaries.
    2. Very hot take: a lot of my intution says that preserving cycles of patterns (the fourth category), with deference going to the patterns recurring on the longest timescale) is an interesting extrapolation of moral trends. (I don’t think this is particularly defensible, but it’s an interesting thought.)

Again, this is very incomplete, and I’m mostly trying to get something out the door in time for the next workshop. We’ll try to have a more comprehensive (and more timely) summary out of the next workshop!

Welcome to Atlas Computing's blog!

Atlas Computing — Thu, 04 Apr 2024 14:11:36 GMT

Hi Atlas community!

Welcome to our blog. As a first post, we wanted to share the main ways to share information:

Our website, atlascomputing.org, covers slow-changing descriptions of what we want to achieve and how we’re going about doing it
We’ve got an email list here for regular major updates, roughly quarterly.
We'll also be using the safe-by-design email list here for very informal conversations. A small group also spun out a Zulip chat from that list.
On this blog, you can expect brief, informal updates on a topic probably about one every two weeks for now.
We’ve got profiles on LinkedIn and Twitter; currently rarely used, but we have ambitious to change that.

Each of those venues has their own sign-up; if #4 sounds good, subscribe here 👇

Subscribe now