Post-FMxAI 2025 newsletter
Takeaways from Formal Methods x AI conference 2025 @ SRI, Menlo Park: atlascomputing.org/fmai25
This is a summary we sent to our FMxAI attendees. We wanted to share the takeaways here as well.
Thank you for taking part in our meeting. It was great to see so much work going on at the intersection of Formal Methods and AI. Here are a few high-level themes we noticed, and we’d love to hear from you if there’s anything you think we missed.
We need more benchmarks / evals / RL environments to make AI models better at formal methods. We heard a lot of discussion about the lack of good evaluation benchmarks for formal methods. The current generation of models is extensively trained by reinforcement learning against problem-specific benchmarks. If you have a dataset of problems that AI can’t currently solve, even if the number of problems is modest, it seems impactful to turn it into an AI eval and get AI labs to include it in training. If you need help figuring out how to do this (or are generally interested in contributing to this effort), talk to Evan Miyazono (evan@atlascomputing.org).
We need more FM infrastructure. A lot of discussions focused on formal methods infrastructure: if AI gets stronger, will we have tools, conventions, and languages available for the AI systems and workflows to use? One new project that seems like it might help here is the CSlib project, which is building an intermediate representation for computer science concepts in Lean. However, it seemed like the FM infrastructure gap is very large, and would benefit from both more funding and more senior talent.
New orgs are starting. Several organizations represented at the meeting are brand new, having started in the last year. These include (in no particular order) Math Inc., Theorem Labs, Sigil Logic, Principia Labs, Axiom Math, Safer-AI, Ulyssean, and genproof.ai. We take this as a strong signal that people are starting to see the potential in formal methods combined with AI.
Engineering tools matter, specifications matter. Many people discussed possible AI-driven tools that could be used for engineering. We heard several people raise the notion of “verified vibe-coding” or “vibe-speccing”. A particularly important problem seems to be how to specify formally what AIs should do, and how to use these specifications to guide the AI to a correct response.
Lean is a big thing, but not the only thing. The Lean theorem prover was a topic of discussion in many conversations. On the one hand, Lean has become a common denominator —a tool known outside FM expert circles. On the other hand, we talked to many FM experts who were keen to emphasize the broad range of tools in formal methods, including CHERI. It seems to be a live debate in FMxAI whether and how to standardize on Lean, or to try to maintain the diversity of the field, or a defense-in-depth approach.
AI forecasts vary enormously. We heard many conversations about the future of AI, and here, forecasts varied enormously. Broadly speaking, attendees working more closely on AI predicted faster gains, while FM experts were more skeptical. At the most skeptical end of the spectrum, some attendees felt that AI capabilities were unlikely to increase, while at the other end, others predicted that fully automated AI software engineers would be in place by 2030.
Forward pointers
Here are some additional things you might want to sign up for updates on
Newsletter on formal approaches to AI security:
ARIA Safeguarded AI program: https://www.aria.org.uk/programme-safeguarded-ai/
The Atlas Computing blog will have updates on some related projects and spin-outs when there are public announcements (like a potential FRO to build tools to generate and validate formal specs)
Lastly, >70% of attendees who filled out the post-event survey said they’d highly recommend the event to colleagues, and almost 95% said they’d try their best to attend a subsequent event, so it seems like we found a good recipe: great people + lots of space to chat. We’ll keep you updated and hope to see you again soon!
Mike, Evan, and the team.









