A system was given a research direction and left mostly on its own in the Tokyo offices of Sakana AI, a relatively new AI startup. There was not a single graduate student bent over a keyboard. No lecturer writing notes in a draft’s margins. The system created a hypothesis, planned and conducted experiments, created visualizations, composed the text, formatted citations, and ultimately submitted a full scientific paper to one of the most reputable peer-reviewed machine learning journals. It received a score from the reviewers. They agreed to it. It outperformed 55% of the papers authored by real researchers. It was not noted as unusual by anyone, at least not during that review process.
The journey from concept to acceptance of that paper, which was produced by something called The AI Scientist-v2, tells a tale that the academic community is only now beginning to take seriously. Sakana submitted three AI-generated manuscripts to a workshop at ICLR, a long-running, truly prestigious conference in the field of artificial intelligence, while collaborating with researchers from the University of British Columbia, the Vector Institute, and the University of Oxford. Out of those three papers, one passed the exam.
Sakana AI
| Founded | 2023 |
| Co-founders | David Ha, Llion Jones (ex-Google Brain) |
| Flagship System | The AI Scientist (v1 & v2) |
| Research Partners | Univ. of British Columbia, Univ. of Oxford, Vector Institute |
| Historic Milestone | First fully AI-generated paper to pass human peer review |
| Conference | ICLR 2025 ICBINB Workshop (accepted, then withdrawn) |
| Paper Score | Avg. 6.33 — higher than 55% of human-authored submissions |
| Nature Publication | March 26, 2026 |
| Official Website | sakana.ai |
Although it acknowledged that there were still empirical difficulties, the accepted paper presented what the system seemed to consider to be a promising approach for training neural networks. Experienced reviewers tend to reward that kind of nuanced framing that highlights both potential and limitations. Apparently, the machine had picked up that lesson quite well.
The workflow itself is nearly confusing to read through, so it’s worth taking a moment to consider the mechanics of this. The AI Scientist is more than just a writer. Based on a topic prompt, it generates new research ideas, searches the literature, plans experiments, writes and executes the code autonomously using what the team refers to as a parallelized agentic tree search, and then drafts the entire paper, including figures and LaTeX formatting, with visual feedback from a different model verifying the quality of its illustrations. That isn’t a spell checker that helps a researcher polish their writing. That’s the whole research process, automated from start to finish, resulting in a product that human reviewers deemed reliable enough to approve.
Individual reviewers gave the accepted paper an average score of 6.33, with scores of 6, 7, and 6. That is a true outcome in academic peer review, where reviewers are conservative and scoring systems are frequently strict. Not even close to getting through. To their credit, Sakana had decided that, in the interest of openness and adherence to ICLR conventions, they would withdraw the paper before it could be published, and they did. However, the withdrawal does not change the paper’s passing status. The score remains the same.
Caveats exist, and they are important. Sakana acknowledges in its own documentation that the system occasionally made embarrassing citation errors, such as incorrectly attributing a method to a 2016 paper when it actually originated in 1997. Additionally, the paper was not subjected to the same degree of scrutiny as some more well-known conference submissions. Additionally, the workshop organizers and ICLR leadership agreed in advance to facilitate the double-blind review, so the entire experiment was carried out with their knowledge and cooperation. This was not a secret infiltration. It’s important to remember that the test was controlled.
Whether any of this actually qualifies as science in the sense that the majority of active researchers use the term is still up for debate. Science is a process based on intellectual ownership, accountability, and the gradual development of trust among researchers who stake their reputations on the documents they sign. It is not simply the creation of a document that looks credible. An AI system doesn’t need to defend its reputation. No career to risk. There was no late-night uncertainty about the methodology’s soundness. It produces and submits, and there is a significant distinction between that and the real workings of science.
And yet. All of this was detailed in a Nature paper published in March 2026, which marked the end of an eighteen-month process that the team characterizes as having distinct phases shaped by advancements in foundation models. Even though no one feels comfortable stating it outright, the fact that Nature, which is neither an open-access preprint server nor a fringe journal, decided to publish a thorough description of the AI Scientist’s capabilities says something about where the scientific community believes this is going.
Observing this from the outside, it seems as though the academic publishing system is about to experience the same reckoning that all other information-based industries eventually do: the realization that the gates were designed for a different era. Peer review was intended to identify flawed human science. It’s possible that no one had the idea to develop it in order to capture excellent machine science. Even though that distinction is currently quite narrow, it will become much wider in the coming years.
