{"id":133498,"date":"2025-09-18T12:20:46","date_gmt":"2025-09-18T20:20:46","guid":{"rendered":"https:\/\/xira.com\/p\/2025\/09\/18\/ai-tools-match-or-exceed-human-lawyers-in-contract-drafting-benchmark-study\/"},"modified":"2025-09-18T12:20:46","modified_gmt":"2025-09-18T20:20:46","slug":"ai-tools-match-or-exceed-human-lawyers-in-contract-drafting-benchmark-study","status":"publish","type":"post","link":"https:\/\/xira.com\/p\/2025\/09\/18\/ai-tools-match-or-exceed-human-lawyers-in-contract-drafting-benchmark-study\/","title":{"rendered":"AI Tools Match Or Exceed Human Lawyers in Contract Drafting Benchmark Study"},"content":{"rendered":"<p>Artificial intelligence tools matched or exceeded human lawyers in producing reliable contract drafts in the first comprehensive benchmarking study comparing AI against legal professionals, according to research published this week. The study, Benchmarking Humans &amp; AI in Contract Drafting, conducted by LegalBenchmarks.ai, found that human lawyers produced reliable first drafts 56.7% of the time, while [\u2026]<\/p>\n<p>Artificial intelligence tools matched or exceeded human lawyers in producing reliable contract drafts in the first comprehensive benchmarking study comparing AI against legal professionals, according to research published this week.<\/p>\n<p>The study, <a href=\"https:\/\/www.legalbenchmarks.ai\/research\/phase-2-research\" rel=\"nofollow noopener\" target=\"_blank\">Benchmarking Humans &amp; AI in Contract Drafting<\/a>, conducted by <a href=\"https:\/\/www.legalbenchmarks.ai\/\" rel=\"nofollow noopener\" target=\"_blank\">LegalBenchmarks.ai<\/a>, found that human lawyers produced reliable first drafts 56.7% of the time, while several AI products met or exceeded their performance.<\/p>\n<p>The top-performing AI tool, Gemini 2.5 Pro, achieved a 73.3% reliability rate, marginally outperforming the best human lawyer at 70%.<\/p>\n<p>The research evaluated 13 AI tools against human lawyers using 30 real-world contract drafting tasks. The study assessed 450 task outputs and surveyed 72 legal professionals to measure three dimensions of performance: output reliability, usefulness, and workflow integration.<\/p>\n<p>Of the 13 tools evaluated, seven were tools designed specifically for the legal market: <a href=\"https:\/\/www.august.law\/\" rel=\"nofollow noopener\" target=\"_blank\">August<\/a>, <a href=\"https:\/\/brackets.ai\/\" rel=\"nofollow noopener\" target=\"_blank\">Brackets<\/a>, <a href=\"https:\/\/gc.ai\/\" rel=\"nofollow noopener\" target=\"_blank\">GC AI<\/a>, <a href=\"https:\/\/www.instaspace.ai\/\" rel=\"nofollow noopener\" target=\"_blank\">InstaSpace<\/a>, <a href=\"https:\/\/simpledocs.com\/simpleai\" rel=\"nofollow noopener\" target=\"_blank\">SimpleDocs<\/a> and <a href=\"https:\/\/www.wordsmith.ai\/\" rel=\"nofollow noopener\" target=\"_blank\">Wordsmith<\/a>. One of the seven was not identified by name, but was described as a \u201clong-standing enterprise legal-ai platform.\u201d<\/p>\n<p>The other six were general commercial tools: ChatGPT (GPT-4.1 and GPT-5), Claude (Opus-4.1), Copilot (free version), Gemini (2.5 Pro), Le Chat(Mistral), and Qwen (qwen3-235b-a22b).<\/p>\n<h3>AI Identifies Legal Risks Lawyers Missed<\/h3>\n<p>In scenarios involving high legal risks, specialized legal AI tools outperformed general purpose tools, raising explicit risk warnings in 83% of outputs, compared to 55% for general tools. However, in the same scenarios, human lawyers raised no such warnings, according to the study.<\/p>\n<p>\u201cLegal AI tools surfaced material risks that lawyers missed entirely,\u201d the researchers wrote. In one example involving a potentially unenforceable penalty clause under New York law, AI tools flagged enforceability concerns while human lawyers provided no risk assessment.<\/p>\n<p>The findings challenge assumptions about AI\u2019s inability to exercise legal judgment, the researchers said, showing that some AI tools can identify compliance and enforceability issues overlooked by experienced practitioners.<\/p>\n<h3>Tools Vary In Performance<\/h3>\n<p>The study revealed substantial variations in AI performance. Reliability rates ranged from 44% to 73.3% across different tools. Google\u2019s Gemini 2.5 Pro achieved the highest reliability score, followed by OpenAI\u2019s GPT-5 at approximately 73%.<\/p>\n<p>Legal AI platforms, including GC AI, Brackets, August, and SimpleDocs, also scored above the overall AI average of 57%. General-purpose AI tools slightly outperformed specialized legal AI platforms on reliability metrics, contrary to what many in the industry might expect.<\/p>\n<p>In usefulness ratings, August led with an average score of 8.13 out of 9 points, while human lawyers averaged 7.53 points. The metric measured clarity, helpfulness and appropriate length of draft outputs.<\/p>\n<p>\u201cSpecialized legal AI tools did not meaningfully outperform general-purpose AI tools in both output reliability and usefulness,\u201d the researchers said. \u201cGeneral-purpose AI solutions had a slight edge in output reliability, while legal AI solutions scored marginally higher on output usefulness.\u201d<\/p>\n<h3>Workflow Integration As Differentiator<\/h3>\n<p>While general-purpose AI tools competed effectively on output quality, specialized legal AI platforms differentiated themselves through workflow integration. Two-thirds of tested legal AI products integrate with Microsoft Word, where most contract drafting occurs.<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/www.lawnext.com\/wp-content\/uploads\/2025\/09\/platform-workflow-support-ranking.png?ssl=1\" rel=\"nofollow noopener\" target=\"_blank\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-51179\" src=\"https:\/\/i0.wp.com\/www.lawnext.com\/wp-content\/uploads\/2025\/09\/platform-workflow-support-ranking.png?resize=500%2C419&#038;ssl=1\" alt=\"\" width=\"500\" height=\"419\" title=\"\"><\/a><\/p>\n<p>Brackets, GC AI, and SimpleDocs scored highest on platform workflow support, offering features like template libraries, clause storage, and quality assurance tools designed specifically for legal work.<\/p>\n<p>\u201cPlatform Workflow Support is the key differentiator for specialized tools, not output performance,\u201d the researchers concluded.<\/p>\n<p>Human lawyers demonstrated clear advantages in tasks requiring commercial judgment and context management. They excelled at interpreting client intent, avoiding unnecessary concessions to counterparties, and integrating multiple information sources.<\/p>\n<p>In one multi-source drafting task requiring integration of templates, term sheets and email communications, only the human lawyer successfully extracted complete party information from a screenshot. All AI outputs contained incomplete or inaccurate company details.<\/p>\n<p>However, AI tools proved more consistent in routine drafting. In a task requiring a 10% penalty clause, all AI tools correctly reproduced the figure while one human lawyer mistakenly wrote \u201c9%.\u201d<\/p>\n<p>The starkest differentiator between humans and AI was in the length of time it took to complete a task, the report said. Humans took nearly 13 minutes per task while AI tools produced outputs in seconds.<\/p>\n<h3>Changing Professional Priorities<\/h3>\n<p>Among the 72 lawyers surveyed for the study who use AI for legal work, 86% employ multiple tools rather than relying on a single product. Only 6% require 100% accuracy before using AI tools, while 55% expressed comfort with accuracy below 90%.<\/p>\n<p>When asked about factors that would increase AI usage, 35% cited easier output verification as most important, 23% pointed to improved context management, and 21% ranked accuracy gains as the top priority.<\/p>\n<p>\u201cAccuracy is only one of several factors we consider,\u201d one Fortune 500 general counsel told researchers. \u201cEvery lawyer is responsible for reviewing the work product they produce, with or without AI.\u201d<\/p>\n<h3>Methodology and Limitations<\/h3>\n<p>The study evaluated AI tools and human lawyers using identical tasks contributed by practicing attorneys across various industries. Tasks ranged from basic clause drafting to complex commercial arrangements.<\/p>\n<p>The research assessed three performance dimensions: output reliability (factual accuracy and legal adequacy), output usefulness (clarity and helpfulness), and platform workflow support (integration and verification features).<\/p>\n<p>The human baseline consisted of in-house commercial lawyers with an average of 10 years\u2019 experience. Outputs were evaluated through a combination of automated scoring against predefined criteria and expert reviewer assessment.<\/p>\n<p>The researchers acknowledged several limitations, including the snapshot nature of rapidly evolving AI capabilities, the subjective elements in scoring usefulness, and the focus on junior- to mid-level drafting complexity.<\/p>\n<h3>Bottom Line<\/h3>\n<p>The bottom line seems to be that tasks involving routine, low-risk contract drafting may be the best candidates for using AI, while complex commercial negotiations continue to require human expertise.<\/p>\n<p>\u201cThe future of drafting will not be decided by one side or one tool,\u201d the researchers wrote. \u201cIt will be shaped by orchestration: combining the speed and consistency of general AI, the workflow fit of legal AI, and the judgment of lawyers. The real advantage will belong to teams that learn to design and manage this collaboration.\u201d<\/p>\n<p>The research was conducted by\u00a0<a class=\"underline text-blue-500\" href=\"https:\/\/www.linkedin.com\/in\/anna-guo-255ba7b0\/\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">Anna Guo<\/a>,\u00a0<a class=\"underline text-blue-500\" href=\"https:\/\/www.linkedin.com\/in\/arthrod\/\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">Arthur Souza Rodrigues<\/a>,\u00a0<a class=\"underline text-blue-500\" href=\"https:\/\/www.linkedin.com\/in\/mhmalmamari\/\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">Mohamed Al Mamari<\/a>,\u00a0<a class=\"underline text-blue-500\" href=\"https:\/\/www.linkedin.com\/in\/sakshiudeshi\/\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">Sakshi Udeshi<\/a> and <a class=\"underline text-blue-500\" href=\"https:\/\/www.linkedin.com\/in\/marc-astbury\/\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">Marc Astbury<\/a>, with advisory support from legal technology experts and platform evaluation by HumanSignal.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Artificial intelligence tools matched or exceeded human lawyers in producing reliable contract drafts in the first comprehensive benchmarking study comparing AI against legal professionals, according to research published this week. The study, Benchmarking Humans &amp; AI in Contract Drafting, conducted by LegalBenchmarks.ai, found that human lawyers produced reliable first drafts 56.7% of the time, while [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":133499,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[24],"tags":[],"class_list":["post-133498","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-lawsite"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/xira.com\/p\/wp-content\/uploads\/2025\/09\/Benchmarking-Study-Overall-Performance-Matrix-1024x576-KEpejc.png?fit=1024%2C576&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/xira.com\/p\/wp-json\/wp\/v2\/posts\/133498","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/xira.com\/p\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/xira.com\/p\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/xira.com\/p\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/xira.com\/p\/wp-json\/wp\/v2\/comments?post=133498"}],"version-history":[{"count":0,"href":"https:\/\/xira.com\/p\/wp-json\/wp\/v2\/posts\/133498\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/xira.com\/p\/wp-json\/wp\/v2\/media\/133499"}],"wp:attachment":[{"href":"https:\/\/xira.com\/p\/wp-json\/wp\/v2\/media?parent=133498"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/xira.com\/p\/wp-json\/wp\/v2\/categories?post=133498"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/xira.com\/p\/wp-json\/wp\/v2\/tags?post=133498"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}