Malmö University Publications
4748495051525350 of 193
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
BlueprintSymVL: A discriminative benchmark for VLM symbol recognition in engineering blueprints
McDermott, Engineering, Pr. Beatrixlaan 35, The Hague, 2595 AK, the Netherlands; Eindhoven University of Technology, Mathematics and Computer Science, Groene Loper 3, Eindhoven, 5612 AE, the Netherlands.ORCID iD: 0009-0006-2515-6951
McDermott, Engineering, Pr. Beatrixlaan 35, The Hague, 2595 AK, the Netherlands; Eindhoven University of Technology, Mathematics and Computer Science, Groene Loper 3, Eindhoven, 5612 AE, the Netherlands.ORCID iD: 0000-0003-3411-4084
Eindhoven University of Technology, Mathematics and Computer Science, Groene Loper 3, Eindhoven, 5612 AE, the Netherlands; Chalmers University of Technology, Computer Science and Engineering, Chalmersgatan 4, Gothenburg, 412 96, Sweden.ORCID iD: 0000-0003-2854-722X
Malmö University, Faculty of Technology and Society (TS), Department of Computer Science and Media Technology (DVMT).ORCID iD: 0000-0002-7700-1816
2025 (English)In: Results in Engineering (RINENG), ISSN 2590-1230, Vol. 28, article id 108171Article in journal (Refereed) Published
Abstract [en]

The application of Vision Language Models (VLMs) to industrial automation, specifically engineering blueprint analysis, is severely hampered by the absence of domain-specific evaluation tools. Existing benchmarks fail to replicate the critical visual challenges of this domain, such as high symbol density, occlusion, and visual similarity. Furthermore, they assume reliable pre-trained knowledge or standardized symbology, which rarely hold in real-world industrial settings. To address these critical gaps, we introduce BlueprintSymVL, the first benchmark explicitly designed to evaluate VLM symbol recognition in engineering blueprints. BlueprintSymVL is engineered as a strong discriminator, with test cases that systematically introduce challenges to differentiate model capabilities. A key innovation is our robust evaluation method, centered on a one-shot visual in-context querying strategy. At query time, the model is provided with a visual exemplar of a symbol. This approach eliminates reliance on unreliable pre-existing knowledge and is paired with a strict evaluation criterion demanding correctness on both symbol counts and their labels, setting a rigorous standard for quality assurance in high-stakes applications. We conducted a comprehensive benchmark of four leading VLMs (GPT-4o, Gemini 2.5 Pro, InternVL 2.5 78B, and Qwen 2.5 VL 72B). Our analysis provides the first baseline on their readiness, revealing that BlueprintSymVL is highly discriminative. We pinpoint specific failure modes, including a notable degradation in cluttered environments, confusion when faced with visually similar distractors, and a concerning propensity to hallucinate symbols. These insights demonstrate that current VLMs are not yet suitable for autonomous deployment in blueprint analysis and are best integrated into human-in-the-loop workflows.

Place, publisher, year, edition, pages
Elsevier , 2025. Vol. 28, article id 108171
Keywords [en]
Benchmark, Engineering blueprints, Symbol recognition, Vision Language Models (VLMs), Visual in-context learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:mau:diva-80836DOI: 10.1016/j.rineng.2025.108171ISI: 001621300000001Scopus ID: 2-s2.0-105021855684OAI: oai:DiVA.org:mau-80836DiVA, id: diva2:2016391
Available from: 2025-11-25 Created: 2025-11-25 Last updated: 2025-12-08Bibliographically approved

Open Access in DiVA

fulltext(1708 kB)44 downloads
File information
File name FULLTEXT01.pdfFile size 1708 kBChecksum SHA-512
d80c5c29515ca4c076fbb234bd7f5b13797e8fa208e214a5dc4a26dc70730f6de556a68f2b8c831b3eff68c2efd6fb9b60babd4f204b58aee429cbec00717981
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Olsson, Helena Holmström

Search in DiVA

By author/editor
Shteriyanov, VasilDzhusupova, RimmaBosch, JanOlsson, Helena Holmström
By organisation
Department of Computer Science and Media Technology (DVMT)
In the same journal
Results in Engineering (RINENG)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 67 hits
4748495051525350 of 193
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf