Training Large Language Models on Respiratory Physiology: Why Domain-Specific AI Outperforms General-Purpose Tools for Clinical Report Writing

Training Large Language Models on Respiratory Physiology: Why Domain-Specific AI Outperforms General-Purpose Tools for Clinical Report Writing

General-purpose AI tools are impressive, but when it comes to writing clinical respiratory reports, they consistently fall short. Large language model training on broad internet data does not equip an AI to interpret a flow-volume loop, apply ATS guidelines, or flag a borderline obstructive pattern with clinical precision. Domain-specific AI, trained or fine-tuned on respiratory physiology data and embedded within purpose-built pulmonary function test software, produces more accurate, safer, and more clinically useful outputs. The gap between general and specialist AI is not a minor nuance; it is a patient safety issue.

TL;DR

  • General-purpose LLMs lack the domain depth needed for reliable respiratory clinical report writing.

  • Research confirms LLMs are not yet ready for autonomous clinical decision-making without proper constraints and domain grounding.

  • Domain-specific AI, fine-tuned on respiratory physiology, produces more accurate and guideline-aligned outputs.

  • Purpose-built platforms like Rezibase embed AI report writing within a structured clinical workflow, reducing risk.

  • Choosing the right AI tool means looking beyond capability and asking: trained on what, constrained by what, and accountable to whom?

What Is Domain-Specific Large Language Model Training and Why Does It Matter in Respiratory Medicine?

Domain-specific large language model training refers to the process of taking a pre-trained LLM and fine-tuning it on a curated, specialist dataset relevant to a particular field. In respiratory medicine, this means training or adapting the model on pulmonary physiology terminology, ATS/ERS interpretation guidelines, spirometry patterns, diffusion capacity norms, and clinical reporting conventions.

A general-purpose LLM has seen vast amounts of text, but respiratory physiology represents a tiny fraction of that data. Fine-tuning corrects this imbalance. According to GoML's best practices guide on training and fine-tuning large language models, the fine-tuning process involves adjusting a model's parameters to align its outputs with a specific task or domain, significantly improving relevance and reliability for that use case.

Key differences between general and domain-specific LLMs in clinical respiratory contexts:

Factor

General-Purpose LLM

Domain-Specific LLM

Training data

Broad internet corpus

Respiratory/clinical datasets

Guideline alignment

Inconsistent

Configurable to ATS/ERS standards

Terminology accuracy

Variable

High

Clinical risk

Higher

Lower with proper constraints

Report structure

Generic

Workflow-specific

What Does the Research Say About LLMs in Clinical Settings?

The research picture is nuanced and worth understanding clearly before deploying any AI in a clinical environment.

A landmark 2024 study published in Nature Medicine by Hager et al., cited over 660 times, concluded that LLMs are currently not ready for autonomous clinical decision-making. The study highlighted specific failure modes including flawed reasoning chains, overconfidence in uncertain scenarios, and difficulty integrating multimodal clinical data. This is a critical finding for any lab considering AI-assisted reporting.

A 2025 study published in eLife by Sim et al. surveyed reasoning behaviour in medical LLMs and found emerging trends but also significant open challenges, particularly around how these models handle complex multi-step clinical reasoning.

Separately, research published in JMIR in 2025 by Li et al. explored how prompt engineering and LLMs could enhance pulmonary disease prediction, finding that structured prompting strategies improved model interpretability and prediction performance. The findings were interesting in that they pointed toward the importance of how AI is instructed, not just what it was trained on.

The takeaway from the research is not that AI is dangerous and should be avoided. It is that AI performs best when it is domain-grounded, properly constrained, and embedded within a structured clinical workflow rather than operating as a freestanding tool.

Why Do General-Purpose AI Tools Struggle With Pulmonary Function Test Reporting?

Pulmonary function test software is not just a data repository. It is a clinical interpretation engine. Reporting a spirometry result requires understanding:

  • Reference equation selection based on patient demographics

  • Post-bronchodilator response thresholds

  • Pattern recognition across multiple test parameters simultaneously

  • Guideline-specific severity grading (e.g., GOLD, ATS)

  • Clinical context, including comorbidities and referral indication

A general-purpose LLM asked to write a respiratory report is essentially being asked to perform specialist clinical reasoning without the specialist training. It may produce fluent, confident-sounding text that contains subtle but consequential errors, such as misapplying a severity classification or using an outdated reference range.

According to the Public Health AI Handbook, LLMs offer significant potential for health applications when used responsibly, but responsible use requires understanding the model's limitations and ensuring appropriate human oversight remains in place.

What Makes Domain-Specific AI Report Writing Safer and More Useful?

The key is not just the AI itself, but the system it operates within. Effective AI-assisted report writing in respiratory medicine requires:

  • Structured data inputs: AI should work from discrete, validated test values, not free-text summaries.

  • Guideline anchoring: The AI's outputs should be constrained by current ATS/ERS interpretation frameworks.

  • Scientist oversight: AI generates a draft; a qualified respiratory scientist reviews and approves it.

  • Audit trails: Every AI-assisted report should be traceable for quality and accreditation purposes.

This is where purpose-built pulmonary function test software creates a meaningful advantage over asking a general chatbot to write a report. The AI is embedded in context, not floating above it.

Rezibase takes this approach with its AI-powered report writing module. Built specifically for respiratory and sleep labs, it generates structured report drafts aligned to ATS guidelines, drawing from discrete imported test data rather than unstructured text. Respiratory scientists remain in the loop, reviewing outputs before any report is finalised. The AI assists; it does not replace clinical judgement.

How Should Labs Evaluate AI Tools for Clinical Reporting?

Before adopting any AI tool for respiratory report writing, labs should ask:

  1. What was the model trained or fine-tuned on? General training is insufficient for clinical reporting.

  2. Is it integrated with validated test data? AI working from structured data is safer than AI interpreting free text.

  3. Does it support ATS/ERS guideline alignment? Outputs should reflect current standards.

  4. Where does human oversight sit in the workflow? AI should assist, not autonomously decide.

  5. Does the platform support accreditation requirements? Reporting tools used in accredited labs must meet documentation and quality standards.

Frequently Asked Questions

Can a general-purpose LLM write respiratory reports accurately?
Not reliably. General LLMs lack the domain-specific training needed to consistently apply respiratory physiology guidelines, select appropriate reference equations, or flag clinically significant patterns correctly.

What is fine-tuning in the context of medical AI?
Fine-tuning is the process of adapting a pre-trained LLM to a specific domain by further training it on relevant specialist data, improving its accuracy and reliability for that use case.

Is AI-assisted reporting safe for clinical use?
Research suggests AI can support clinical workflows safely when it operates within structured systems, is properly constrained by guidelines, and includes mandatory human review before outputs are acted upon.

What guidelines should respiratory AI reporting tools follow?
ATS (American Thoracic Society) and ERS (European Respiratory Society) guidelines are the primary international standards for spirometry and pulmonary function test interpretation.

Does Rezibase use AI for report writing?
Yes. Rezibase includes AI-powered report writing that generates structured drafts aligned to ATS guidelines, integrated directly within its cloud-based respiratory reporting workflow.

What is the risk of using AI tools not designed for respiratory medicine?
The primary risks include misclassification of results, outdated reference values, non-guideline-aligned severity grading, and the production of fluent but clinically incorrect report language.

How does domain-specific AI improve efficiency in respiratory labs?
By generating accurate first-draft reports from structured test data, domain-specific AI reduces the time scientists spend on documentation while maintaining clinical accuracy and supporting accreditation compliance.

About Rezibase

Rezibase is Australia's most advanced cloud-based respiratory and sleep reporting platform, built by respiratory scientists for respiratory scientists. Trusted by over 35 sites including NHS and NSW Health, Rezibase combines AI-powered report writing, ATS-aligned interpretation tools, and a vendor-neutral data import system to reduce clinical risk and streamline lab workflows. Learn more at rezibase.com.

Ready to see what purpose-built AI looks like in a respiratory lab? Explore Rezibase at rezibase.com or start your 30-day free trial today.

References