Automation of clinical coding from digital clinical text

Manchester icon Greater Manchester


Researchers Involved Prof Goran Nenadic, Prof Will Dixon, Dr George Demetriou
Partners Involved The University of Manchester, IMO, Salford Royal Foundation Trust
Start and end dates Oct 2016 – Jan 2018

Project Overview

Clinical coding across the NHS is not currently regulated and the primary benefit in the UK has traditionally been in research. This project aims to demonstrate the value of accurate clinical coding for patients, clinicians and the overall health system.

The National Health Service (NHS) keep electronic health records from birth to death with near complete records in primary care, which can be linked via an NHS Number to other records such as hospital admission/discharge diagnoses and lab results. UK hospitals, however, lag behind primary care in their use of clinical codes at the point of care but they are starting to roll out more advanced Electronic Health Record (EHR) systems.

Salford Royal Foundation Trust (SRFT) has an expansive database of clinical data and is largely paperless with over 15 years of digital records linked to primary healthcare. Read Codes; a coded thesaurus of clinical terms, from Salford residents’ primary care are routinely linked to other health coding systems from discharge diagnoses in the hospital. Although, SRFT is recognised as the UK’s most digitally mature hospital, coding in routine clinical practice however, is confined to a small group of enthusiasts. Consequently, the benefits of reusing EHR data are limited by inconsistent and patchy coding in the hospital.

In partnership with Intelligent Medical Objects (a US market leader in technologies for clinical coding) we aim to develop the coding of historical EHRs by mapping individual diagnosis descriptions to a standardised vocabulary to increase the clinical and scientific utility of EHR data.

What data are you using?

We will be using historical outpatient letters from Salford Royal Foundation Trust


What methods are you using to conduct this work? (How are you using the data?)

We will evaluate and compare two text-mining algorithms to code clinical intent about diagnoses from within semi-structured outpatient documents. Within the context of this project, these documents involve clinical letters from hospital specialists to general practitioners. The methodology will focus on comparing how clinical intent is captured from the source documents when mapped to SNOMED CT*, as well as how the reference codes associated with the clinical intent differ between methods.

*SNOMED CT is a structured clinical vocabulary for use in an electronic health record.


Who will/could benefit? (What will we know that we don’t already?)

  • The clinical meaning of records will be more transparent, interactive and consistent. There will be better communication between care professionals and improved communication at the interface between primary and secondary care – meaning better opportunities for decision support.
  • Clinicians will gain easier access to historical information in records and coding will improve the efficiency of recording information in clinical practice.
  • The project will create potentially actionable information for quality improvement; and assistance of culture shift toward clinicians coding at the point of care.


What will be the intended outcome of your research project?

  • To increase the coding of health issues within the clinical record to demonstrate to clinicians the advantages of well coded patient records.
  • Deeper digital phenotyping for clinical epidemiology;
  • Population-based model that enables health systems to learn from structured data in order 
to adapt their clinical workflows;
  • Increase the coding of health issues within the clinical record, enabling demonstration to clinicians of the advantages of having a well-coded patient record (i.e. the clinical benefits rather than the research benefits).


Are there any early findings or indications you can report? Are there any publications?

Guidelines for annotation of clinical texts have been designed and the annotators have had preliminary training. An initial set of clinical texts have been annotated by 2 pairs of coders, one pair using SNOMED CT and the other pair using IMO Anywhere as coding system.