Discriminant histological features in the diagnosis of chronic idiopathic inflammatory bowel disease: analysis of a large dataset by a novel data visualisation technique
- 1Academic Unit of Pathology, Section of Oncology and Pathology, Division of Genomic Medicine, University of Sheffield, Sheffield S10 2UL, South Yorkshire, UK
- 2Department of Automatic Control and Systems Engineering, University of Sheffield
- Correspondence to: Dr S S Cross, Academic Unit of Pathology, Section of Oncology and Pathology, Division of Genomic Medicine, University of Sheffield Medical School, Beech Hill Road, Sheffield S10 2UL, South Yorkshire, UK;
- Accepted 21 May 2001
Background/Aims: The histopathological assessment of endoscopic colorectal biopsies is important in the distinction between normality and chronic idiopathic inflammatory bowel disease, and between ulcerative colitis and Crohn's disease, in subjects with symptoms of bowel dysfunction. This study aims to use carefully defined histopathological observations on a large study population to produce systems that improve classification into these diagnostic categories.
Methods: Eight hundred and nine endoscopic colorectal biopsies with verified outcomes (165 normal, 473 ulcerative colitis, 171 Crohn's disease) were examined by a single experienced histopathologist and 20 defined features were recorded for each case using a novel graphical interface with reference images of each feature. These features, together with age and sex, were used to produce and test statistical classifiers using logistic regression and a novel growing cell structure technique.
Results: The distinction between chronic idiopathic inflammatory bowel disease and normality was made with a good level of performance by both statistical classifiers (with areas under the receiver operating characteristic curves above 0.80). The growing cell structure system selected features as discriminant that agreed with the published literature. Logistic regression produced a more variable selection of discriminant features because of the high correlation between many features. The distinction between ulcerative colitis and Crohn's disease was performed less accurately, with areas under the receiver operating characteristic curves of about 0.70. Again the features selected as discriminant broadly agreed with those in the published literature.
Conclusions: Histopathological examination of endoscopic colorectal biopsies is an effective method of distinguishing between subjects with chronic idiopathic inflammatory bowel disease and normality, but less good at distinguishing between ulcerative colitis and Crohn's disease. The features selected as discriminant in this large statistical analysis broadly agree with those published in the literature from more qualitative studies.
- chronic idiopathic inflammatory bowel disease
- ulcerative colitis
- Crohn's disease
- histopathological features
- BSG, British Society of Gastroenterology
- CIIBD, chronic idiopathic inflammatory bowel disease
- GCS, growing cell structure
- LP, lamina propria
- ROC, receiver operating characteristic
The distinction between chronic idiopathic inflammatory bowel disease (CIIBD) and normality, and between ulcerative colitis and Crohn's disease in known cases of CIIBD, is very important to patient management. Histopathological assessment of endoscopic colorectal biopsies often plays a large part in these diagnostic processes. Although the accuracy and reliability of histopathology diagnosis is reasonable in this context,1–3 it is not perfect, and some studies have sought to improve this by statistical analysis of defined observations.1–7 However, many of these studies have had relatively small numbers of cases, usually 40–504,5 (the largest being 128),7 and different selections of features, many of which have been poorly defined. In our study, we use a large (809 cases) study population, making carefully defined observations entered using a novel graphical interface and use two classification techniques, one novel and unique to our laboratory, to investigate the most discriminant histopathological features for these diagnostic processes.
MATERIALS AND METHODS
In total, we studied 809 large bowel endoscopic biopsies reported in the department of histopathology, Royal Hallamshire Hospital, Sheffield between 1990 and 1995 (inclusive). Biopsies originating in diverted bowel, rectal stumps, or pouches were excluded, as were those with a diagnosis of neoplasia. The diagnosis was confirmed by the finding of typical endoscopic appearances seen on video photographs in the clinical notes, subsequent bowel resection, pattern of disease on radiological investigation, or microbiological culture results. In cases without confirmation by subsequent resection specimens this final diagnostic outcome was made with review of the patient's case notes.3 The biopsies were a mixed population of single distal biopsies and colonoscopic series from initial presentation and follow up of disease. By verified outcome, the study population contained 165 biopsies from normal subjects, 473 from patients with ulcerative colitis, and 171 with Crohn's disease.
The observed features
The biopsies were examined (blind to all clinical details) by a single experienced observer (SSC) using a computer interface that implements the British Society of Gastroenterology (BSG) guidelines for the initial biopsy diagnosis of suspected CIIBD,1 with digitised images representing examples of each histopathological feature.8 Some of the features are dichotomous variables—for example, the presence or absence of mucosal granulomas, whereas others are ordinal categories, such as mucin depletion classified into none, mild, moderate, or severe. Table 1 lists the observed features and their coding. The observer recorded each feature by choosing the digitised image that was most similar to the image seen down the light microscope and using a mouse to click on this, which recorded the observation in the computerised database. The observations were spread over a period of nine months, with no more than 30 biopsies observed in a single day. The histopathological diagnosis at the initial reporting of each specimen was recorded, and after making all the observations on a patient the observing histopathologist (SSC) recorded a diagnosis. The sensitivities and specificities were calculated for these diagnoses.
Partitioning of the dataset
The dataset was analysed in two parts. The whole dataset was used with a dichotomous outcome of normal or CIIBD (with Crohn's disease and ulcerative colitis cases combined as a single category). The order was randomised and the first 540 cases were used as the training set and the other 269 cases as the test set. Three hundred and seventy cases from the whole dataset were selected with a dichotomous outcome of Crohn's disease or ulcerative colitis. These cases were defined by their outcome and the presence of active inflammation (as indicated by polymorphs in the lamina propria (LP)). The order of these was randomised and the first 247 cases were used as the training set and the other 123 cases as the test set. The input data in both partitions were normalised to the interval (0, 1). This corresponds only to a shift in origin for the binary features and can be justified for the categorical features on the grounds that they are ordinal, that increasing value corresponds to an increasing effect,9 and that the distance between each ordinal category has been designed to be equal in the BSG guidelines.1
Logistic regression analysis
Logistic regression was performed using a main effects only model implemented within the Statistical Package for Social Sciences (http://www.spss.com). For both partitions of the dataset four models were created—all input variables entered together, forward conditional stepwise analysis, backward conditional stepwise analysis, and use of the features selected as significant by the BSG guidelines in their authors' meta-analysis of the literature.1 For each model, both the area under the receiver operating characteristic (ROC) curve for the test set (using the method of Hanley and McNeil),10 and the odds ratios for each input variable were calculated.11 The sensitivities and specificities were calculated for the points on the ROC curves that gave the greatest overall accuracy.
Growing cell structure system analysis
Our novel variant of the growing cell structure (GCS) system performs as a conventional statistical classifier, but also produces visualisation of each input variable in relation to the known outcome. A detailed explanation of our GCS system has been given elsewhere with a clear analogy of its function.12 The process is simple and divided into two steps, namely: sorting of cases into different groups and calculating predictive probabilities for each group. The cases are sorted into different groups using just the input data, without reference to the known outcome, as an unsupervised process. Each node in the network is a collection of cases which are grouped together because they have features that are more similar to each other than to the rest of the cases—for example, they all have pronounced cryptitis and severe mucin depletion. The network starts with three nodes and the cases are distributed between these three nodes so that the cases at each node are as similar to each other as possible; that is, the differences between cases in each node (error) are minimised. Once the cases have been distributed between the initial three nodes a fourth node is added, adjacent to the two nodes that have most error in them, and the cases are then redistributed between the four nodes. The process of adding new nodes is continued until adding a node does not reduce the overall error across the whole network. It should be emphasised that the whole process of sorting the cases to the nodal network is made without reference to the outcome and without any decisions by the user as to which features are more likely to be significant than others. The distribution of each input variable can be visualised on the final network using a colour contour display of the mean value for each node. This produces a series of colour contour maps for all the input variables, which can be compared with each other. Once the network of nodes has been constructed, and the cases distributed among them, then the known outcomes are used to calculate the probabilities of each outcome for each node. These probabilities are calculated using Bayes's theorem implemented by a Parzen window method.13 In our study, the outcomes were CIIBD or normal and Crohn's disease or ulcerative colitis so—for example, the calculated probability could be the probability of a patient having Crohn's disease if the input variables for the biopsy features placed that case on that node in the network. Again, these probabilities are visualised as a colour contour display on the network. Visual comparison of the input feature overlays with the probability overlays indicates which input features have an association with a particular outcome.12 The GCS system was used on both partitions of the dataset with all input variables included and just the BSG selected variables. The sensitivities and specificities were calculated for the points on the ROC curves that gave the greatest overall accuracy; that is, the points where the highest percentages were correctly classified regardless of prevalence (equal weighting to false positives and false negatives).
After the observer had made and recorded all the observations on a case he recorded his own diagnosis, within the categories recommended in the BSG guidelines,1 still without any knowledge of the clinical details of the case.
Tables 2–7 and figs 1 and 2 summarise the results.
Our study examines the histological features that are discriminant in two decisions, namely: (1) Does an endoscopic colorectal biopsy show features of CIIBD or not? (2) If so, does it show features of Crohn's disease or ulcerative colitis? Two complementary methods are used to investigate these classifications.
Both methods give a good performance on the CIIBD/normality classification with areas under the ROC curves above 0.80, sensitivities of 76–80%, and specificities of 86–98% (table 6). This level of performance suggests that the input features that were included in our study do contain important discriminant value. By looking at the colour map of the posterior probability of CIIBD from the GCS system (fig 1), and comparing this with the colour maps for the input features, discriminant features can be selected by the fact that they have similarly distributed colour maps to the probability map. From this analysis, the following features appear to be discriminant for CIIBD: increased age, abnormal mucosal surface, abnormal crypt architecture, reduced crypt profiles, increased LP cellularity, patchy increase in LP cellularity, transmucosal increase in LP cellularity, extent of cryptitis and number of polymorphs, extent of crypt abscesses and number of polymorphs, LP polymorphs, epithelial degeneration and ulceration, mucin depletion, increased intraepithelial lymphocytes, LP granulomas, and submucosal granulomas. All these features agree with the canonical information in the published literature.1
When looking at the odds ratios for the input variables for the logistic equation, which produces a similar level of classification performance, a rather different picture emerges. With all input features entered together, only one input feature has an odds ratio with the lower 95% confidence interval above one, namely: increased LP polymorphs (table 2). Thirteen other features have odds ratios above one but with wide 95% confidence intervals, extending to below one, so they are not significant. This is probably because of the high degree of correlation between many of the input features so that one feature, such as increased LP polymorphs, may have a very high odds ratio, but all the other input variables that are highly correlated with it will have lower odds ratios. An advantage of the GCS system is that the initial clustering of cases in the network is unsupervised, so that all discriminant input variables can be discerned, even if they are highly correlated. However, the input features that have odds ratios above one (but without a lower 95% confidence interval above one) do correspond closely to features that are cited in the literature as being discriminant.1 The logistic regression methods using selected variables (table 3) produce a slightly different selection of variables with forward and backward conditional entry, but LP polymorphs, abnormal crypt architecture, and male sex feature in both selections.
The distinction between ulcerative colitis and Crohn's disease in cases known to be CIIBD appears to be a more difficult classification task, with a lower area under the ROC curves for both logistic regression and the GCS system (in the range 0.72–0.78, table 7). Looking at the colour map overlays from the GCS system (fig 2), the following input features have maps that correspond closely with the area of a high probability of Crohn's disease—patchy increase in LP, LP granulomas, and submucosal granulomas. These input features correlate with the area of high probability of ulcerative colitis—abnormal mucosal surface, abnormal crypt architecture, reduced crypt profiles, transmucosal increase in cellularity, extent and number of polymorphs in cryptitis, extent and number of polymorphs in crypt abscesses, increased LP polymorphs, degenerate or ulcerated surface epithelial cells, and pronounced mucin depletion. The following features appear to be non-discriminant because they cover areas of high probability for both ulcerative colitis and Crohn's disease—age, sex, increased lymphoid aggregates, intraepithelial lymphocytes, and basal histiocytic cells. The selection of these features by the GCS system in our study is again in broad agreement with the published literature.1
In the logistic regression method, the outcome of Crohn's disease was coded as zero and that of ulcerative colitis as one. Thus, in tables 4 and 5 any odds ratios above one will indicate an input feature that was discriminant for ulcerative colitis. In table 4 no features have odds ratios with lower 95% confidence intervals above one (indicating significant discriminant value for a diagnosis of ulcerative colitis). The following features in table 4 have odds ratios with an upper 95% confidence interval below one (indicating significant discriminant value, at a 5% level, for a diagnosis of Crohn's disease): female sex, patchy increased LP cellularity, extent of crypt abscesses, increased LP polymorphs, and LP granulomas. With forward conditional entry of variables, an abnormal glandular architecture favours ulcerative colitis and a superficial increase in LP cellularity, patchy increase in LP cellularity, and LP granulomas favour Crohn's disease. Backward conditional entry adds the extent of cryptitis as a feature favouring ulcerative colitis and female sex and increased LP polymorphs as features favouring Crohn's disease. As for the CIIBD/normality decision, the logistic regression does not identify all the features that might be discriminant if they are highly correlated with other significant input features.
Both techniques in our study identify histological features that are discriminant for the distinction between CIIBD and normality, and between Crohn's disease and ulcerative colitis in cases of CIIBD. The specific features that they identify show some variation between the techniques, with the GCS system identifying more features than logistic regression, because in this last technique only one feature may be selected from a group of highly correlated variables. The features that these techniques identify are in agreement with the published literature1 and no additional discriminant features are revealed. Thus, our study provides confirmation of existing knowledge, and it does so in the context of a large dataset with many cases and carefully defined features. Tables 6 and 7 show some interesting differences between the trained classification systems and human performance. For the distinction between normality and CIIBD, both the observing histopathologist, and the initial reporting histopathologist had slightly higher sensitivities and much higher specificities than any of the classification techniques. This suggests that either the histopathologists must be using some diagnostic features that are not encoded by the defined observed features in our study or that they are integrating them in a more subtle way than the classification techniques. Visual images contain immense amounts of data and encoding these into discrete variables, even as many as were used in our study, leads to a considerable loss of information.14 It may be that even more categories need to be introduced for each feature to provide sufficiently rich information for the classification systems to produce optimal predictive power.15 For the distinction between ulcerative colitis and Crohn's disease (table 7) the human performance differs between the initial reporting and the observing histopathologists. Whereas the sensitivities for ulcerative colitis, and specificity of the observing pathologist, are similar to the classification techniques, the specificity for Crohn's disease of the initial reporting histopathologists is much higher (98%) than any other method. The initial reporting pathologists had all the clinical information that was on the histopathology request form available to them when making the diagnosis. This may have contained information about small bowel radiological studies or other investigations that may have facilitated the diagnosis. Because the observing pathologist's performance was not better than any of the classification techniques, this suggests that there are not histopathology features additional to those included in our study that are of discriminatory value. However, the relatively modest performance of any system in the Crohn's disease from ulcerative colitis distinction suggests that histopathology will never provide perfect classification of these diseases, and that other investigations are required to add information.
Take home messages
Histopathological examination of endoscopic colorectal biopsies is useful to distinguish between normal subjects and those with chronic idiopathic inflammatory bowel disease
It is less good at distinguishing between ulcerative colitis and Crohn's disease