Deep Distance Transform for Tubular Structure Segmentation in CT Scans.pdf

deep,Distance,pdf,Scans,segmentation,STRUCTURE,Transform,Tubular,承台,计算机与AI

文档页数：10

文档大小：1.87MB

文档格式：pdf

文档分类：计算机与AI

上传会员：匿名用户

上传日期：2025-06-13

最后更新：2025-06-13

DeepDistance Transform forTubular StructureSegmentation in CTScans

Yan WangltXu Wei24Fengze Liu'Jieneng Chen3Yuyin Zhou Elliot K. Fishman Alan L. Yuille Wei Shen'tJohns Hopkins University University of California San Diego *Tongji UniversityThe Johns Hopkins University School of Medicine

Abstract

Tubulr structere segmentation in medical images e.gthe use of puters fo aid in screening early stages of re- segmenting vessels in CT scans serves as α vital step inlated diseases. But automatic tubular structure segmentc- tion in CT scans is α challenging problem due to issuessuch as poor contrast noise and plicated background.A tubular structure usuolly has α cylinder-like shape which can be well represented by its skeleton and cross-sectionalradil (scales. Inspired by this we propose α geomery- cware tubular structure segmentation method Deep Dis-Icnce Transform (DDT) whtich bines intuitions from theclassical distance transform for skeletonization and mod- ern deep segmentarion nerworks. DDT first learns a multi-Iask network to predict α segmentarion mask for α tuxbular structure and α distance map. Each vale in the mop repre-sents the distance from each rxbulor structure voxel to therubular structure surface. Then the segmentation mask is refned by leveraging the shape prior reconstrnucted fromthe distance map. We apply our DDT on six medical int- αge datasers. Resulrs show that (1) DDT can boost nuxbular13% DSC improvement for pancrearic duct segmentation) sructure segmentarion performance significantly (e.g. overcnd (2) DDT additionally provides α geometrical measure-ment for α nuxbulor structure which is importanr for clini- cal diagnosis (e.g. the cross-sectional scale of α pancreaticdct can be an indicator for pancreatic cancer).

Figure 1. A tubular shape is presented as the envelope of a familyof spheres with contiuusly changing center points and radi [9].

In this paper we investigate automatic tubular or-for the characterization of various diseases [18]. For exam- gan/tissue segmentation from CT scans which is important ple pancreatic duct dilatation or abrupt pancreatic duct cal- iber change signifies high risk for pancreatic ductal adeno-carcinoma (PDAC) which is the third most mon causeof cancer death in the US [11]. Another example is that ob- structed vessels lead to coronary heart disease which is theleading cause of death in the US [27].

Segmenting tubular organs/tissues from CT scans is apopular but challenging problem. Existing methods ad-groups: (1) Geometry-based methods which build de- dressing this problem can be roughly categorized into twoformable shapemodels tofitubular structuresbyexploiting their geometrical properties [43 45 3 25] e.g. a tubularstructure can be well represented by its skeleron aka sym-But due to the lack of poweful leaming models these metry axis or medial axis and it has a cylindrical surface.methods cannot deal with poor contrast noise and - plicated background. (2) Learning-based methods whichlearm a per-pixel classification model to detect tubular struc-tures. The performance of this type of methods is largely boosted by deep learming especially fully convolutionalbee out-of-the-box models for tubular organ/tissue seg- networks (FCN) [23 49 48]. FCN and its variants havementation and achieve state-of-the-art results [24 47]. But which inevitably ignores the geometric arrangement of the these networks simply try to learn a class label per voxel voxels in a tubular structure and consequently can not guar- antee that the obtained segmentation has the right shape.

1.Introduction

body with notable examples including blood vessels pan- Tubular structures are ubiquitous throughout the humancreatic duct and urinary tract. They occur in specific environments at the boundary of liquids solids or air and sur- rounding tissues and play a prominent role in sustainingphysiological functions of the human body.

Since a tubular structure can be welrepresented by itsskeleton and the cross-sectional radius of each skeleton

point as shown in Fig. 1 these intrinsic geometric charac-teristics should be taken into account to serve as a valuable prior. To this end a straightforward strategy is to first train amodel e.g. a deep network to directly predict whether eachvoxel is on the skeleton of the tubular structure or not as well as the cross-sectional radius of each skeleton point and thenits skeleton and radi [34]. However such a strategy has reconstruct the segmentation of the tubular structure fromtraining are not easily obtained. Although they can be ap- severe limitations: (1) The ground-truth skeletons used forproximately puted from the ground-truth segmentationmask by 3D skeletonization methods skeleton extraction from 3D mesh representation itself is a hard and unsolvedproblem [5]. Without reliable skeleton ground-truths theguaranteed. (2) It is hard for the classifier to distinguish performance of tubular structure segmentation cannot bevoxels on the skeleton itself from those immediately next to it as they have similar features but different labels.

To tackle the obstacles mentioned above we propose toperform tubular structure segmentation by training a multi- task deep network to predict not only a segmentation mask but also a distance map consisting of the distance trans-form value from each tubular structure voxel to the tubular structure surface rather than a single skeleton/non-skeletonlabel. Distance ransform [28] is a classical image pro-size of the input image each value in which is the dis- tance from each foreground pixel/voxel to the foregroundboundary. Distance transform is also known as the basis ofone type of skeletonization algorithms [17] .e. the ridge of the distance map is the skeleton. Thus the predictedtubular structure. This motivated us to design a geometry- distance map encodes the geometric characteristics of theqses dnouddeeleveraging the shape prior reconstructed from the distance map. Essentially our approach performs tubular structuresegmentation by an implicit skeletonization-reconstruction procedure with no requirements for skeleton ground-truths.We stress that the distance transform brings two benefitsfor our approach: (1) Distance transform values are de- fined on each voxel inside a tubular structure which elimi-nates the problem of the discontinuity between the skeleton and its surrounding voxels; (2) distance transform values onthe skeleton (the ridge of the distance map) are exactly thecross-sectional radi (scales) of the tubular structure which is an important geometrical measurement. To make theditionally propose a distance loss term used for network distance transform value prediction more precise we ad-transform value is far away from its ground-truth.

We term our method Deep Distance Transform (DDT) as it naturally bines intuitions from the classical dis-tance transform for skeletonization and modern deep seg-

mentation networks. We emphasize that DDT has two ad-tubular structure segmentation by taking the geometric vantages over vanilla segmentation networks: (1) It guides property of tubular structures into account. This reducesxao o ss en s surrounding structures and ensures that the segmentation re-sddd sectional scales of a tubular structure as by-products whichsuch as clinical diagnosis and virtual endoscopy [7]. are important for the further study of the tubular structure

We verify DDT on six datasets including five datasetssis. For segmentation task the performance of our DDT ex- for segmentation task and one dataset for clinical diagno-ceeds all backbone networks by a large margin with evenover 13% improvement in terms of Dice-Sgrensen coeffi- cient for pancreatic duct segmentation on the famous 3D-ness of each proposed module in DDT. The experiment Unet [12]. The ablation study further shows the effective-for clinical diagnosis leverages dilated pancreatic duct ascue for finding missing PDAC tumors by original deep net- works which verifies the potential of our DDT for earlydiagnosis of pancreatic cancer.

2.RelatedWork

2.1. Tubular Structure Segmentation

2.1.1 Geometry-based Methods

Various methods have been proposed to improve the perfor- mance of tubular structure segmentation by considering thegiven here. (1) Contour-based methods extracted the seg- geometric characteristics and a non-exhaustive overview ismentation mask of a tubular structure by means of approx-imating its shape in the cross-sectional domain [1 10]. (2) Minimal path approaches conducted tubular structure track-ing and were usually interactive. They captured the global minimum curve (energy weighted by the image potential)between two points given by the user [9]. (3) Model-based tracking methods required to refine a tubular struc- ture model which most of the time adopted a 3D cylinders o they calculated the new model position by seeking for theoptimal model match among all possible new model posi-line and estimated the radius of linear structures. For ex- tions [8]. (4) Centerline based methods found the center-ample multiscale centerline detection method proposed in [34] adopted the idea of distance transform and reformu-lated centerline detection and radius estimation in terms ofa regression problem in 2D. Our work fully leverages the geometric information of a tubular structure proposing adistance transform algorithm to implicitly learm the skeleton and cross-sectional radius and the final segmentation mask is reconstructed by adopting the shape prior of the tubularstructure.

Figure 2. The training and testing stage of DDT illustrated oefirst one is targeting on the ground-truth label map which performs per- voxel veins/non-veins classification and the second head branch is itation. Our DDT has two head branches: theto leverage the shape prior obtained from the scale class map and the pse eudo skeleton map to refine the segmentation mask.

2.1.2Learning-based Method

[e1] pasodond ose sem eoe Suspuao

2.2. Learning-based Skeleton Extraction

Learning-based method for tubular structure segmentationTraditional methods such as 2-D Gabor wavelet and classi- infers a rule from labeled training pairs one for each pixel.random decision forest based method [2] achieved consid- fier bination [35] ridge-based segmentation [36] anderable progress. In the past years various 2D and 3D deepsegmentation networks have bee very popular. Some multi-organ segmentation methods [41 29] were proposed lar organs. DeepVessel [16] put a four-stage HED-like CNN to segment multiple organs simultaneously including tubu-and conditional random field into an integrated deep net-3D-Unet [12] was a two-phase 3D network for kidney ves- work to segment retinal vessel. Kid-Net [37] inspired fromsels segmentation. ResDSN [50 51] and 3D-Unet [12] were used in Hyper-pairing network [47] to segment tissuesin pancreas including duct by bining information fromdual-phase imaging. Besides 3D-HED and its variant were applied for vascular boundary detection [24]. Other sce-narios such as using synthetic data to improve endotracheal tube segmentation [15]. Cross-modality domain adaptationframework with adversarial leaming which dealt with thedomain shift in segmenting biomedical images including as-

has been widely studied in recent decades [38 31 34 22 Learning-based skeleton extraction from natural images21] and achieved promising progress with the help of deeptask learming Le. jointly leaming skeleton pixel classifica- learming [32 20 46 40]. Shen er αl. [32] showed that multi-tion and skeleton scale regression was important to obtain accurate predicted scales and it was useful for skeleton-segmentation calculated the distance from each voxel to its based object segmentation. One recent work for bronchusnearest skeleton point [39].

tasks since they require the skeleton ground-truth which is However these methods canot be directlyapplied toournot easy to obtain from a tiny and highly distorted 3D maskimages [42]. due to the monly existed annotation errors for medical

3.Methodology

We first define a 3D volume X of size L × W' × H as afunction on the coordinate set V = {v|v N × Nw ×N} i.e. X : V → R C R where the value on position

v is defined as x = X(v). N Nw Ng represent forthe integer set ranging from 1 to L W' H respectively so that the Cartesian product of them can form the coordinatesegmentation is to predict the label Y of all voxels in the set. Given a 3D CT scan X the goal of tubular structureCT scan where gv {0 1} denotes the predicted label foras a tubular structure voxel then gv = 1 otherwise g = 0. each voxel at position v .e. if the voxel at v is predictedmaining of the paper for convenience sake. Fig illustrate We also use v to denote the voxel at position v in the re-our tubular structure segmentation network .e. DDT.

3.1. Distance Transform for Tubular Structure

In this section we discuss how to perfom distance trans-form for tubular structure voxels. Given the ground-truth label map Y of the CT scan X in the training phase let Cvbe the set of voxels on the tubular structure surface which can be defined by

by performing distance transform on the CT scan X the where N’(v) denotes the 6-neighbour voxels of v. Then distance map D is puted by

Note that for each tubular structure voxel v the distancethe nearest distance from v to the tubular structure surface transform assigns it a distance transform value which is] suo snqo e sdru u uep Cv . Here we use Euclidean distance as skeletons from Eu-

We further quantize each d into one of K bins by round-ing d to the nearest integer which converts the continu- ous distance map D to a discrete quantized distance map Z where ≥ {O -.. K}. We do this quantization becausetraining a deep network directly for regression is relatively unstable since outliers Le. the monly existed annota-tion errors for medical images [42]. cause a large errortem which makes it difficult for the network to converge andleads to unstable predictions [30]. Based on quantization we rephrase the distance prediction problem as a classifi- cation problem .e. to determine the corresponding bin foreach quantized distance. We ter the K bins of the quan- tized distances as K scale classes. We use the term scalesince the distance transform values at the skeleton voxels of a tubular structure are its cross-sectional scales.

3.2. Network Training for Deep Distance Transform

Given a 3D CT scan X and its ground-truth label mapY we can pute is scale cfass map (quantized distancemap) Z according to the method given in Sec. 3.1. In this

(1)

(2)

section we describe how to train a deep network for tubularstructure segmentation by targeting on both Y and Z.

As shown in Fig. 2 our DDT model has two headbranches. The first one is targeting on the ground-truth labelmap Y' which performs per-voxel classification for seman- tic segmentation with a weighted cross-entropy loss func-tion Cds

where W is the parameters of the network backbone we isthe parameters of this head branch and p (W we±) is theprobability that v is a tubular structure voxel as predictedloss weights for tubular structure and background classes respectively.

The second head branch is predicting on the scale classmap Z which performs scale prediction for tubular struc- ture voxels (i.e. ≥ > 0). We introduce a new distance lossfunction Cas to leam this head branch:

where W isthe parameters of the network backbone wasis the parameters of the second head branch 1(-) is an indi-cation function A is a trade-off parameter which balances the two loss terms (we simply set 入 = 1 in our imple-mentation) g(W was) is the probability that the scaleof v belongs to k-th scale class and wv is a normalizedthat the first term of Eq. 4 is the standard softmax loss which penalizes the classification error for each scale classequally. The second term of Eq. 4 is termed as distanceIoss term which penalizes the difference between each pre- dicted scale class (.e. max; 9 (W was) and its ground-truth scale class z where the penalty is controlled by wv. Finally the loss function for our segmentation network isC = C ~~T” sv = 1; otherwise sv = 0 and 7 is thethreshold.~~

processing followed by [47] we truncate the raw inten- Our implementation is based on Py Torch. For data pre-sity values within the range of [100 240] HU and normal-augmentation (i.e translation rotation and fipping) is con- ize each CT scan into zero mean and unit variance. Dataof 24. During training we randomly sample patches of a ducted in all the methods leading to an augmentation factorspecified size (.e. 64) due to memory issue. We use expo--nential learning rate decay with = 0.99. During testing we employ the sliding window strategy to obtain the finalpredictions. The groundtruth distance map for each tubu- lar structure is puted by finding the euclidean distanceof each foreground voxel to its nearest boundary voxels.Dice-Sprensen coefficient (DSC) in the rest of the paper The segmentation accuracy is measured by the well-knownunless otherwise specified.

b. Shape reconstruetion. For each voxel v its predicted scale & is given by & = arg max 9. We fit a Gaus-structed shape *: sian kernel to soften each ball and obtain a soft recon-

(5)

where Φ(-) denotes the density function of a multivari-ate normal distribution u is the mean and Eu is the co-variance matrix. According to the 3-sigma rule weset ∑ = ()²I where I is an identity matrix. We notice that the peak of Φ(-;u E) bees smallerif is larger. To normalize the peak of cach nor- mal distribution we introduce a normalization factorc=(2π)²det().

4.1.2 The PDAC Segmentation Dataset [47]

We first study the PDAC segmentation dataset [47] whichhas 239 patients with pathologically proven PDAC. All CT scans are contrast enhanced images and our experiments areconducted n onlyportal venous phase. We follow the same setting and the same cross-validation as reported in [47].DSCs for three structures were reported in [47]: abnormalpancreas PDAC mass and pancreatic duct. We only show the average and standard deviation over all cases for pan-creatic duct which is a tubular structure.

c. Segmentation refinement. We use the soft recon- structed shape Y* to refine the segmentation proba-bility Pa which results in a refined segmentation mapY":

(6)

The final segmentation mask is obtained by thresh-olding V ie. if g > T" gv = 1. otherwise jv = 0 where jQ and §v are the value of voxel at position v ofY" and Y respectively.

Results and Discussions. To evaluate the performance ofthe proposed DDT framework we pare it with a per- voxel classification method [47] termed as SegBaseline inTable 1. It can be seen that our approach outperforms the baseline reported in [47] by a large margin. It is also worthmentioning that although our DDT is only tested on ve-nous phase the performance is parable with the hyper-

ometrical measurement for a tubular structure which is es- As mentioned in Sec. 1 the predicted scale is a ge-sential for clinical diagnosis. We will show one clinical ap-plication in Sec. 4.2.

资源链接请先登录（扫码可直接登录、免注册）点此一键登录
①本文档内容版权归属内容提供方。如果您对本资料有版权申诉，请及时联系我方进行处理（联系方式详见页脚）。
②由于网络或浏览器兼容性等问题导致下载失败，请加客服微信处理（详见下载弹窗提示），感谢理解。
③本资料由其他用户上传，本站不保证质量、数量等令人满意，若存在资料虚假不完整，请及时联系客服投诉处理。
④本站仅收取资料上传人设置的下载费中的一部分分成，用以平摊存储及运营成本。本站仅为用户提供资料分享平台，且会员之间资料免费共享（平台无费用分成），不提供其他经营性业务。