Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation.pdf

3D,Architecture,Coarse,Fine,Image,Medical,Neural,pdf,Search,segmentation,计算机与AI

文档页数：10

文档大小：1.07MB

文档格式：pdf

文档分类：计算机与AI

上传会员：匿名用户

上传日期：2025-06-13

最后更新：2025-06-13

C2FNAS:Coarse-to-Fine Neural Architecture Search for3D Medical ImageSegmentation

Yutong Bai Qihang Yu Yixiao Zhang Dong Yang² Alan L. Yuille Holger Roth² Daguang Xu21 The Johns Hopkins University 2 NVIDIA

Abstract

proved very successfiul in parsing organs or tumours in 3D corvolation neural nerworks (CNN) have been3D medical images but it remains sophisticated cnd rime- consuming to choose or design proper 3D nerworks givendifferent task contexts. Recently.Neural ArchitectureSearch (NAS) is proposed to solve this problem by search- ing for rhe best nerwork architectare caxtomatically. How-ment stoge often exists in NAS algorithms due to mem- ever the inconsistency between search stoge and deploy-ory constraints and large search space which could be-cnd rime-consaing tosks such as 3D medical image seg- e more serious when applying NAS to some memoryral architecture search (C2FNAS) to cxtomctically seorch mentation. In this paper we propose α coarse-to-fine neu-Iency on nerwork size or input sie. Specifcally we di- α 3D segmentation network from scratch without inconsis-vide the search procedure into two stages: 1) the coarseSIage where we search the macro-level topology of the network i.e. how each corvolution module is cornectedto other modules: 2) the fne stage where we search at micro-level for operations in each cell based on previoussearched macro-level topology. The cocrse-to-fine mcnnercnd mearwhile resoves the inconsistency. We evaluate our divides the secrch procedure into two consecutive stagesDecalthon (MSD) challenge and achieve state-of-the-art method on I0 public datasets from Medlical Segmentationperformance with the nerwork searched using one dataser which demonstrates the effectiveness and generalization of our searched models.

Figure 1. Image and mask examples from MSD tasks (from leftto right and top to botom): brain tumours lung tumours hip- pocampus hepatic vessel and tumours pancreas tumours andand anisotropic properties make it very challenging to achie st liver tumours respectively. The abnormalities texure variance spond to labels 1 2 and 3 respectively of each dataset. isfying segmentation performance. Red green and blue corre-

to get satisfying segmentation for some challenging struc-tures which could be extremely small with respect to theappearance. Besides abnormalities which results in a huge whole volume or vary a lot in terms of location shape andchange in texture and anisotropic property different vxel spacing) make the segmentation tasks even harder. Someexamples are showed in Fig 1.

segmentation network requires adequate expertise. Most Meanwhile manually designing a high-performance 3Dns somu e uixo uon Suiq ae sasa po sesapo m [61] 1N-A pue [g] N- setions. In some case an individual network is designed andonly works well for certain task. To leverage this problem Neural Architecture Search (NAS) technique is proposedin [42]. which aims at automatically discovering better neural network architectures than human-designed ones interms of performance parameters amount or putationcost. Starting from NASNet [43] many novel search spaces

1. Introduction

Medical image segmentation is an important pre-requisite of puter-aided diagnosis (CAD) which has M suoedde eu jo asuen apm e u pdde uq the emerging of deep leaming great achievements have been made in this area. However it remains very difficult

SssthFigure 2. An illstration of proposed C2FNAS. Each path from the lef-most node to the right-most node is a candidate architecture. Each color represents one category of operations e.g. depthwise conv dilated v or 2D/3D/P3D conv which are more mon in medicalimage area. The dottd line indicates skip-connections from encoder to decoder. The macro-level topology is determined by coarse stage search while the micro-level operations are further selected in fine stage search.

and search methods have been proposed [2 9 15 16 24]. However only a few works apply NAS on medical imagesegmentation [13 33 40] and they only achieve a para- ble performance versus those manually designed networks.

Inspired by the successful handcrafted architectures suchas ResNet [11] and MobileNet [28] many NAS works focus on searching for network building blocks. However suchworks usually search in a shallow network while deploy- ing with a deeper one. An inconsistency exiss in networksize between the search stage and deployment stage [6]. [3]path at each iteration and [] proposed to progressively re- and [10] avoided this problem through activating only onede d op duce search space and enlarge the network in order to re-

Nevertheless when the network topology is involved inthe search space things bee more plex because no inconsistency is allowed in network size. [15] incorporatedthe network topology into search space and relieved thecrop size. However on memory-costly tasks such as 3D memory tensity instead with a sacrifice on batch size andmedical image segmentation the memory scarcity cannot be solved by lowering the batch size or cropping size sincethey are already very small pared to those of 2D tasks.a uo ame uasa pue aoueoad asom Reducing them to a smaller number would lead to a much

between search stage and deployment stage we propose To avoid the inconsistency on network size or input sizea coarse-to-fine neural architecture search scheme for 3Dmedical image segmentation (see Fig. 2). In detail we di- vide the search procedure into coarse stage and fine stage.In the coarse stage the search is in a small search space with limited network topologies therefore searching in atrain-from-scratch manner is affordable for each network.Moreover to reduce the search space and make the search procedure more eficient we constrain the search spaceunder inspirations from successful medical segmentation network designs: (1) U-shape encoder-decoder structure;(2) Skip-connections between the down-sampling paths andthe up-sampling paths. The search space is largely re- duced with these two priors. Afterwards we apply atopology-similarity based evolutionary algorithm consider- ing the search space properties which makes the searchingprocedure focused on the promising architecture topologies.In the fine stage the aim is to find the best operations in-

side each cell. Motivated by [40] we let the network itself choose the operation among 2D 3D and pseudo-3D (P3D) Since the topology is already determined by coarse stage so that it can capture features from different viewpoints.gs-ao ed-us u ansad ou a amNAS manner [10].

For validation we apply the proposed method on tensegmentation tasks from MSD challenge [30] and achieve state-of-the-art performance. The network is searched us-ing the pancreas dataset which is one of the largest datasetamong the I0 tasks. Our result on this proxy dataset sur- passes the previous state-of-the-art by a large margin of 1%d uo sd pe sed uo ply the same model and training/testing hyper-parametersacross the other tasks demonstrating the robustness andtransfer-ability of the searched network.

Our contributions can be summarized into 3 folds: (1) wesearch a 3D segmentation network from scratch in a coarse- ndu o azis xomu to sogs m r g-osize; (2) we design the specific search space and searchtion priors; (3) our model achieves state-of-the-art perfor- method for each stage based on medical image segmenta-robustness and transfer-ability. mance on 10 datasets from MSD challenge and shows great

2.Related Work

2.1. Medical Image Segmentation

cess in natural image recognition [11] detection [25] and Deep learning based methods have achieved great suc-ical image segmentation tasks in recent years. Since U-Net segmentation [5] and they also have been dominating med-was first introduced in biomedical image segmentation [26] several modifications have been proposed. [8] extended the 2D U-Net to a 3D version. Later V-Net [19] is proposedto incorporate residual blocks and soft dice loss. [21] in- troduced attention module to reinforce the U-Net model.Researchers also tried to investigate other possible archi-volumes into 2D slices and handle them with 2D segmen- tectures despite U-Net. For example [27 38 39] cut 3DResNet50 as 2D encoder and appending 3D decoders af- tation network. [17] designed a hybrid network by usingterwards. In [35] 2D predictions are fused by a 3D networkto obtain a better prediction with contextual information.

However until now U-Net based architectures are stillthe most powerful models in this area. Recently [12] intro- duced nnU-Net and won the first place in Medical Segmen-tation Decalthon (MSD) Challenge [30]. They ensemble 2Dable to dynamically adapt itself to any given segmentation U-Net 3D U-Net and cascaded 3D U-Net. The network isparameters accordingly. The optimal results are achieved task by analysing the data attributes and adjusting hyper-with different binations of the aforementioned networksgiven various tasks.

a deeper one at deployment which results in an inconsis-tency problem. [3] proposed to activate single path of the super-network at each iteration to reduce the memory cost and [6] proposed to progressively increase the network sizemethods also face problems when the network topology is with a reduced approximate search space. However theseincluded in search. For instance the progressive manner cannot deal with the network topology. As for single-pathmethods sice there exis illegal aths inetwork topologyothers which results in a serious fairness problem [7]. 0 paeduo su aou pouen meu are sie oos

2.2.Neural Architecture Search

candidate from scratch respectively yet the search cost is A straightforward way to solve the issue is to train eachtoo expensive considering the magnitude of search space qssooodo oou sonpn [1]qeda which may contain millions of candidates or more. Auto-space and sacrifices the input size instead of network size at training stage where it uses a much smaller batch sizeand crop size. However it introduces a new inconsistency at input size to solve the old one at network size. Besides for memory-costly tasks such as 3D medical image segmen-input size needs to be reduced to unreasonably smaller to tation sacrificing input size is infeasible. The already smallfit the model in memory which usually leads to an unstabletraining problem in terms of convergence and the method only yields a random architecture finally.

Neural Architecture Search (NAS) aims at automaticallydiscovering better neural network architectures than human-designed ones. At the beginning stage most NAS algo- rithms are based on either reinforcement learning (RL) [1 based methods a controller is responsible for generating 42 43] or evolutionary algorithm (EA) [24 36]. In RLnew architectures to train and evaluate and the controlleritself is trained with the architecture accuracy on validation set as reward. In EA based methods architectures are mu-ated by accuracy on validation set Since parameter sharing tated to produce beter off-springs which are also evalu-scheme was proposed in [23] more search methods wereproposed such as differentiable NAS approaches [16] and one-shot NAS approaches [2] which reduced the searchcost to several GPU days or even several GPU hours.

3.2. Coarse-to-fine Neural Architecture Search

age recognition researchers also tried to extend it to other Besides the successes NAS has achieved in natural im-areas such as segmentation [15] detection [9] and atten- tion mechanism [14]. Moreover there are also some workssigned a search space consisting of 2D 3D and pseudo3D applying NAS to medical image segmentation area. [40] de-(P3D) operations and let the network itself choose betweenthese operations at each layer. [20 37] use the policy gradi-plored with a pre-defined 3D U-Net topology. and data augmentations. In [13 33] the cell structure is ex-

andinput siz and ine AS with medical image e In order to resolve the inconsistency in network sizementation we develop a coarse-to-fine neural architecturesearch method for automatically designing 3D segmenta- tion networks. Without loss of generality the architecturem aeds qes oodon o sisisuo aeds qesis represented by a directed acyclic graph (DAG) and cell operation space C which is represented by the color of eachnode in the DAG. Each network candidate is a sub-graph s S with color scheme c E C and weights u denoted asN(s c w).

a small search space of topology S and a huge search space Therefore the search space A is divided into two parts:of operation C:

3.Method

3.1. Inconsistency Problem

controller based on EA or RL to select network candidates Early works of NAS [1 24 36 42 43] typically use afrom search space; then the selected architecture is trained and evaluated. Such methods need to train numerous mod-els from scratch and thus lead to an expensive search cost.Recent works [2 16] propose a differentiable search method that reduces the search cost significantly where each net-ever a critical problem is that the super-network cannot fit work is treated as a sub-network of a super-network. How-into the memory. For these methods a trade-off is madeby sacrificing the network size at search stage and building

(1)

fordable to handle the inconsistency by training each candi -je s! 1 pue eus Xiensn s! aoeds qoeas Aojodo udate from scratch. For instance the topology search spaceS only has up to 2.9 × 104 candidates for a network with 12 cells [15]. The operation search space C can have millionsNAS for recognition e.g activating only one path at each t- of candidates but since topology s is given techniques ineration are incorporated naturally to solve the memory lim-itation. Therefore by regarding neural architecture search

Figure 3. An example of how introduced priors help reduce search spuce. The gry ndes areliint entirly fr the gra. Besides many illegal paths have been peuned off as well. Anexampleof illegal path and legal path is shown as the orange line path and green line path separately.

from scratch as a process of constructing a colored DAG we divide the search procedure into two stages: (1) Coarsestage: search at macro-level for the network topology and (2) Fine stage: search for the best way to color each node i.e. finding the most suitable operation configuration.

Figure 4. Proportion of clusters sampled during searching at coarsestage. Ths fgure illustrates effctivenes of the proposed ev lutionary searching algorithm. Different clusters are in differentcolors. The x-axis label *Evaluated Network Number” means the total number of networks trained and evaluated while the y-axislabel “Cluster Proportion is the propotion ofumberof networks belonging tpeifc lstetotttalber f evlte tworks. Itis shown that the algorithm gradually focuses on the most promising cluster 1 making the search procedure more effcient.

We start with defining macro-level and micro-level.Each network consists of multiple cells which are -posed of several convolutional layers. On macro level by defining how every cell is connected to each other the net-work topology is uniquely determined. Once the topology is determined we need to define which operation each noderepresents. On micro-level we assign an operation to eachnode which represents the operation inside the cell such as standard convolution or dilated convolution.

where and Ctrsin is the loss function used at the training stage and Accvsl the accuracy on validation set.

that 3D networks have heavier putation requirements It is extremely time-consuming especially consideringthe search space to make the search procedure more focused pared with 2D models. Thus it is necessary to reduceand efficient.

representing network topology then assign operations to With this two-stage procedure we frst construct a DAGeach cell by coloring the corresponding node in the graph.Therefore a network is constructed from scratch in a coarse-to-fine manner. By separating the macro-level andmicro-level we relieve the memory pressure and thus re- solve the inconsistency problem between search stage anddeployment stage.

networks and we find they all share something in - We revisit the successful medical image segmentationmon: (1) a U-shape encoder-decoder topology and (2)skip-connections between the down-sampling paths and the up-sampling paths. We incorporate these priors into ourtration of how the priors belp prune search space is shown method and prune the search space accordingly. An illus-in Fig. 3. Therefore the search space S is pruned to S′ andthe topology optimization bees:

3.3. Coarse Stage: Macro-level Search

In this stage we mainly focus on searching the topologyof the network. A default operation is assigned to each cell. specifically standard 3D convolution in this paper and thecell is used as the basic unit to construct the network.

(4)

Due to memory constraint and fairness problem traininga super-network and evaluating candidates with a weight-needs to be trained from scratch. The search on macro-level sharing method is infeasible which means each network mization and topology optimization: is formulated into a bi-level optimization with weight opti-

(5)

To further improve the search efficiency we proposean assumption of continuous relaxation of topology search make use of macro-level properties. The idea is that withspace two similar networks should also share a similar per-formance. Specifically we represent each network topology with a code and we define the network similarity as the eu-is more similar two networks are. Based on the distance clidean distance between two codes. Smaller the distanceeral clusters with K-means algorithm [18] based on their

(2)

(3)

where s represents current topology and c denotes a de-fault coloring scheme e.g. standard 3D convolution every-

Algorithm 1 Topology Similarity based Evolution1: popufation alltopologies 2:P=(pP.P}Claster(populatin）3: history H ② 4: set of trained models M = {m m.. m} {2)5: for i = 1 to k do model.topology RandomSample(p.)7: 6: model.ccuracy TrainEval(model.tpology)8: 9: while |H| ≤ / do add modeI to H and m 11: 10: while HasldleGPU() do model for pare D ②12: 13: for i = 1 fo k do add RandomSample(m.) to D14: rank P based on corrsponding accuracy in D model.topologg ← SampleUntrained(praxk1)16: 15: model.accuracy TrainEeval(modef.topology)18: return highest-accuracy model in 7 21 add model to 7 and m an大1

with uniformly sampling [10] as our search method. In de-tails we construet a super-network where each candidate is a sub-network of it and then at each iteration of the train-ing procedure a candidate is uniformly sampled from theprocedure ends we do random search for final operation super-network and trained and updated. After the trainingconfiguration. That is to say at searching stage we ran- dom sample K candidates and each candidate is initializedwith the weights from trained super-network. All these can-didates are ranked by validation performance and the one with the highest accuracy is finally picked.

one-shot NAS manner with uniformly sampling which is Therefore optimization of fine stage is in single-pathformulated as:

(6)

(7)

where C is the search space of fine stage i.e. all possiblesbinations of operations.

encoded codes. The evolution procedure is prompted in theunit of cluster. In details when producing next generation we random sample some networks from each cluster andrank the clusters by paring performance of these net- works. The higher rank a cluster is the higher proportionof next generation will e from this cluster. As shownin Fig. 4 the topology proposed by our algorithm grad- ually falls into the most promising cluster demonstratingthe effectiveness of it. To better make use of putation resources we further implement this EA algorithm in anasynchronous manner as shown in Algorithm 1.

tained.And the operation configuration c* es from After the coarse stage is finished the topology s* is ob-N’(s c* ur) is constructed. the fine stage. Therefore the final network architecture

4. Experiments

details of C2FNAS and then report our found architecture In this section we firstly introduce our implementationtation results on all 10 MSD datasets [30] which is a public (searched on MSD Pancreas dataset) with semantic segmen-prehensive benchmark for general-purpose algorithmic validation and testing covering a large span of challenges such as small data unbalanced labels large-ranging bjetscales multi-class labels and multi-modal imaging etc. It contains 10 segmentation datasets i.e. Brain Tumours Cardiac Liver Tumours Hippocampus Prostate Lung Tu-mours Pancreas Tumours Hepatic Vessels Spleen Colon Cancer.

3.4. Fine Stage: Micro-level Search

After the topology of the network is determined we fur-ther search the model at a fine-grained level by replacing the operations inside each cell. Each cell is a small fullyconvolutional module which takes 1 or 2 input tensors andoutputs 1 tensors. Since the topology is pre-determined in coarse stage cell i is simply represented by its operationsO which is a subset of the possible operation set O. Our cell structure is much simpler pared with [15] this iscell numbers. Given the tense memory requirement of 3D because there is a trade-off between the cell plexity andmodels we prefer more cells instead of a more plex cellstructure.

4.1. Implementation Details

Coarse Stage Search. At coarse stage search the net-sampling cells and 3 up-sampling cells so that the model work has 12 cells at total where 3 of them are down-size is moderate. With the priors introduced in Section 3 the search space is largely reduced from 2.9 × 104 to9.24 × 102.

The set of possible operations O consisting of the fol-lowing 3 choices: (1) 3 × 3 × 3 3D convolution; (2) 3 × 3 × 1onoo followed by 1 × 1 × 3 P3D convolution; (3) 3 × 3 × 1 2D

at the beginning of the network and another one at the end. For network architecture we define one stem moduleThe beginning module consists of two 3D 3 × 3 × 3 con- volution layers and strides are 1 2 respectively. The endmodule consists of two 3D 3 × 3 × 3 convolution layers anda trilinear up-sampling layer between the two layers. Each

training each candidate from scratch is infeasible. There- Considering the magnitude of fine stage search space fore to address the problem of memory limitation whilemaking search efficient we adopt single-path one-shot NAS

资源链接请先登录（扫码可直接登录、免注册）

①本文档内容版权归属内容提供方。如果您对本资料有版权申诉，请及时联系我方进行处理（联系方式详见页脚）。
②由于网络或浏览器兼容性等问题导致下载失败，请加客服微信处理（详见下载弹窗提示），感谢理解。
③本资料由其他用户上传，本站不保证质量、数量等令人满意，若存在资料虚假不完整，请及时联系客服投诉处理。
④本站仅收取资料上传人设置的下载费中的一部分分成，用以平摊存储及运营成本。本站仅为用户提供资料分享平台，且会员之间资料免费共享（平台无费用分成），不提供其他经营性业务。

投稿会员：匿名用户

Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation.pdf

推荐内容/By 规范库

手机扫码、免注册、直接登录