An Unsupervised Deep Learning Method for Motion Tracking on Cardiac Tagging Magnetic Resonance Images.pdf

Cardiac,deep,Images,learning,Magnetic,Method,Motion,pdf,Resonance,Tagging,Tracking,Unsupervised,计算机与AI
文档页数:11
文档大小:7.17MB
文档格式:pdf
文档分类:计算机与AI
上传会员:
上传日期:
最后更新:

DeepTag:AnUnsupervisedDeepLearningMethod forMotionTrackingon CardiacTaggingMagnetic ResonanceImages

Zhennan Yan^ Qiaoying Huang Leon Axel? Dimitris Metaxas Meng Ye Mikael Kanski² Dong Yang². Qi Chang

SenseBrain and Shanghai Al Laboratory and Centre for Perceptual and Interactive Intellgience Rutgers University New York University School of Medicine NVIDIA {my389 qc58 qh55 dnm}écs.

Abstract

Cardiac ragging magneric resonance imaging (I-MRI) isthe gold stondard for regional myocardium deformarion andcordliac strain estimation. However this technique has not been widely used in clinical dicgnosis as α resuelt of the dif-In this ppe we prpose α novel dep lerning-based fully ficulry of motion tracking encountered with 1-MRI images.unsuxpervised method for in vivo motion tracking on 1-MRIcry wo consecutive 1-MRI frames by α bi-directional gen- images. We first estimate the motion field (INF) berweenthis resul we then estimate the Lograngian motion field erative diffeomorphic registration neural nerwork. Usingbetween the reference frame and any other frame thuroughinformation to perfom reasonable estimations on spatio- α differentiable position laryer. By utilizing temporolful solution for motion tracking and image registration in Iemporal motion fields this novel method provides α use-dated ona representive clinical MRI daase; the exper- dynamic medical imaging. Our merthod has been vali-imental resalts show that our method is superior fo con-Iracking accuracy and inference efficiency. Project page ventional motion tracking methods in terms of landmcrkis at: tagging_motion_estimation.

axis views) of cardiac MRI. (a) Tagging images. Number uner Figure 1. Standard scan views (2- 3 4-chamber views and short-the figure means percentage of one cardiac cycle. (b) End-diastole(ED) phase of cine images. Red and green contours depict the epi- and endo-cardial borders of left ventricle (LV) myocardiumleft atrium. RA: right atrium. (MYO) wall. Blue contour depicts the right ventricle (RV). LA:

tion field of the whole heart wall typically we need to scanseveral slices in long axis (2- 3- 4-chamber) views and short-axis (SAX) views as shown in Fig. 1. There are twokinds of dynamic imaging: conventional (untagged) cineMR imaging and tagging imaging (t-MRI) [1]. For un- tagged cine images (most recent work has focused on theseimages) feature tracking can be used to estimate myocar- dial motion [22 35 40 57 55 54]. However as shownin Fig. 1 (b) due to the relatively uniform signal in the my-ocardial wall and the lack of reliable identifiable landmarks the estimated motion cannot be used as a reliable indicatorfor linical diagnosis. In contrast t-MRI provides the gold standard imaging method for regional myocardial motionquantification and strain estimation. The t-MRI data is pro-duced by a specially designed magnetic preparation module

1. Introduction

Cardiac magnetic resonance imaging (MRI) provides anon-invasive way to evaluate the morphology and functionof the heart from the imaging data. Specifically dynamic cine imaging which generates a 2D image sequence toof heart motion. Due to the long imaging time and breath- cover a full cardiac cycle can provide direct informationholding requirements the clinical cardiac MRI imagingprotocols are still 2D sequences. To recover the 3D mo-

It takes one t-MRI image sequence (usually a 2D video) asinput and outputs a 2D motion field over time. The motion field is a 2D dense field depicting the non-rigid deforma-tion of the LV MYO wall. The image sequence covers a fullcardiac cycle. It starts from the end diastole (ED) phase at which the ventricle begins to contract then to the maximumcontraction at end systole (ES) phase and back to relaxation to ED phase as shown in Fig. 1. Typically we set a refer-ence frame as the ED phase and track the motion on anyother later frame relative to the reference one. For t-MRI motion tracking previous work was mainly based n phase optical flow and conventional non-rigid image registration.

called spatial modulation of magnetization (SPAMM) [5].like darker tag patterns embedded in relatively brighter my- It introduces the intrinsic tissue markers which are stripe-ocardium as shown in Fig. 1 (a). By tracking the defor-the imaging plane and recover magnetization which non- mation of tags we can retrieve a 2D displacement field inm e m s e sean ais

Although it has been widely accepted as the gold stan-quantification tMRI has largely remained only a research dard imaging modality for regional myocardium motiontool and has not been widely used in clinical practice.ssd-od s The principal challenge (detailed analysis in SupplementaryImage appearance changes greatly over a cardiac cycle and which could be principally attributed to the following: (1)tag signal fades on the later frames as shown in Fig. 1 (a).(2) Motion artifacts can degrade images. (3) Other arti- facts and noise can reduce image quality. To tackle theseproblems in this work we propose a novel dep learning-t-MRI images. The method has no annotation requirement based unsupervised method to estimate tag deformations onduring training so with more training data are collected our method can learm to predict more accurate cardiac de-formation motion fields with minimal increased effort. Inour method we first track the motion field in between two consecutive frames using a bi-directional generative diffeo-morphic registration network. Based on these initial mo- tion field estimations we then track the Lagrangian motionfield between the reference frame and any other frame by a position layer. The position layer is differentiable so it can update the learning parameters of the registrationnetwork with a global Lagrangian motion constraint thus achieving a reasonable putation of motion fields.

2.1. Phase-based Method

resentative one for t-MRI image motion tracking [37 38 Harmonic phase (HARP) based method is the most rep-to spectral peaks in theFourier domain of theimage. Isola- 28 27 17]. Periodic tags in the image domain corresponding the first harmonic peak region by a bandpass filter andperforming an inverse Fourier transform of the selected re- gion yields a plex harmonic image. The phase map ofthe plex image is the HARP image which could be usedfor motion tracking since the harmonic phase of a material point is a time-invariant physics property for simple trans-pixel through time one can track the position and by exten- lation. Thus by tracking the harmonic phase vector of eachsion the displacement of each pixel along time. However pue Sueds Se jo suoeuea [eoo uonou seipreo oi onporientation at different frames may lead to erroneous phasereconstructed phase map which also happens at boundaries estimation when using HARP such as bifurcations in the Extending HARP Gabor flters are used to refine phase map and in large deformation regions of the myocardium [28].estimation by changing the filter parameters according todiferent tag pattems in the image domain [13 50 39]. the local tag spacing and orientation to automatically match

lows: (1) We peopose a novel unsupervised method for t- Our contributions could be summarized briefly as fol-MRI motion tracking which can achieve a high accuracy ofdirectionaldiffeomorphic mage registration network which performance n a fast nference speed. 2) We propose a bi-the transfomation in which the likelihood of the warped could guarantee topology preservation and invertibility ofimage is modeled as a Boltzmann distribution and a nor-robust performance on image intensity time-variant regis- malized cross correlation metric is incorporated in it for itstration problems. (3) We propose a scheme to depose the Lagrangian motion between the reference and any therframe into sums of consecutive frame motions and then im-back intothe Lagrangian motion and posing a global motion prove the estimation of these motions by posing themconstraint.

2.2. Optical Flow Approach

While HARP exploits specificity of quasiperiodic t-MRIthe optical flow (OF) based method is generic and canbe ap- plied to track objects in video sequences [18 8 7 32 52].OF can estimate a dense motion field based on the basic eo os sq jo s varying image regions with motion at least for a very shorttime interval. The under-determined OF constraint equa- tion is solved by variational principles in which some otherageraitasrlckhnt regularization constraints are added in including the im-have been made to seek more accurate regularization terms OF approaches lack accuracy especially for t-MRI motion tracking due to the tag fading and large deformation prob-lems [11 49]. More recently convolutional neural networks(CNN) are trained to predict OF [16 19 20 24 26 41 31

2.Background

Regional myocardium motion quantification mainly fo-cuses on the left ventricle (LV) myocardium (MYO) wall.

47 53 51 48]. However most of these works were su-training which is nearly impossble to obtain for medical pervised methods with the need of a ground truth OF forimages.

2.3. Image Registration-based Method

been used to estimate the deformation of the myocardium Conventional non-rigid image registration methods haveistration schemes are formulated as an optimization pro- for a long time [46 43 30 12 34 25]. Non-rigid reg-cedure that maximizes a similarity criterion between theoptimal transformation. Transformation models could be fixed image and the transformed moving image to find theparametric models including B-spline free-form deforma-variational method. Similarity criteria are generally cho- tion [46 34 12] and non-parametric models including thesen such as mutual information and generalized informa- o iaean ae sapo s go [t] snse uomized which is time consuming.

Figure 2. Interframe (INF) motion @ and Lagrangian motion Φb.

Figure 3. An overview of our scheme for regional myocardium motion tracking on t-MRI image sequences. Φ: Interframe (INF)motion field between consecutive image pairs. : Lagrangian mo- tion field between the first frame and any other later frame.

plied to medical image registration and motion tracking. Recently deep learning-based methods have been ap-racy with conventional registration methods. Among those They are fast and have achieved at least parable accu-approaches supervised methods [42] require ground truthdeformation fields which are usually synthetic. Registra- tion accuracy thus will be limited by the quality of syn-thetic ground truth. Unsupervised methods [9 10 23 22 56 15 6 14 36 44 45 33] learn the deformation field by awarped moving image. Unsupervised methods have been loss function of the similarity between the fixed image andextended to cover deformable and diffeomorphic models.Deformable models [6 9 10] aim to learn the single direc- tional deformation field from the fixed image to the movingtionary velocity field (SVF) and integrate the SVFby a scal- image. Diffeomorphic models [14 22 33 45] learn the sta-ing and squaring layer to get thediffeomorphic defomationferentiable and invertible which ensures one-to-one map- field [14]. A deformation field with diffeomorphism is dif-propose to use a bi-directional diffeomorphic registration ping and preserves topology. Inspired by these works wenetwork to track motions on t-MRI images.

In a N frames sequence we only record the finite posi-t = n1 2 the displacement can be shown pic- tions X(n = 0 1 N 1) of m. In a time interval call the interframe (INF) motion. A set of INF motions{r(x1)( =0. 1...n 2)} will repose the motionvector Po(n1) which we call the Lagrangian motion.frames is small if the time interval t is small net La-grangian motion Po(n1) however could be very large in some frames of the sequence. For motion tracking as wederive the Lagrangian motion P(n1) on any other later set the first frame as the reference frame our task is toon the associated frame pairs but for large motion the frame t = n 1. It is possible to directly track it basedtracking result (1) could drift a lot. In a cardiac cy- (g-2)(n1) ≤ Po(n1) deposing o(n-1) cle for a given frame t = n 1 since the amplitudeinto (x-1)(r =0 1.. n 2) tracking Pr(r1) atfirst then posing them back to Pu(1) will makemotion tracking results on t-MRI images. sense. In this work we follow this idea to obtain accurate

3.Method

We propose an unsupervised leaming method based ondeep learming to track dense motion fields of objects that change over time. Although our method can be easily ex-tended to other motion tracking tasks without loss of gen-motion tracking. erality the design focus of the proposed method is t-MRI

3.2. Motion Tracking on A Time Sequence

Fig. 3 shows our scheme for myocardium motion track-ing through time on a t-MRI image sequence. We first estimate the INF motion field Φ between two consecu-tive frames by a bi-directional diffeomorphic registrationnetwork as shown in Fig. 4. Once all the INF motion

3.1. Motion Deposition and Reposition

As shown in Fig. 2 for a material point m which movesfrom position Xo at time to we have its trajectory X-

Figure 4. An overview of our proposed bi-directional forwad-backward generative diffeomorphic registration network.

fields are obtained in the full time sequence we posethem as the Lagrangian motion field dΦ which is shown insition X1 on an arbitrary frame moved from the posi- Fig. 5. Motion tracking is achieved by predicting the po-tion Xg on the first frame with the estimated Lagrangianmotion field: X1 = Po(1) (Xa). In our method mo- tion position is implemented by a differentiable -registration network such a differentiable layer can back- position layer C as depicted in Fig. 6. When training thepropagate the similarity loss between the warped reference image by Lagrangian motion field and any other laterframe image as a global constraint and then update the pa-rameters of the registration net which in tum guarantees a reasonable INF motion field Φ estimation.

3.3.Bi-Directional Forward-Backward Generative Diffeomorphic Registration Network

backward diffeomorphic registration network to estimate As shown in Fig. 4 we use a bi-directional forward-the INF motion field . Our network is modeled as a gen-erative stochastic variational autoencoder (VAE) [21]. Let and y be a 2D image pair and let ≥ be a latent variablethat parameterizes the INF motion field : R → R2. Fol- lowing the methodology of a VAE we assume that the priorp(≥) is a multivariate Gaussian distribution with zero meanand covariance :

The latent variable ≥ could be applied to a wide range of representations for image registration. In our work in or-der to obtain a diffeomorphism we let ≥ be a SVF whichis generated as the path of diffeomorphic deformation field g(t) parametrized by t [0 1] as follows:

( = ≥) and = Id is an identity tansformation. We where o is a position operator v is the velocity fieldfollow [2 3 14 33] to integrate the SVF over time t =[euy au ujeqo on (SS) ake| Suuenbs pue Suqeos e Xq [1 °0]

(1)

(2)

motion field (1) at time t = 1. Specifically starting fromusing the recrenee (1/2) (1/2 (1/2) we g(1/27) q “uoeoo eeds e s d aqm /(d)a d =cancmute 1/2)/2)Iorexperents T = 7 which is chosen so that v(p)/27 is small enough. With the latent variable we can pute the motion fieldΦ by the SS layer. We then use a spatial transform layerof the warped image z o Φ which could be a Gaussian to warp image a by Φ and we obtain a noisy observationdistribution:

where y denotes the observation of warped image a ² describes the variance of additive image noise. We call the process of warping image z towards y as the forward reg-istration.

Our goal is to estimate the posterior probabilistic distri-likely motion field for a new image pair (a g) via maxi- bution p(≥g; ac) for registration so that we obtain the most mum a posteriori estimation. However directly putingthis posterior is intractable. Altermatively we can use a vari- xddeeoppo parametrized by a fully convolutional neural network (FCN) ate normal posterior probabilistic distribution g (≥|gy: z)module as:

where we let the FCN learn the mean μs)z and diago-nal covariance y of the posterior probabilistic distri- bution g(≥|g; ±z). When training the network we imple-ment a layer that samples a new latent variable ≥; usingthe reparameterization trick: = μs)z. =x.y where ~N(0 I).

between q (=≥|gy;:az) and p(=|g; az) which leads to maxi- To leam parameters we minimize the KL divergencemizing the evidence lower bound (ELBO) [21] of the logmarginalized likelihood Iog p(3|ar) as follows (detailed derivation in Supplementary Material):

In Eq. (5) the second term E [log p(|≥;±z)] is calledthe reconstruction loss term in a VAE model. While wecan model the distribution of p(g|≥; ar) as a Gaussian as in Eq. (3) which is equivalent to using a sum-of-squared dif-warped image a and the observed y in this work we in- ference (SSD) metric to measure the similarity between thestead use a normalized local cross-correlation (NCC) met-ric due to its robustness properties and superior results es-

(3)

(4)

= mjn CC[9(≥|g:α)lr(≥)] E[log p(g|≥;a)](5)

qd e e a s o olems [4 29]. NCC of an image pair I and J is defined as:

tion prespectively calculated ina w window f centered at where I(p) and J(p) are the local mean of I and J at posi-p. In our experiments we set u = 9. A higher NCC indi-cates a better alignment so the similarity loss between I and J could be: Csm (I J) = NCC(I J). Thus we adopt the following Boltzmann distribution to model p(g|≥; az) as:

where is a negative scalar hyperparameter. Finally weformulate the loss function as:

where D is the graph degree matrx defined on the 2D im-age pixel grid and K is the number of samples used to ap-proximate the expectation with K = 1 in our experiments. We let L = D - A be the Laplacian of a neighborhoodgraph defined on the pixel grid where A is a pixel eighbor- hood adjacency matrix. To encourage the spatial smooth-ness of SVF we set A = = 入L [14] where 入 is a parameter controlling the scale of the SVF ≥.

With the SVF representation we can also pute an in-nqpg SS(). Thus we can warp image y towards imagebution of warped image y: p(z|≥: y). We minimize the KL (the backward registration) and get the observation distri-divergence between q (=|a; g) and p(=|a; g) which leadsto maximizing the ELBO of the log marginalized likelihood Iog p(ar|g) (see supplementary material for detailed deriva-tion). In this way we can add the backward KL loss term into the forward KL loss term and get:

(6)

(7)

(8)

(9)

to Lagrangian motion field b. *"W" means “warp". Figure 5. A position layer C that transforms INF motion field

Figure 6. (a) The differentiable position layer C. (b) INF mo-tion field Φ interpolation at the new tracked position p'.

spμ[])² where N(i) are the neighbors of pixel i. While thisis an implicit smoothness of the motion field we also en- force the explicit smoothness of the motion field Φ by pe-nalizing its gradients: Cmth(Φ) = I

Such a bi-directional regisration architeture not onlyalso provides a path for the inverse consistency of the pre- enforces the invertibility of the estimated motion field butdicted motion field. Since the tags fade in later frames in alem we need this forward-backward constraint to obtain a cardiac cycle and there exists a through-plane motion prob-more reasonable motion tracking result.

3.4. Global Lagrangian Motion Constraints

age sequence we design a differentiable position layer After we get all the INF motion fields in a t-MRI im-as shown in Fig. 5. From Fig. 2 we can get Pu = o C to repose them as the Lagrangian motion field Φ a[n-1)=Pp(n-2) (n-2)(-1)(n > 2)).However P(2)X)could be a sub-pixel locationand bese as Fig. 6 (b) shows the new position p’ = X2INF motion field values are only defined at integer loca-tions we linearly interpolate the values between the four

资源链接请先登录(扫码可直接登录、免注册)
①本文档内容版权归属内容提供方。如果您对本资料有版权申诉,请及时联系我方进行处理(联系方式详见页脚)。
②由于网络或浏览器兼容性等问题导致下载失败,请加客服微信处理(详见下载弹窗提示),感谢理解。
③本资料由其他用户上传,本站不保证质量、数量等令人满意,若存在资料虚假不完整,请及时联系客服投诉处理。

投稿会员:匿名用户
我的头像

您必须才能评论!

手机扫码、免注册、直接登录

 注意:QQ登录支持手机端浏览器一键登录及扫码登录
微信仅支持手机扫码一键登录

账号密码登录(仅适用于原老用户)