W-net:用于2D医学图像分割的桥接U-net + 对真实照片的卷积盲去噪 + Sem-GAN:语义一致的图像到图像转换

W-net: Bridged U-net for 2D Medical Image Segmentation

Wanli Chen, Yue Zhang, Junjun He, Yu Qiao, Yifan Chen, Hongjian Shi, Xiaoying Tang

In this paper, we focus on three problems in deep learning based medical image segmentation. Firstly, U-net, as a popular model for medical image segmentation, is difficult to train when convolutional layers increase even though a deeper network usually has a better generalization ability because of more learnable parameters. Secondly, the exponential ReLU (ELU), as an alternative of ReLU, is not much different from ReLU when the network of interest gets deep. Thirdly, the Dice loss, as one of the pervasive loss functions for medical image segmentation, is not effective when the prediction is close to ground truth and will cause oscillation during training. To address the aforementioned three problems, we propose and validate a deeper network that can fit medical image datasets that are usually small in the sample size. Meanwhile, we propose a new loss function to accelerate the learning process and a combination of different activation functions to improve the network performance. Our experimental results suggest that our network is comparable or superior to state-of-the-art methods. [1807.04459v1]

 

Visual Reinforcement Learning with Imagined Goals

Ashvin Nair, Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine

For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires. Furthermore, to provide the requisite level of generality, these skills must handle raw sensory input such as images. In this paper, we propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies. Since the particular goals that might be required at test-time are not known in advance, the agent performs a self-supervised “practice” phase where it imagines goals and attempts to achieve them. We learn a visual representation with three distinct purposes: sampling goals for self-supervised practice, providing a structured transformation of raw sensory inputs, and computing a reward signal for goal reaching. We also propose a retroactive goal relabeling scheme to further improve the sample-efficiency of our method. Our off-policy algorithm is efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques. [1807.04742v1]

 

Learning Product Codebooks using Vector Quantized Autoencoders for Image Retrieval

Hanwei Wu, Markus Flierl

The Vector Quantized-Variational Autoencoder (VQ-VAE) provides an unsupervised model for learning discrete representations by combining vector quantization and autoencoders. The VQ-VAE can avoid the issue of “posterior collapse” so that its learned discrete representation is meaningful. In this paper, we incorporate the product quantization into the bottleneck stage of VQ-VAE and propose an end-to-end unsupervised learning model for the image retrieval tasks. Compared to the classic vector quantization, product quantization has the advantage of generating large size codebook and fast retrieval can be achieved by using the lookup tables that store the distance between every two sub-codewords. In our proposed model, the product codebook is jointly learned with the encoders and decoders of the autoencoders. The encodings of query and database images can be generated by feeding the images into the trained encoder and learned product codebook. The experiment shows that our proposed model outperforms other state-of-the-art hashing and quantization methods for image retrieval. [1807.04629v1]

 

Competitive Analysis System for Theatrical Movie Releases Based on Movie Trailer Deep Video Representation

Miguel Campo, Cheng-Kang Hsieh, Matt Nickens, JJ Espinoza, Abhinav Taliyan, Julie Rieger, Jean Ho, Bettina Sherick

Audience discovery is an important activity at major movie studios. Deep models that use convolutional networks to extract frame-by-frame features of a movie trailer and represent it in a form that is suitable for prediction are now possible thanks to the availability of pre-built feature extractors trained on large image datasets. Using these pre-built feature extractors, we are able to process hundreds of publicly available movie trailers, extract frame-by-frame low level features (e.g., a face, an object, etc) and create video-level representations. We use the video-level representations to train a hybrid Collaborative Filtering model that combines video features with historical movie attendance records. The trained model not only makes accurate attendance and audience prediction for existing movies, but also successfully profiles new movies six to eight months prior to their release. [1807.04465v1]

 

Adding Attentiveness to the Neurons in Recurrent Neural Networks

Pengfei Zhang, Jianru Xue, Cuiling Lan, Wenjun Zeng, Zhanning Gao, Nanning Zheng

Recurrent neural networks (RNNs) are capable of modeling the temporal dynamics of complex sequential information. However, the structures of existing RNN neurons mainly focus on controlling the contributions of current and historical information but do not explore the different importance levels of different elements in an input vector of a time slot. We propose adding a simple yet effective Element-wiseAttention Gate (EleAttG) to an RNN block (e.g., all RNN neurons in a network layer) that empowers the RNN neurons to have the attentiveness capability. For an RNN block, an EleAttG is added to adaptively modulate the input by assigning different levels of importance, i.e., attention, to each element/dimension of the input. We refer to an RNN block equipped with an EleAttG as an EleAtt-RNN block. Specifically, the modulation of the input is content adaptive and is performed at fine granularity, being element-wise rather than input-wise. The proposed EleAttG, as an additional fundamental unit, is general and can be applied to any RNN structures, e.g., standard RNN, Long Short-Term Memory (LSTM), or Gated Recurrent Unit (GRU). We demonstrate the effectiveness of the proposed EleAtt-RNN by applying it to the action recognition tasks on both 3D human skeleton data and RGB videos. Experiments show that adding attentiveness through EleAttGs to RNN blocks significantly boosts the power of RNNs. [1807.04445v1]

 

Toward Convolutional Blind Denoising of Real Photographs

Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, Lei Zhang

Despite their success in Gaussian denoising, deep convolutional neural networks (CNNs) are still very limited on real noisy photographs, and may even perform worse than the representative traditional methods such as BM3D and K-SVD. In order to improve the robustness and practicability of deep denoising models, this paper presents a convolutional blind denoising network (CBDNet) by incorporating network architecture, noise modeling, and asymmetric learning. Our CBDNet is comprised of a noise estimation subnetwork and a denoising subnetwork, and is trained using a more realistic noise model by considering both signal-dependent noise and in-camera processing pipeline. Motivated by the asymmetric sensitivity of non-blind denoisers (e.g., BM3D) to noise estimation error, the asymmetric learning is presented on the noise estimation subnetwork to suppress more on under-estimation of noise level. To make the learned model applicable to real photographs, both synthetic images based on realistic noise model and real noisy photographs with nearly noise-free images are incorporated to train our CBDNet. The results on three datasets of real noisy photographs clearly demonstrate the superiority of our CBDNet over the state-of-the-art denoisers in terms of quantitative metrics and visual quality. The code and models will be publicly available at https://github.com/GuoShi28/CBDNet. [1807.04686v1]

 

LandmarkBoost: Efficient Visual Context Classifiers for Robust Localization

Marcin Dymczyk, Igor Gilitschenski, Juan Nieto, Simon Lynen, Bernhard Zeisl, Roland Siegwart

The growing popularity of autonomous systems creates a need for reliable and efficient metric pose retrieval algorithms. Currently used approaches tend to rely on nearest neighbor search of binary descriptors to perform the 2D-3D matching and guarantee realtime capabilities on mobile platforms. These methods struggle, however, with the growing size of the map, changes in viewpoint or appearance, and visual aliasing present in the environment. The rigidly defined descriptor patterns only capture a limited neighborhood of the keypoint and completely ignore the overall visual context. We propose LandmarkBoost – an approach that, in contrast to the conventional 2D-3D matching methods, casts the search problem as a landmark classification task. We use a boosted classifier to classify landmark observations and directly obtain correspondences as classifier scores. We also introduce a formulation of visual context that is flexible, efficient to compute, and can capture relationships in the entire image plane. The original binary descriptors are augmented with contextual information and informative features are selected by the boosting framework. Through detailed experiments, we evaluate the retrieval quality and performance of LandmarkBoost, demonstrating that it outperforms common state-of-the-art descriptor matching methods. [1807.04702v1]

 

Video Saliency Detection by 3D Convolutional Neural Networks

Guanqun Ding, Yuming Fang

Different from salient object detection methods for still images, a key challenging for video saliency detection is how to extract and combine spatial and temporal features. In this paper, we present a novel and effective approach for salient object detection for video sequences based on 3D convolutional neural networks. First, we design a 3D convolutional network (Conv3DNet) with the input as three video frame to learn the spatiotemporal features for video sequences. Then, we design a 3D deconvolutional network (Deconv3DNet) to combine the spatiotemporal features to predict the final saliency map for video sequences. Experimental results show that the proposed saliency detection model performs better in video saliency prediction compared with the state-of-the-art video saliency detection methods. [1807.04514v1]

 

Learning to Segment Medical Images with Scribble-Supervision Alone

Yigit B. Can, Krishna Chaitanya, Basil Mustafa, Lisa M. Koch, Ender Konukoglu, Christian F. Baumgartner

Semantic segmentation of medical images is a crucial step for the quantification of healthy anatomy and diseases alike. The majority of the current state-of-the-art segmentation algorithms are based on deep neural networks and rely on large datasets with full pixel-wise annotations. Producing such annotations can often only be done by medical professionals and requires large amounts of valuable time. Training a medical image segmentation network with weak annotations remains a relatively unexplored topic. In this work we investigate training strategies to learn the parameters of a pixel-wise segmentation network from scribble annotations alone. We evaluate the techniques on public cardiac (ACDC) and prostate (NCI-ISBI) segmentation datasets. We find that the networks trained on scribbles suffer from a remarkably small degradation in Dice of only 2.9% (cardiac) and 4.5% (prostate) with respect to a network trained on full annotations. [1807.04668v1]

 

Deep semi-supervised segmentation with weight-averaged consistency targets

Christian S. Perone, Julien Cohen-Adad

Recently proposed techniques for semi-supervised learning such as Temporal Ensembling and Mean Teacher have achieved state-of-the-art results in many important classification benchmarks. In this work, we expand the Mean Teacher approach to segmentation tasks and show that it can bring important improvements in a realistic small data regime using a publicly available multi-center dataset from the Magnetic Resonance Imaging (MRI) domain. We also devise a method to solve the problems that arise when using traditional data augmentation strategies for segmentation tasks on our new training scheme. [1807.04657v1]

 

Robustness Analysis of Pedestrian Detectors for Surveillance

Yuming Fang, Guanqun Ding, Yuan Yuan, Weisi Lin, Haiwen Liu

To obtain effective pedestrian detection results in surveillance video, there have been many methods proposed to handle the problems from severe occlusion, pose variation, clutter background, \emph{etc}. Besides detection accuracy, a robust surveillance video system should be stable to video quality degradation by network transmission, environment variation, \emph{etc}. In this study, we conduct the research on the robustness of pedestrian detection algorithms to video quality degradation. The main contribution of this work includes the following three aspects. First, a large-scale Distorted Surveillance Video Data Set (\emph{DSurVD}) is constructed from high-quality video sequences and their corresponding distorted versions. Second, we design a method to evaluate detection stability and a robustness measure called \emph{Robustness Quadrangle}, which can be adopted to visualize detection accuracy of pedestrian detection algorithms on high-quality video sequences and stability with video quality degradation. Third, the robustness of seven existing pedestrian detection algorithms is evaluated by the built \emph{DSurVD}. Experimental results show that the robustness can be further improved for existing pedestrian detection algorithms. Additionally, we provide much in-depth discussion on how different distortion types influence the performance of pedestrian detection algorithms, which is important to design effective pedestrian detection algorithms for surveillance. [1807.04562v1]

 

Deep Learning for Imbalance Data Classification using Class Expert Generative Adversarial Network

Fanny, Tjeng Wawan Cenggoro

Without any specific way for imbalance data classification, artificial intelligence algorithm cannot recognize data from minority classes easily. In general, modifying the existing algorithm by assuming that the training data is imbalanced, is the only way to handle imbalance data. However, for a normal data handling, this way mostly produces a deficient result. In this research, we propose a class expert generative adversarial network (CE-GAN) as the solution for imbalance data classification. CE-GAN is a modification in deep learning algorithm architecture that does not have an assumption that the training data is imbalance data. Moreover, CE-GAN is designed to identify more detail about the character of each class before classification step. CE-GAN has been proved in this research to give a good performance for imbalance data classification. [1807.04585v1]

 

Subsampled Turbulence Removal Network

Wai Ho Chak, Chun Pong Lau, Lok Ming Lui

We present a deep-learning approach to restore a sequence of turbulence-distorted video frames from turbulent deformations and space-time varying blurs. Instead of requiring a massive training sample size in deep networks, we purpose a training strategy that is based on a new data augmentation method to model turbulence from a relatively small dataset. Then we introduce a subsampled method to enhance the restoration performance of the presented GAN model. The contributions of the paper is threefold: first, we introduce a simple but effective data augmentation algorithm to model the turbulence in real life for training in the deep network; Second, we firstly purpose the Wasserstein GAN combined with $\ell_1$ cost for successful restoration of turbulence-corrupted video sequence; Third, we combine the subsampling algorithm to filter out strongly corrupted frames to generate a video sequence with better quality. [1807.04418v1]

 

Sem-GAN: Semantically-Consistent Image-to-Image Translation

Anoop Cherian, Alan Sullivan

Unpaired image-to-image translation is the problem of mapping an image in the source domain to one in the target domain, without requiring corresponding image pairs. To ensure the translated images are realistically plausible, recent works, such as Cycle-GAN, demands this mapping to be invertible. While, this requirement demonstrates promising results when the domains are unimodal, its performance is unpredictable in a multi-modal scenario such as in an image segmentation task. This is because, invertibility does not necessarily enforce semantic correctness. To this end, we present a semantically-consistent GAN framework, dubbed Sem-GAN, in which the semantics are defined by the class identities of image segments in the source domain as produced by a semantic segmentation algorithm. Our proposed framework includes consistency constraints on the translation task that, together with the GAN loss and the cycle-constraints, enforces that the images when translated will inherit the appearances of the target domain, while (approximately) maintaining their identities from the source domain. We present experiments on several image-to-image translation tasks and demonstrate that Sem-GAN improves the quality of the translated images significantly, sometimes by more than 20% on the FCN score. Further, we show that semantic segmentation models, trained with synthetic images translated via Sem-GAN, leads to significantly better segmentation results than other variants. [1807.04409v1]

 

A Reflectance Based Method For Shadow Detection and Removal

Sri Kalyan Yarlagadda, Fengqing Zhu

Shadows are common aspect of images and when left undetected can hinder scene understanding and visual processing. We propose a simple yet effective approach based on reflectance to detect shadows from single image. An image is first segmented and based on the reflectance, illumination and texture characteristics, segments pairs are identified as shadow and non-shadow pairs. The proposed method is tested on two publicly available and widely used datasets. Our method achieves higher accuracy in detecting shadows compared to previous reported methods despite requiring fewer parameters. We also show results of shadow-free images by relighting the pixels in the detected shadow regions. [1807.04352v1]

 

Deepwound: Automated Postoperative Wound Assessment and Surgical Site Surveillance through Convolutional Neural Networks

Varun Shenoy, Elizabeth Foster, Lauren Aalami, Bakar Majeed, Oliver Aalami

Postoperative wound complications are a significant cause of expense for hospitals, doctors, and patients. Hence, an effective method to diagnose the onset of wound complications is strongly desired. Algorithmically classifying wound images is a difficult task due to the variability in the appearance of wound sites. Convolutional neural networks (CNNs), a subgroup of artificial neural networks that have shown great promise in analyzing visual imagery, can be leveraged to categorize surgical wounds. We present a multi-label CNN ensemble, Deepwound, trained to classify wound images using only image pixels and corresponding labels as inputs. Our final computational model can accurately identify the presence of nine labels: drainage, fibrinous exudate, granulation tissue, surgical site infection, open wound, staples, steri strips, and sutures. Our model achieves receiver operating curve (ROC) area under curve (AUC) scores, sensitivity, specificity, and F1 scores superior to prior work in this area. Smartphones provide a means to deliver accessible wound care due to their increasing ubiquity. Paired with deep neural networks, they offer the capability to provide clinical insight to assist surgeons during postoperative care. We also present a mobile application frontend to Deepwound that assists patients in tracking their wound and surgical recovery from the comfort of their home. [1807.04355v1]

 

A Generic Approach to Lung Field Segmentation from Chest Radiographs using Deep Space and Shape Learning

Awais Mansoor, Juan J. Cerrolaza, Geovanny Perez, Elijah Biggs, Kazunori Okada, Gustavo Nino, Marius George Linguraru

Computer-aided diagnosis (CAD) techniques for lung field segmentation from chest radiographs (CXR) have been proposed for adult cohorts, but rarely for pediatric subjects. Statistical shape models (SSMs), the workhorse of most state-of-the-art CXR-based lung field segmentation methods, do not efficiently accommodate shape variation of the lung field during the pediatric developmental stages. The main contributions of our work are: (1) a generic lung field segmentation framework from CXR accommodating large shape variation for adult and pediatric cohorts; (2) a deep representation learning detection mechanism, \emph{ensemble space learning}, for robust object localization; and (3) \emph{marginal shape deep learning} for the shape deformation parameter estimation. Unlike the iterative approach of conventional SSMs, the proposed shape learning mechanism transforms the parameter space into marginal subspaces that are solvable efficiently using the recursive representation learning mechanism. Furthermore, our method is the first to include the challenging retro-cardiac region in the CXR-based lung segmentation for accurate lung capacity estimation. The framework is evaluated on 668 CXRs of patients between 3 month to 89 year of age. We obtain a mean Dice similarity coefficient of $0.96\pm0.03$ (including the retro-cardiac region). For a given accuracy, the proposed approach is also found to be faster than conventional SSM-based iterative segmentation methods. The computational simplicity of the proposed generic framework could be similarly applied to the fast segmentation of other deformable objects. [1807.04339v1]

 

A Trilateral Weighted Sparse Coding Scheme for Real-World Image Denoising

Jun Xu, Lei Zhang, David Zhang

Most of existing image denoising methods assume the corrupted noise to be additive white Gaussian noise (AWGN). However, the realistic noise in real-world noisy images is much more complex than AWGN, and is hard to be modelled by simple analytical distributions. As a result, many state-of-the-art denoising methods in literature become much less effective when applied to real-world noisy images captured by CCD or CMOS cameras. In this paper, we develop a trilateral weighted sparse coding (TWSC) scheme for robust real-world image denoising. Specifically, we introduce three weight matrices into the data and regularisation terms of the sparse coding framework to characterise the statistics of realistic noise and image priors. TWSC can be reformulated as a linear equality-constrained problem and can be solved by the alternating direction method of multipliers. The existence and uniqueness of the solution and convergence of the proposed algorithm are analysed. Extensive experiments demonstrate that the proposed TWSC scheme outperforms state-of-the-art denoising methods on removing realistic noise. [1807.04364v1]

W-net:用于2D医学图像分割的桥接U-net

Wanli Chen, Yue Zhang, Junjun He, Yu Qiao, Yifan Chen, Hongjian Shi, Xiaoying Tang

在本文中,我们关注基于深度学习的医学图像分割中的三个问题。首先,作为医学图像分割的流行模型,U-net在卷积层增加时难以训练,即使更深的网络通常由于更多可学习的参数而具有更好的泛化能力。其次,作为ReLU的替代方案,指数ReLUELU)与感兴趣的网络深入时的ReLU没有太大差别。第三,作为医学图像分割的普遍损失函数之一的骰子损失在预测接近基础事实并且在训练期间将引起振荡时是无效的。为了解决上述三个问题,我们提出并验证了一个更深入的网络,该网络可以适合通常样本量较小的医学图像数据集。与此同时,我们提出了一种新的损失函数来加速学习过程,并结合不同的激活函数来提高网络性能。我们的实验结果表明,我们的网络与最先进的方法相当或更优越。[1807.04459v1]

 

用想象的目标进行视觉强化学习

Ashvin NairVitchyr PongMurtaza DalalShikhar BahlSteven LinSergey Levine

对于自主代理在测试时满足用户指定的各种目标,它必须能够学习广泛适用的和通用的技能组合。此外,为了提供必要的通用性,这些技能必须处理诸如图像之类的原始感官输入。在本文中,我们提出了一种算法,通过将无监督表示学习与目标条件策略的强化学习相结合来获得这种通用技能。由于在测试时可能需要的特定目标不是事先知道的,因此代理执行自我监督的实践阶段,在此阶段,它想象目标并尝试实现目标。我们学习了一个具有三个不同目的的视觉表现:自我监督实践的抽样目标,提供原始感官输入的结构化转换,并计算目标到达的奖励信号。我们还提出了一种追溯目标重新贴标签方案,以进一步提高我们方法的样本效率。我们的非政策算法足够有效,可以学习对原始图像观察和现实世界机器人系统目标进行操作的策略,并且大大优于现有技术。[1807.04742v1]

 

使用矢量量化自动编码器学习产品代码簿进行图像检索

吴汉伟,Markus Flierl

矢量量化变分自动编码器(VQ-VAE)通过组合矢量量化和自动编码器提供用于学习离散表示的无监督模型。VQ-VAE可以避免后塌陷的问题,因此其学习的离散表示是有意义的。在本文中,我们将产品量化纳入VQ-VAE的瓶颈阶段,并为图像检索任务提出端到端的无监督学习模型。与经典矢量量化相比,产品量化具有生成大尺寸码本的优点,并且可以通过使用存储每两个子码字之间的距离的查找表来实现快速检索。在我们提出的模型中,产品码本与自动编码器的编码器和解码器共同学习。可以通过将图像馈送到训练的编码器和学习的产品码本来生成查询和数据库图像的编码。实验表明,我们提出的模型优于其他最先进的用于图像检索的散列和量化方法。[1807.04629v1]

 

基于电影预告片深度视频表现的戏剧电影发行竞争分析系统

Miguel CampoCheng-Kang HsiehMatt NickensJJ EspinozaAbhinav TaliyanJulie RiegerJean HoBettina Sherick

观众发现是主要电影制片厂的重要活动。由于可以在大图像数据集上训练预先构建的特征提取器,因此现在可以使用卷积网络提取电影预告片的逐帧特征并以适合预测的形式表示它的深度模型。使用这些预先构建的特征提取器,我们能够处理数百个公开可用的电影预告片,提取逐帧低级特征(例如,面部,对象等)并创建视频级表示。我们使用视频级表示来训练混合协作过滤模型,该模型将视频功能与历史电影出勤记录相结合。经过训练的模型不仅可以为现有电影进行准确的出勤和观众预测,但在发布之前的六到八个月内,他们还成功地发布了新电影。[1807.04465v1]

 

在递归神经网络中增加神经元的注意力

Pengfei Zhang, Jianru Xue, Cuiling Lan, Wenjun Zeng, Zhanning Gao, Nanning Zheng

递归神经网络(RNN)能够对复杂顺序信息的时间动态建模。然而,现有RNN神经元的结构主要集中在控制当前和历史信息的贡献,而没有探索时隙的输入向量中不同元素的不同重要性级别。我们建议在RNN块(例如,网络层中的所有RNN神经元)中添加一个简单但有效的Element-wiseAttention GateEleAttG),使RNN神经元具有注意力。对于RNN块,添加EleAttG以通过向输入的每个元素/维度分配不同的重要性级别(即,注意力)来自适应地调制输入。我们将配备有EleAttGRNN块称为EleAtt-RNN块。特别,输入的调制是内容自适应的,并且以细粒度执行,是元素而不是输入。所提出的EleAttG作为附加的基本单元是通用的,并且可以应用于任何RNN结构,例如标准RNN,长短期存储器(LSTM)或门控循环单元(GRU)。我们通过将其应用于3D人体骨骼数据和RGB视频的动作识别任务来证明所提出的EleAtt-RNN的有效性。实验表明,通过EleAttGsRNN块增加注意力可以显着提高RNN的功效。[1807.04445v1] 长短期记忆(LSTM)或门控循环单位(GRU)。我们通过将其应用于3D人体骨骼数据和RGB视频的动作识别任务来证明所提出的EleAtt-RNN的有效性。实验表明,通过EleAttGsRNN块增加注意力可以显着提高RNN的功效。[1807.04445v1] 长短期记忆(LSTM)或门控循环单位(GRU)。我们通过将其应用于3D人体骨骼数据和RGB视频的动作识别任务来证明所提出的EleAtt-RNN的有效性。实验表明,通过EleAttGsRNN块增加注意力可以显着提高RNN的功效。[1807.04445v1]

 

对真实照片的卷积盲去噪

Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, Lei Zhang

尽管它们在高斯去噪方面取得了成功,但深度卷积神经网络(CNN)仍然在真实噪声照片上非常有限,甚至可能比BM3DK-SVD等代表性的传统方法表现更差。为了提高深度去噪模型的鲁棒性和实用性,本文提出了一种卷积盲去噪网络(CBDNet),它结合了网络架构,噪声建模和非对称学习。我们的CBDNet由噪声估计子网和去噪子网组成,并通过考虑信号相关噪声和相机内处理流水线,使用更逼真的噪声模型进行训练。受非盲目降噪器(例如BM3D)对噪声估计误差的不对称敏感性的影响,在噪声估计子网上呈现非对称学习,以更多地抑制噪声水平的低估。为了使学习的模型适用于真实照片,基于真实噪声模型的合成图像和具有几乎无噪声图像的真实噪声照片被合并以训练我们的CBDNet。真实嘈杂照片的三个数据集的结果清楚地证明了我们的CBDNet在定量指标和视觉质量方面优于现有技术的降噪器。代码和模型将在https://github.com/GuoShi28/CBDNet上公开。[1807.04686v1] 基于逼真噪声模型的合成图像和具有几乎无噪声图像的真实噪声照片被合并以训练我们的CBDNet。真实嘈杂照片的三个数据集的结果清楚地证明了我们的CBDNet在定量指标和视觉质量方面优于现有技术的降噪器。代码和模型将在https://github.com/GuoShi28/CBDNet上公开。[1807.04686v1] 基于逼真噪声模型的合成图像和具有几乎无噪声图像的真实噪声照片被合并以训练我们的CBDNet。真实嘈杂照片的三个数据集的结果清楚地证明了我们的CBDNet在定量指标和视觉质量方面优于现有技术的降噪器。代码和模型将在https://github.com/GuoShi28/CBDNet上公开。[1807.04686v1]

 

LandmarkBoost:用于稳健本地化的高效视觉上下文分类器

Marcin DymczykIgor GilitschenskiJuan NietoSimon LynenBernhard ZeislRoland Siegwart

自治系统的日益普及产生了对可靠和有效的度量姿势检索算法的需求。当前使用的方法倾向于依赖二进制描述符的最近邻搜索来执行2D-3D匹配并保证移动平台上的实时能力。然而,这些方法随着地图的大小增加,视点或外观的变化以及环境中存在的视觉混叠而变得困难。严格定义的描述符模式仅捕获关键点的有限邻域并完全忽略整体视觉上下文。我们提出LandmarkBoost–与传统的2D-3D匹配方法相比,这种方法将搜索问题作为一个具有里程碑意义的分类任务。我们使用增强分类器对地标观测进行分类,并直接获得作为分类器分数的对应关系。我们还介绍了一种灵活,高效计算的视觉上下文,并可以捕捉整个图像平面中的关系。原始二进制描述符用上下文信息增强,并且增强框架选择信息特征。通过详细的实验,我们评估了LandmarkBoost的检索质量和性能,证明它优于常见的最先进的描述符匹配方法。

三维卷积神经网络的视频显着性检测

Guanqun Ding, Yuming Fang

与静止图像的显着对象检测方法不同,视频显着性检测的关键挑战是如何提取和组合空间和时间特征。在本文中,我们提出了一种基于三维卷积神经网络的视频序列显着物体检测的新方法。首先,我们设计了一个3D卷积网络(Conv3DNet),其输入为三个视频帧,以学习视频序列的时空特征。然后,我们设计了一个3D反卷积网络(Deconv3DNet)来组合时空特征来预测视频序列的最终显着性图。实验结果表明,与现有技术的视频显着性检测方法相比,所提出的显着性检测模型在视频显着性预测方面表现更好。[1807.04514v1]

 

学习用Scribble-Supervision单独划分医学图像

Yigit B. CanKrishna ChaitanyaBasil MustafaLisa M. KochEnder KonukogluChristian F. Baumgartner

医学图像的语义分割是量化健康解剖学和疾病的关键步骤。目前大多数最先进的分割算法都基于深度神经网络,并依赖于具有完全像素注释的大型数据集。产生这样的注释通常只能由医疗专业人员完成,并且需要大量宝贵的时间。训练具有弱注释的医学图像分割网络仍然是一个相对未开发的主题。在这项工作中,我们研究了训练策略,以便仅从涂鸦注释中学习像素分割网络的参数。我们评估公共心脏(ACDC)和前列腺(NCI-ISBI)分割数据集的技术。我们发现,对于在完全注释训练的网络而言,在涂鸦上训练的网络遭受仅仅2.9%(心脏)和4.5%(前列腺)的骰子的非常小的退化。[1807.04668v1]

 

具有权重平均一致性目标的深度半监督分割

Christian S. PeroneJulien Cohen-Adad

最近提出的用于半监督学习的技术,例如时间集合和平均教师,已经在许多重要的分类基准中获得了最先进的结果。在这项工作中,我们将均值教师方法扩展到分割任务,并表明它可以使用来自磁共振成像(MRI)领域的公开可用的多中心数据集在现实的小数据体系中带来重要的改进。我们还设计了一种方法来解决在我们的新培训方案中使用传统数据增强策略进行分段任务时出现的问题。[1807.04657v1]

 

行人检测器的鲁棒性分析

Yuming Fang, Guanqun Ding, Yuan Yuan, Weisi Lin, Haiwen Liu

为了在监控视频中获得有效的行人检测结果,已经提出了许多方法来处理严重遮挡,姿势变化,杂乱背景,\ emph {}的问题。除了检测精度之外,强大的监控视频系统应该通过网络传输,环境变化,\ emph {}来稳定视频质量下降。在本研究中,我们对行人检测算法对视频质量下降的鲁棒性进行了研究。这项工作的主要贡献包括以下三个方面。首先,大规模失真监视视频数据集(\ emph {DSurVD})由高质量视频序列及其相应的失真版本构成。第二,我们设计了一种评估检测稳定性的方法和一种称为\ emph {Robustness Quadrangle}的鲁棒性度量,可用于可视化行人检测算法对高质量视频序列的检测精度和视频质量下降的稳定性。第三,通过内置\ emph {DSurVD}评估七种现有行人检测算法的稳健性。实验结果表明,现有的行人检测算法可以进一步提高鲁棒性。此外,我们还提供了关于不同失真类型如何影响行人检测算法性能的深入讨论,这对于设计有效的行人检测算法非常重要。[1807.04562v1] 可用于可视化行人检测算法对高质量视频序列的检测精度和视频质量下降的稳定性。第三,通过内置\ emph {DSurVD}评估七种现有行人检测算法的稳健性。实验结果表明,现有的行人检测算法可以进一步提高鲁棒性。此外,我们还提供了关于不同失真类型如何影响行人检测算法性能的深入讨论,这对于设计有效的行人检测算法非常重要。[1807.04562v1] 可用于可视化行人检测算法对高质量视频序列的检测精度和视频质量下降的稳定性。第三,通过内置\ emph {DSurVD}评估七种现有行人检测算法的稳健性。实验结果表明,现有的行人检测算法可以进一步提高鲁棒性。此外,我们还提供了关于不同失真类型如何影响行人检测算法性能的深入讨论,这对于设计有效的行人检测算法非常重要。[1807.04562v1] 实验结果表明,现有的行人检测算法可以进一步提高鲁棒性。此外,我们还提供了关于不同失真类型如何影响行人检测算法性能的深入讨论,这对于设计有效的行人检测算法非常重要。[1807.04562v1] 实验结果表明,现有的行人检测算法可以进一步提高鲁棒性。此外,我们还提供了关于不同失真类型如何影响行人检测算法性能的深入讨论,这对于设计有效的行人检测算法非常重要。[1807.04562v1]

 

使用类专家生成对抗网络进行不平衡数据分类的深度学习

范妮,Tjeng Wawan Cenggoro

没有任何特定的不平衡数据分类方法,人工智能算法不能轻易识别少数类的数据。通常,通过假设训练数据不平衡来修改现有算法是处理不平衡数据的唯一方法。但是,对于正常的数据处理,这种方式主要产生不足的结果。在这项研究中,我们提出了一个类专家生成对抗网络(CE-GAN)作为不平衡数据分类的解决方案。CE-GAN是深度学习算法架构中的修改,其不具有训练数据是不平衡数据的假设。此外,CE-GAN旨在在分类步骤之前识别关于每个类别的特征的更多细节。本研究证明CE-GAN能够为不平衡数据分类提供良好的性能。[1807.04585v1]

 

子采样湍流去除网络

Wai Ho Chak, Chun Pong Lau, Lok Ming Lui

我们提出了一种深度学习方法,用于从湍流变形和时空变化模糊中恢复一系列湍流失真的视频帧。我们的目标不是在深层网络中需要大量的训练样本量,而是采用基于新数据增强方法的训练策略来对来自相对较小的数据集的湍流进行建模。然后我们介绍一种子采样方法,以增强所提出的GAN模型的恢复性能。本文的贡献有三个方面:首先,我们引入一种简单但有效的数据增强算法来模拟现实生活中的湍流,以便在深层网络中进行训练其次,我们首先将Wasserstein GAN$ \ ell_1 $成本结合起来,成功恢复湍流损坏的视频序列第三,我们结合使用子采样算法来滤除强烈损坏的帧,以生成质量更好的视频序列。[1807.04418v1]

 

Sem-GAN:语义一致的图像到图像转换

Anoop CherianAlan Sullivan

不成对的图像到图像转换是将源域中的图像映射到目标域中的图像的问题,而不需要相应的图像对。为了确保翻译的图像真实可信,最近的工作,例如Cycle-GAN,要求这种映射是可逆的。然而,当域是单峰时,该要求表明了有希望的结果,其性能在多模态场景中是不可预测的,例如在图像分割任务中。这是因为,可逆性不一定强制语义正确性。为此,我们提出了一个语义一致的GAN框架,称为Sem-GAN,其中语义由源域中的图像片段的类标识定义,如由语义分割算法产生的。我们提出的框架包括对翻译任务的一致性约束,它与GAN损失和周期约束一起强制执行翻译后的图像将继承目标域的外观,同时(大致)保持其与源域的身份。我们提供了几个图像到图像翻译任务的实验,并证明Sem-GAN显着提高了翻译图像的质量,有时甚至超过了FCN得分的20%。此外,我们表明,通过Sem-GAN翻译的合成图像训练的语义分割模型导致比其他变体明显更好的分割结果。[1807.04409v1] 强制执行翻译后的图像将继承目标域的外观,同时(大致)从源域维护其身份。我们提供了几个图像到图像翻译任务的实验,并证明Sem-GAN显着提高了翻译图像的质量,有时甚至超过了FCN得分的20%。此外,我们表明,通过Sem-GAN翻译的合成图像训练的语义分割模型导致比其他变体明显更好的分割结果。[1807.04409v1] 强制执行翻译后的图像将继承目标域的外观,同时(大致)从源域维护其身份。我们提供了几个图像到图像翻译任务的实验,并证明Sem-GAN显着提高了翻译图像的质量,有时甚至超过了FCN得分的20%。此外,我们表明,通过Sem-GAN翻译的合成图像训练的语义分割模型导致比其他变体明显更好的分割结果。[1807.04409v1] 有时候FCN得分超过20%。此外,我们表明,通过Sem-GAN翻译的合成图像训练的语义分割模型导致比其他变体明显更好的分割结果。[1807.04409v1] 有时候FCN得分超过20%。此外,我们表明,通过Sem-GAN翻译的合成图像训练的语义分割模型导致比其他变体明显更好的分割结果。[1807.04409v1]

 

一种基于反射的阴影检测与去除方法

Sri Kalyan Yarlagadda,朱凤清

阴影是图像的常见方面,当未被发现时会影响场景理解和视觉处理。我们提出了一种基于反射率的简单而有效的方法来检测单个图像的阴影。首先对图像进行分割,并且基于反射率,照明和纹理特征,将片段对识别为阴影对和非阴影对。所提出的方法在两个公开可用且广泛使用的数据集上进行测试。尽管需要较少的参数,但与先前报道的方法相比,我们的方法在检测阴影方面实现更高 我们还通过重新点亮检测到的阴影区域中的像素来显示无阴影图像的结果。[1807.04352v1]

 

Deepwound:通过卷积神经网络自动化术后伤口评估和手术部位监测

Varun ShenoyElizabeth FosterLauren AalamiBakar MajeedOliver Aalami

术后伤口并发症是医院,医生和患者费用的重要原因。因此,强烈期望一种诊断伤口并发症的有效方法。由于伤口部位外观的可变性,对伤口图像进行算法分类是一项艰巨的任务。卷积神经网络(CNN)是人工神经网络的一个子群,在分析视觉图像方面显示出巨大的希望,可以用来对手术伤口进行分类。我们提出了一个多标签的CNN集合,Deepwound,经过训练,仅使用图像像素和相应的标签作为输入来对伤口图像进行分类。我们的最终计算模型可以准确地识别九种标签的存在:引流,纤维蛋白渗出物,肉芽组织,手术部位感染,开放性伤口,钉,锭带和缝合线。我们的模型实现了接收器操作曲线(ROC)曲线下面积(AUC)得分,灵敏度,特异性和F1得分优于此领域的先前工作。智能手机提供了一种手段,可以提供无障碍伤口护理,因为它们越来越普遍。与深度神经网络配合使用,可提供临床洞察力,帮助外科医生在术后护理期间提供帮助。我们还为Deepwound提供了一个移动应用程序前端,帮助患者在舒适的家中跟踪伤口和手术恢复情况。[1807.04355v1] 它们提供了提供临床见解的能力,以帮助外科医生在术后护理期间。我们还为Deepwound提供了一个移动应用程序前端,帮助患者在舒适的家中跟踪伤口和手术恢复情况。[1807.04355v1] 它们提供了提供临床见解的能力,以帮助外科医生在术后护理期间。我们还为Deepwound提供了一个移动应用程序前端,帮助患者在舒适的家中跟踪伤口和手术恢复情况。[1807.04355v1]

 

利用深空和形状学习从胸部X射线照片进行肺野分割的一般方法

Awais MansoorJuan J. CerrolazaGeovanny PerezElijah BiggsKazunori OkadaGustavo NinoMarius George Linguraru

已经为成人队列提出了用于胸部X射线照片(CXR)的肺野分割的计算机辅助诊断(CAD)技术,但很少用于儿科受试者。统计形状模型(SSM)是大多数最先进的基于CXR的肺野分割方法的主力,在儿科发育阶段不能有效地适应肺野的形状变化。我们工作的主要贡献是:(1CXR的通用肺野分割框架,适用于成人和儿科队列的大形状变异2)深度表示学习检测机制,\ emph {集合空间学习},用于鲁棒的对象定位3\ emph {边缘形状深度学习}用于形状变形参数估计。与传统SSM的迭代方法不同,所提出的形状学习机制将参数空间转换为可使用递归表示学习机制有效解决的边缘子空间。此外,我们的方法首先在基于CXR的肺部分割中包括具有挑战性的逆心脏区域,以进行准确的肺容量估计。该框架在3个月至89岁之间的668名患者的CXR上进行评估。我们得到的平均骰子相似系数为0.96美元\ pm0.03 $(包括复古心脏区域)。对于给定的精度,还发现所提出的方法比传统的基于SSM的迭代分割方法更快。所提出的通用框架的计算简单性可以类似地应用于其他可变形对象的快速分割。[1807.04339v1] 我们的方法是第一个在基于CXR的肺部分割中包含具有挑战性的逆行心脏区域,以进行准确的肺容量估计。该框架在3个月至89岁之间的668名患者的CXR上进行评估。我们得到的平均骰子相似系数为0.96美元\ pm0.03 $(包括复古心脏区域)。对于给定的精度,还发现所提出的方法比传统的基于SSM的迭代分割方法更快。所提出的通用框架的计算简单性可以类似地应用于其他可变形对象的快速分割。[1807.04339v1] 我们的方法是第一个在基于CXR的肺部分割中包含具有挑战性的逆行心脏区域,以进行准确的肺容量估计。该框架在3个月至89岁之间的668名患者的CXR上进行评估。我们得到的平均骰子相似系数为0.96美元\ pm0.03 $(包括复古心脏区域)。对于给定的精度,还发现所提出的方法比传统的基于SSM的迭代分割方法更快。所提出的通用框架的计算简单性可以类似地应用于其他可变形对象的快速分割。[1807.04339v1] 我们得到的平均骰子相似系数为0.96美元\ pm0.03 $(包括复古心脏区域)。对于给定的精度,还发现所提出的方法比传统的基于SSM的迭代分割方法更快。所提出的通用框架的计算简单性可以类似地应用于其他可变形对象的快速分割。[1807.04339v1] 我们得到的平均骰子相似系数为0.96美元\ pm0.03 $(包括复古心脏区域)。对于给定的精度,还发现所提出的方法比传统的基于SSM的迭代分割方法更快。所提出的通用框架的计算简单性可以类似地应用于其他可变形对象的快速分割。[1807.04339v1]

 

一种用于实际图像去噪的三边加权稀疏编码方案

Jun Xu, Lei Zhang, David Zhang

大多数现有的图像去噪方法假设被破坏的噪声是加性高斯白噪声(AWGN)。然而,真实噪声图像中的逼真噪声比AWGN复杂得多,并且很难通过简单的分析分布来建模。因此,当应用于由CCDCMOS相机捕获的真实噪声图像时,文献中的许多最先进的去噪方法变得不那么有效。在本文中,我们开发了一种三边加权稀疏编码(TWSC)方案,用于鲁棒的真实图像去噪。具体来说,我们将三个权重矩阵引入稀疏编码框架的数据和正则化项,以表征逼真噪声和图像先验的统计。TWSC可以重新表述为线性等式约束问题,可以通过乘法器的交替方向法求解。分析了该算法的存在性和唯一性以及所提算法的收敛性。大量实验表明,所提出的TWSC方案在去除真实噪声方面优于最先进的去噪方法。[1807.04364v1]

转载请注明:《W-net:用于2D医学图像分割的桥接U-net + 对真实照片的卷积盲去噪 + Sem-GAN:语义一致的图像到图像转换

发表评论