W-net:用于2D医学图像分割的桥接U-net + 对真实照片的卷积盲去噪 + Sem-GAN:语义一致的图像到图像转换

W-net: Bridged U-net for 2D Medical Image Segmentation

Wanli Chen, Yue Zhang, Junjun He, Yu Qiao, Yifan Chen, Hongjian Shi, Xiaoying Tang

In this paper, we focus on three problems in deep learning based medical image segmentation. Firstly, U-net, as a popular model for medical image segmentation, is difficult to train when convolutional layers increase even though a deeper network usually has a better generalization ability because of more learnable parameters. Secondly, the exponential ReLU (ELU), as an alternative of ReLU, is not much different from ReLU when the network of interest gets deep. Thirdly, the Dice loss, as one of the pervasive loss functions for medical image segmentation, is not effective when the prediction is close to ground truth and will cause oscillation during training. To address the aforementioned three problems, we propose and validate a deeper network that can fit medical image datasets that are usually small in the sample size. Meanwhile, we propose a new loss function to accelerate the learning process and a combination of different activation functions to improve the network performance. Our experimental results suggest that our network is comparable or superior to state-of-the-art methods. [1807.04459v1]


Visual Reinforcement Learning with Imagined Goals

Ashvin Nair, Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine

For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires. Furthermore, to provide the requisite level of generality, these skills must handle raw sensory input such as images. In this paper, we propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies. Since the particular goals that might be required at test-time are not known in advance, the agent performs a self-supervised “practice” phase where it imagines goals and attempts to achieve them. We learn a visual representation with three distinct purposes: sampling goals for self-supervised practice, providing a structured transformation of raw sensory inputs, and computing a reward signal for goal reaching. We also propose a retroactive goal relabeling scheme to further improve the sample-efficiency of our method. Our off-policy algorithm is efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques. [1807.04742v1]


Learning Product Codebooks using Vector Quantized Autoencoders for Image Retrieval

Hanwei Wu, Markus Flierl

The Vector Quantized-Variational Autoencoder (VQ-VAE) provides an unsupervised model for learning discrete representations by combining vector quantization and autoencoders. The VQ-VAE can avoid the issue of “posterior collapse” so that its learned discrete representation is meaningful. In this paper, we incorporate the product quantization into the bottleneck stage of VQ-VAE and propose an end-to-end unsupervised learning model for the image retrieval tasks. Compared to the classic vector quantization, product quantization has the advantage of generating large size codebook and fast retrieval can be achieved by using the lookup tables that store the distance between every two sub-codewords. In our proposed model, the product codebook is jointly learned with the encoders and decoders of the autoencoders. The encodings of query and database images can be generated by feeding the images into the trained encoder and learned product codebook. The experiment shows that our proposed model outperforms other state-of-the-art hashing and quantization methods for image retrieval. [1807.04629v1]


Competitive Analysis System for Theatrical Movie Releases Based on Movie Trailer Deep Video Representation

Miguel Campo, Cheng-Kang Hsieh, Matt Nickens, JJ Espinoza, Abhinav Taliyan, Julie Rieger, Jean Ho, Bettina Sherick

Audience discovery is an important activity at major movie studios. Deep models that use convolutional networks to extract frame-by-frame features of a movie trailer and represent it in a form that is suitable for prediction are now possible thanks to the availability of pre-built feature extractors trained on large image datasets. Using these pre-built feature extractors, we are able to process hundreds of publicly available movie trailers, extract frame-by-frame low level features (e.g., a face, an object, etc) and create video-level representations. We use the video-level representations to train a hybrid Collaborative Filtering model that combines video features with historical movie attendance records. The trained model not only makes accurate attendance and audience prediction for existing movies, but also successfully profiles new movies six to eight months prior to their release. [1807.04465v1]


Adding Attentiveness to the Neurons in Recurrent Neural Networks

Pengfei Zhang, Jianru Xue, Cuiling Lan, Wenjun Zeng, Zhanning Gao, Nanning Zheng

Recurrent neural networks (RNNs) are capable of modeling the temporal dynamics of complex sequential information. However, the structures of existing RNN neurons mainly focus on controlling the contributions of current and historical information but do not explore the different importance levels of different elements in an input vector of a time slot. We propose adding a simple yet effective Element-wiseAttention Gate (EleAttG) to an RNN block (e.g., all RNN neurons in a network layer) that empowers the RNN neurons to have the attentiveness capability. For an RNN block, an EleAttG is added to adaptively modulate the input by assigning different levels of importance, i.e., attention, to each element/dimension of the input. We refer to an RNN block equipped with an EleAttG as an EleAtt-RNN block. Specifically, the modulation of the input is content adaptive and is performed at fine granularity, being element-wise rather than input-wise. The proposed EleAttG, as an additional fundamental unit, is general and can be applied to any RNN structures, e.g., standard RNN, Long Short-Term Memory (LSTM), or Gated Recurrent Unit (GRU). We demonstrate the effectiveness of the proposed EleAtt-RNN by applying it to the action recognition tasks on both 3D human skeleton data and RGB videos. Experiments show that adding attentiveness through EleAttGs to RNN blocks significantly boosts the power of RNNs. [1807.04445v1]


Toward Convolutional Blind Denoising of Real Photographs

Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, Lei Zhang

Despite their success in Gaussian denoising, deep convolutional neural networks (CNNs) are still very limited on real noisy photographs, and may even perform worse than the representative traditional methods such as BM3D and K-SVD. In order to improve the robustness and practicability of deep denoising models, this paper presents a convolutional blind denoising network (CBDNet) by incorporating network architecture, noise modeling, and asymmetric learning. Our CBDNet is comprised of a noise estimation subnetwork and a denoising subnetwork, and is trained using a more realistic noise model by considering both signal-dependent noise and in-camera processing pipeline. Motivated by the asymmetric sensitivity of non-blind denoisers (e.g., BM3D) to noise estimation error, the asymmetric learning is presented on the noise estimation subnetwork to suppress more on under-estimation of noise level. To make the learned model applicable to real photographs, both synthetic images based on realistic noise model and real noisy photographs with nearly noise-free images are incorporated to train our CBDNet. The results on three datasets of real noisy photographs clearly demonstrate the superiority of our CBDNet over the state-of-the-art denoisers in terms of quantitative metrics and visual quality. The code and models will be publicly available at https://github.com/GuoShi28/CBDNet. [1807.04686v1]


LandmarkBoost: Efficient Visual Context Classifiers for Robust Localization

Marcin Dymczyk, Igor Gilitschenski, Juan Nieto, Simon Lynen, Bernhard Zeisl, Roland Siegwart

The growing popularity of autonomous systems creates a need for reliable and efficient metric pose retrieval algorithms. Currently used approaches tend to rely on nearest neighbor search of binary descriptors to perform the 2D-3D matching and guarantee realtime capabilities on mobile platforms. These methods struggle, however, with the growing size of the map, changes in viewpoint or appearance, and visual aliasing present in the environment. The rigidly defined descriptor patterns only capture a limited neighborhood of the keypoint and completely ignore the overall visual context. We propose LandmarkBoost – an approach that, in contrast to the conventional 2D-3D matching methods, casts the search problem as a landmark classification task. We use a boosted classifier to classify landmark observations and directly obtain correspondences as classifier scores. We also introduce a formulation of visual context that is flexible, efficient to compute, and can capture relationships in the entire image plane. The original binary descriptors are augmented with contextual information and informative features are selected by the boosting framework. Through detailed experiments, we evaluate the retrieval quality and performance of LandmarkBoost, demonstrating that it outperforms common state-of-the-art descriptor matching methods. [1807.04702v1]


Video Saliency Detection by 3D Convolutional Neural Networks

Guanqun Ding, Yuming Fang

Different from salient object detection methods for still images, a key challenging for video saliency detection is how to extract and combine spatial and temporal features. In this paper, we present a novel and effective approach for salient object detection for video sequences based on 3D convolutional neural networks. First, we design a 3D convolutional network (Conv3DNet) with the input as three video frame to learn the spatiotemporal features for video sequences. Then, we design a 3D deconvolutional network (Deconv3DNet) to combine the spatiotemporal features to predict the final saliency map for video sequences. Experimental results show that the proposed saliency detection model performs better in video saliency prediction compared with the state-of-the-art video saliency detection methods. [1807.04514v1]


Learning to Segment Medical Images with Scribble-Supervision Alone

Yigit B. Can, Krishna Chaitanya, Basil Mustafa, Lisa M. Koch, Ender Konukoglu, Christian F. Baumgartner

Semantic segmentation of medical images is a crucial step for the quantification of healthy anatomy and diseases alike. The majority of the current state-of-the-art segmentation algorithms are based on deep neural networks and rely on large datasets with full pixel-wise annotations. Producing such annotations can often only be done by medical professionals and requires large amounts of valuable time. Training a medical image segmentation network with weak annotations remains a relatively unexplored topic. In this work we investigate training strategies to learn the parameters of a pixel-wise segmentation network from scribble annotations alone. We evaluate the techniques on public cardiac (ACDC) and prostate (NCI-ISBI) segmentation datasets. We find that the networks trained on scribbles suffer from a remarkably small degradation in Dice of only 2.9% (cardiac) and 4.5% (prostate) with respect to a network trained on full annotations. [1807.04668v1]


Deep semi-supervised segmentation with weight-averaged consistency targets

Christian S. Perone, Julien Cohen-Adad

Recently proposed techniques for semi-supervised learning such as Temporal Ensembling and Mean Teacher have achieved state-of-the-art results in many important classification benchmarks. In this work, we expand the Mean Teacher approach to segmentation tasks and show that it can bring important improvements in a realistic small data regime using a publicly available multi-center dataset from the Magnetic Resonance Imaging (MRI) domain. We also devise a method to solve the problems that arise when using traditional data augmentation strategies for segmentation tasks on our new training scheme. [1807.04657v1]


Robustness Analysis of Pedestrian Detectors for Surveillance

Yuming Fang, Guanqun Ding, Yuan Yuan, Weisi Lin, Haiwen Liu

To obtain effective pedestrian detection results in surveillance video, there have been many methods proposed to handle the problems from severe occlusion, pose variation, clutter background, \emph{etc}. Besides detection accuracy, a robust surveillance video system should be stable to video quality degradation by network transmission, environment variation, \emph{etc}. In this study, we conduct the research on the robustness of pedestrian detection algorithms to video quality degradation. The main contribution of this work includes the following three aspects. First, a large-scale Distorted Surveillance Video Data Set (\emph{DSurVD}) is constructed from high-quality video sequences and their corresponding distorted versions. Second, we design a method to evaluate detection stability and a robustness measure called \emph{Robustness Quadrangle}, which can be adopted to visualize detection accuracy of pedestrian detection algorithms on high-quality video sequences and stability with video quality degradation. Third, the robustness of seven existing pedestrian detection algorithms is evaluated by the built \emph{DSurVD}. Experimental results show that the robustness can be further improved for existing pedestrian detection algorithms. Additionally, we provide much in-depth discussion on how different distortion types influence the performance of pedestrian detection algorithms, which is important to design effective pedestrian detection algorithms for surveillance. [1807.04562v1]


Deep Learning for Imbalance Data Classification using Class Expert Generative Adversarial Network

Fanny, Tjeng Wawan Cenggoro

Without any specific way for imbalance data classification, artificial intelligence algorithm cannot recognize data from minority classes easily. In general, modifying the existing algorithm by assuming that the training data is imbalanced, is the only way to handle imbalance data. However, for a normal data handling, this way mostly produces a deficient result. In this research, we propose a class expert generative adversarial network (CE-GAN) as the solution for imbalance data classification. CE-GAN is a modification in deep learning algorithm architecture that does not have an assumption that the training data is imbalance data. Moreover, CE-GAN is designed to identify more detail about the character of each class before classification step. CE-GAN has been proved in this research to give a good performance for imbalance data classification. [1807.04585v1]


Subsampled Turbulence Removal Network

Wai Ho Chak, Chun Pong Lau, Lok Ming Lui

We present a deep-learning approach to restore a sequence of turbulence-distorted video frames from turbulent deformations and space-time varying blurs. Instead of requiring a massive training sample size in deep networks, we purpose a training strategy that is based on a new data augmentation method to model turbulence from a relatively small dataset. Then we introduce a subsampled method to enhance the restoration performance of the presented GAN model. The contributions of the paper is threefold: first, we introduce a simple but effective data augmentation algorithm to model the turbulence in real life for training in the deep network; Second, we firstly purpose the Wasserstein GAN combined with $\ell_1$ cost for successful restoration of turbulence-corrupted video sequence; Third, we combine the subsampling algorithm to filter out strongly corrupted frames to generate a video sequence with better quality. [1807.04418v1]


Sem-GAN: Semantically-Consistent Image-to-Image Translation

Anoop Cherian, Alan Sullivan

Unpaired image-to-image translation is the problem of mapping an image in the source domain to one in the target domain, without requiring corresponding image pairs. To ensure the translated images are realistically plausible, recent works, such as Cycle-GAN, demands this mapping to be invertible. While, this requirement demonstrates promising results when the domains are unimodal, its performance is unpredictable in a multi-modal scenario such as in an image segmentation task. This is because, invertibility does not necessarily enforce semantic correctness. To this end, we present a semantically-consistent GAN framework, dubbed Sem-GAN, in which the semantics are defined by the class identities of image segments in the source domain as produced by a semantic segmentation algorithm. Our proposed framework includes consistency constraints on the translation task that, together with the GAN loss and the cycle-constraints, enforces that the images when translated will inherit the appearances of the target domain, while (approximately) maintaining their identities from the source domain. We present experiments on several image-to-image translation tasks and demonstrate that Sem-GAN improves the quality of the translated images significantly, sometimes by more than 20% on the FCN score. Further, we show that semantic segmentation models, trained with synthetic images translated via Sem-GAN, leads to significantly better segmentation results than other variants. [1807.04409v1]


A Reflectance Based Method For Shadow Detection and Removal

Sri Kalyan Yarlagadda, Fengqing Zhu

Shadows are common aspect of images and when left undetected can hinder scene understanding and visual processing. We propose a simple yet effective approach based on reflectance to detect shadows from single image. An image is first segmented and based on the reflectance, illumination and texture characteristics, segments pairs are identified as shadow and non-shadow pairs. The proposed method is tested on two publicly available and widely used datasets. Our method achieves higher accuracy in detecting shadows compared to previous reported methods despite requiring fewer parameters. We also show results of shadow-free images by relighting the pixels in the detected shadow regions. [1807.04352v1]


Deepwound: Automated Postoperative Wound Assessment and Surgical Site Surveillance through Convolutional Neural Networks

Varun Shenoy, Elizabeth Foster, Lauren Aalami, Bakar Majeed, Oliver Aalami

Postoperative wound complications are a significant cause of expense for hospitals, doctors, and patients. Hence, an effective method to diagnose the onset of wound complications is strongly desired. Algorithmically classifying wound images is a difficult task due to the variability in the appearance of wound sites. Convolutional neural networks (CNNs), a subgroup of artificial neural networks that have shown great promise in analyzing visual imagery, can be leveraged to categorize surgical wounds. We present a multi-label CNN ensemble, Deepwound, trained to classify wound images using only image pixels and corresponding labels as inputs. Our final computational model can accurately identify the presence of nine labels: drainage, fibrinous exudate, granulation tissue, surgical site infection, open wound, staples, steri strips, and sutures. Our model achieves receiver operating curve (ROC) area under curve (AUC) scores, sensitivity, specificity, and F1 scores superior to prior work in this area. Smartphones provide a means to deliver accessible wound care due to their increasing ubiquity. Paired with deep neural networks, they offer the capability to provide clinical insight to assist surgeons during postoperative care. We also present a mobile application frontend to Deepwound that assists patients in tracking their wound and surgical recovery from the comfort of their home. [1807.04355v1]


A Generic Approach to Lung Field Segmentation from Chest Radiographs using Deep Space and Shape Learning

Awais Mansoor, Juan J. Cerrolaza, Geovanny Perez, Elijah Biggs, Kazunori Okada, Gustavo Nino, Marius George Linguraru

Computer-aided diagnosis (CAD) techniques for lung field segmentation from chest radiographs (CXR) have been proposed for adult cohorts, but rarely for pediatric subjects. Statistical shape models (SSMs), the workhorse of most state-of-the-art CXR-based lung field segmentation methods, do not efficiently accommodate shape variation of the lung field during the pediatric developmental stages. The main contributions of our work are: (1) a generic lung field segmentation framework from CXR accommodating large shape variation for adult and pediatric cohorts; (2) a deep representation learning detection mechanism, \emph{ensemble space learning}, for robust object localization; and (3) \emph{marginal shape deep learning} for the shape deformation parameter estimation. Unlike the iterative approach of conventional SSMs, the proposed shape learning mechanism transforms the parameter space into marginal subspaces that are solvable efficiently using the recursive representation learning mechanism. Furthermore, our method is the first to include the challenging retro-cardiac region in the CXR-based lung segmentation for accurate lung capacity estimation. The framework is evaluated on 668 CXRs of patients between 3 month to 89 year of age. We obtain a mean Dice similarity coefficient of $0.96\pm0.03$ (including the retro-cardiac region). For a given accuracy, the proposed approach is also found to be faster than conventional SSM-based iterative segmentation methods. The computational simplicity of the proposed generic framework could be similarly applied to the fast segmentation of other deformable objects. [1807.04339v1]


A Trilateral Weighted Sparse Coding Scheme for Real-World Image Denoising

Jun Xu, Lei Zhang, David Zhang

Most of existing image denoising methods assume the corrupted noise to be additive white Gaussian noise (AWGN). However, the realistic noise in real-world noisy images is much more complex than AWGN, and is hard to be modelled by simple analytical distributions. As a result, many state-of-the-art denoising methods in literature become much less effective when applied to real-world noisy images captured by CCD or CMOS cameras. In this paper, we develop a trilateral weighted sparse coding (TWSC) scheme for robust real-world image denoising. Specifically, we introduce three weight matrices into the data and regularisation terms of the sparse coding framework to characterise the statistics of realistic noise and image priors. TWSC can be reformulated as a linear equality-constrained problem and can be solved by the alternating direction method of multipliers. The existence and uniqueness of the solution and convergence of the proposed algorithm are analysed. Extensive experiments demonstrate that the proposed TWSC scheme outperforms state-of-the-art denoising methods on removing realistic noise. [1807.04364v1]


Wanli Chen, Yue Zhang, Junjun He, Yu Qiao, Yifan Chen, Hongjian Shi, Xiaoying Tang




Ashvin NairVitchyr PongMurtaza DalalShikhar BahlSteven LinSergey Levine




吴汉伟,Markus Flierl




Miguel CampoCheng-Kang HsiehMatt NickensJJ EspinozaAbhinav TaliyanJulie RiegerJean HoBettina Sherick




Pengfei Zhang, Jianru Xue, Cuiling Lan, Wenjun Zeng, Zhanning Gao, Nanning Zheng

递归神经网络(RNN)能够对复杂顺序信息的时间动态建模。然而,现有RNN神经元的结构主要集中在控制当前和历史信息的贡献,而没有探索时隙的输入向量中不同元素的不同重要性级别。我们建议在RNN块(例如,网络层中的所有RNN神经元)中添加一个简单但有效的Element-wiseAttention GateEleAttG),使RNN神经元具有注意力。对于RNN块,添加EleAttG以通过向输入的每个元素/维度分配不同的重要性级别(即,注意力)来自适应地调制输入。我们将配备有EleAttGRNN块称为EleAtt-RNN块。特别,输入的调制是内容自适应的,并且以细粒度执行,是元素而不是输入。所提出的EleAttG作为附加的基本单元是通用的,并且可以应用于任何RNN结构,例如标准RNN,长短期存储器(LSTM)或门控循环单元(GRU)。我们通过将其应用于3D人体骨骼数据和RGB视频的动作识别任务来证明所提出的EleAtt-RNN的有效性。实验表明,通过EleAttGsRNN块增加注意力可以显着提高RNN的功效。[1807.04445v1] 长短期记忆(LSTM)或门控循环单位(GRU)。我们通过将其应用于3D人体骨骼数据和RGB视频的动作识别任务来证明所提出的EleAtt-RNN的有效性。实验表明,通过EleAttGsRNN块增加注意力可以显着提高RNN的功效。[1807.04445v1] 长短期记忆(LSTM)或门控循环单位(GRU)。我们通过将其应用于3D人体骨骼数据和RGB视频的动作识别任务来证明所提出的EleAtt-RNN的有效性。实验表明,通过EleAttGsRNN块增加注意力可以显着提高RNN的功效。[1807.04445v1]



Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, Lei Zhang

尽管它们在高斯去噪方面取得了成功,但深度卷积神经网络(CNN)仍然在真实噪声照片上非常有限,甚至可能比BM3DK-SVD等代表性的传统方法表现更差。为了提高深度去噪模型的鲁棒性和实用性,本文提出了一种卷积盲去噪网络(CBDNet),它结合了网络架构,噪声建模和非对称学习。我们的CBDNet由噪声估计子网和去噪子网组成,并通过考虑信号相关噪声和相机内处理流水线,使用更逼真的噪声模型进行训练。受非盲目降噪器(例如BM3D)对噪声估计误差的不对称敏感性的影响,在噪声估计子网上呈现非对称学习,以更多地抑制噪声水平的低估。为了使学习的模型适用于真实照片,基于真实噪声模型的合成图像和具有几乎无噪声图像的真实噪声照片被合并以训练我们的CBDNet。真实嘈杂照片的三个数据集的结果清楚地证明了我们的CBDNet在定量指标和视觉质量方面优于现有技术的降噪器。代码和模型将在https://github.com/GuoShi28/CBDNet上公开。[1807.04686v1] 基于逼真噪声模型的合成图像和具有几乎无噪声图像的真实噪声照片被合并以训练我们的CBDNet。真实嘈杂照片的三个数据集的结果清楚地证明了我们的CBDNet在定量指标和视觉质量方面优于现有技术的降噪器。代码和模型将在https://github.com/GuoShi28/CBDNet上公开。[1807.04686v1] 基于逼真噪声模型的合成图像和具有几乎无噪声图像的真实噪声照片被合并以训练我们的CBDNet。真实嘈杂照片的三个数据集的结果清楚地证明了我们的CBDNet在定量指标和视觉质量方面优于现有技术的降噪器。代码和模型将在https://github.com/GuoShi28/CBDNet上公开。[1807.04686v1]



Marcin DymczykIgor GilitschenskiJuan NietoSimon LynenBernhard ZeislRoland Siegwart



Guanqun Ding, Yuming Fang




Yigit B. CanKrishna ChaitanyaBasil MustafaLisa M. KochEnder KonukogluChristian F. Baumgartner




Christian S. PeroneJulien Cohen-Adad




Yuming Fang, Guanqun Ding, Yuan Yuan, Weisi Lin, Haiwen Liu

为了在监控视频中获得有效的行人检测结果,已经提出了许多方法来处理严重遮挡,姿势变化,杂乱背景,\ emph {}的问题。除了检测精度之外,强大的监控视频系统应该通过网络传输,环境变化,\ emph {}来稳定视频质量下降。在本研究中,我们对行人检测算法对视频质量下降的鲁棒性进行了研究。这项工作的主要贡献包括以下三个方面。首先,大规模失真监视视频数据集(\ emph {DSurVD})由高质量视频序列及其相应的失真版本构成。第二,我们设计了一种评估检测稳定性的方法和一种称为\ emph {Robustness Quadrangle}的鲁棒性度量,可用于可视化行人检测算法对高质量视频序列的检测精度和视频质量下降的稳定性。第三,通过内置\ emph {DSurVD}评估七种现有行人检测算法的稳健性。实验结果表明,现有的行人检测算法可以进一步提高鲁棒性。此外,我们还提供了关于不同失真类型如何影响行人检测算法性能的深入讨论,这对于设计有效的行人检测算法非常重要。[1807.04562v1] 可用于可视化行人检测算法对高质量视频序列的检测精度和视频质量下降的稳定性。第三,通过内置\ emph {DSurVD}评估七种现有行人检测算法的稳健性。实验结果表明,现有的行人检测算法可以进一步提高鲁棒性。此外,我们还提供了关于不同失真类型如何影响行人检测算法性能的深入讨论,这对于设计有效的行人检测算法非常重要。[1807.04562v1] 可用于可视化行人检测算法对高质量视频序列的检测精度和视频质量下降的稳定性。第三,通过内置\ emph {DSurVD}评估七种现有行人检测算法的稳健性。实验结果表明,现有的行人检测算法可以进一步提高鲁棒性。此外,我们还提供了关于不同失真类型如何影响行人检测算法性能的深入讨论,这对于设计有效的行人检测算法非常重要。[1807.04562v1] 实验结果表明,现有的行人检测算法可以进一步提高鲁棒性。此外,我们还提供了关于不同失真类型如何影响行人检测算法性能的深入讨论,这对于设计有效的行人检测算法非常重要。[1807.04562v1] 实验结果表明,现有的行人检测算法可以进一步提高鲁棒性。此外,我们还提供了关于不同失真类型如何影响行人检测算法性能的深入讨论,这对于设计有效的行人检测算法非常重要。[1807.04562v1]



范妮,Tjeng Wawan Cenggoro




Wai Ho Chak, Chun Pong Lau, Lok Ming Lui

我们提出了一种深度学习方法,用于从湍流变形和时空变化模糊中恢复一系列湍流失真的视频帧。我们的目标不是在深层网络中需要大量的训练样本量,而是采用基于新数据增强方法的训练策略来对来自相对较小的数据集的湍流进行建模。然后我们介绍一种子采样方法,以增强所提出的GAN模型的恢复性能。本文的贡献有三个方面:首先,我们引入一种简单但有效的数据增强算法来模拟现实生活中的湍流,以便在深层网络中进行训练其次,我们首先将Wasserstein GAN$ \ ell_1 $成本结合起来,成功恢复湍流损坏的视频序列第三,我们结合使用子采样算法来滤除强烈损坏的帧,以生成质量更好的视频序列。[1807.04418v1]



Anoop CherianAlan Sullivan

不成对的图像到图像转换是将源域中的图像映射到目标域中的图像的问题,而不需要相应的图像对。为了确保翻译的图像真实可信,最近的工作,例如Cycle-GAN,要求这种映射是可逆的。然而,当域是单峰时,该要求表明了有希望的结果,其性能在多模态场景中是不可预测的,例如在图像分割任务中。这是因为,可逆性不一定强制语义正确性。为此,我们提出了一个语义一致的GAN框架,称为Sem-GAN,其中语义由源域中的图像片段的类标识定义,如由语义分割算法产生的。我们提出的框架包括对翻译任务的一致性约束,它与GAN损失和周期约束一起强制执行翻译后的图像将继承目标域的外观,同时(大致)保持其与源域的身份。我们提供了几个图像到图像翻译任务的实验,并证明Sem-GAN显着提高了翻译图像的质量,有时甚至超过了FCN得分的20%。此外,我们表明,通过Sem-GAN翻译的合成图像训练的语义分割模型导致比其他变体明显更好的分割结果。[1807.04409v1] 强制执行翻译后的图像将继承目标域的外观,同时(大致)从源域维护其身份。我们提供了几个图像到图像翻译任务的实验,并证明Sem-GAN显着提高了翻译图像的质量,有时甚至超过了FCN得分的20%。此外,我们表明,通过Sem-GAN翻译的合成图像训练的语义分割模型导致比其他变体明显更好的分割结果。[1807.04409v1] 强制执行翻译后的图像将继承目标域的外观,同时(大致)从源域维护其身份。我们提供了几个图像到图像翻译任务的实验,并证明Sem-GAN显着提高了翻译图像的质量,有时甚至超过了FCN得分的20%。此外,我们表明,通过Sem-GAN翻译的合成图像训练的语义分割模型导致比其他变体明显更好的分割结果。[1807.04409v1] 有时候FCN得分超过20%。此外,我们表明,通过Sem-GAN翻译的合成图像训练的语义分割模型导致比其他变体明显更好的分割结果。[1807.04409v1] 有时候FCN得分超过20%。此外,我们表明,通过Sem-GAN翻译的合成图像训练的语义分割模型导致比其他变体明显更好的分割结果。[1807.04409v1]



Sri Kalyan Yarlagadda,朱凤清

阴影是图像的常见方面,当未被发现时会影响场景理解和视觉处理。我们提出了一种基于反射率的简单而有效的方法来检测单个图像的阴影。首先对图像进行分割,并且基于反射率,照明和纹理特征,将片段对识别为阴影对和非阴影对。所提出的方法在两个公开可用且广泛使用的数据集上进行测试。尽管需要较少的参数,但与先前报道的方法相比,我们的方法在检测阴影方面实现更高 我们还通过重新点亮检测到的阴影区域中的像素来显示无阴影图像的结果。[1807.04352v1]



Varun ShenoyElizabeth FosterLauren AalamiBakar MajeedOliver Aalami

术后伤口并发症是医院,医生和患者费用的重要原因。因此,强烈期望一种诊断伤口并发症的有效方法。由于伤口部位外观的可变性,对伤口图像进行算法分类是一项艰巨的任务。卷积神经网络(CNN)是人工神经网络的一个子群,在分析视觉图像方面显示出巨大的希望,可以用来对手术伤口进行分类。我们提出了一个多标签的CNN集合,Deepwound,经过训练,仅使用图像像素和相应的标签作为输入来对伤口图像进行分类。我们的最终计算模型可以准确地识别九种标签的存在:引流,纤维蛋白渗出物,肉芽组织,手术部位感染,开放性伤口,钉,锭带和缝合线。我们的模型实现了接收器操作曲线(ROC)曲线下面积(AUC)得分,灵敏度,特异性和F1得分优于此领域的先前工作。智能手机提供了一种手段,可以提供无障碍伤口护理,因为它们越来越普遍。与深度神经网络配合使用,可提供临床洞察力,帮助外科医生在术后护理期间提供帮助。我们还为Deepwound提供了一个移动应用程序前端,帮助患者在舒适的家中跟踪伤口和手术恢复情况。[1807.04355v1] 它们提供了提供临床见解的能力,以帮助外科医生在术后护理期间。我们还为Deepwound提供了一个移动应用程序前端,帮助患者在舒适的家中跟踪伤口和手术恢复情况。[1807.04355v1] 它们提供了提供临床见解的能力,以帮助外科医生在术后护理期间。我们还为Deepwound提供了一个移动应用程序前端,帮助患者在舒适的家中跟踪伤口和手术恢复情况。[1807.04355v1]



Awais MansoorJuan J. CerrolazaGeovanny PerezElijah BiggsKazunori OkadaGustavo NinoMarius George Linguraru

已经为成人队列提出了用于胸部X射线照片(CXR)的肺野分割的计算机辅助诊断(CAD)技术,但很少用于儿科受试者。统计形状模型(SSM)是大多数最先进的基于CXR的肺野分割方法的主力,在儿科发育阶段不能有效地适应肺野的形状变化。我们工作的主要贡献是:(1CXR的通用肺野分割框架,适用于成人和儿科队列的大形状变异2)深度表示学习检测机制,\ emph {集合空间学习},用于鲁棒的对象定位3\ emph {边缘形状深度学习}用于形状变形参数估计。与传统SSM的迭代方法不同,所提出的形状学习机制将参数空间转换为可使用递归表示学习机制有效解决的边缘子空间。此外,我们的方法首先在基于CXR的肺部分割中包括具有挑战性的逆心脏区域,以进行准确的肺容量估计。该框架在3个月至89岁之间的668名患者的CXR上进行评估。我们得到的平均骰子相似系数为0.96美元\ pm0.03 $(包括复古心脏区域)。对于给定的精度,还发现所提出的方法比传统的基于SSM的迭代分割方法更快。所提出的通用框架的计算简单性可以类似地应用于其他可变形对象的快速分割。[1807.04339v1] 我们的方法是第一个在基于CXR的肺部分割中包含具有挑战性的逆行心脏区域,以进行准确的肺容量估计。该框架在3个月至89岁之间的668名患者的CXR上进行评估。我们得到的平均骰子相似系数为0.96美元\ pm0.03 $(包括复古心脏区域)。对于给定的精度,还发现所提出的方法比传统的基于SSM的迭代分割方法更快。所提出的通用框架的计算简单性可以类似地应用于其他可变形对象的快速分割。[1807.04339v1] 我们的方法是第一个在基于CXR的肺部分割中包含具有挑战性的逆行心脏区域,以进行准确的肺容量估计。该框架在3个月至89岁之间的668名患者的CXR上进行评估。我们得到的平均骰子相似系数为0.96美元\ pm0.03 $(包括复古心脏区域)。对于给定的精度,还发现所提出的方法比传统的基于SSM的迭代分割方法更快。所提出的通用框架的计算简单性可以类似地应用于其他可变形对象的快速分割。[1807.04339v1] 我们得到的平均骰子相似系数为0.96美元\ pm0.03 $(包括复古心脏区域)。对于给定的精度,还发现所提出的方法比传统的基于SSM的迭代分割方法更快。所提出的通用框架的计算简单性可以类似地应用于其他可变形对象的快速分割。[1807.04339v1] 我们得到的平均骰子相似系数为0.96美元\ pm0.03 $(包括复古心脏区域)。对于给定的精度,还发现所提出的方法比传统的基于SSM的迭代分割方法更快。所提出的通用框架的计算简单性可以类似地应用于其他可变形对象的快速分割。[1807.04339v1]



Jun Xu, Lei Zhang, David Zhang


转载请注明:《W-net:用于2D医学图像分割的桥接U-net + 对真实照片的卷积盲去噪 + Sem-GAN:语义一致的图像到图像转换