# 用于精确检测和识别场景中文本的Verisimilar图像合成+PARN：用于密集语义对应估计的金字塔仿射回归网络+自动上下文R-CNN

Deriving Neural Network Architectures using Precision Learning: Parallel-to-fan beam Conversion

Christopher Syben, Bernhard Stimpel, Jonathan Lommen, Tobias Würfl, Arnd Dörfler, Andreas Maier

In this paper, we derive a neural network architecture based on an analytical formulation of the parallel-to-fan beam conversion problem following the concept of precision learning. The network allows to learn the unknown operators in this conversion in a data-driven manner avoiding interpolation and potential loss of resolution. Integration of known operators results in a small number of trainable parameters that can be estimated from synthetic data only. The concept is evaluated in the context of Hybrid MRI/X-ray imaging where transformation of the parallel-beam MRI projections to fan-beam X-ray projections is required. The proposed method is compared to a traditional rebinning method. The results demonstrate that the proposed method is superior to ray-by-ray interpolation and is able to deliver sharper images using the same amount of parallel-beam input projections which is crucial for interventional applications. We believe that this approach forms a basis for further work uniting deep learning, signal processing, physics, and traditional pattern recognition. [1807.03057v1]

Vulnerability Analysis of Chest X-Ray Image Classification Against Adversarial Attacks

Saeid Asgari Taghanaki, Arkadeep Das, Ghassan Hamarneh

Recently, there have been several successful deep learning approaches for automatically classifying chest X-ray images into different disease categories. However, there is not yet a comprehensive vulnerability analysis of these models against the so-called adversarial perturbations/attacks, which makes deep models more trustful in clinical practices. In this paper, we extensively analyzed the performance of two state-of-the-art classification deep networks on chest X-ray images. These two networks were attacked by three different categories (nine methods in total) of adversarial methods (both white- and black-box), namely gradient-based, score-based, and decision-based attacks. Furthermore, we modified the pooling operations in the two classification networks to measure their sensitivities against different attacks, on the specific task of chest X-ray classification. [1807.02905v1]

Convolutional Recurrent Neural Networks for Blood Glucose Prediction

Kezhi Li, John Daniels, Pau Herrero-viñas, Chengyuan Liu, Pantelis Georgiou

The main purpose of the artificial pancreas (AP) or any diabetes therapy for subjects with type 1 diabetes (T1D) is to maintain the subjects’ plasma glucose level within the euglycemic range, which means below the threshold of hyperglycemia and above the threshold of hypoglycemia. The development of modern continuous glucose monitor (CGM) makes this feasible. Its continuous monitoring enables people to take actions before the hypo/hyperglycemia episodes. For this reason, an accurate blood glucose (BG) prediction is essential. It raises alarms before the real hypo/hyperglycemia scenarios and stays silent for non-hypo/non-hyperglycemia episodes. Data driven approaches have recently been widely used in statistical modelling in healthcare and medical researches, in particular deep neural network techniques. In this paper, the glucose prediction is seen as a probabilistically generative problem, and a hybrid deep neural network is proposed to combine the advantages of convolutional neural networks (CNN) and recurrent neural networks (RNN). Specifically, a multi-layer CNN is implemented as a feature extraction component, followed by a multi-layer modified long short term memory (LSTM) model to capture the probabilistic correlations between the future BG and historical BG level, meal information and insulin. The model is adaptive for the individual subject with T1D, and dropout layers are leveraged to avoid overfitting. The model is easily implemented using Tensorflow, and can be embedded to portable devices with limited computation resources. Experiments verify and evaluate the effectiveness of the proposed method using the simulated and clinical data. [1807.03043v1]

Approximate k-space models and Deep Learning for fast photoacoustic reconstruction

Andreas Hauptmann, Ben Cox, Felix Lucka, Nam Huynh, Marta Betcke, Paul Beard, Simon Arridge

We present a framework for accelerated iterative reconstructions using a fast and approximate forward model that is based on k-space methods for photoacoustic tomography. The approximate model introduces aliasing artefacts in the gradient information for the iterative reconstruction, but these artefacts are highly structured and we can train a CNN that can use the approximate information to perform an iterative reconstruction. We show feasibility of the method for human in-vivo measurements in a limited-view geometry. The proposed method is able to produce superior results to total variation reconstructions with a speed-up of 32 times. [1807.03191v1]

Image Restoration Using Conditional Random Fields and Scale Mixtures of Gaussians

This paper proposes a general framework for internal patch-based image restoration based on Conditional Random Fields (CRF). Unlike related models based on Markov Random Fields (MRF), our approach explicitly formulates the posterior distribution for the entire image. The potential functions are taken as proportional to the product of a likelihood and prior for each patch. By assuming identical parameters for similar patches, our approach can be classified as a model-based non-local method. For the prior term in the potential function of the CRF model, multivariate Gaussians and multivariate scale-mixture of Gaussians are considered, with the latter being a novel prior for image patches. Our results show that the proposed approach outperforms methods based on Gaussian mixture models for image denoising and state-of-the-art methods for image interpolation/inpainting. [1807.03027v1]

Pioneer Networks: Progressively Growing Generative Autoencoder

Ari Heljakka, Arno Solin, Juho Kannala

We introduce a novel generative autoencoder network model that learns to encode and reconstruct images with high quality and resolution, and supports smooth random sampling from the latent space of the encoder. Generative adversarial networks (GANs) are known for their ability to simulate random high-quality images, but they cannot reconstruct existing images. Previous works have attempted to extend GANs to support such inference but, so far, have not delivered satisfactory high-quality results. Instead, we propose the Progressively Growing Generative Autoencoder (PIONEER) network which achieves high-quality reconstruction with $128{\times}128$ images without requiring a GAN discriminator. We merge recent techniques for progressively building up the parts of the network with the recently introduced adversarial encoder-generator network. The ability to reconstruct input images is crucial in many real-world applications, and allows for precise intelligent manipulation of existing images. We show promising results in image synthesis and inference, with state-of-the-art results in CelebA inference tasks. [1807.03026v1]

External Patch-Based Image Restoration Using Importance Sampling

This paper introduces a new approach to patch-based image restoration based on external datasets and importance sampling. The Minimum Mean Squared Error (MMSE) estimate of the image patches, the computation of which requires solving a multidimensional (typically intractable) integral, is approximated using samples from an external dataset. The new method, which can be interpreted as a generalization of the external non-local means (NLM), uses self-normalized importance sampling to efficiently approximate the MMSE estimates. The use of self-normalized importance sampling endows the proposed method with great flexibility, namely regarding the statistical properties of the measurement noise. The effectiveness of the proposed method is shown in a series of experiments using both generic large-scale and class-specific external datasets. [1807.03018v1]

Fashion is Taking Shape: Understanding Clothing Preference Based on Body Shape From Online Sources

Hosnieh Sattar, Gerard Pons-Moll, Mario Fritz

To study the correlation between clothing garments and body shape, we collected a new dataset (Fashion Takes Shape), which includes images of users with clothing category annotations. We employ our multi-photo approach to estimate body shapes of each user and build a conditional model of clothing categories given body-shape. We demonstrate that in real-world data, clothing categories and body-shapes are correlated and show that our multi-photo approach leads to a better predictive model for clothing categories compared to models based on single-view shape estimates or manually annotated body types. We see our method as the first step towards the large-scale understanding of clothing preferences from body shape. [1807.03235v1]

Polarimetric Convolutional Network for PolSAR Image Classification

Xu Liu, Licheng Jiao, Xu Tang, Qigong Sun, Dan Zhang

The approaches for analyzing the polarimetric scattering matrix of polarimetric synthetic aperture radar (PolSAR) data have always been the focus of PolSAR image classification. Generally, the polarization coherent matrix and the covariance matrix obtained by the polarimetric scattering matrix only show a limited number of polarimetric information. In order to solve this problem, we propose a sparse scattering coding way to deal with polarimetric scattering matrix and obtain a close complete feature. This encoding mode can also maintain polarimetric information of scattering matrix completely. At the same time, in view of this encoding way, we design a corresponding classification algorithm based on convolution network to combine this feature. Based on sparse scattering coding and convolution neural network, the polarimetric convolutional network is proposed to classify PolSAR images by making full use of polarimetric information. We perform the experiments on the PolSAR images acquired by AIRSAR and RADARSAT-2 to verify the proposed method. The experimental results demonstrate that the proposed method get better results and has huge potential for PolSAR data classification. Source code for sparse scattering coding is available at https://github.com/liuxuvip/Polarimetric-Scattering-Coding. [1807.02975v1]

Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes

Fangneng Zhan, Shijian Lu, Chuhui Xue

The requirement of large amounts of annotated images has become one grand challenge while training deep neural network models for various visual detection and recognition tasks. This paper presents a novel image synthesis technique that aims to generate a large amount of annotated scene text images for training accurate and robust scene text detection and recognition models. The proposed technique consists of three innovative designs. First, it realizes “semantic coherent” synthesis by embedding texts at semantically sensible regions within the background image, where the semantic coherence is achieved by leveraging the semantic annotations of objects and image regions that have been created in the prior semantic segmentation research. Second, it exploits visual saliency to determine the embedding locations within each semantic sensible region, which coincides with the fact that texts are often placed around homogeneous regions for better visibility in scenes. Third, it designs an adaptive text appearance model that determines the color and brightness of embedded texts by learning from the feature of real scene text images adaptively. The proposed technique has been evaluated over five public datasets and the experiments show its superior performance in training accurate and robust scene text detection and recognition models. [1807.03021v1]

Flow Network Tracking for Spatiotemporal and Periodic Point Matching: Applied to Cardiac Motion Analysis

Nripesh Parajuli, Allen Lu, Kevinminh Ta, John C. Stendahl, Nabil Boutagy, Imran Alkhalil, Melissa Eberle, Geng-Shi Jeng, Maria Zontak, Matthew ODonnell, Albert J. Sinusas, James S. Duncan

The accurate quantification of left ventricular (LV) deformation/strain shows significant promise for quantitatively assessing cardiac function for use in diagnosis and therapy planning (Jasaityte et al., 2013). However, accurate estimation of the displacement of myocardial tissue and hence LV strain has been challenging due to a variety of issues, including those related to deriving tracking tokens from images and following tissue locations over the entire cardiac cycle. In this work, we propose a point matching scheme where correspondences are modeled as flow through a graphical network. Myocardial surface points are set up as nodes in the network and edges define neighborhood relationships temporally. The novelty lies in the constraints that are imposed on the matching scheme, which render the correspondences one-to-one through the entire cardiac cycle, and not just two consecutive frames. The constraints also encourage motion to be cyclic, which is an important characteristic of LV motion. We validate our method by applying it to the estimation of quantitative LV displacement and strain estimation using 8 synthetic and 8 open-chested canine 4D echocardiographic image sequences, the latter with sonomicrometric crystals implanted on the LV wall. We were able to achieve excellent tracking accuracy on the synthetic dataset and observed a good correlation with crystal-based strains on the in-vivo data. [1807.02951v1]

Pooling Pyramid Network for Object Detection

Pengchong Jin, Vivek Rathod, Xiangxin Zhu

We’d like to share a simple tweak of Single Shot Multibox Detector (SSD) family of detectors, which is effective in reducing model size while maintaining the same quality. We share box predictors across all scales, and replace convolution between scales with max pooling. This has two advantages over vanilla SSD: (1) it avoids score miscalibration across scales; (2) the shared predictor sees the training data over all scales. Since we reduce the number of predictors to one, and trim all convolutions between them, model size is significantly smaller. We empirically show that these changes do not hurt model quality compared to vanilla SSD. [1807.03284v1]

Human Activity Recognition in RGB-D Videos by Dynamic Images

Snehasis Mukherjee, Leburu Anvitha, T. Mohana Lahari

Human Activity Recognition in RGB-D videos has been an active research topic during the last decade. However, no efforts have been found in the literature, for recognizing human activity in RGB-D videos where several performers are performing simultaneously. In this paper we introduce such a challenging dataset with several performers performing the activities. We present a novel method for recognizing human activities in such videos. The proposed method aims in capturing the motion information of the whole video by producing a dynamic image corresponding to the input video. We use two parallel ResNext-101 to produce the dynamic images for the RGB video and depth video separately. The dynamic images contain only the motion information and hence, the unnecessary background information are eliminated. We send the two dynamic images extracted from the RGB and Depth videos respectively, through a fully connected layer of neural networks. The proposed dynamic image reduces the complexity of the recognition process by extracting a sparse matrix from a video. However, the proposed system maintains the required motion information for recognizing the activity. The proposed method has been tested on the MSR Action 3D dataset and has shown comparable performances with respect to the state-of-the-art. We also apply the proposed method on our own dataset, where the proposed method outperforms the state-of-the-art approaches. [1807.02947v1]

Video Summarisation by Classification with Deep Reinforcement Learning

Kaiyang Zhou, Tao Xiang, Andrea Cavallaro

Most existing video summarisation methods are based on either supervised or unsupervised learning. In this paper, we propose a reinforcement learning-based weakly supervised method that exploits easy-to-obtain, video-level category labels and encourages summaries to contain category-related information and maintain category recognisability. Specifically, We formulate video summarisation as a sequential decision-making process and train a summarisation network with deep Q-learning (DQSN). A companion classification network is also trained to provide rewards for training the DQSN. With the classification network, we develop a global recognisability reward based on the classification result. Critically, a novel dense ranking-based reward is also proposed in order to cope with the temporally delayed and sparse reward problems for long sequence reinforcement learning. Extensive experiments on two benchmark datasets show that the proposed approach achieves state-of-the-art performance. [1807.03089v1]

Multi-Scale Coarse-to-Fine Segmentation for Screening Pancreatic Ductal Adenocarcinoma

Zhuotun Zhu, Yingda Xia, Lingxi Xie, Elliot K. Fishman, Alan L. Yuille

This paper proposes an intuitive approach to finding pancreatic ductal adenocarcinoma (PDAC), the most common type of pancreatic cancer, by checking abdominal CT scans. Our idea is named segmentation-for-classification (S4C), which classifies a volume by checking if at least a sufficient number of voxels is segmented as the tumor. In order to deal with tumors with different scales, we train volumetric segmentation networks with multi-scale inputs, and test them in a coarse-to-fine flowchart. A post-processing module is used to filter out outliers and reduce false alarms. We perform a case study on our dataset containing 439 CT scans, in which 136 cases were diagnosed with PDAC and 303 cases are normal. Our approach reports a sensitivity of 94.1% at a specificity of 98.5%, with an average tumor segmentation accuracy of 56.46% over all PDAC cases. [1807.02941v1]

An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution

Rosanne Liu, Joel Lehman, Piero Molino, Felipe Petroski Such, Eric Frank, Alex Sergeev, Jason Yosinski

Few ideas have enjoyed as large an impact on deep learning as convolution. For any problem involving pixels or spatial representations, common intuition holds that convolutional neural networks may be appropriate. In this paper we show a striking counterexample to this intuition via the seemingly trivial coordinate transform problem, which simply requires learning a mapping between coordinates in (x,y) Cartesian space and one-hot pixel space. Although convolutional networks would seem appropriate for this task, we show that they fail spectacularly. We demonstrate and carefully analyze the failure first on a toy problem, at which point a simple fix becomes obvious. We call this solution CoordConv, which works by giving convolution access to its own input coordinates through the use of extra coordinate channels. Without sacrificing the computational and parametric efficiency of ordinary convolution, CoordConv allows networks to learn either perfect translation invariance or varying degrees of translation dependence, as required by the task. CoordConv solves the coordinate transform problem with perfect generalization and 150 times faster with 10–100 times fewer parameters than convolution. This stark contrast raises the question: to what extent has this inability of convolution persisted insidiously inside other tasks, subtly hampering performance from within? A complete answer to this question will require further investigation, but we show preliminary evidence that swapping convolution for CoordConv can improve models on a diverse set of tasks. Using CoordConv in a GAN produced less mode collapse as the transform between high-level spatial latents and pixels becomes easier to learn. A Faster R-CNN detection model trained on MNIST detection showed 24% better IOU when using CoordConv, and in the RL domain agents playing Atari games benefit significantly from the use of CoordConv layers. [1807.03247v1]

Deep Co-Clustering for Unsupervised Audiovisual Learning

Di Hu, Feiping Nie, Xuelong Li

The seen birds twitter, the running cars accompany with noise, people talks by face-to-face, etc. These naturally audiovisual correspondences provide the possibilities to explore and understand the outside world. However, the mixed multiple objects and sounds make it intractable to perform efficient matching in the unconstrained environment. To settle this problem, we propose to adequately excavate audio and visual components and perform elaborate correspondence learning among them. Concretely, a novel unsupervised audiovisual learning model is proposed, named as Deep Co-Clustering (DCC), that synchronously performs sets of clustering with multimodal vectors of convolutional maps in different shared spaces for capturing multiple audiovisual correspondences. And such integrated multimodal clustering network can be effectively trained with max-margin loss in the end-to-end fashion. Amounts of experiments in feature evaluation and audiovisual tasks are performed. The results demonstrate that DCC can learn effective unimodal representation, with which the classifier can even outperform human. Further, DCC shows noticeable performance in the task of sound localization, multisource detection, and audiovisual understanding. [1807.03094v1]

Attention to Refine through Multi-Scales for Semantic Segmentation

Shiqi Yang, Gang Peng

This paper proposes a novel attention model for semantic segmentation, which aggregates multi-scale and context features to refine prediction. Specifically, the skeleton convolutional neural network framework takes in multiple different scales inputs, by which means the CNN can get representations in different scales. The proposed attention model will handle the features from different scale streams respectively and integrate them. Then location attention branch of the model learns to softly weight the multi-scale features at each pixel location. Moreover, we add an recalibrating branch, parallel to where location attention comes out, to recalibrate the score map per class. We achieve quite competitive results on PASCAL VOC 2012 and ADE20K datasets, which surpass baseline and related works. [1807.02917v1]

Generating objects going well with the surroundings

Jeesoo Kim, Jaeyoung Yoo, Jangho Kim, Nojun Kwak

Since the generative adversarial network has made a breakthrough in the image generation problem, lots of researches on its applications have been studied such as image restoration, style transfer and image completion. However, there have been few researches generating objects in uncontrolled real-world environments. In this paper, we propose a novel approach for image generation in real-world scenes. The overall architecture consists of two different networks each of which completes the shape of the generating object and paints the context on it respectively. Using a subnetwork proposed in a precedent work of image completion, our model make the shape of an object. Unlike the approaches used in the image completion problem, details of trained objects are encoded into a latent variable by an additional subnetwork, resulting in a better quality of the generated objects. We evaluated our method using KITTI and City-scape datasets, which are widely used for object detection and image segmentation problems. The adequacy of the generated images by the proposed method has also been evaluated using a widely utilized object detection algorithm. [1807.02925v1]

Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector

Jia-Xing Zhong, Nannan Li, Weijie Kong, Tao Zhang, Thomas H. Li, Ge Li

Weakly supervised temporal action detection is a Herculean task in understanding untrimmed videos, since no supervisory signal except the video-level category label is available on training data. Under the supervision of category labels, weakly supervised detectors are usually built upon classifiers. However, there is an inherent contradiction between classifier and detector; i.e., a classifier in pursuit of high classification performance prefers top-level discriminative video clips that are extremely fragmentary, whereas a detector is obliged to discover the whole action instance without missing any relevant snippet. To reconcile this contradiction, we train a detector by driving a series of classifiers to find new actionness clips progressively, via step-by-step erasion from a complete video. During the test phase, all we need to do is to collect detection results from the one-by-one trained classifiers at various erasing steps. To assist in the collection process, a fully connected conditional random field is established to refine the temporal localization outputs. We evaluate our approach on two prevailing datasets, THUMOS’14 and ActivityNet. The experiments show that our detector advances state-of-the-art weakly supervised temporal action detection results, and even compares with quite a few strongly supervised methods. [1807.02929v1]

PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence Estimation

Sangryul Jeon, Seungryong Kim, Dongbo Min, Kwanghoon Sohn

This paper presents a deep architecture for dense semantic correspondence, called pyramidal affine regression networks (PARN), that estimates pixel-varying affine transformation fields across images. To deal with intra-class appearance and shape variations that commonly exist among different instances within the same object category, we leverage a pyramidal model where dense affine transformation fields are progressively estimated in a coarse-to-fine manner so that the smoothness constraint is naturally imposed within deep networks. PARN estimates residual affine transformations at each level and composes them to estimate final affine transformations. Furthermore, to overcome the limitations of insufficient training data for semantic correspondence, we propose a novel weakly-supervised training scheme that generates progressive supervisions by leveraging the correspondence consistency across images. Our method is fully learnable in an end-to-end manner and does not require quantizing infinite continuous affine transformation fields. To the best of our knowledge, it is the first work that attempts to estimate dense affine transformation fields in a coarse-to-fine manner within deep networks. Experimental results demonstrate that PARN outperforms the state-of-the-art methods for dense semantic correspondence on various benchmarks. [1807.02939v1]

Partial Policy-based Reinforcement Learning for Anatomical Landmark Localization in 3D Medical Images

Walid Abdullah Al, Il Dong Yun

Deploying the idea of long-term cumulative return, reinforcement learning has shown remarkable performance in various fields. We propose a formulation of the landmark localization in 3D medical images as a reinforcement learning problem. Whereas value-based methods have been widely used to solve similar problems, we adopt an actor-critic based direct policy search method framed in a temporal difference learning approach. Successful behavior learning is challenging in large state and/or action spaces, requiring many trials. We introduce a partial policy-based reinforcement learning to enable solving the large problem of localization by learning the optimal policy on smaller partial domains. Independent actors efficiently learn the corresponding partial policies, each utilizing their own independent critic. The proposed policy reconstruction from the partial policies ensures a robust and efficient localization utilizing the sub-agents solving simple binary decision problems in their corresponding partial action spaces. The proposed reinforcement learning requires a small number of trials to learn the optimal behavior compared with the original behavior learning scheme. [1807.02908v1]

Automatic multi-objective based feature selection for classification

Zhiguo Zhou, Shulong Li, Genggeng Qin, Michael Folkert, Steve Jiang, Jing Wang

Accurately classifying malignancy of lesions detected in a screening scan plays a critical role in reducing false positives. Through extracting and analyzing a large numbers of quantitative image features, radiomics holds great potential to differentiate the malignant tumors from benign ones. Since not all radiomic features contribute to an effective classifying model, selecting an optimal feature subset is critical. This work proposes a new multi-objective based feature selection (MO-FS) algorithm that considers both sensitivity and specificity simultaneously as the objective functions during the feature selection. In MO-FS, we developed a modified entropy based termination criterion (METC) to stop the algorithm automatically rather than relying on a preset number of generations. We also designed a solution selection methodology for multi-objective learning using the evidential reasoning approach (SMOLER) to automatically select the optimal solution from the Pareto-optimal set. Furthermore, an adaptive mutation operation was developed to generate the mutation probability in MO-FS automatically. The MO-FS was evaluated for classifying lung nodule malignancy in low-dose CT and breast lesion malignancy in digital breast tomosynthesis. Compared with other commonly used feature selection methods, the experimental results for both lung nodule and breast lesion malignancy classification demonstrated that the feature set by selected MO-FS achieved better classification performance. [1807.03236v1]

ChestNet: A Deep Neural Network for Classification of Thoracic Diseases on Chest Radiography

Hongyu Wang, Yong Xia

Computer-aided techniques may lead to more accurate and more acces-sible diagnosis of thorax diseases on chest radiography. Despite the success of deep learning-based solutions, this task remains a major challenge in smart healthcare, since it is intrinsically a weakly supervised learning problem. In this paper, we incorporate the attention mechanism into a deep convolutional neural network, and thus propose the ChestNet model to address effective diagnosis of thorax diseases on chest radiography. This model consists of two branches: a classification branch serves as a uniform feature extraction-classification network to free users from troublesome handcrafted feature extraction, and an attention branch exploits the correlation between class labels and the locations of patholog-ical abnormalities and allows the model to concentrate adaptively on the patholog-ically abnormal regions. We evaluated our model against three state-of-the-art deep learning models on the Chest X-ray 14 dataset using the official patient-wise split. The results indicate that our model outperforms other methods, which use no extra training data, in diagnosing 14 thorax diseases on chest radiography. [1807.03058v1]

Exploring Brain-wide Development of Inhibition through Deep Learning

Asim Iqbal, Asfandyar Sheikh, Theofanis Karayannis

We introduce here a fully automated convolutional neural network-based method for brain image processing to Detect Neurons in different brain Regions during Development (DeNeRD). Our method takes a developing mouse brain as input and i) registers the brain sections against a developing mouse reference atlas, ii) detects various types of neurons, and iii) quantifies the neural density in many unique brain regions at different postnatal (P) time points. Our method is invariant to the shape, size and expression of neurons and by using DeNeRD, we compare the brain-wide neural density of all GABAergic neurons in developing brains of ages P4, P14 and P56. We discover and report 6 different clusters of regions in the mouse brain in which GABAergic neurons develop in a differential manner from early age (P4) to adulthood (P56). These clusters reveal key steps of GABAergic cell development that seem to track with the functional development of diverse brain regions as the mouse transitions from a passive receiver of sensory information (<P14) to an active seeker (>P14). [1807.03238v1]

Automatic Classification of Defective Photovoltaic Module Cells in Electroluminescence Images

Sergiu Deitsch, Vincent Christlein, Stephan Berger, Claudia Buerhop-Lutz, Andreas Maier, Florian Gallwitz, Christian Riess

Electroluminescence (EL) imaging is a useful modality for the inspection of photovoltaic (PV) modules. EL images provide high spatial resolution, which makes it possible to detect even finest defects on the surface of PV modules. However, the analysis of EL images is typically a manual process that is expensive, time-consuming, and requires expert knowledge of many different types of defects. In this work, we investigate two approaches for automatic detection of such defects in a single image of a PV cell. The approaches differ in their hardware requirements, which are dictated by their respective application scenarios. The more hardware-efficient approach is based on hand-crafted features that are classified in a Support Vector Machine (SVM). To obtain a strong performance, we investigate and compare various processing variants. The more hardware-demanding approach uses an end-to-end deep Convolutional Neural Network (CNN) that runs on a Graphics Processing Unit (GPU). Both approaches are trained on 1,968 cells extracted from high resolution EL intensity images of mono- and polycrystalline PV modules. The CNN is more accurate, and reaches an average accuracy of 88.42%. The SVM achieves a slightly lower average accuracy of 82.44%, but can run on arbitrary hardware. Both automated approaches make continuous, highly accurate monitoring of PV cells feasible. [1807.02894v1]

Deeply Supervised Rotation Equivariant Network for Lesion Segmentation in Dermoscopy Images

Xiaomeng Li, Lequan Yu, Chi-Wing Fu, Pheng-Ann Heng

Automatic lesion segmentation in dermoscopy images is an essential step for computer-aided diagnosis of melanoma. The dermoscopy images exhibits rotational and reflectional symmetry, however, this geometric property has not been encoded in the state-of-the-art convolutional neural networks based skin lesion segmentation methods. In this paper, we present a deeply supervised rotation equivariant network for skin lesion segmentation by extending the recent group rotation equivariant network~\cite{cohen2016group}. Specifically, we propose the G-upsampling and G-projection operations to adapt the rotation equivariant classification network for our skin lesion segmentation problem. To further increase the performance, we integrate the deep supervision scheme into our proposed rotation equivariant segmentation architecture. The whole framework is equivariant to input transformations, including rotation and reflection, which improves the network efficiency and thus contributes to the segmentation performance. We extensively evaluate our method on the ISIC 2017 skin lesion challenge dataset. The experimental results show that our rotation equivariant networks consistently excel the regular counterparts with the same model complexity under different experimental settings. Our best model achieves 77.23\%(JA) on the test dataset, outperforming the state-of-the-art challenging methods and further demonstrating the effectiveness of our proposed deeply supervised rotation equivariant segmentation network. Our best model also outperforms the state-of-the-art challenging methods, which further demonstrate the effectiveness of our proposed deeply supervised rotation equivariant segmentation network. [1807.02804v1]

Distillation Techniques for Pseudo-rehearsal Based Incremental Learning

Haseeb Shah, Khurram Javed, Faisal Shafait

The ability to learn from incrementally arriving data is essential for any life-long learning system. However, standard deep neural networks forget the knowledge about the old tasks, a phenomenon called catastrophic forgetting, when trained on incrementally arriving data. We discuss the biases in current Generative Adversarial Networks (GAN) based approaches that learn the classifier by knowledge distillation from previously trained classifiers. These biases cause the trained classifier to perform poorly. We propose an approach to remove these biases by distilling knowledge from the classifier of AC-GAN. Experiments on MNIST and CIFAR10 show that this method is comparable to current state of the art rehearsal based approaches. The code for this paper is available at this $\href{https://github.com/haseebs/Pseudo-rehearsal-Incremental-Learning}{link}$. [1807.02799v1]

Auto-Context R-CNN

Bo Li, Tianfu Wu, Lun Zhang, Rufeng Chu

Region-based convolutional neural networks (R-CNN)~\cite{fast_rcnn,faster_rcnn,mask_rcnn} have largely dominated object detection. Operators defined on RoIs (Region of Interests) play an important role in R-CNNs such as RoIPooling~\cite{fast_rcnn} and RoIAlign~\cite{mask_rcnn}. They all only utilize information inside RoIs for RoI prediction, even with their recent deformable extensions~\cite{deformable_cnn}. Although surrounding context is well-known for its importance in object detection, it has yet been integrated in R-CNNs in a flexible and effective way. Inspired by the auto-context work~\cite{auto_context} and the multi-class object layout work~\cite{nms_context}, this paper presents a generic context-mining RoI operator (i.e., \textit{RoICtxMining}) seamlessly integrated in R-CNNs, and the resulting object detection system is termed \textbf{Auto-Context R-CNN} which is trained end-to-end. The proposed RoICtxMining operator is a simple yet effective two-layer extension of the RoIPooling or RoIAlign operator. Centered at an object-RoI, it creates a $3\times 3$ layout to mine contextual information adaptively in the $8$ surrounding context regions on-the-fly. Within each of the $8$ context regions, a context-RoI is mined in term of discriminative power and its RoIPooling / RoIAlign features are concatenated with the object-RoI for final prediction. \textit{The proposed Auto-Context R-CNN is robust to occlusion and small objects, and shows promising vulnerability for adversarial attacks without being adversarially-trained.} In experiments, it is evaluated using RoIPooling as the backbone and shows competitive results on Pascal VOC, Microsoft COCO, and KITTI datasets (including $6.9\%$ mAP improvements over the R-FCN~\cite{rfcn} method on COCO \textit{test-dev} dataset and the first place on both KITTI pedestrian and cyclist detection as of this submission). [1807.02842v1]

Learning The Sequential Temporal Information with Recurrent Neural Networks

Pushparaja Murugan

Recurrent Networks are one of the most powerful and promising artificial neural network algorithms to processing the sequential data such as natural languages, sound, time series data. Unlike traditional feed-forward network, Recurrent Network has a inherent feed back loop that allows to store the temporal context information and pass the state of information to the entire sequences of the events. This helps to achieve the state of art performance in many important tasks such as language modeling, stock market prediction, image captioning, speech recognition, machine translation and object tracking etc., However, training the fully connected RNN and managing the gradient flow are the complicated process. Many studies are carried out to address the mentioned limitation. This article is intent to provide the brief details about recurrent neurons, its variances and trips & tricks to train the fully recurrent neural network. This review work is carried out as a part of our IPO studio software module ‘Multiple Object Tracking’. [1807.02857v1]

Semi-parametric Image Inpainting

Karim Iskakov

This paper introduces a semi-parametric approach to image inpainting for irregular holes. The nonparametric part consists of an external image database. During test time database is used to retrieve a supplementary image, similar to the input masked picture, and utilize it as auxiliary information for the deep neural network. Further, we propose a novel method of generating masks with irregular holes and present public dataset with such masks. Experiments on CelebA-HQ dataset show that our semi-parametric method yields more realistic results than previous approaches, which is confirmed by the user study. [1807.02855v1]

Detecting Synapse Location and Connectivity by Signed Proximity Estimation and Pruning with Deep Nets

Toufiq Parag, Daniel Berger, Lee Kamentsky, Benedikt Staffler, Donlai Wei, Moritz Helmstaedter, Jeff W. Lichtman, Hanspeter Pfister

Synaptic connectivity detection is a critical task for neural reconstruction from Electron Microscopy (EM) data. Most of the existing algorithms for synapse detection do not identify the cleft location and direction of connectivity simultaneously. The few methods that computes direction along with contact location have only been demonstrated to work on either dyadic (most common in vertebrate brain) or polyadic (found in fruit fly brain) synapses, but not on both types. In this paper, we present an algorithm to automatically predict the location as well as the direction of both dyadic and polyadic synapses. The proposed algorithm first generates candidate synaptic connections from voxelwise predictions of signed proximity generated by a 3D U-net. A second 3D CNN then prunes the set of candidates to produce the final detection of cleft and connectivity orientation. Experimental results demonstrate that the proposed method outperforms the existing methods for determining synapses in both rodent and fruit fly brain. [1807.02739v1]

Hierarchical Stochastic Graphlet Embedding for Graph-based Pattern Recognition

Anjan Dutta, Pau Riba, Josep Lladós, Alicia Fornés

Despite being very successful within the pattern recognition and machine learning community, graph-based methods are often unusable with many machine learning tools. This is because of the incompatibility of most of the mathematical operations in graph domain. Graph embedding has been proposed as a way to tackle these difficulties, which maps graphs to a vector space and makes the standard machine learning techniques applicable for them. However, it is well known that graph embedding techniques usually suffer from the loss of structural information. In this paper, given a graph, we consider its hierarchical structure for mapping it into a vector space. The hierarchical structure is constructed by topologically clustering the graph nodes, and considering each cluster as a node in the upper hierarchical level. Once this hierarchical structure of graph is constructed, we consider its various configurations of its parts, and use stochastic graphlet embedding (SGE) for mapping them into vector space. Broadly speaking, SGE produces a distribution of uniformly sampled low to high order graphlets as a way to embed graphs into the vector space. In what follows, the coarse-to-fine structure of a graph hierarchy and the statistics fetched through the distribution of low to high order stochastic graphlets complements each other and include important structural information with varied contexts. Altogether, these two techniques substantially cope with the usual information loss involved in graph embedding techniques, and it is not a surprise that we obtain more robust vector space embedding of graphs. This fact has been corroborated through a detailed experimental evaluation on various benchmark graph datasets, where we outperform the state-of-the-art methods. [1807.02839v1]

Data-driven Upsampling of Point Clouds

Wentai Zhang, Haoliang Jiang, Zhangsihao Yang, Soji Yamakawa, Kenji Shimada, Levent Burak Kara

High quality upsampling of sparse 3D point clouds is critically useful for a wide range of geometric operations such as reconstruction, rendering, meshing, and analysis. In this paper, we propose a data-driven algorithm that enables an upsampling of 3D point clouds without the need for hard-coded rules. Our approach uses a deep network with Chamfer distance as the loss function, capable of learning the latent features in point clouds belonging to different object categories. We evaluate our algorithm across different amplification factors, with upsampling learned and performed on objects belonging to the same category as well as different categories. We also explore the desirable characteristics of input point clouds as a function of the distribution of the point samples. Finally, we demonstrate the performance of our algorithm in single-category training versus multi-category training scenarios. The final proposed model is compared against a baseline, optimization-based upsampling method. Results indicate that our algorithm is capable of generating more uniform and accurate upsamplings. [1807.02740v1]

Revisiting Distillation and Incremental Classifier Learning

Khurram Javed, Faisal Shafait

One of the key differences between the learning mechanism of humans and Artificial Neural Networks (ANNs) is the ability of humans to learn one task at a time. ANNs, on the other hand, can only learn multiple tasks simultaneously. Any attempts at learning new tasks incrementally cause them to completely forget about previous tasks. This lack of ability to learn incrementally, called Catastrophic Forgetting, is considered a major hurdle in building a true AI system. In this paper, our goal is to isolate the truly effective existing ideas for incremental learning from those that only work under certain conditions. To this end, we first thoroughly analyze the current state of the art (iCaRL) method for incremental learning and demonstrate that the good performance of the system is not because of the reasons presented in the existing literature. We conclude that the success of iCaRL is primarily due to knowledge distillation and recognize a key limitation of knowledge distillation, i.e, it often leads to bias in classifiers. Finally, we propose a dynamic threshold moving algorithm that is able to successfully remove this bias. We demonstrate the effectiveness of our algorithm on CIFAR100 and MNIST datasets showing near-optimal results. Our implementation is available at https://github.com/Khurramjaved96/incremental-learning. [1807.02802v1]

Spatio-Temporal Instance Learning: Action Tubes from Class Supervision

Pascal Mettes, Cees G. M. Snoek

The goal of this paper is spatio-temporal localization of human actions from their class labels only. The state-of-the-art casts the problem as Multiple Instance Learning, where the instances are a priori computed action proposals. Rather than disconnecting the localization from the learning, we propose a variant of Multiple Instance Learning that integrates the spatio-temporal localization during the learning. We make three contributions. First, we define model assumptions tailored to actions and propose a latent instance learning objective allowing for optimization at the box-level. Second, we propose a spatio-temporal box linking algorithm, exploiting box proposals from off-the-shelf person detectors, suitable for weakly-supervised learning. Third, we introduce tube- and video-level refinements at inference time to integrate long-term spatio-temporal action characteristics. Our experiments on three video datasets show the benefits of our contributions as well as its competitive results compared to state-of-the-art alternatives that localize actions from their class label only. Finally, our algorithm enables incorporating point and box supervision, allowing to benchmark, mix, and balance action localization performance versus annotation time. [1807.02800v1]

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, Yun Fu

Convolutional neural network (CNN) depth is of crucial importance for image super-resolution (SR). However, we observe that deeper networks for image SR are more difficult to train. The low-resolution inputs and features contain abundant low-frequency information, which is treated equally across channels, hence hindering the representational ability of CNNs. To solve these problems, we propose the very deep residual channel attention networks (RCAN). Specifically, we propose a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections. Each residual group contains some residual blocks with short skip connections. Meanwhile, RIR allows abundant low-frequency information to be bypassed through multiple skip connections, making the main network focus on learning high-frequency information. Furthermore, we propose a channel attention mechanism to adaptively rescale channel-wise features by considering interdependencies among channels. Extensive experiments show that our RCAN achieves better accuracy and visual improvements against state-of-the-art methods. [1807.02758v1]

Real-time stereo vision-based lane detection system

Rui Fan, Naim Dahnoun

The detection of multiple curved lane markings on a non-flat road surface is still a challenging task for automotive applications. To make an improvement, the depth information can be used to greatly enhance the robustness of the lane detection systems. The proposed system in this paper is developed from our previous work where the dense vanishing point Vp is estimated globally to assist the detection of multiple curved lane markings. However, the outliers in the optimal solution may severely affect the accuracy of the least squares fitting when estimating Vp. Therefore, in this paper we use Random Sample Consensus to update the inliers and outliers iteratively until the fraction of the number of inliers versus the total number exceeds our pre-set threshold. This significantly helps the system to overcome some suddenly changing conditions. Furthermore, we propose a novel lane position validation approach which provides a piecewise weight based on Vp and the gradient to reduce the gradient magnitude of the non-lane candidates. Then, we compute the energy of each possible solution and select all satisfying lane positions for visualisation. The proposed system is implemented on a heterogeneous system which consists of an Intel Core i7-4720HQ CPU and a NVIDIA GTX 970M GPU. A processing speed of 143 fps has been achieved, which is over 38 times faster than our previous work. Also, in order to evaluate the detection precision, we tested 2495 frames with 5361 lanes from the KITTI database (1637 lanes more than our previous experiment). It is shown that the overall successful detection rate is improved from 98.7% to 99.5%. [1807.02752v1]

Tournament Based Ranking CNN for the Cataract grading

Dohyeun Kim, Tae Joon Jun, Daeyoung Kim, Youngsub Eom

Solving the classification problem, unbalanced number of dataset among the classes often causes performance degradation. Especially when some classes dominate the other classes with its large number of datasets, trained model shows low performance in identifying the dominated classes. This is common case when it comes to medical dataset. Because the case with a serious degree is not quite usual, there are imbalance in number of dataset between severe case and normal cases of diseases. Also, there is difficulty in precisely identifying grade of medical data because of vagueness between them. To solve these problems, we propose new architecture of convolutional neural network named Tournament based Ranking CNN which shows remarkable performance gain in identifying dominated classes while trading off very small accuracy loss in dominating classes. Our Approach complemented problems that occur when method of Ranking CNN that aggregates outputs of multiple binary neural network models is applied to medical data. By having tournament structure in aggregating method and using very deep pretrained binary models, our proposed model recorded 68.36% of exact match accuracy, while Ranking CNN recorded 53.40%, pretrained Resnet recorded 56.12% and CNN with linear regression recorded 57.48%. As a result, our proposed method is applied efficiently to cataract grading which have ordinal labels with imbalanced number of data among classes, also can be applied further to medical problems which have similar features to cataract and similar dataset configuration. [1807.02657v1]

One-shot Texture Segmentation

Ivan Ustyuzhaninov, Claudio Michaelis, Wieland Brendel, Matthias Bethge

We introduce one-shot texture segmentation: the task of segmenting an input image containing multiple textures given a patch of a reference texture. This task is designed to turn the problem of texture-based perceptual grouping into an objective benchmark. We show that it is straight-forward to generate large synthetic data sets for this task from a relatively small number of natural textures. In particular, this task can be cast as a self-supervised problem thereby alleviating the need for massive amounts of manually annotated data necessary for traditional segmentation tasks. In this paper we introduce and study two concrete data sets: a dense collage of textures (CollTex) and a cluttered texturized Omniglot data set. We show that a baseline model trained on these synthesized data is able to generalize to natural images and videos without further fine-tuning, suggesting that the learned image representations are useful for higher-level vision tasks. [1807.02654v1]

Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery

Seyed Majid Azimi, Eleonora Vig, Reza Bahmanyar, Marco Körner, Peter Reinartz

Automatic multi-class object detection in remote sensing images in unconstrained scenarios is of high interest for several applications including traffic monitoring and disaster management. %a crucial and needed tool. The huge variation in object scale, orientation, category, and complex backgrounds, as well as the different camera sensors pose great challenges for current algorithms. In this work, we propose a new method consisting of a novel joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features. These features are fed into rotation-based region proposal and region of interest networks to produce object detections. Finally, rotational non-maximum suppression is applied to remove redundant detections. During training, we minimize joint horizontal and oriented bounding box loss functions, as well as a novel loss that enforces oriented boxes to be rectangular. Our method achieves 68.16\% mAP on horizontal and 72.45\% mAP on oriented bounding box detection tasks on the challenging new DOTA dataset, outperforming all published methods by a large margin ($+6$\% and $+12$\% absolute improvement, respectively). % whereas the best results in the leader-board are 54.13\% and 60.46\%. Furthermore, it generalizes to two other datasets, NWPU VHR-10 and UCAS-AOD, and achieves competitive results with the baselines even when trained on DOTA. Our method can be deployed in multi-class object detection applications, regardless of the image and object scales and orientations, making it a great choice for unconstrained aerial and satellite imagery. [1807.02700v1]

DeepSource: Point Source Detection using Deep Learning

A. Vafaei Sadr, Etienne. E. Vos, Bruce A. Bassett, Zafiirah Hosenie, N. Oozeer, Michelle Lochner

Point source detection at low signal-to-noise is challenging for astronomical surveys, particularly in radio interferometry images where the noise is correlated. Machine learning is a promising solution, allowing the development of algorithms tailored to specific telescope arrays and science cases. We present DeepSource – a deep learning solution – that uses convolutional neural networks to achieve these goals. DeepSource enhances the Signal-to-Noise Ratio (SNR) of the original map and then uses dynamic blob detection to detect sources. Trained and tested on two sets of 500 simulated 1 deg x 1 deg MeerKAT images with a total of 300,000 sources, DeepSource is essentially perfect in both purity and completeness down to SNR = 4 and outperforms PyBDSF in all metrics. For uniformly-weighted images it achieves a Purity x Completeness (PC) score at SNR = 3 of 0.73, compared to 0.31 for the best PyBDSF model. For natural-weighting we find a smaller improvement of ~40% in the PC score at SNR = 3. If instead we ask where either of the purity or completeness first drop to 90%, we find that DeepSource reaches this value at SNR = 3.6 compared to the 4.3 of PyBDSF (natural-weighting). A key advantage of DeepSource is that it can learn to optimally trade off purity and completeness for any science case under consideration. Our results show that deep learning is a promising approach to point source detection in astronomical images. [1807.02701v1]

Synthetic Sampling for Multi-Class Malignancy Prediction

Matthew Yung, Eli T. Brown, Alexander Rasin, Jacob D. Furst, Daniela S. Raicu

We explore several oversampling techniques for an imbalanced multi-label classification problem, a setting often encountered when developing models for Computer-Aided Diagnosis (CADx) systems. While most CADx systems aim to optimize classifiers for overall accuracy without considering the relative distribution of each class, we look into using synthetic sampling to increase per-class performance when predicting the degree of malignancy. Using low-level image features and a random forest classifier, we show that using synthetic oversampling techniques increases the sensitivity of the minority classes by an average of 7.22% points, with as much as a 19.88% point increase in sensitivity for a particular minority class. Furthermore, the analysis of low-level image feature distributions for the synthetic nodules reveals that these nodules can provide insights on how to preprocess image data for better classification performance or how to supplement the original datasets when more data acquisition is feasible. [1807.02608v1]

Representing a Partially Observed Non-Rigid 3D Human Using Eigen-Texture and Eigen-Deformation

Ryosuke Kimura, Akihiko Sayo, Fabian Lorenzo Dayrit, Yuta Nakashima, Hiroshi Kawasaki, Ambrosio Blanco, Katsushi Ikeuchi

Reconstruction of the shape and motion of humans from RGB-D is a challenging problem, receiving much attention in recent years. Recent approaches for full-body reconstruction use a statistic shape model, which is built upon accurate full-body scans of people in skin-tight clothes, to complete invisible parts due to occlusion. Such a statistic model may still be fit to an RGB-D measurement with loose clothes but cannot describe its deformations, such as clothing wrinkles. Observed surfaces may be reconstructed precisely from actual measurements, while we have no cues for unobserved surfaces. For full-body reconstruction with loose clothes, we propose to use lower dimensional embeddings of texture and deformation referred to as eigen-texturing and eigen-deformation, to reproduce views of even unobserved surfaces. Provided a full-body reconstruction from a sequence of partial measurements as 3D meshes, the texture and deformation of each triangle are then embedded using eigen-decomposition. Combined with neural-network-based coefficient regression, our method synthesizes the texture and deformation from arbitrary viewpoints. We evaluate our method using simulated data and visually demonstrate how our method works on real data. [1807.02632v1]

Video Prediction with Appearance and Motion Conditions

Yunseok Jang, Gunhee Kim, Yale Song

Video prediction aims to generate realistic future frames by learning dynamic visual patterns. One fundamental challenge is to deal with future uncertainty: How should a model behave when there are multiple correct, equally probable future? We propose an Appearance-Motion Conditional GAN to address this challenge. We provide appearance and motion information as conditions that specify how the future may look like, reducing the level of uncertainty. Our model consists of a generator, two discriminators taking charge of appearance and motion pathways, and a perceptual ranking module that encourages videos of similar conditions to look similar. To train our model, we develop a novel conditioning scheme that consists of different combinations of appearance and motion conditions. We evaluate our model using facial expression and human action datasets and report favorable results compared to existing methods. [1807.02635v1]

Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry

Nan Yang, Rui Wang, Jörg Stückler, Daniel Cremers

Monocular visual odometry approaches that purely rely on geometric cues are prone to scale drift and require sufficient motion parallax in successive frames for motion estimation and 3D reconstruction. In this paper, we propose to leverage deep monocular depth prediction to overcome limitations of geometry-based monocular visual odometry. To this end, we incorporate deep depth predictions into Direct Sparse Odometry as direct virtual stereo measurements. For depth prediction, we design a novel deep network that refines predicted depth from a single image in a two-stage process. We train our network in a semi-supervised way on photoconsistency in stereo images and on consistency with accurate sparse depth reconstructions from Stereo DSO. Our deep predictions excel state-of-the-art approaches for monocular depth on the KITTI benchmark. Moreover, our Deep Virtual Stereo Odometry clearly exceeds previous monocular and deep-learning based methods in accuracy. It even achieves comparable performance to the state-of-the-art stereo methods, while only relying on a single camera. [1807.02570v1]

Generative Probabilistic Novelty Detection with Adversarial Autoencoders

Stanislav Pidhorskyi, Ranya Almohsen, Donald A Adjeroh, Gianfranco Doretto

Novelty detection is the problem of identifying whether a new data point is considered to be an inlier or an outlier. We assume that training data is available to describe only the inlier distribution. Recent approaches primarily leverage deep encoder-decoder network architectures to compute a reconstruction error that is used to either compute a novelty score or to train a one-class classifier. While we too leverage a novel network of that kind, we take a probabilistic approach and effectively compute how likely is that a sample was generated by the inlier distribution. We achieve this with two main contributions. First, we make the computation of the novelty probability feasible because we linearize the parameterized manifold capturing the underlying structure of the inlier distribution, and show how the probability factorizes and can be computed with respect to local coordinates of the manifold tangent space. Second, we improved the training of the autoencoder network. An extensive set of results show that the approach achieves state-of-the-art results on several benchmark datasets. [1807.02588v1]

Parallel Convolutional Networks for Image Recognition via a Discriminator

Shiqi Yang, Gang Peng

In this paper, we introduce a simple but quite effective recognition framework dubbed D-PCN, aiming at enhancing feature extracting ability of CNN. The framework consists of two parallel CNNs, a discriminator and an extra classifier which takes integrated features from parallel networks and gives final prediction. The discriminator is core which drives parallel networks to focus on different regions and learn different representations. The corresponding training strategy is introduced to ensures utilization of discriminator. We validate D-PCN with several CNN models on benchmark datasets: CIFAR-100, and ImageNet, D-PCN enhances all models. In particular it yields state of the art performance on CIFAR-100 compared with related works. We also conduct visualization experiment on fine-grained Stanford Dogs dataset to verify our motivation. Additionally, we apply D-PCN for segmentation on PASCAL VOC 2012 and also find promotion. [1807.02265v2]

Fast and Accurate Point Cloud Registration using Trees of Gaussian Mixtures

Ben Eckart, Kihwan Kim, Jan Kautz

Point cloud registration sits at the core of many important and challenging 3D perception problems including autonomous navigation, SLAM, object/scene recognition, and augmented reality. In this paper, we present a new registration algorithm that is able to achieve state-of-the-art speed and accuracy through its use of a hierarchical Gaussian Mixture Model (GMM) representation. Our method constructs a top-down multi-scale representation of point cloud data by recursively running many small-scale data likelihood segmentations in parallel on a GPU. We leverage the resulting representation using a novel PCA-based optimization criterion that adaptively finds the best scale to perform data association between spatial subsets of point cloud data. Compared to previous Iterative Closest Point and GMM-based techniques, our tree-based point association algorithm performs data association in logarithmic-time while dynamically adjusting the level of detail to best match the complexity and spatial distribution characteristics of local scene geometry. In addition, unlike other GMM methods that restrict covariances to be isotropic, our new PCA-based optimization criterion well-approximates the true MLE solution even when fully anisotropic Gaussian covariances are used. Efficient data association, multi-scale adaptability, and a robust MLE approximation produce an algorithm that is up to an order of magnitude both faster and more accurate than current state-of-the-art on a wide variety of 3D datasets captured from LiDAR to structured light. [1807.02587v1]

Automated and Interpretable Patient ECG Profiles for Disease Detection, Tracking, and Discovery

Geoffrey H. Tison, Jeffrey Zhang, Francesca N. Delling, Rahul C. Deo

The electrocardiogram or ECG has been in use for over 100 years and remains the most widely performed diagnostic test to characterize cardiac structure and electrical activity. We hypothesized that parallel advances in computing power, innovations in machine learning algorithms, and availability of large-scale digitized ECG data would enable extending the utility of the ECG beyond its current limitations, while at the same time preserving interpretability, which is fundamental to medical decision-making. We identified 36,186 ECGs from the UCSF database that were 1) in normal sinus rhythm and 2) would enable training of specific models for estimation of cardiac structure or function or detection of disease. We derived a novel model for ECG segmentation using convolutional neural networks (CNN) and Hidden Markov Models (HMM) and evaluated its output by comparing electrical interval estimates to 141,864 measurements from the clinical workflow. We built a 725-element patient-level ECG profile using downsampled segmentation data and trained machine learning models to estimate left ventricular mass, left atrial volume, mitral annulus e’ and to detect and track four diseases: pulmonary arterial hypertension (PAH), hypertrophic cardiomyopathy (HCM), cardiac amyloid (CA), and mitral valve prolapse (MVP). CNN-HMM derived ECG segmentation agreed with clinical estimates, with median absolute deviations (MAD) as a fraction of observed value of 0.6% for heart rate and 4% for QT interval. Patient-level ECG profiles enabled quantitative estimates of left ventricular and mitral annulus e’ velocity with good discrimination in binary classification models of left ventricular hypertrophy and diastolic function. Models for disease detection ranged from AUROC of 0.94 to 0.77 for MVP. Top-ranked variables for all models included known ECG characteristics along with novel predictors of these traits/diseases. [1807.02569v1]

Guided Proceduralization: Optimizing Geometry Processing and Grammar Extraction for Architectural Models

Ilke Demir, Daniel G. Aliaga

We describe a guided proceduralization framework that optimizes geometry processing on architectural input models to extract target grammars. We aim to provide efficient artistic workflows by creating procedural representations from existing 3D models, where the procedural expressiveness is controlled by the user. Architectural reconstruction and modeling tasks have been handled as either time consuming manual processes or procedural generation with difficult control and artistic influence. We bridge the gap between creation and generation by converting existing manually modeled architecture to procedurally editable parametrized models, and carrying the guidance to procedural domain by letting the user define the target procedural representation. Additionally, we propose various applications of such procedural representations, including guided completion of point cloud models, controllable 3D city modeling, and other benefits of procedural modeling. [1807.02578v1]

Christopher SybenBernhard StimpelJonathan LommenTobiasWürflArndDörflerAndreas Maier

Kezhi Li, John Daniels, Pau Herrero-viñas, Chengyuan Liu, Pantelis Georgiou

Andreas HauptmannBen CoxFelix LuckaNam HuynhMarta BetckePaul BeardSimon Arridge

Ari HeljakkaArno SolinJuho Kannala

Hosnieh SattarGerard Pons-MollMario Fritz

Xu Liu, Licheng Jiao, Xu Tang, Qigong Sun, Dan Zhang

Fangneng Zhan, Shijian Lu, Chuhui Xue

Nripesh Parajuli，艾伦鲁，Kevinminh钽，约翰C.斯滕达尔纳比勒Boutagy，伊姆兰Alkhalil，梅丽莎埃伯尔，岗石情妇，玛丽亚Zontak，马修·奥唐奈，阿尔伯特J. Sinusas，詹姆斯·S·邓肯

Pengchong Jin, Vivek Rathod, Xiangxin Zhu

Kaiyang Zhou, Tao Xiang, Andrea Cavallaro

Zhuotun Zhu, Yingda Xia, Lingxi Xie, Elliot K. Fishman, Alan L. Yuille

Rosanne LiuJoel LehmanPiero MolinoFelipe PetroskiEric FrankAlex SergeevJason Yosinski

Di Hu, Feiping Nie, Xuelong Li

Shiqi Yang, Gang Peng

Jeesoo KimJaeyoung YooJangho KimNojun Kwak

Jia-Xing Zhong, Nannan Li, Weijie Kong, Tao Zhang, Thomas H. Li, Ge Li

PARN：用于密集语义对应估计的金字塔仿射回归网络

Sangryul JeonSeungryong KimDongbo MinKwanghoon Sohn

Walid Abdullah AlIl Dong Yun

Zhiguo Zhou, Shulong Li, Genggeng Qin, Michael Folkert, Steve Jiang, Jing Wang

Hongyu Wang, Yong Xia

Asim IqbalAsfandyar SheikhTheofanis Karayannis

Sergiu DeitschVincent ChristleinStephan BergerClaudia Buerhop-LutzAndreas MaierFlorian GallwitzChristian Riess

Xiaomeng Li, Lequan Yu, Chi-Wing Fu, Pheng-Ann Heng

Haseeb ShahKhurram JavedFaisal Shafait

Bo Li, Tianfu Wu, Lun Zhang, Rufeng Chu

DeepSource：使用深度学习的点源检测

A. Vafaei SadrEtienneE. VosBruce A. BassettZafiirah HosenieNOozeerMichelle Lochner

Matthew YungEli T. BrownAlexander RasinJacob D. FurstDaniela S. Raicu

RGB-D重建人的形状和运动是一个具有挑战性的问题，近年来受到很多关注。最近的全身重建方法使用统计形状模型，该模型建立在对紧身衣服的人进行精确的全身扫描的基础上，以完成由于闭塞而导致的不可见部分。这样的统计模型仍然适合于穿着宽松衣服的RGB-D测量，但是不能描述其变形，例如衣服皱纹。可以从实际测量精确地重建观察到的表面，而我们没有观察到未观察到的表面。对于宽松衣服的全身重建，我们建议使用纹理和变形的低维嵌入（称为本征纹理和本征变形）来再现甚至未观察到的表面的视图。提供从一系列部分测量为3D网格的全身重建，然后使用特征分解嵌入每个三角形的纹理和变形。结合基于神经网络的系数回归，我们的方法从任意视点合成纹理和变形。我们使用模拟数据评估我们的方法，并直观地演示我们的方法如何处理实际数据。[1807.02632v1]

Yunseok JangGunhee KimYale Song

Nan YangRui WangJörgStücklerDaniel Cremers

Stanislav PidhorskyiRanya AlmohsenDonald A AdjerohGianfranco Doretto

Shiqi Yang, Gang Peng

Ben EckartKihwan KimJan Kautz

Geoffrey H. TisonJeffrey ZhangFrancesca N. DellingRahul C. Deo

Ilke DemirDaniel G. Aliaga