Deriving Neural Network Architectures using Precision Learning: Parallel-to-fan beam Conversion
Christopher Syben, Bernhard Stimpel, Jonathan Lommen, Tobias Würfl, Arnd Dörfler, Andreas Maier
In this paper, we derive a neural network architecture based on an analytical formulation of the parallel-to-fan beam conversion problem following the concept of precision learning. The network allows to learn the unknown operators in this conversion in a data-driven manner avoiding interpolation and potential loss of resolution. Integration of known operators results in a small number of trainable parameters that can be estimated from synthetic data only. The concept is evaluated in the context of Hybrid MRI/X-ray imaging where transformation of the parallel-beam MRI projections to fan-beam X-ray projections is required. The proposed method is compared to a traditional rebinning method. The results demonstrate that the proposed method is superior to ray-by-ray interpolation and is able to deliver sharper images using the same amount of parallel-beam input projections which is crucial for interventional applications. We believe that this approach forms a basis for further work uniting deep learning, signal processing, physics, and traditional pattern recognition. [1807.03057v1]
Vulnerability Analysis of Chest X-Ray Image Classification Against Adversarial Attacks
Saeid Asgari Taghanaki, Arkadeep Das, Ghassan Hamarneh
Recently, there have been several successful deep learning approaches for automatically classifying chest X-ray images into different disease categories. However, there is not yet a comprehensive vulnerability analysis of these models against the so-called adversarial perturbations/attacks, which makes deep models more trustful in clinical practices. In this paper, we extensively analyzed the performance of two state-of-the-art classification deep networks on chest X-ray images. These two networks were attacked by three different categories (nine methods in total) of adversarial methods (both white- and black-box), namely gradient-based, score-based, and decision-based attacks. Furthermore, we modified the pooling operations in the two classification networks to measure their sensitivities against different attacks, on the specific task of chest X-ray classification. [1807.02905v1]
Convolutional Recurrent Neural Networks for Blood Glucose Prediction
Kezhi Li, John Daniels, Pau Herrero-viñas, Chengyuan Liu, Pantelis Georgiou
The main purpose of the artificial pancreas (AP) or any diabetes therapy for subjects with type 1 diabetes (T1D) is to maintain the subjects’ plasma glucose level within the euglycemic range, which means below the threshold of hyperglycemia and above the threshold of hypoglycemia. The development of modern continuous glucose monitor (CGM) makes this feasible. Its continuous monitoring enables people to take actions before the hypo/hyperglycemia episodes. For this reason, an accurate blood glucose (BG) prediction is essential. It raises alarms before the real hypo/hyperglycemia scenarios and stays silent for non-hypo/non-hyperglycemia episodes. Data driven approaches have recently been widely used in statistical modelling in healthcare and medical researches, in particular deep neural network techniques. In this paper, the glucose prediction is seen as a probabilistically generative problem, and a hybrid deep neural network is proposed to combine the advantages of convolutional neural networks (CNN) and recurrent neural networks (RNN). Specifically, a multi-layer CNN is implemented as a feature extraction component, followed by a multi-layer modified long short term memory (LSTM) model to capture the probabilistic correlations between the future BG and historical BG level, meal information and insulin. The model is adaptive for the individual subject with T1D, and dropout layers are leveraged to avoid overfitting. The model is easily implemented using Tensorflow, and can be embedded to portable devices with limited computation resources. Experiments verify and evaluate the effectiveness of the proposed method using the simulated and clinical data. [1807.03043v1]
Approximate k-space models and Deep Learning for fast photoacoustic reconstruction
Andreas Hauptmann, Ben Cox, Felix Lucka, Nam Huynh, Marta Betcke, Paul Beard, Simon Arridge
We present a framework for accelerated iterative reconstructions using a fast and approximate forward model that is based on k-space methods for photoacoustic tomography. The approximate model introduces aliasing artefacts in the gradient information for the iterative reconstruction, but these artefacts are highly structured and we can train a CNN that can use the approximate information to perform an iterative reconstruction. We show feasibility of the method for human in-vivo measurements in a limited-view geometry. The proposed method is able to produce superior results to total variation reconstructions with a speed-up of 32 times. [1807.03191v1]
Image Restoration Using Conditional Random Fields and Scale Mixtures of Gaussians
Milad Niknejad, Jose M. Bioucas-Dias, Mario A. T. Figueiredo
This paper proposes a general framework for internal patch-based image restoration based on Conditional Random Fields (CRF). Unlike related models based on Markov Random Fields (MRF), our approach explicitly formulates the posterior distribution for the entire image. The potential functions are taken as proportional to the product of a likelihood and prior for each patch. By assuming identical parameters for similar patches, our approach can be classified as a model-based non-local method. For the prior term in the potential function of the CRF model, multivariate Gaussians and multivariate scale-mixture of Gaussians are considered, with the latter being a novel prior for image patches. Our results show that the proposed approach outperforms methods based on Gaussian mixture models for image denoising and state-of-the-art methods for image interpolation/inpainting. [1807.03027v1]
Pioneer Networks: Progressively Growing Generative Autoencoder
Ari Heljakka, Arno Solin, Juho Kannala
We introduce a novel generative autoencoder network model that learns to encode and reconstruct images with high quality and resolution, and supports smooth random sampling from the latent space of the encoder. Generative adversarial networks (GANs) are known for their ability to simulate random high-quality images, but they cannot reconstruct existing images. Previous works have attempted to extend GANs to support such inference but, so far, have not delivered satisfactory high-quality results. Instead, we propose the Progressively Growing Generative Autoencoder (PIONEER) network which achieves high-quality reconstruction with $128{\times}128$ images without requiring a GAN discriminator. We merge recent techniques for progressively building up the parts of the network with the recently introduced adversarial encoder-generator network. The ability to reconstruct input images is crucial in many real-world applications, and allows for precise intelligent manipulation of existing images. We show promising results in image synthesis and inference, with state-of-the-art results in CelebA inference tasks. [1807.03026v1]
External Patch-Based Image Restoration Using Importance Sampling
Milad Niknejad, Jose M. Bioucas-Dias, Mario A. T. Figueiredo
This paper introduces a new approach to patch-based image restoration based on external datasets and importance sampling. The Minimum Mean Squared Error (MMSE) estimate of the image patches, the computation of which requires solving a multidimensional (typically intractable) integral, is approximated using samples from an external dataset. The new method, which can be interpreted as a generalization of the external non-local means (NLM), uses self-normalized importance sampling to efficiently approximate the MMSE estimates. The use of self-normalized importance sampling endows the proposed method with great flexibility, namely regarding the statistical properties of the measurement noise. The effectiveness of the proposed method is shown in a series of experiments using both generic large-scale and class-specific external datasets. [1807.03018v1]
Fashion is Taking Shape: Understanding Clothing Preference Based on Body Shape From Online Sources
Hosnieh Sattar, Gerard Pons-Moll, Mario Fritz
To study the correlation between clothing garments and body shape, we collected a new dataset (Fashion Takes Shape), which includes images of users with clothing category annotations. We employ our multi-photo approach to estimate body shapes of each user and build a conditional model of clothing categories given body-shape. We demonstrate that in real-world data, clothing categories and body-shapes are correlated and show that our multi-photo approach leads to a better predictive model for clothing categories compared to models based on single-view shape estimates or manually annotated body types. We see our method as the first step towards the large-scale understanding of clothing preferences from body shape. [1807.03235v1]
Polarimetric Convolutional Network for PolSAR Image Classification
Xu Liu, Licheng Jiao, Xu Tang, Qigong Sun, Dan Zhang
The approaches for analyzing the polarimetric scattering matrix of polarimetric synthetic aperture radar (PolSAR) data have always been the focus of PolSAR image classification. Generally, the polarization coherent matrix and the covariance matrix obtained by the polarimetric scattering matrix only show a limited number of polarimetric information. In order to solve this problem, we propose a sparse scattering coding way to deal with polarimetric scattering matrix and obtain a close complete feature. This encoding mode can also maintain polarimetric information of scattering matrix completely. At the same time, in view of this encoding way, we design a corresponding classification algorithm based on convolution network to combine this feature. Based on sparse scattering coding and convolution neural network, the polarimetric convolutional network is proposed to classify PolSAR images by making full use of polarimetric information. We perform the experiments on the PolSAR images acquired by AIRSAR and RADARSAT-2 to verify the proposed method. The experimental results demonstrate that the proposed method get better results and has huge potential for PolSAR data classification. Source code for sparse scattering coding is available at https://github.com/liuxuvip/Polarimetric-Scattering-Coding. [1807.02975v1]
Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes
Fangneng Zhan, Shijian Lu, Chuhui Xue
The requirement of large amounts of annotated images has become one grand challenge while training deep neural network models for various visual detection and recognition tasks. This paper presents a novel image synthesis technique that aims to generate a large amount of annotated scene text images for training accurate and robust scene text detection and recognition models. The proposed technique consists of three innovative designs. First, it realizes “semantic coherent” synthesis by embedding texts at semantically sensible regions within the background image, where the semantic coherence is achieved by leveraging the semantic annotations of objects and image regions that have been created in the prior semantic segmentation research. Second, it exploits visual saliency to determine the embedding locations within each semantic sensible region, which coincides with the fact that texts are often placed around homogeneous regions for better visibility in scenes. Third, it designs an adaptive text appearance model that determines the color and brightness of embedded texts by learning from the feature of real scene text images adaptively. The proposed technique has been evaluated over five public datasets and the experiments show its superior performance in training accurate and robust scene text detection and recognition models. [1807.03021v1]
Nripesh Parajuli, Allen Lu, Kevinminh Ta, John C. Stendahl, Nabil Boutagy, Imran Alkhalil, Melissa Eberle, Geng-Shi Jeng, Maria Zontak, Matthew ODonnell, Albert J. Sinusas, James S. Duncan
The accurate quantification of left ventricular (LV) deformation/strain shows significant promise for quantitatively assessing cardiac function for use in diagnosis and therapy planning (Jasaityte et al., 2013). However, accurate estimation of the displacement of myocardial tissue and hence LV strain has been challenging due to a variety of issues, including those related to deriving tracking tokens from images and following tissue locations over the entire cardiac cycle. In this work, we propose a point matching scheme where correspondences are modeled as flow through a graphical network. Myocardial surface points are set up as nodes in the network and edges define neighborhood relationships temporally. The novelty lies in the constraints that are imposed on the matching scheme, which render the correspondences one-to-one through the entire cardiac cycle, and not just two consecutive frames. The constraints also encourage motion to be cyclic, which is an important characteristic of LV motion. We validate our method by applying it to the estimation of quantitative LV displacement and strain estimation using 8 synthetic and 8 open-chested canine 4D echocardiographic image sequences, the latter with sonomicrometric crystals implanted on the LV wall. We were able to achieve excellent tracking accuracy on the synthetic dataset and observed a good correlation with crystal-based strains on the in-vivo data. [1807.02951v1]
Pooling Pyramid Network for Object Detection
Pengchong Jin, Vivek Rathod, Xiangxin Zhu
We’d like to share a simple tweak of Single Shot Multibox Detector (SSD) family of detectors, which is effective in reducing model size while maintaining the same quality. We share box predictors across all scales, and replace convolution between scales with max pooling. This has two advantages over vanilla SSD: (1) it avoids score miscalibration across scales; (2) the shared predictor sees the training data over all scales. Since we reduce the number of predictors to one, and trim all convolutions between them, model size is significantly smaller. We empirically show that these changes do not hurt model quality compared to vanilla SSD. [1807.03284v1]
Human Activity Recognition in RGB-D Videos by Dynamic Images
Snehasis Mukherjee, Leburu Anvitha, T. Mohana Lahari
Human Activity Recognition in RGB-D videos has been an active research topic during the last decade. However, no efforts have been found in the literature, for recognizing human activity in RGB-D videos where several performers are performing simultaneously. In this paper we introduce such a challenging dataset with several performers performing the activities. We present a novel method for recognizing human activities in such videos. The proposed method aims in capturing the motion information of the whole video by producing a dynamic image corresponding to the input video. We use two parallel ResNext-101 to produce the dynamic images for the RGB video and depth video separately. The dynamic images contain only the motion information and hence, the unnecessary background information are eliminated. We send the two dynamic images extracted from the RGB and Depth videos respectively, through a fully connected layer of neural networks. The proposed dynamic image reduces the complexity of the recognition process by extracting a sparse matrix from a video. However, the proposed system maintains the required motion information for recognizing the activity. The proposed method has been tested on the MSR Action 3D dataset and has shown comparable performances with respect to the state-of-the-art. We also apply the proposed method on our own dataset, where the proposed method outperforms the state-of-the-art approaches. [1807.02947v1]
Video Summarisation by Classification with Deep Reinforcement Learning
Kaiyang Zhou, Tao Xiang, Andrea Cavallaro
Most existing video summarisation methods are based on either supervised or unsupervised learning. In this paper, we propose a reinforcement learning-based weakly supervised method that exploits easy-to-obtain, video-level category labels and encourages summaries to contain category-related information and maintain category recognisability. Specifically, We formulate video summarisation as a sequential decision-making process and train a summarisation network with deep Q-learning (DQSN). A companion classification network is also trained to provide rewards for training the DQSN. With the classification network, we develop a global recognisability reward based on the classification result. Critically, a novel dense ranking-based reward is also proposed in order to cope with the temporally delayed and sparse reward problems for long sequence reinforcement learning. Extensive experiments on two benchmark datasets show that the proposed approach achieves state-of-the-art performance. [1807.03089v1]
Multi-Scale Coarse-to-Fine Segmentation for Screening Pancreatic Ductal Adenocarcinoma
Zhuotun Zhu, Yingda Xia, Lingxi Xie, Elliot K. Fishman, Alan L. Yuille
This paper proposes an intuitive approach to finding pancreatic ductal adenocarcinoma (PDAC), the most common type of pancreatic cancer, by checking abdominal CT scans. Our idea is named segmentation-for-classification (S4C), which classifies a volume by checking if at least a sufficient number of voxels is segmented as the tumor. In order to deal with tumors with different scales, we train volumetric segmentation networks with multi-scale inputs, and test them in a coarse-to-fine flowchart. A post-processing module is used to filter out outliers and reduce false alarms. We perform a case study on our dataset containing 439 CT scans, in which 136 cases were diagnosed with PDAC and 303 cases are normal. Our approach reports a sensitivity of 94.1% at a specificity of 98.5%, with an average tumor segmentation accuracy of 56.46% over all PDAC cases. [1807.02941v1]
An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution
Rosanne Liu, Joel Lehman, Piero Molino, Felipe Petroski Such, Eric Frank, Alex Sergeev, Jason Yosinski
Few ideas have enjoyed as large an impact on deep learning as convolution. For any problem involving pixels or spatial representations, common intuition holds that convolutional neural networks may be appropriate. In this paper we show a striking counterexample to this intuition via the seemingly trivial coordinate transform problem, which simply requires learning a mapping between coordinates in (x,y) Cartesian space and one-hot pixel space. Although convolutional networks would seem appropriate for this task, we show that they fail spectacularly. We demonstrate and carefully analyze the failure first on a toy problem, at which point a simple fix becomes obvious. We call this solution CoordConv, which works by giving convolution access to its own input coordinates through the use of extra coordinate channels. Without sacrificing the computational and parametric efficiency of ordinary convolution, CoordConv allows networks to learn either perfect translation invariance or varying degrees of translation dependence, as required by the task. CoordConv solves the coordinate transform problem with perfect generalization and 150 times faster with 10–100 times fewer parameters than convolution. This stark contrast raises the question: to what extent has this inability of convolution persisted insidiously inside other tasks, subtly hampering performance from within? A complete answer to this question will require further investigation, but we show preliminary evidence that swapping convolution for CoordConv can improve models on a diverse set of tasks. Using CoordConv in a GAN produced less mode collapse as the transform between high-level spatial latents and pixels becomes easier to learn. A Faster R-CNN detection model trained on MNIST detection showed 24% better IOU when using CoordConv, and in the RL domain agents playing Atari games benefit significantly from the use of CoordConv layers. [1807.03247v1]
Deep Co-Clustering for Unsupervised Audiovisual Learning
Di Hu, Feiping Nie, Xuelong Li
The seen birds twitter, the running cars accompany with noise, people talks by face-to-face, etc. These naturally audiovisual correspondences provide the possibilities to explore and understand the outside world. However, the mixed multiple objects and sounds make it intractable to perform efficient matching in the unconstrained environment. To settle this problem, we propose to adequately excavate audio and visual components and perform elaborate correspondence learning among them. Concretely, a novel unsupervised audiovisual learning model is proposed, named as Deep Co-Clustering (DCC), that synchronously performs sets of clustering with multimodal vectors of convolutional maps in different shared spaces for capturing multiple audiovisual correspondences. And such integrated multimodal clustering network can be effectively trained with max-margin loss in the end-to-end fashion. Amounts of experiments in feature evaluation and audiovisual tasks are performed. The results demonstrate that DCC can learn effective unimodal representation, with which the classifier can even outperform human. Further, DCC shows noticeable performance in the task of sound localization, multisource detection, and audiovisual understanding. [1807.03094v1]
Attention to Refine through Multi-Scales for Semantic Segmentation
Shiqi Yang, Gang Peng
This paper proposes a novel attention model for semantic segmentation, which aggregates multi-scale and context features to refine prediction. Specifically, the skeleton convolutional neural network framework takes in multiple different scales inputs, by which means the CNN can get representations in different scales. The proposed attention model will handle the features from different scale streams respectively and integrate them. Then location attention branch of the model learns to softly weight the multi-scale features at each pixel location. Moreover, we add an recalibrating branch, parallel to where location attention comes out, to recalibrate the score map per class. We achieve quite competitive results on PASCAL VOC 2012 and ADE20K datasets, which surpass baseline and related works. [1807.02917v1]
Generating objects going well with the surroundings
Jeesoo Kim, Jaeyoung Yoo, Jangho Kim, Nojun Kwak
Since the generative adversarial network has made a breakthrough in the image generation problem, lots of researches on its applications have been studied such as image restoration, style transfer and image completion. However, there have been few researches generating objects in uncontrolled real-world environments. In this paper, we propose a novel approach for image generation in real-world scenes. The overall architecture consists of two different networks each of which completes the shape of the generating object and paints the context on it respectively. Using a subnetwork proposed in a precedent work of image completion, our model make the shape of an object. Unlike the approaches used in the image completion problem, details of trained objects are encoded into a latent variable by an additional subnetwork, resulting in a better quality of the generated objects. We evaluated our method using KITTI and City-scape datasets, which are widely used for object detection and image segmentation problems. The adequacy of the generated images by the proposed method has also been evaluated using a widely utilized object detection algorithm. [1807.02925v1]
Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector
Jia-Xing Zhong, Nannan Li, Weijie Kong, Tao Zhang, Thomas H. Li, Ge Li
Weakly supervised temporal action detection is a Herculean task in understanding untrimmed videos, since no supervisory signal except the video-level category label is available on training data. Under the supervision of category labels, weakly supervised detectors are usually built upon classifiers. However, there is an inherent contradiction between classifier and detector; i.e., a classifier in pursuit of high classification performance prefers top-level discriminative video clips that are extremely fragmentary, whereas a detector is obliged to discover the whole action instance without missing any relevant snippet. To reconcile this contradiction, we train a detector by driving a series of classifiers to find new actionness clips progressively, via step-by-step erasion from a complete video. During the test phase, all we need to do is to collect detection results from the one-by-one trained classifiers at various erasing steps. To assist in the collection process, a fully connected conditional random field is established to refine the temporal localization outputs. We evaluate our approach on two prevailing datasets, THUMOS’14 and ActivityNet. The experiments show that our detector advances state-of-the-art weakly supervised temporal action detection results, and even compares with quite a few strongly supervised methods. [1807.02929v1]
PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence Estimation
Sangryul Jeon, Seungryong Kim, Dongbo Min, Kwanghoon Sohn
This paper presents a deep architecture for dense semantic correspondence, called pyramidal affine regression networks (PARN), that estimates pixel-varying affine transformation fields across images. To deal with intra-class appearance and shape variations that commonly exist among different instances within the same object category, we leverage a pyramidal model where dense affine transformation fields are progressively estimated in a coarse-to-fine manner so that the smoothness constraint is naturally imposed within deep networks. PARN estimates residual affine transformations at each level and composes them to estimate final affine transformations. Furthermore, to overcome the limitations of insufficient training data for semantic correspondence, we propose a novel weakly-supervised training scheme that generates progressive supervisions by leveraging the correspondence consistency across images. Our method is fully learnable in an end-to-end manner and does not require quantizing infinite continuous affine transformation fields. To the best of our knowledge, it is the first work that attempts to estimate dense affine transformation fields in a coarse-to-fine manner within deep networks. Experimental results demonstrate that PARN outperforms the state-of-the-art methods for dense semantic correspondence on various benchmarks. [1807.02939v1]
Walid Abdullah Al, Il Dong Yun
Deploying the idea of long-term cumulative return, reinforcement learning has shown remarkable performance in various fields. We propose a formulation of the landmark localization in 3D medical images as a reinforcement learning problem. Whereas value-based methods have been widely used to solve similar problems, we adopt an actor-critic based direct policy search method framed in a temporal difference learning approach. Successful behavior learning is challenging in large state and/or action spaces, requiring many trials. We introduce a partial policy-based reinforcement learning to enable solving the large problem of localization by learning the optimal policy on smaller partial domains. Independent actors efficiently learn the corresponding partial policies, each utilizing their own independent critic. The proposed policy reconstruction from the partial policies ensures a robust and efficient localization utilizing the sub-agents solving simple binary decision problems in their corresponding partial action spaces. The proposed reinforcement learning requires a small number of trials to learn the optimal behavior compared with the original behavior learning scheme. [1807.02908v1]
Automatic multi-objective based feature selection for classification
Zhiguo Zhou, Shulong Li, Genggeng Qin, Michael Folkert, Steve Jiang, Jing Wang
Accurately classifying malignancy of lesions detected in a screening scan plays a critical role in reducing false positives. Through extracting and analyzing a large numbers of quantitative image features, radiomics holds great potential to differentiate the malignant tumors from benign ones. Since not all radiomic features contribute to an effective classifying model, selecting an optimal feature subset is critical. This work proposes a new multi-objective based feature selection (MO-FS) algorithm that considers both sensitivity and specificity simultaneously as the objective functions during the feature selection. In MO-FS, we developed a modified entropy based termination criterion (METC) to stop the algorithm automatically rather than relying on a preset number of generations. We also designed a solution selection methodology for multi-objective learning using the evidential reasoning approach (SMOLER) to automatically select the optimal solution from the Pareto-optimal set. Furthermore, an adaptive mutation operation was developed to generate the mutation probability in MO-FS automatically. The MO-FS was evaluated for classifying lung nodule malignancy in low-dose CT and breast lesion malignancy in digital breast tomosynthesis. Compared with other commonly used feature selection methods, the experimental results for both lung nodule and breast lesion malignancy classification demonstrated that the feature set by selected MO-FS achieved better classification performance. [1807.03236v1]
ChestNet: A Deep Neural Network for Classification of Thoracic Diseases on Chest Radiography
Hongyu Wang, Yong Xia
Computer-aided techniques may lead to more accurate and more acces-sible diagnosis of thorax diseases on chest radiography. Despite the success of deep learning-based solutions, this task remains a major challenge in smart healthcare, since it is intrinsically a weakly supervised learning problem. In this paper, we incorporate the attention mechanism into a deep convolutional neural network, and thus propose the ChestNet model to address effective diagnosis of thorax diseases on chest radiography. This model consists of two branches: a classification branch serves as a uniform feature extraction-classification network to free users from troublesome handcrafted feature extraction, and an attention branch exploits the correlation between class labels and the locations of patholog-ical abnormalities and allows the model to concentrate adaptively on the patholog-ically abnormal regions. We evaluated our model against three state-of-the-art deep learning models on the Chest X-ray 14 dataset using the official patient-wise split. The results indicate that our model outperforms other methods, which use no extra training data, in diagnosing 14 thorax diseases on chest radiography. [1807.03058v1]
Exploring Brain-wide Development of Inhibition through Deep Learning
Asim Iqbal, Asfandyar Sheikh, Theofanis Karayannis
We introduce here a fully automated convolutional neural network-based method for brain image processing to Detect Neurons in different brain Regions during Development (DeNeRD). Our method takes a developing mouse brain as input and i) registers the brain sections against a developing mouse reference atlas, ii) detects various types of neurons, and iii) quantifies the neural density in many unique brain regions at different postnatal (P) time points. Our method is invariant to the shape, size and expression of neurons and by using DeNeRD, we compare the brain-wide neural density of all GABAergic neurons in developing brains of ages P4, P14 and P56. We discover and report 6 different clusters of regions in the mouse brain in which GABAergic neurons develop in a differential manner from early age (P4) to adulthood (P56). These clusters reveal key steps of GABAergic cell development that seem to track with the functional development of diverse brain regions as the mouse transitions from a passive receiver of sensory information (<P14) to an active seeker (>P14). [1807.03238v1]
Automatic Classification of Defective Photovoltaic Module Cells in Electroluminescence Images
Sergiu Deitsch, Vincent Christlein, Stephan Berger, Claudia Buerhop-Lutz, Andreas Maier, Florian Gallwitz, Christian Riess
Electroluminescence (EL) imaging is a useful modality for the inspection of photovoltaic (PV) modules. EL images provide high spatial resolution, which makes it possible to detect even finest defects on the surface of PV modules. However, the analysis of EL images is typically a manual process that is expensive, time-consuming, and requires expert knowledge of many different types of defects. In this work, we investigate two approaches for automatic detection of such defects in a single image of a PV cell. The approaches differ in their hardware requirements, which are dictated by their respective application scenarios. The more hardware-efficient approach is based on hand-crafted features that are classified in a Support Vector Machine (SVM). To obtain a strong performance, we investigate and compare various processing variants. The more hardware-demanding approach uses an end-to-end deep Convolutional Neural Network (CNN) that runs on a Graphics Processing Unit (GPU). Both approaches are trained on 1,968 cells extracted from high resolution EL intensity images of mono- and polycrystalline PV modules. The CNN is more accurate, and reaches an average accuracy of 88.42%. The SVM achieves a slightly lower average accuracy of 82.44%, but can run on arbitrary hardware. Both automated approaches make continuous, highly accurate monitoring of PV cells feasible. [1807.02894v1]
Deeply Supervised Rotation Equivariant Network for Lesion Segmentation in Dermoscopy Images
Xiaomeng Li, Lequan Yu, Chi-Wing Fu, Pheng-Ann Heng
Automatic lesion segmentation in dermoscopy images is an essential step for computer-aided diagnosis of melanoma. The dermoscopy images exhibits rotational and reflectional symmetry, however, this geometric property has not been encoded in the state-of-the-art convolutional neural networks based skin lesion segmentation methods. In this paper, we present a deeply supervised rotation equivariant network for skin lesion segmentation by extending the recent group rotation equivariant network~\cite{cohen2016group}. Specifically, we propose the G-upsampling and G-projection operations to adapt the rotation equivariant classification network for our skin lesion segmentation problem. To further increase the performance, we integrate the deep supervision scheme into our proposed rotation equivariant segmentation architecture. The whole framework is equivariant to input transformations, including rotation and reflection, which improves the network efficiency and thus contributes to the segmentation performance. We extensively evaluate our method on the ISIC 2017 skin lesion challenge dataset. The experimental results show that our rotation equivariant networks consistently excel the regular counterparts with the same model complexity under different experimental settings. Our best model achieves 77.23\%(JA) on the test dataset, outperforming the state-of-the-art challenging methods and further demonstrating the effectiveness of our proposed deeply supervised rotation equivariant segmentation network. Our best model also outperforms the state-of-the-art challenging methods, which further demonstrate the effectiveness of our proposed deeply supervised rotation equivariant segmentation network. [1807.02804v1]
Distillation Techniques for Pseudo-rehearsal Based Incremental Learning
Haseeb Shah, Khurram Javed, Faisal Shafait
The ability to learn from incrementally arriving data is essential for any life-long learning system. However, standard deep neural networks forget the knowledge about the old tasks, a phenomenon called catastrophic forgetting, when trained on incrementally arriving data. We discuss the biases in current Generative Adversarial Networks (GAN) based approaches that learn the classifier by knowledge distillation from previously trained classifiers. These biases cause the trained classifier to perform poorly. We propose an approach to remove these biases by distilling knowledge from the classifier of AC-GAN. Experiments on MNIST and CIFAR10 show that this method is comparable to current state of the art rehearsal based approaches. The code for this paper is available at this $\href{https://github.com/haseebs/Pseudo-rehearsal-Incremental-Learning}{link}$. [1807.02799v1]
Bo Li, Tianfu Wu, Lun Zhang, Rufeng Chu
Region-based convolutional neural networks (R-CNN)~\cite{fast_rcnn,faster_rcnn,mask_rcnn} have largely dominated object detection. Operators defined on RoIs (Region of Interests) play an important role in R-CNNs such as RoIPooling~\cite{fast_rcnn} and RoIAlign~\cite{mask_rcnn}. They all only utilize information inside RoIs for RoI prediction, even with their recent deformable extensions~\cite{deformable_cnn}. Although surrounding context is well-known for its importance in object detection, it has yet been integrated in R-CNNs in a flexible and effective way. Inspired by the auto-context work~\cite{auto_context} and the multi-class object layout work~\cite{nms_context}, this paper presents a generic context-mining RoI operator (i.e., \textit{RoICtxMining}) seamlessly integrated in R-CNNs, and the resulting object detection system is termed \textbf{Auto-Context R-CNN} which is trained end-to-end. The proposed RoICtxMining operator is a simple yet effective two-layer extension of the RoIPooling or RoIAlign operator. Centered at an object-RoI, it creates a $3\times 3$ layout to mine contextual information adaptively in the $8$ surrounding context regions on-the-fly. Within each of the $8$ context regions, a context-RoI is mined in term of discriminative power and its RoIPooling / RoIAlign features are concatenated with the object-RoI for final prediction. \textit{The proposed Auto-Context R-CNN is robust to occlusion and small objects, and shows promising vulnerability for adversarial attacks without being adversarially-trained.} In experiments, it is evaluated using RoIPooling as the backbone and shows competitive results on Pascal VOC, Microsoft COCO, and KITTI datasets (including $6.9\%$ mAP improvements over the R-FCN~\cite{rfcn} method on COCO \textit{test-dev} dataset and the first place on both KITTI pedestrian and cyclist detection as of this submission). [1807.02842v1]
Learning The Sequential Temporal Information with Recurrent Neural Networks
Pushparaja Murugan
Recurrent Networks are one of the most powerful and promising artificial neural network algorithms to processing the sequential data such as natural languages, sound, time series data. Unlike traditional feed-forward network, Recurrent Network has a inherent feed back loop that allows to store the temporal context information and pass the state of information to the entire sequences of the events. This helps to achieve the state of art performance in many important tasks such as language modeling, stock market prediction, image captioning, speech recognition, machine translation and object tracking etc., However, training the fully connected RNN and managing the gradient flow are the complicated process. Many studies are carried out to address the mentioned limitation. This article is intent to provide the brief details about recurrent neurons, its variances and trips & tricks to train the fully recurrent neural network. This review work is carried out as a part of our IPO studio software module ‘Multiple Object Tracking’. [1807.02857v1]
Semi-parametric Image Inpainting
Karim Iskakov
This paper introduces a semi-parametric approach to image inpainting for irregular holes. The nonparametric part consists of an external image database. During test time database is used to retrieve a supplementary image, similar to the input masked picture, and utilize it as auxiliary information for the deep neural network. Further, we propose a novel method of generating masks with irregular holes and present public dataset with such masks. Experiments on CelebA-HQ dataset show that our semi-parametric method yields more realistic results than previous approaches, which is confirmed by the user study. [1807.02855v1]
Toufiq Parag, Daniel Berger, Lee Kamentsky, Benedikt Staffler, Donlai Wei, Moritz Helmstaedter, Jeff W. Lichtman, Hanspeter Pfister
Synaptic connectivity detection is a critical task for neural reconstruction from Electron Microscopy (EM) data. Most of the existing algorithms for synapse detection do not identify the cleft location and direction of connectivity simultaneously. The few methods that computes direction along with contact location have only been demonstrated to work on either dyadic (most common in vertebrate brain) or polyadic (found in fruit fly brain) synapses, but not on both types. In this paper, we present an algorithm to automatically predict the location as well as the direction of both dyadic and polyadic synapses. The proposed algorithm first generates candidate synaptic connections from voxelwise predictions of signed proximity generated by a 3D U-net. A second 3D CNN then prunes the set of candidates to produce the final detection of cleft and connectivity orientation. Experimental results demonstrate that the proposed method outperforms the existing methods for determining synapses in both rodent and fruit fly brain. [1807.02739v1]
Hierarchical Stochastic Graphlet Embedding for Graph-based Pattern Recognition
Anjan Dutta, Pau Riba, Josep Lladós, Alicia Fornés
Despite being very successful within the pattern recognition and machine learning community, graph-based methods are often unusable with many machine learning tools. This is because of the incompatibility of most of the mathematical operations in graph domain. Graph embedding has been proposed as a way to tackle these difficulties, which maps graphs to a vector space and makes the standard machine learning techniques applicable for them. However, it is well known that graph embedding techniques usually suffer from the loss of structural information. In this paper, given a graph, we consider its hierarchical structure for mapping it into a vector space. The hierarchical structure is constructed by topologically clustering the graph nodes, and considering each cluster as a node in the upper hierarchical level. Once this hierarchical structure of graph is constructed, we consider its various configurations of its parts, and use stochastic graphlet embedding (SGE) for mapping them into vector space. Broadly speaking, SGE produces a distribution of uniformly sampled low to high order graphlets as a way to embed graphs into the vector space. In what follows, the coarse-to-fine structure of a graph hierarchy and the statistics fetched through the distribution of low to high order stochastic graphlets complements each other and include important structural information with varied contexts. Altogether, these two techniques substantially cope with the usual information loss involved in graph embedding techniques, and it is not a surprise that we obtain more robust vector space embedding of graphs. This fact has been corroborated through a detailed experimental evaluation on various benchmark graph datasets, where we outperform the state-of-the-art methods. [1807.02839v1]
Data-driven Upsampling of Point Clouds
Wentai Zhang, Haoliang Jiang, Zhangsihao Yang, Soji Yamakawa, Kenji Shimada, Levent Burak Kara
High quality upsampling of sparse 3D point clouds is critically useful for a wide range of geometric operations such as reconstruction, rendering, meshing, and analysis. In this paper, we propose a data-driven algorithm that enables an upsampling of 3D point clouds without the need for hard-coded rules. Our approach uses a deep network with Chamfer distance as the loss function, capable of learning the latent features in point clouds belonging to different object categories. We evaluate our algorithm across different amplification factors, with upsampling learned and performed on objects belonging to the same category as well as different categories. We also explore the desirable characteristics of input point clouds as a function of the distribution of the point samples. Finally, we demonstrate the performance of our algorithm in single-category training versus multi-category training scenarios. The final proposed model is compared against a baseline, optimization-based upsampling method. Results indicate that our algorithm is capable of generating more uniform and accurate upsamplings. [1807.02740v1]
Revisiting Distillation and Incremental Classifier Learning
Khurram Javed, Faisal Shafait
One of the key differences between the learning mechanism of humans and Artificial Neural Networks (ANNs) is the ability of humans to learn one task at a time. ANNs, on the other hand, can only learn multiple tasks simultaneously. Any attempts at learning new tasks incrementally cause them to completely forget about previous tasks. This lack of ability to learn incrementally, called Catastrophic Forgetting, is considered a major hurdle in building a true AI system. In this paper, our goal is to isolate the truly effective existing ideas for incremental learning from those that only work under certain conditions. To this end, we first thoroughly analyze the current state of the art (iCaRL) method for incremental learning and demonstrate that the good performance of the system is not because of the reasons presented in the existing literature. We conclude that the success of iCaRL is primarily due to knowledge distillation and recognize a key limitation of knowledge distillation, i.e, it often leads to bias in classifiers. Finally, we propose a dynamic threshold moving algorithm that is able to successfully remove this bias. We demonstrate the effectiveness of our algorithm on CIFAR100 and MNIST datasets showing near-optimal results. Our implementation is available at https://github.com/Khurramjaved96/incremental-learning. [1807.02802v1]
Spatio-Temporal Instance Learning: Action Tubes from Class Supervision
Pascal Mettes, Cees G. M. Snoek
The goal of this paper is spatio-temporal localization of human actions from their class labels only. The state-of-the-art casts the problem as Multiple Instance Learning, where the instances are a priori computed action proposals. Rather than disconnecting the localization from the learning, we propose a variant of Multiple Instance Learning that integrates the spatio-temporal localization during the learning. We make three contributions. First, we define model assumptions tailored to actions and propose a latent instance learning objective allowing for optimization at the box-level. Second, we propose a spatio-temporal box linking algorithm, exploiting box proposals from off-the-shelf person detectors, suitable for weakly-supervised learning. Third, we introduce tube- and video-level refinements at inference time to integrate long-term spatio-temporal action characteristics. Our experiments on three video datasets show the benefits of our contributions as well as its competitive results compared to state-of-the-art alternatives that localize actions from their class label only. Finally, our algorithm enables incorporating point and box supervision, allowing to benchmark, mix, and balance action localization performance versus annotation time. [1807.02800v1]
Image Super-Resolution Using Very Deep Residual Channel Attention Networks
Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, Yun Fu
Convolutional neural network (CNN) depth is of crucial importance for image super-resolution (SR). However, we observe that deeper networks for image SR are more difficult to train. The low-resolution inputs and features contain abundant low-frequency information, which is treated equally across channels, hence hindering the representational ability of CNNs. To solve these problems, we propose the very deep residual channel attention networks (RCAN). Specifically, we propose a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections. Each residual group contains some residual blocks with short skip connections. Meanwhile, RIR allows abundant low-frequency information to be bypassed through multiple skip connections, making the main network focus on learning high-frequency information. Furthermore, we propose a channel attention mechanism to adaptively rescale channel-wise features by considering interdependencies among channels. Extensive experiments show that our RCAN achieves better accuracy and visual improvements against state-of-the-art methods. [1807.02758v1]
Real-time stereo vision-based lane detection system
Rui Fan, Naim Dahnoun
The detection of multiple curved lane markings on a non-flat road surface is still a challenging task for automotive applications. To make an improvement, the depth information can be used to greatly enhance the robustness of the lane detection systems. The proposed system in this paper is developed from our previous work where the dense vanishing point Vp is estimated globally to assist the detection of multiple curved lane markings. However, the outliers in the optimal solution may severely affect the accuracy of the least squares fitting when estimating Vp. Therefore, in this paper we use Random Sample Consensus to update the inliers and outliers iteratively until the fraction of the number of inliers versus the total number exceeds our pre-set threshold. This significantly helps the system to overcome some suddenly changing conditions. Furthermore, we propose a novel lane position validation approach which provides a piecewise weight based on Vp and the gradient to reduce the gradient magnitude of the non-lane candidates. Then, we compute the energy of each possible solution and select all satisfying lane positions for visualisation. The proposed system is implemented on a heterogeneous system which consists of an Intel Core i7-4720HQ CPU and a NVIDIA GTX 970M GPU. A processing speed of 143 fps has been achieved, which is over 38 times faster than our previous work. Also, in order to evaluate the detection precision, we tested 2495 frames with 5361 lanes from the KITTI database (1637 lanes more than our previous experiment). It is shown that the overall successful detection rate is improved from 98.7% to 99.5%. [1807.02752v1]
Tournament Based Ranking CNN for the Cataract grading
Dohyeun Kim, Tae Joon Jun, Daeyoung Kim, Youngsub Eom
Solving the classification problem, unbalanced number of dataset among the classes often causes performance degradation. Especially when some classes dominate the other classes with its large number of datasets, trained model shows low performance in identifying the dominated classes. This is common case when it comes to medical dataset. Because the case with a serious degree is not quite usual, there are imbalance in number of dataset between severe case and normal cases of diseases. Also, there is difficulty in precisely identifying grade of medical data because of vagueness between them. To solve these problems, we propose new architecture of convolutional neural network named Tournament based Ranking CNN which shows remarkable performance gain in identifying dominated classes while trading off very small accuracy loss in dominating classes. Our Approach complemented problems that occur when method of Ranking CNN that aggregates outputs of multiple binary neural network models is applied to medical data. By having tournament structure in aggregating method and using very deep pretrained binary models, our proposed model recorded 68.36% of exact match accuracy, while Ranking CNN recorded 53.40%, pretrained Resnet recorded 56.12% and CNN with linear regression recorded 57.48%. As a result, our proposed method is applied efficiently to cataract grading which have ordinal labels with imbalanced number of data among classes, also can be applied further to medical problems which have similar features to cataract and similar dataset configuration. [1807.02657v1]
Ivan Ustyuzhaninov, Claudio Michaelis, Wieland Brendel, Matthias Bethge
We introduce one-shot texture segmentation: the task of segmenting an input image containing multiple textures given a patch of a reference texture. This task is designed to turn the problem of texture-based perceptual grouping into an objective benchmark. We show that it is straight-forward to generate large synthetic data sets for this task from a relatively small number of natural textures. In particular, this task can be cast as a self-supervised problem thereby alleviating the need for massive amounts of manually annotated data necessary for traditional segmentation tasks. In this paper we introduce and study two concrete data sets: a dense collage of textures (CollTex) and a cluttered texturized Omniglot data set. We show that a baseline model trained on these synthesized data is able to generalize to natural images and videos without further fine-tuning, suggesting that the learned image representations are useful for higher-level vision tasks. [1807.02654v1]
Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery
Seyed Majid Azimi, Eleonora Vig, Reza Bahmanyar, Marco Körner, Peter Reinartz
Automatic multi-class object detection in remote sensing images in unconstrained scenarios is of high interest for several applications including traffic monitoring and disaster management. %a crucial and needed tool. The huge variation in object scale, orientation, category, and complex backgrounds, as well as the different camera sensors pose great challenges for current algorithms. In this work, we propose a new method consisting of a novel joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features. These features are fed into rotation-based region proposal and region of interest networks to produce object detections. Finally, rotational non-maximum suppression is applied to remove redundant detections. During training, we minimize joint horizontal and oriented bounding box loss functions, as well as a novel loss that enforces oriented boxes to be rectangular. Our method achieves 68.16\% mAP on horizontal and 72.45\% mAP on oriented bounding box detection tasks on the challenging new DOTA dataset, outperforming all published methods by a large margin ($+6$\% and $+12$\% absolute improvement, respectively). % whereas the best results in the leader-board are 54.13\% and 60.46\%. Furthermore, it generalizes to two other datasets, NWPU VHR-10 and UCAS-AOD, and achieves competitive results with the baselines even when trained on DOTA. Our method can be deployed in multi-class object detection applications, regardless of the image and object scales and orientations, making it a great choice for unconstrained aerial and satellite imagery. [1807.02700v1]
DeepSource: Point Source Detection using Deep Learning
A. Vafaei Sadr, Etienne. E. Vos, Bruce A. Bassett, Zafiirah Hosenie, N. Oozeer, Michelle Lochner
Point source detection at low signal-to-noise is challenging for astronomical surveys, particularly in radio interferometry images where the noise is correlated. Machine learning is a promising solution, allowing the development of algorithms tailored to specific telescope arrays and science cases. We present DeepSource – a deep learning solution – that uses convolutional neural networks to achieve these goals. DeepSource enhances the Signal-to-Noise Ratio (SNR) of the original map and then uses dynamic blob detection to detect sources. Trained and tested on two sets of 500 simulated 1 deg x 1 deg MeerKAT images with a total of 300,000 sources, DeepSource is essentially perfect in both purity and completeness down to SNR = 4 and outperforms PyBDSF in all metrics. For uniformly-weighted images it achieves a Purity x Completeness (PC) score at SNR = 3 of 0.73, compared to 0.31 for the best PyBDSF model. For natural-weighting we find a smaller improvement of ~40% in the PC score at SNR = 3. If instead we ask where either of the purity or completeness first drop to 90%, we find that DeepSource reaches this value at SNR = 3.6 compared to the 4.3 of PyBDSF (natural-weighting). A key advantage of DeepSource is that it can learn to optimally trade off purity and completeness for any science case under consideration. Our results show that deep learning is a promising approach to point source detection in astronomical images. [1807.02701v1]
Synthetic Sampling for Multi-Class Malignancy Prediction
Matthew Yung, Eli T. Brown, Alexander Rasin, Jacob D. Furst, Daniela S. Raicu
We explore several oversampling techniques for an imbalanced multi-label classification problem, a setting often encountered when developing models for Computer-Aided Diagnosis (CADx) systems. While most CADx systems aim to optimize classifiers for overall accuracy without considering the relative distribution of each class, we look into using synthetic sampling to increase per-class performance when predicting the degree of malignancy. Using low-level image features and a random forest classifier, we show that using synthetic oversampling techniques increases the sensitivity of the minority classes by an average of 7.22% points, with as much as a 19.88% point increase in sensitivity for a particular minority class. Furthermore, the analysis of low-level image feature distributions for the synthetic nodules reveals that these nodules can provide insights on how to preprocess image data for better classification performance or how to supplement the original datasets when more data acquisition is feasible. [1807.02608v1]
Representing a Partially Observed Non-Rigid 3D Human Using Eigen-Texture and Eigen-Deformation
Ryosuke Kimura, Akihiko Sayo, Fabian Lorenzo Dayrit, Yuta Nakashima, Hiroshi Kawasaki, Ambrosio Blanco, Katsushi Ikeuchi
Reconstruction of the shape and motion of humans from RGB-D is a challenging problem, receiving much attention in recent years. Recent approaches for full-body reconstruction use a statistic shape model, which is built upon accurate full-body scans of people in skin-tight clothes, to complete invisible parts due to occlusion. Such a statistic model may still be fit to an RGB-D measurement with loose clothes but cannot describe its deformations, such as clothing wrinkles. Observed surfaces may be reconstructed precisely from actual measurements, while we have no cues for unobserved surfaces. For full-body reconstruction with loose clothes, we propose to use lower dimensional embeddings of texture and deformation referred to as eigen-texturing and eigen-deformation, to reproduce views of even unobserved surfaces. Provided a full-body reconstruction from a sequence of partial measurements as 3D meshes, the texture and deformation of each triangle are then embedded using eigen-decomposition. Combined with neural-network-based coefficient regression, our method synthesizes the texture and deformation from arbitrary viewpoints. We evaluate our method using simulated data and visually demonstrate how our method works on real data. [1807.02632v1]
Video Prediction with Appearance and Motion Conditions
Yunseok Jang, Gunhee Kim, Yale Song
Video prediction aims to generate realistic future frames by learning dynamic visual patterns. One fundamental challenge is to deal with future uncertainty: How should a model behave when there are multiple correct, equally probable future? We propose an Appearance-Motion Conditional GAN to address this challenge. We provide appearance and motion information as conditions that specify how the future may look like, reducing the level of uncertainty. Our model consists of a generator, two discriminators taking charge of appearance and motion pathways, and a perceptual ranking module that encourages videos of similar conditions to look similar. To train our model, we develop a novel conditioning scheme that consists of different combinations of appearance and motion conditions. We evaluate our model using facial expression and human action datasets and report favorable results compared to existing methods. [1807.02635v1]
Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry
Nan Yang, Rui Wang, Jörg Stückler, Daniel Cremers
Monocular visual odometry approaches that purely rely on geometric cues are prone to scale drift and require sufficient motion parallax in successive frames for motion estimation and 3D reconstruction. In this paper, we propose to leverage deep monocular depth prediction to overcome limitations of geometry-based monocular visual odometry. To this end, we incorporate deep depth predictions into Direct Sparse Odometry as direct virtual stereo measurements. For depth prediction, we design a novel deep network that refines predicted depth from a single image in a two-stage process. We train our network in a semi-supervised way on photoconsistency in stereo images and on consistency with accurate sparse depth reconstructions from Stereo DSO. Our deep predictions excel state-of-the-art approaches for monocular depth on the KITTI benchmark. Moreover, our Deep Virtual Stereo Odometry clearly exceeds previous monocular and deep-learning based methods in accuracy. It even achieves comparable performance to the state-of-the-art stereo methods, while only relying on a single camera. [1807.02570v1]
Generative Probabilistic Novelty Detection with Adversarial Autoencoders
Stanislav Pidhorskyi, Ranya Almohsen, Donald A Adjeroh, Gianfranco Doretto
Novelty detection is the problem of identifying whether a new data point is considered to be an inlier or an outlier. We assume that training data is available to describe only the inlier distribution. Recent approaches primarily leverage deep encoder-decoder network architectures to compute a reconstruction error that is used to either compute a novelty score or to train a one-class classifier. While we too leverage a novel network of that kind, we take a probabilistic approach and effectively compute how likely is that a sample was generated by the inlier distribution. We achieve this with two main contributions. First, we make the computation of the novelty probability feasible because we linearize the parameterized manifold capturing the underlying structure of the inlier distribution, and show how the probability factorizes and can be computed with respect to local coordinates of the manifold tangent space. Second, we improved the training of the autoencoder network. An extensive set of results show that the approach achieves state-of-the-art results on several benchmark datasets. [1807.02588v1]
Parallel Convolutional Networks for Image Recognition via a Discriminator
Shiqi Yang, Gang Peng
In this paper, we introduce a simple but quite effective recognition framework dubbed D-PCN, aiming at enhancing feature extracting ability of CNN. The framework consists of two parallel CNNs, a discriminator and an extra classifier which takes integrated features from parallel networks and gives final prediction. The discriminator is core which drives parallel networks to focus on different regions and learn different representations. The corresponding training strategy is introduced to ensures utilization of discriminator. We validate D-PCN with several CNN models on benchmark datasets: CIFAR-100, and ImageNet, D-PCN enhances all models. In particular it yields state of the art performance on CIFAR-100 compared with related works. We also conduct visualization experiment on fine-grained Stanford Dogs dataset to verify our motivation. Additionally, we apply D-PCN for segmentation on PASCAL VOC 2012 and also find promotion. [1807.02265v2]
Fast and Accurate Point Cloud Registration using Trees of Gaussian Mixtures
Ben Eckart, Kihwan Kim, Jan Kautz
Point cloud registration sits at the core of many important and challenging 3D perception problems including autonomous navigation, SLAM, object/scene recognition, and augmented reality. In this paper, we present a new registration algorithm that is able to achieve state-of-the-art speed and accuracy through its use of a hierarchical Gaussian Mixture Model (GMM) representation. Our method constructs a top-down multi-scale representation of point cloud data by recursively running many small-scale data likelihood segmentations in parallel on a GPU. We leverage the resulting representation using a novel PCA-based optimization criterion that adaptively finds the best scale to perform data association between spatial subsets of point cloud data. Compared to previous Iterative Closest Point and GMM-based techniques, our tree-based point association algorithm performs data association in logarithmic-time while dynamically adjusting the level of detail to best match the complexity and spatial distribution characteristics of local scene geometry. In addition, unlike other GMM methods that restrict covariances to be isotropic, our new PCA-based optimization criterion well-approximates the true MLE solution even when fully anisotropic Gaussian covariances are used. Efficient data association, multi-scale adaptability, and a robust MLE approximation produce an algorithm that is up to an order of magnitude both faster and more accurate than current state-of-the-art on a wide variety of 3D datasets captured from LiDAR to structured light. [1807.02587v1]
Automated and Interpretable Patient ECG Profiles for Disease Detection, Tracking, and Discovery
Geoffrey H. Tison, Jeffrey Zhang, Francesca N. Delling, Rahul C. Deo
The electrocardiogram or ECG has been in use for over 100 years and remains the most widely performed diagnostic test to characterize cardiac structure and electrical activity. We hypothesized that parallel advances in computing power, innovations in machine learning algorithms, and availability of large-scale digitized ECG data would enable extending the utility of the ECG beyond its current limitations, while at the same time preserving interpretability, which is fundamental to medical decision-making. We identified 36,186 ECGs from the UCSF database that were 1) in normal sinus rhythm and 2) would enable training of specific models for estimation of cardiac structure or function or detection of disease. We derived a novel model for ECG segmentation using convolutional neural networks (CNN) and Hidden Markov Models (HMM) and evaluated its output by comparing electrical interval estimates to 141,864 measurements from the clinical workflow. We built a 725-element patient-level ECG profile using downsampled segmentation data and trained machine learning models to estimate left ventricular mass, left atrial volume, mitral annulus e’ and to detect and track four diseases: pulmonary arterial hypertension (PAH), hypertrophic cardiomyopathy (HCM), cardiac amyloid (CA), and mitral valve prolapse (MVP). CNN-HMM derived ECG segmentation agreed with clinical estimates, with median absolute deviations (MAD) as a fraction of observed value of 0.6% for heart rate and 4% for QT interval. Patient-level ECG profiles enabled quantitative estimates of left ventricular and mitral annulus e’ velocity with good discrimination in binary classification models of left ventricular hypertrophy and diastolic function. Models for disease detection ranged from AUROC of 0.94 to 0.77 for MVP. Top-ranked variables for all models included known ECG characteristics along with novel predictors of these traits/diseases. [1807.02569v1]
Ilke Demir, Daniel G. Aliaga
We describe a guided proceduralization framework that optimizes geometry processing on architectural input models to extract target grammars. We aim to provide efficient artistic workflows by creating procedural representations from existing 3D models, where the procedural expressiveness is controlled by the user. Architectural reconstruction and modeling tasks have been handled as either time consuming manual processes or procedural generation with difficult control and artistic influence. We bridge the gap between creation and generation by converting existing manually modeled architecture to procedurally editable parametrized models, and carrying the guidance to procedural domain by letting the user define the target procedural representation. Additionally, we propose various applications of such procedural representations, including guided completion of point cloud models, controllable 3D city modeling, and other benefits of procedural modeling. [1807.02578v1]
Christopher Syben,Bernhard Stimpel,Jonathan Lommen,TobiasWürfl,ArndDörfler,Andreas Maier
在本文中,我们根据精确学习概念的并行 – 扇形光束转换问题的分析公式推导出一种神经网络结构。网络允许以数据驱动的方式学习此转换中的未知运算符,从而避免插值和潜在的分辨率损失。已知运算符的集成导致少量可训练参数,这些参数仅可从合成数据估计。在混合MRI / X射线成像的背景下评估该概念,其中需要将平行束MRI投影转换成扇形束X射线投影。将所提出的方法与传统的重组方法进行比较。结果表明,所提出的方法优于逐射线插值,并且能够使用相同数量的平行光束输入投影来提供更清晰的图像,这对于介入应用是至关重要的。我们相信这种方法为深度学习,信号处理,物理和传统模式识别的进一步工作奠定了基础。[1807.03057v1]
Saeid Asgari Taghanaki,Arkadeep Das,Ghassan Hamarneh
最近,已经有几种成功的深度学习方法用于将胸部X射线图像自动分类为不同的疾病类别。然而,对于所谓的对抗性扰动/攻击,尚未对这些模型进行全面的脆弱性分析,这使得深度模型在临床实践中更加可信。在本文中,我们广泛分析了两个最先进的分类深度网络在胸部X射线图像上的性能。这两个网络受到三种不同类别(总共九种方法)的对抗方法(白盒和黑盒)的攻击,即基于梯度的,基于分数的和基于决策的攻击。此外,我们修改了两个分类网络中的池化操作,以衡量它们对不同攻击的敏感性,关于胸部X线分类的具体任务。[1807.02905v1]
Kezhi Li, John Daniels, Pau Herrero-viñas, Chengyuan Liu, Pantelis Georgiou
人工胰腺(AP)或1型糖尿病(T1D)受试者的任何糖尿病治疗的主要目的是将受试者的血浆葡萄糖水平维持在正常血糖范围内,这意味着低于高血糖阈值并高于低血糖阈值。现代连续血糖监测仪(CGM)的发展使这成为可能。它的持续监测使人们能够在低血糖/高血糖发作前采取行动。因此,准确的血糖(BG)预测是必不可少的。它在真正的低血糖/高血糖情景之前发出警报,并且对于非低血症/非高血糖发作保持沉默。数据驱动方法最近已广泛用于医疗保健和医学研究中的统计建模,特别是深度神经网络技术。在本文中,葡萄糖预测被视为概率生成问题,并提出了混合深度神经网络来结合卷积神经网络(CNN)和递归神经网络(RNN)的优点。具体地,将多层CNN实现为特征提取组件,接着是多层修改的长期短期记忆(LSTM)模型,以捕获未来BG与历史BG水平,膳食信息和胰岛素之间的概率相关性。该模型适用于具有T1D的个体受试者,并且利用辍学层来避免过度拟合。该模型可以使用Tensorflow轻松实现,并且可以嵌入到计算资源有限的便携式设备中。实验使用模拟和临床数据验证和评估所提出方法的有效性。[1807。
Andreas Hauptmann,Ben Cox,Felix Lucka,Nam Huynh,Marta Betcke,Paul Beard,Simon Arridge
我们提出了一种使用基于k空间光声层析成像方法的快速近似正演模型进行加速迭代重建的框架。近似模型在梯度信息中引入混叠伪像用于迭代重建,但是这些假象是高度结构化的,并且我们可以训练可以使用近似信息执行迭代重建的CNN。我们在有限视图几何中显示了用于人体内测量的方法的可行性。所提出的方法能够对总变差重建产生优异的结果,加速度为32倍。[1807.03191v1]
Milad Niknejad,Jose M. Bioucas-Dias,Mario AT Figueiredo
本文提出了一种基于条件随机场(CRF)的基于内部补丁的图像恢复的通用框架。与基于马尔可夫随机场(MRF)的相关模型不同,我们的方法明确地规定了整个图像的后验分布。潜在的功能与每个补丁的可能性和先验的乘积成比例。通过假设相似补丁的相同参数,我们的方法可以归类为基于模型的非本地方法。对于CRF模型的潜在函数中的先前项,考虑多元高斯和多元高斯尺度混合,后者是图像块的新颖先验。我们的结果表明,所提出的方法优于基于高斯混合模型的图像去噪方法和用于图像插值/修复的最先进方法。[1807.03027v1]
Ari Heljakka,Arno Solin,Juho Kannala
我们引入了一种新颖的生成自编码网络模型,该模型学习以高质量和高分辨率编码和重建图像,并支持从编码器的潜在空间进行平滑随机采样。生成对抗网络(GAN)以其模拟随机高质量图像的能力而闻名,但它们无法重建现有图像。以前的工作已经尝试扩展GAN以支持这种推断,但到目前为止还没有提供令人满意的高质量结果。相反,我们提出逐步增长的生成自动编码器(PIONEER)网络,它可以用128美元{\次} 128美元的图像实现高质量的重建,而无需GAN鉴别器。我们通过最近推出的对抗性编码器 – 发电机网络,合并了最近逐步建立网络部分的技术。在许多实际应用中,重建输入图像的能力至关重要,并且允许对现有图像进行精确的智能操作。我们在图像合成和推理方面展示了有希望的结果,并在CelebA推理任务中获得了最先进的结果。[1807.03026v1]
Milad Niknejad,Jose M. Bioucas-Dias,Mario AT Figueiredo
本文介绍了一种基于外部数据集和重要性抽样的基于补丁的图像恢复的新方法。使用来自外部数据集的样本来近似图像块的最小均方误差(MMSE)估计,其计算需要求解多维(通常难以处理的)积分。新方法可以解释为外部非局部均值(NLM)的推广,使用自标准化重要性采样来有效地近似MMSE估计。使用自标准化重要性采样赋予所提出的方法很大的灵活性,即关于测量噪声的统计特性。在使用通用大规模和类特定外部数据集的一系列实验中显示了所提出方法的有效性。[1807。
Hosnieh Sattar,Gerard Pons-Moll,Mario Fritz
为了研究服装和身体形状之间的相关性,我们收集了一个新的数据集(Fashion Takes Shape),其中包括具有服装类别注释的用户的图像。我们采用多照片方法来估计每个用户的身体形状,并建立一个给定体形的服装类别的条件模型。我们证明,在现实世界的数据中,服装类别和身体形状是相关的,并且表明我们的多照片方法与基于单视图形状估计或手动注释的身体类型的模型相比,导致更好的服装类别预测模型。我们认为我们的方法是从体形大规模理解服装偏好的第一步。[1807.03235v1]
Xu Liu, Licheng Jiao, Xu Tang, Qigong Sun, Dan Zhang
分析极化合成孔径雷达(PolSAR)数据的极化散射矩阵的方法一直是PolSAR图像分类的重点。通常,极化相干矩阵和由极化散射矩阵获得的协方差矩阵仅显示有限数量的极化信息。为了解决这个问题,我们提出了一种稀疏散射编码方法来处理极化散射矩阵并获得一个完整的特征。该编码模式还可以完全保持散射矩阵的极化信息。同时,针对这种编码方式,我们设计了一种基于卷积网络的相应分类算法来组合该特征。基于稀疏散射编码和卷积神经网络,提出了极化卷积网络,通过充分利用极化信息对PolSAR图像进行分类。我们对AIRSAR和RADARSAT-2采集的PolSAR图像进行了实验,以验证所提出的方法。实验结果表明,该方法取得了较好的效果,具有巨大的PolSAR数据分类潜力。有关稀疏散射编码的源代码,请访问https://github.com/liuxuvip/Polarimetric-Scattering-Coding。[1807.02975v1] 有关稀疏散射编码的源代码,请访问https://github.com/liuxuvip/Polarimetric-Scattering-Coding。[1807.02975v1] 有关稀疏散射编码的源代码,请访问https://github.com/liuxuvip/Polarimetric-Scattering-Coding。[1807.02975v1]
用于精确检测和识别场景中文本的Verisimilar图像合成
Fangneng Zhan, Shijian Lu, Chuhui Xue
在为各种视觉检测和识别任务训练深度神经网络模型时,对大量注释图像的要求已成为一项重大挑战。本文提出了一种新颖的图像合成技术,旨在生成大量带注释的场景文本图像,用于训练准确和鲁棒的场景文本检测和识别模型。所提出的技术包括三种创新设计。首先,它通过在背景图像内的语义敏感区域嵌入文本来实现“语义连贯”合成,其中通过利用在先前语义分割研究中创建的对象和图像区域的语义注释来实现语义一致性。第二,它利用视觉显着性来确定每个语义敏感区域内的嵌入位置,这与文本通常放置在同质区域周围以便在场景中获得更好可见性的事实相吻合。第三,设计了一种自适应文本外观模型,通过自适应地学习真实场景文本图像的特征,确定嵌入文本的颜色和亮度。所提出的技术已经在五个公共数据集上进行了评估,并且实验表明其在训练精确和稳健的场景文本检测和识别模型方面的优越性能。[1807.03021v1] 它设计了一种自适应文本外观模型,通过自适应地学习真实场景文本图像的特征来确定嵌入文本的颜色和亮度。所提出的技术已经在五个公共数据集上进行了评估,并且实验表明其在训练精确和稳健的场景文本检测和识别模型方面的优越性能。[1807.03021v1] 它设计了一种自适应文本外观模型,通过自适应地学习真实场景文本图像的特征来确定嵌入文本的颜色和亮度。所提出的技术已经在五个公共数据集上进行了评估,并且实验表明其在训练精确和稳健的场景文本检测和识别模型方面的优越性能。[1807.03021v1]
Nripesh Parajuli,艾伦鲁,Kevinminh钽,约翰C.斯滕达尔纳比勒Boutagy,伊姆兰Alkhalil,梅丽莎埃伯尔,岗石情妇,玛丽亚Zontak,马修·奥唐奈,阿尔伯特J. Sinusas,詹姆斯·S·邓肯
左心室(LV)变形/应变的准确量化显示出用于诊断和治疗计划的定量评估心脏功能的显着前景(Jasaityte等,2013)。然而,由于各种问题,包括与在整个心动周期中从图像和随后的组织位置导出跟踪令牌有关的那些问题,准确估计心肌组织的位移并因此LV应变一直是挑战。在这项工作中,我们提出了一种点匹配方案,其中对应关系被建模为通过图形网络的流程。心肌表面点被设置为网络中的节点,并且边缘在时间上定义邻域关系。新颖之处在于对匹配方案施加的限制,它在整个心动周期中一对一地呈现对应关系,而不仅仅是两个连续的帧。约束还鼓励运动是循环的,这是LV运动的重要特征。我们通过将其应用于使用8个合成和8个开胸的4D超声心动图图像序列估计定量LV位移和应变估计来验证我们的方法,后者在LV壁上植入声学示像晶体。我们能够在合成数据集上实现出色的跟踪精度,并观察到与基于晶体的菌株在体内数据上的良好相关性。[1807.02951v1] 我们通过将其应用于使用8个合成和8个开胸的4D超声心动图图像序列估计定量LV位移和应变估计来验证我们的方法,后者在LV壁上植入声学示像晶体。我们能够在合成数据集上实现出色的跟踪精度,并观察到与基于晶体的菌株在体内数据上的良好相关性。[1807.02951v1] 我们通过将其应用于使用8个合成和8个开胸的4D超声心动图图像序列估计定量LV位移和应变估计来验证我们的方法,后者在LV壁上植入声学示像晶体。我们能够在合成数据集上实现出色的跟踪精度,并观察到与基于晶体的菌株在体内数据上的良好相关性。[1807.02951v1]
Pengchong Jin, Vivek Rathod, Xiangxin Zhu
我们想分享一下Single Shot Multibox Detector(SSD)系列探测器的简单调整,它可以有效地减小模型尺寸,同时保持相同的质量。我们在所有尺度上共享盒子预测器,并用最大池化替换卷之间的卷积。与vanilla SSD相比,这有两个优点:(1)它避免了各种尺度的评分错误校准; (2)共享预测器可以查看所有比例的训练数据。由于我们将预测变量的数量减少到一个,并修剪它们之间的所有卷积,因此模型大小要小得多。我们凭经验证明,与香草SSD相比,这些变化不会损害模型质量。[1807.03284v1]
重点Mukherjee,Leburu Anvitha,T。Mohana Lahari
在过去十年中,RGB-D视频中的人类活动识别一直是一个活跃的研究课题。然而,在文献中没有找到用于识别RGB-D视频中的人类活动的努力,其中几个表演者同时进行。在本文中,我们介绍了这样一个具有挑战性的数据集,其中有几个执行者正在执 我们提出了一种识别此类视频中人类活动的新方法。所提出的方法旨在通过产生与输入视频相对应的动态图像来捕获整个视频的运动信息。我们使用两个并行的ResNext-101分别为RGB视频和深度视频生成动态图像。动态图像仅包含运动信息,因此消除了不必要的背景信息。我们分别通过完全连接的神经网络层发送从RGB和深度视频中提取的两个动态图像。所提出的动态图像通过从视频中提取稀疏矩阵来降低识别过程的复杂性。然而,所提出的系统维持用于识别活动的所需运动信息。所提出的方法已经在MSR Action 3D数据集上进行了测试,并且已经显示出与现有技术相当的性能。我们还在我们自己的数据集上应用所提出的方法,其中所提出的方法优于最先进的方法。[1807.02947v1] 所提出的动态图像通过从视频中提取稀疏矩阵来降低识别过程的复杂性。然而,所提出的系统维持用于识别活动的所需运动信息。所提出的方法已经在MSR Action 3D数据集上进行了测试,并且已经显示出与现有技术相当的性能。我们还在我们自己的数据集上应用所提出的方法,其中所提出的方法优于最先进的方法。[1807.02947v1] 所提出的动态图像通过从视频中提取稀疏矩阵来降低识别过程的复杂性。然而,所提出的系统维持用于识别活动的所需运动信息。所提出的方法已经在MSR Action 3D数据集上进行了测试,并且已经显示出与现有技术相当的性能。我们还在我们自己的数据集上应用所提出的方法,其中所提出的方法优于最先进的方法。[1807.02947v1] 其中提出的方法优于最先进的方法。[1807.02947v1] 其中提出的方法优于最先进的方法。[1807.02947v1]
Kaiyang Zhou, Tao Xiang, Andrea Cavallaro
大多数现有的视频摘要方法基于有监督或无监督学习。在本文中,我们提出了一种基于强化学习的弱监督方法,该方法利用易于获取的视频级别类别标签,并鼓励摘要包含与类别相关的信息并保持类别可识别性。具体而言,我们将视频摘要制定为顺序决策过程,并培训具有深度Q学习(DQSN)的摘要网络。还训练伴侣分类网络以提供训练DQSN的奖励。通过分类网络,我们基于分类结果开发全局可识别性奖励。重要的是,为了应对长序列强化学习的时间延迟和稀疏奖励问题,还提出了一种新颖的基于密集排名的奖励。对两个基准数据集的大量实验表明,所提出的方法实现了最先进的性能。[1807.03089v1]
Zhuotun Zhu, Yingda Xia, Lingxi Xie, Elliot K. Fishman, Alan L. Yuille
本文提出了一种直观的方法,通过检查腹部CT扫描来发现胰腺导管腺癌(PDAC),这是最常见的胰腺癌类型。我们的想法被命名为分类分割(S4C),其通过检查是否至少足够数量的体素被分割为肿瘤来对体积进行分类。为了处理不同尺度的肿瘤,我们使用多尺度输入训练体积分割网络,并在粗略到精细的流程图中进行测试。后处理模块用于过滤异常值并减少误报。我们对包含439例CT扫描的数据集进行了案例研究,其中136例诊断为PDAC,303例正常。我们的方法报告敏感性为94.1%,特异性为98.5%,平均肿瘤分割准确度为56。所有PDAC案例的46%。[1807.02941v1]
Rosanne Liu,Joel Lehman,Piero Molino,Felipe Petroski,Eric Frank,Alex Sergeev,Jason Yosinski
很少有想法像卷积一样对深度学习产生巨大影响。对于涉及像素或空间表示的任何问题,常见的直觉认为卷积神经网络可能是合适的。在本文中,我们通过看似平凡的坐标变换问题展示了这种直觉的一个引人注目的反例,它只需要学习(x,y)笛卡尔空间和单热像素空间中坐标之间的映射。虽然卷积网络似乎适合这项任务,但我们表明它们失败了。我们首先在玩具问题上展示并仔细分析故障,此时简单的修复变得明显。我们将此解决方案称为CoordConv,它通过使用额外的坐标通道对其自己的输入坐标进行卷积访问来工作。在不牺牲普通卷积的计算和参数效率的情况下,CoordConv允许网络根据任务的要求学习完美的平移不变性或不同程度的平移依赖性。CoordConv解决了坐标变换问题,具有完美的泛化速度,速度提高了150倍,参数比卷积少10-100倍。这种鲜明的对比提出了一个问题:卷积无法在多大程度上潜伏在其他任务中,从根本上阻碍了内部的表现?这个问题的完整答案需要进一步调查,但我们展示了初步证据,即交换CoordConv的卷积可以改善各种任务的模型。在GAN中使用CoordConv产生较少的模式崩溃,因为高级空间潜伏和像素之间的转换变得更容易学习。在使用CoordConv时,使用MNIST检测训练的更快的R-CNN检测模型显示24%更好的IOU,并且在RL域中,玩Atari游戏的代理从使用CoordConv层中获益显着。[1807.03247v1]
Di Hu, Feiping Nie, Xuelong Li
看到的鸟儿叽叽喳喳,跑步车伴随着噪音,人们面对面交谈等等。这些自然的视听通信提供了探索和理解外部世界的可能性。然而,混合的多个对象和声音使得在无约束环境中执行有效匹配变得棘手。为了解决这个问题,我们建议充分挖掘音频和视觉组件,并在它们之间进行精细的通信学习。具体地,提出了一种新的无监督视听学习模型,称为深度协同聚类(DCC),其与不同共享空间中的卷积图的多模态向量同步地执行聚类集合,以捕获多个视听对应。并且这种集成的多模式集群网络可以以端到端的方式进行有效的最大边际损失训练。执行特征评估和视听任务中的实验量。结果表明,DCC可以学习有效的单峰表示,分类器甚至可以胜过人类。此外,DCC在声音定位,多源检测和视听理解的任务中表现出显着的性能。[1807.03094v1] 和视听理解。[1807.03094v1] 和视听理解。[1807.03094v1]
Shiqi Yang, Gang Peng
本文提出了一种新的语义分割注意模型,该模型聚合了多尺度和上下文特征来优化预测。具体来说,骨架卷积神经网络框架采用多种不同的尺度输入,这意味着CNN可以获得不同尺度的表示。建议的注意模型将分别处理来自不同比例流的特征并将它们集成。然后,模型的位置关注分支学习对每个像素位置处的多尺度特征进行柔和加权。此外,我们添加一个重新校准分支,与注意位置的位置平行,以重新校准每个类的分数图。我们在PASCAL VOC 2012和ADE20K数据集上取得了极具竞争力的成果,超过了基线和相关工作。[1807.02917v1]
Jeesoo Kim,Jaeyoung Yoo,Jangho Kim,Nojun Kwak
由于生成对抗网络在图像生成问题上取得了突破,因此对其应用进行了大量研究,如图像恢复,样式转移和图像完成。然而,很少有研究在不受控制的现实环境中生成对象。在本文中,我们提出了一种在现实场景中生成图像的新方法。整体架构由两个不同的网络组成,每个网络完成生成对象的形状并分别在其上绘制上下文。使用在先前的图像完成工作中提出的子网,我们的模型形成了一个对象的形状。与图像完成问题中使用的方法不同,训练对象的细节通过附加子网编码为潜在变量,从而提高生成对象的质量。我们使用KITTI和City-scape数据集评估了我们的方法,这些数据集广泛用于对象检测和图像分割问题。还使用广泛使用的对象检测算法评估了所提出的方法生成的图像的充分性。[1807.02925v1]
Jia-Xing Zhong, Nannan Li, Weijie Kong, Tao Zhang, Thomas H. Li, Ge Li
弱监督时间动作检测是理解未修剪视频的艰巨任务,因为除了视频级别类别标签之外,没有监控信号可用于训练数据。在类别标签的监督下,弱监督检测器通常建立在分类器之上。然而,分类器和检测器之间存在固有的矛盾; 即,追求高分类性能的分类器更喜欢极其零碎的顶级判别视频剪辑,而检测器必须发现整个动作实例而不丢失任何相关片段。为了调和这一矛盾,我们通过驱动一系列分类器来训练探测器,通过逐步擦除整个视频逐步找到新的动作片段。在测试阶段,我们需要做的就是在各种擦除步骤中从一个接一个的训练分类器中收集检测结果。为了协助收集过程,建立完全连接的条件随机场以改进时间定位输出。我们在两个主流数据集THUMOS’14和ActivityNet上评估我们的方法。实验表明,我们的探测器推进了最先进的弱监督时间动作检测结果,甚至可以与相当多的强监督方法进行比较。[1807.02929v1] 实验表明,我们的探测器推进了最先进的弱监督时间动作检测结果,甚至可以与相当多的强监督方法进行比较。[1807.02929v1] 实验表明,我们的探测器推进了最先进的弱监督时间动作检测结果,甚至可以与相当多的强监督方法进行比较。[1807.02929v1]
Sangryul Jeon,Seungryong Kim,Dongbo Min,Kwanghoon Sohn
本文提出了一种密集语义对应的深层体系结构,称为金字塔仿射回归网络(PARN),用于估计图像中像素变化的仿射变换域。为了处理同一对象类别中不同实例之间通常存在的类内外观和形状变化,我们利用金字塔模型,其中以粗到细的方式逐步估计密集仿射变换场,使得平滑约束自然强加在深层网络中。PARN估计每个级别的残差仿射变换并将它们组合以估计最终的仿射变换。此外,为了克服语义对应的训练数据不足的局限性,我们提出了一种新颖的弱监督训练方案,通过利用图像间的对应一致性来产生渐进式监督。我们的方法可以以端到端的方式完全学习,并且不需要量化无限连续仿射变换场。据我们所知,这是第一个尝试在深度网络中以粗到细的方式估计密集仿射变换场的工作。实验结果表明,PARN在各种基准测试中优于最先进的密集语义对应方法。[1807.02939v1] 这是第一个尝试在深层网络中以粗到细的方式估计密集仿射变换场的工作。实验结果表明,PARN在各种基准测试中优于最先进的密集语义对应方法。[1807.02939v1] 这是第一个尝试在深层网络中以粗到细的方式估计密集仿射变换场的工作。实验结果表明,PARN在各种基准测试中优于最先进的密集语义对应方法。[1807.02939v1]
Walid Abdullah Al,Il Dong Yun
部署了长期累积回报的理念,强化学习在各个领域都表现出了卓越的表现。我们提出将3D医学图像中的标志性定位的公式作为强化学习问题。虽然基于价值的方法已被广泛用于解决类似的问题,但我们采用基于行动者 – 批评的直接政策搜索方法,以时间差异学习方法为框架。成功的行为学习在大型州和/或行动空间中具有挑战性,需要许多试验。我们引入了部分基于策略的强化学习,通过学习较小的部分域上的最优策略来解决大型本地化问题。独立行动者有效地学习相应的部分政策,每个政策都利用自己独立的批评 从部分策略中提出的策略重建确保了利用子代理在其相应的部分动作空间中解决简单二元决策问题的稳健且有效的定位。与原始行为学习方案相比,拟议的强化学习需要少量试验来学习最佳行为。[1807.02908v1]
Zhiguo Zhou, Shulong Li, Genggeng Qin, Michael Folkert, Steve Jiang, Jing Wang
准确地对筛查扫描中检测到的病变的恶性进行分类在减少假阳性中起关键作用。通过提取和分析大量的定量图像特征,放射医学具有很大的潜力,可以区分恶性肿瘤和良性肿瘤。由于并非所有的放射学特征都有助于有效的分类模型,因此选择最佳特征子集至关重要。这项工作提出了一种新的基于多目标的特征选择(MO-FS)算法,该算法同时将灵敏度和特异性同时视为特征选择期间的目标函数。在MO-FS中,我们开发了一种改进的基于熵的终止标准(METC)来自动停止算法,而不是依赖于预设的几代。我们还使用证据推理方法(SMOLER)设计了一种用于多目标学习的解决方案选择方法,以从帕累托最优集合中自动选择最优解。此外,开发了自适应变异操作以自动生成MO-FS中的变异概率。评估MO-FS在数字乳房断层合成中对低剂量CT和乳腺病变恶性肿瘤中的肺结节恶性进行分类。与其他常用的特征选择方法相比,肺结节和乳腺病变恶性肿瘤分类的实验结果表明,选择MO-FS的特征集可以获得更好的分类性能。[1807.03236v1] 评估MO-FS在数字乳房断层合成中对低剂量CT和乳腺病变恶性肿瘤中的肺结节恶性进行分类。与其他常用的特征选择方法相比,肺结节和乳腺病变恶性肿瘤分类的实验结果表明,选择MO-FS的特征集可以获得更好的分类性能。[1807.03236v1] 评估MO-FS在数字乳房断层合成中对低剂量CT和乳腺病变恶性肿瘤中的肺结节恶性进行分类。与其他常用的特征选择方法相比,肺结节和乳腺病变恶性肿瘤分类的实验结果表明,选择MO-FS的特征集可以获得更好的分类性能。[1807.03236v1]
Hongyu Wang, Yong Xia
计算机辅助技术可以使胸部X线摄影中的胸部疾病得到更准确和更可行的诊断。尽管基于深度学习的解决方案取得了成功,但这项任务仍然是智能医疗领域的一项重大挑战,因为它本质上是一个弱监督学习问题。在本文中,我们将注意机制纳入深度卷积神经网络,因此提出了ChestNet模型来解决胸部X线摄影中胸部疾病的有效诊断。该模型由两个分支组成:分类分支作为统一的特征提取 – 分类网络,使用户免于麻烦的手工特征提取,注意分支利用类别标签和病理异常位置之间的相关性,并允许模型自适应地集中在病理异常区域。我们使用官方的患者分裂对胸部X射线14数据集上的三个最先进的深度学习模型评估我们的模型。结果表明,我们的模型优于其他方法,不使用额外的训练数据,诊断胸部X线摄影中的14种胸部疾病。[1807.03058v1] 在胸部X线摄影中诊断出14种胸部疾病。[1807.03058v1] 在胸部X线摄影中诊断出14种胸部疾病。[1807.03058v1]
Asim Iqbal,Asfandyar Sheikh,Theofanis Karayannis
我们在这里介绍一种基于全自动卷积神经网络的脑图像处理方法,用于在发育过程中检测不同脑区的神经元(DeNeRD)。我们的方法将发育中的小鼠大脑作为输入,i)针对正在发育的小鼠参考图谱记录脑切片,ii)检测各种类型的神经元,以及iii)量化不同出生后(P)时间的许多独特脑区域中的神经密度点。我们的方法对神经元的形状,大小和表达不变,并且通过使用DeNeRD,我们比较了P4,P14和P56年龄发育中大脑中所有GABA能神经元的全脑神经密度。我们发现并报告了小鼠脑中6个不同的区域簇,其中GABA能神经元从早期(P4)到成年期以不同的方式发展(P56)。这些聚类揭示了GABAergic细胞发育的关键步骤,当鼠标从感觉信息的被动接收器(<P14)转变为活跃的导引头(> P14)时,似乎跟踪不同大脑区域的功能发展。[1807.03238v1]
Sergiu Deitsch,Vincent Christlein,Stephan Berger,Claudia Buerhop-Lutz,Andreas Maier,Florian Gallwitz,Christian Riess
电致发光(EL)成像是用于检查光伏(PV)模块的有用模态。EL图像提供高空间分辨率,这使得可以检测PV模块表面上甚至最精细的缺陷。然而,EL图像的分析通常是手动过程,其昂贵,耗时并且需要许多不同类型缺陷的专业知识。在这项工作中,我们研究了两种在PV电池单个图像中自动检测此类缺陷的方法。这些方法的硬件要求不同,这些要求由各自的应用场景决定。更加硬件有效的方法基于手工制作的功能,这些功能归类为支持向量机(SVM)。为了获得强大的性能,我们调查并比较了各种处理变体。更加硬件要求的方法使用在图形处理单元(GPU)上运行的端到端深度卷积神经网络(CNN)。两种方法都是在从单晶和多晶PV模块的高分辨率EL强度图像中提取的1,968个单元上进行训练。CNN更准确,平均准确率达到88.42%。SVM的平均精度略低,为82.44%,但可以在任意硬件上运行。两种自动化方法都可以对PV电池进行连续,高精度的监测。[1807.02894v1] 平均准确率达到88.42%。SVM的平均精度略低,为82.44%,但可以在任意硬件上运行。两种自动化方法都可以对PV电池进行连续,高精度的监测。[1807.02894v1] 平均准确率达到88.42%。SVM的平均精度略低,为82.44%,但可以在任意硬件上运行。两种自动化方法都可以对PV电池进行连续,高精度的监测。[1807.02894v1]
Xiaomeng Li, Lequan Yu, Chi-Wing Fu, Pheng-Ann Heng
皮肤镜图像中的自动病变分割是计算机辅助诊断黑素瘤的必要步骤。皮肤镜检查图像表现出旋转和反射对称性,然而,这种几何特性尚未在基于现有技术的基于卷积神经网络的皮肤病变分割方法中编码。在本文中,我们通过扩展最近的群体旋转等变网络~\ cite {cohen2016group},提出了一种深度监督的旋转等效网络,用于皮肤病变分割。具体来说,我们提出G-上采样和G-投影操作,以适应我们的皮肤病变分割问题的旋转等效分类网络。为了进一步提高性能,我们将深度监督方案集成到我们提出的旋转等变分段架构中。整个框架与输入转换等效,包括旋转和反射,这提高了网络效率,从而有助于分割性能。我们在ISIC 2017皮肤病变挑战数据集上广泛评估我们的方法。实验结果表明,在不同的实验设置下,我们的旋转等变网络始终优于具有相同模型复杂度的常规对应网络。我们的最佳模型在测试数据集上达到77.23 \%(JA),优于最先进的挑战方法,并进一步证明了我们提出的深度监督旋转等变分割网络的有效性。我们最好的模型也优于最先进的挑战方法,进一步证明了我们提出的深度监督旋转等变分割网络的有效性。[1807.02804v1]
Haseeb Shah,Khurram Javed,Faisal Shafait
从渐进式数据中学习的能力对任何终身学习系统都至关重要。然而,标准的深度神经网络在逐渐到达的数据训练时忘记了关于旧任务的知识,这种现象称为灾难性遗忘。我们讨论了当前基于生成性对抗网络(GAN)的方法中的偏差,这些方法通过先前训练的分类器的知识蒸馏来学习分类器。这些偏差导致训练有素的分类器表现不佳。我们提出了一种通过从AC-GAN的分类器中提取知识来消除这些偏差的方法。对MNIST和CIFAR10的实验表明,该方法与现有技术的基于排练的方法相当。本文的代码可以在这个$ \ href {https:// github上找到。COM / haseebs /伪彩排增量学习} {}链接$。[1807.02799v1]
Bo Li, Tianfu Wu, Lun Zhang, Rufeng Chu
基于区域的卷积神经网络(R-CNN)~\ cite {fast_rcnn,faster_rcnn,mask_rcnn}在很大程度上主导了对象检测。在RoIs(兴趣区域)上定义的运营商在R-CNN中扮演重要角色,例如RoIPooling~ \ cite {fast_rcnn}和RoIAlign~ \ cite {mask_rcnn}。它们都只利用RoIs中的信息进行RoI预测,即使是最近的可变形扩展~\ cite {deformable_cnn}。尽管周围环境因其在物体检测中的重要性而众所周知,但它已经以灵活有效的方式集成在R-CNN中。受自动上下文工作~\ cite {auto_context}和多类对象布局工作~\ cite {nms_context}的启发,本文提出了一个通用的上下文挖掘RoI操作符(即\ textit {RoICtxMining})无缝集成在R-细胞神经网络,并且所得到的对象检测系统被称为\ textbf {自动上下文R-CNN},其被端到端地训练。建议的RoICtxMining运算符是RoIPooling或RoIAlign运算符的简单而有效的两层扩展。以对象RoI为中心,它创建了一个$ 3 \ times 3 $布局,以便在$ 8 $周围环境区域内即时自适应地挖掘上下文信息。在每个$ 8 $上下文区域内,根据判别力挖掘上下文RoI,并将其RoIPooling / RoIAlign特征与object-RoI连接以进行最终预测。\ textit {建议的自动上下文R-CNN对遮挡和小物体具有很强的鲁棒性,并且在没有经过对抗训练的情况下显示出对抗性攻击的有希望的脆弱性。}在实验中,它使用RoIPooling作为主干进行评估,并在Pascal VOC,Microsoft COCO和KITTI数据集上显示出有竞争力的结果(包括对COCO \ textit {test-dev上的R-FCN~ \ cite {rfcn}方法的6.9%\%$ mAP改进}数据集和KITTI行人和骑车人检测的第一名(截至本次提交)。[1807.02842v1]
Pushparaja Murugan
循环网络是处理顺序数据(如自然语言,声音,时间序列数据)的最强大和最有前途的人工神经网络算法之一。与传统的前馈网络不同,Recurrent Network具有固有的反馈回路,允许存储时间上下文信息并将信息状态传递给事件的整个序列。这有助于在许多重要任务中实现艺术性能,例如语言建模,股票市场预测,图像字幕,语音识别,机器翻译和对象跟踪等。然而,训练完全连接的RNN和管理梯度流是复杂的过程。进行了许多研究以解决上述限制。本文旨在提供有关复发神经元,其方差以及训练完全复发神经网络的旅行和技巧的简要细节。此审查工作是作为我们的IPO工作室软件模块“多目标跟踪”的一部分进行的。[1807.02857v1]
卡里姆伊斯卡科夫
本文介绍了一种用于不规则孔洞的图像修复的半参数方法。非参数部分由外部图像数据库组成。在测试时间期间,数据库用于检索补充图像,类似于输入屏蔽图像,并将其用作深度神经网络的辅助信息。此外,我们提出了一种生成具有不规则孔的掩模的新方法,并且提供了具有这种掩模的公共数据集。CelebA-HQ数据集上的实验表明,我们的半参数方法比以前的方法产生更真实的结果,用户研究证实了这一点。[1807.02855v1]
Toufiq Parag,Daniel Berger,Lee Kamentsky,Benedict Staffler,Donlai Wei,Moritz Helmstaedter,Jeff W. Lichtman,Hanspeter Pfister
突触连接检测是从电子显微镜(EM)数据进行神经重建的关键任务。用于突触检测的大多数现有算法不能同时识别连接的裂缝位置和方向。计算方向和接触位置的几种方法仅被证明可用于二元(在脊椎动物大脑中最常见)或多角形(在果蝇大脑中发现)突触,但不适用于两种类型。在本文中,我们提出了一种算法来自动预测二元和多元突触的位置和方向。所提出的算法首先从由3D U-net生成的带符号的邻近度的体素预测生成候选突触连接。然后,第二个3D CNN修剪候选集以产生裂缝和连接方向的最终检测。实验结果表明,该方法优于现有的确定啮齿动物和果蝇大脑突触的方法。[1807.02739v1]
Anjan Dutta,Pau Riba,JosepLladós,AliciaFornés
尽管在模式识别和机器学习社区中非常成功,但基于图形的方法通常无法用于许多机器学习工具。这是因为图域中大多数数学运算的不兼容性。已经提出图形嵌入作为解决这些困难的方法,其将图形映射到向量空间并使得标准机器学习技术适用于它们。然而,众所周知,图嵌入技术通常会遭受结构信息的损失。在本文中,给出一个图,我们考虑它的层次结构,将其映射到向量空间。通过拓扑聚类图节点并将每个聚类视为上层级中的节点来构造层级结构。一旦构建了图的这种层次结构,我们就会考虑其各部分的各种配置,并使用随机图基础嵌入(SGE)将它们映射到向量空间。从广义上讲,SGE生成均匀采样的低阶到高阶图的分布,作为将图嵌入向量空间的一种方法。在下文中,图层次结构的粗到细结构和通过低到高阶随机图的分布获得的统计数据相互补充,并包括具有不同上下文的重要结构信息。总而言之,这两种技术基本上处理了图嵌入技术中常见的信息丢失,并且我们获得更稳健的图矢量空间嵌入并不奇怪。通过对各种基准图数据集的详细实验评估证实了这一事实,其中我们优于最先进的方法。[1807.02839v1]
Wentai Zhang, Haoliang Jiang, Zhangsihao Yang, Soji Yamakawa, Kenji Shimada, Levent Burak Kara
稀疏3D点云的高质量上采样对于各种几何操作(例如重建,渲染,网格划分和分析)非常有用。在本文中,我们提出了一种数据驱动算法,该算法能够在不需要硬编码规则的情况下对3D点云进行上采样。我们的方法使用具有倒角距离的深度网络作为损失函数,能够学习属于不同对象类别的点云中的潜在特征。我们在不同的放大因子上评估我们的算法,对属于同一类别和不同类别的对象学习和执行上采样。我们还探索了输入点云的理想特性,作为点样本分布的函数。最后,我们展示了我们的算法在单类训练和多类训练场景中的表现。将最终提出的模型与基线,基于优化的上采样方法进行比较。结果表明我们的算法能够产生更均匀和准确的上采样。[1807.02740v1]
Khurram Javed,Faisal Shafait
人类学习机制与人工神经网络(ANN)之间的关键差异之一是人类一次学习一项任务的能力。另一方面,人工神经网络只能同时学习多项任务。任何尝试逐步学习新任务都会导致他们完全忘记以前的任务。这种缺乏逐渐学习的能力,称为灾难遗忘,被认为是建立真正的人工智能系统的主要障碍。在本文中,我们的目标是将真正有效的现有增量学习理念与仅在某些条件下工作的理念隔离开来。为此,我们首先彻底分析了当前的增量学习技术(iCaRL)方法,并证明系统的良好性能不是由于现有文献中提出的原因。我们得出结论,iCaRL的成功主要归功于知识蒸馏,并认识到知识蒸馏的关键限制,即它经常导致分类器的偏差。最后,我们提出了一种能够成功消除这种偏差的动态阈值移动算法。我们证明了我们的算法在CIFAR100和MNIST数据集上的有效性,显示了接近最优的结果。我们的实现可以在https://github.com/Khurramjaved96/incremental-learning上找到。[1807.02802v1] 我们得出结论,iCaRL的成功主要归功于知识蒸馏,并认识到知识蒸馏的关键限制,即它经常导致分类器的偏差。最后,我们提出了一种能够成功消除这种偏差的动态阈值移动算法。我们证明了我们的算法在CIFAR100和MNIST数据集上的有效性,显示了接近最优的结果。我们的实现可以在https://github.com/Khurramjaved96/incremental-learning上找到。[1807.02802v1] 我们得出结论,iCaRL的成功主要归功于知识蒸馏,并认识到知识蒸馏的关键限制,即它经常导致分类器的偏差。最后,我们提出了一种能够成功消除这种偏差的动态阈值移动算法。我们证明了我们的算法在CIFAR100和MNIST数据集上的有效性,显示了接近最优的结果。我们的实现可以在https://github.com/Khurramjaved96/incremental-learning上找到。[1807.02802v1] COM / Khurramjaved96 /增量学习。[1807.02802v1] COM / Khurramjaved96 /增量学习。[1807.02802v1]
Pascal Mettes,Cees GM Snoek
本文的目标是仅从类标签中进行人类行为的时空定位。最先进的技术将问题归结为多实例学习,其中实例是先验计算的动作提议。我们提出了一种多实例学习的变体,它在学习过程中集成了时空本地化,而不是将本地化与学习断开。我们做了三个贡献。首先,我们定义针对行动量身定制的模型假设,并提出潜在的实例学习目标,允许在盒级进行优化。其次,我们提出了一种时空盒链接算法,利用现成人检测器的盒子提议,适用于弱监督学习。第三,我们在推理时引入了管视频级细化,以整合长期的时空动作特征。我们对三个视频数据集的实验显示了我们的贡献的好处以及与最先进的替代方案相比的竞争结果,这些替代方案仅仅是从其类别标签本地化行动。最后,我们的算法可以实现点和盒监督,允许基准,混合和平衡动作本地化性能与注释时间。[1807.02800v1] 允许基准,混合和平衡动作本地化性能与注释时间。[1807.02800v1] 允许基准,混合和平衡动作本地化性能与注释时间。[1807.02800v1]
Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, Yun Fu
卷积神经网络(CNN)深度对于图像超分辨率(SR)至关重要。但是,我们观察到图像SR的更深层网络更难以训练。低分辨率输入和特征包含丰富的低频信息,这些信息在信道上被平等对待,因此阻碍了CNN的表示能力。为了解决这些问题,我们提出了非常深的残留信道关注网络(RCAN)。具体而言,我们提出残余残余(RIR)结构以形成非常深的网络,其由具有长跳过连接的若干残余组组成。每个残差组包含一些具有短跳过连接的残余块。同时,RIR允许通过多个跳过连接绕过丰富的低频信息,使主网络专注于学习高频信息。此外,我们提出了一种通道关注机制,通过考虑通道之间的相互依赖性来自适应地重新调整通道方面的特征。大量实验表明,我们的RCAN可以在最先进的方法中实现更好的准确性和视觉改进。[1807.02758v1]
Rui Fan, Naim Dahnoun
在非平坦路面上检测多个弯曲车道标记对于汽车应用来说仍然是一项具有挑战性的任务。为了进行改进,深度信息可用于极大地增强车道检测系统的鲁棒性。本文提出的系统是从我们以前的工作开发的,其中全局估计密集消失点Vp,以帮助检测多个弯曲车道标记。然而,在估计Vp时,最优解中的异常值可能严重影响最小二乘拟合的精度。因此,在本文中,我们使用随机样本共识来迭代地更新内点和异常值,直到内部数量与总数的比例超过我们的预设阈值。这显着有助于系统克服一些突然变化的条件。此外,我们提出了一种新颖的车道位置验证方法,该方法基于Vp和梯度提供分段权重以减小非车道候选者的梯度大小。然后,我们计算每种可能解决方案的能量,并选择所有令人满意的车道位置以进行可视化。所提出的系统在异构系统上实现,该系统由Intel Core i7-4720HQ CPU和NVIDIA GTX 970M GPU组成。处理速度达到143 fps,比以前的工作速度快38倍。此外,为了评估检测精度,我们使用来自KITTI数据库的5361个泳道测试了2495个帧(比我们之前的实验多1637个泳道)。结果表明,总体成功检测率从98.7%提高到99.5%。[1807.02752v1] 我们提出了一种新颖的车道位置验证方法,该方法提供基于Vp和梯度的分段权重,以减小非车道候选者的梯度大小。然后,我们计算每种可能解决方案的能量,并选择所有令人满意的车道位置以进行可视化。所提出的系统在异构系统上实现,该系统由Intel Core i7-4720HQ CPU和NVIDIA GTX 970M GPU组成。处理速度达到143 fps,比以前的工作速度快38倍。此外,为了评估检测精度,我们使用来自KITTI数据库的5361个泳道测试了2495个帧(比我们之前的实验多1637个泳道)。结果表明,总体成功检测率从98.7%提高到99.5%。[1807.02752v1] 我们提出了一种新颖的车道位置验证方法,该方法提供基于Vp和梯度的分段权重,以减小非车道候选者的梯度大小。然后,我们计算每种可能解决方案的能量,并选择所有令人满意的车道位置以进行可视化。所提出的系统在异构系统上实现,该系统由Intel Core i7-4720HQ CPU和NVIDIA GTX 970M GPU组成。处理速度达到143 fps,比以前的工作速度快38倍。此外,为了评估检测精度,我们使用来自KITTI数据库的5361个泳道测试了2495个帧(比我们之前的实验多1637个泳道)。结果表明,总体成功检测率从98.7%提高到99.5%。[1807.02752v1] 我们计算每种可能解决方案的能量,并选择所有令人满意的车道位置进行可视化。所提出的系统在异构系统上实现,该系统由Intel Core i7-4720HQ CPU和NVIDIA GTX 970M GPU组成。处理速度达到143 fps,比以前的工作速度快38倍。此外,为了评估检测精度,我们使用来自KITTI数据库的5361个泳道测试了2495个帧(比我们之前的实验多1637个泳道)。结果表明,总体成功检测率从98.7%提高到99.5%。[1807.02752v1] 我们计算每种可能解决方案的能量,并选择所有令人满意的车道位置进行可视化。所提出的系统在异构系统上实现,该系统由Intel Core i7-4720HQ CPU和NVIDIA GTX 970M GPU组成。处理速度达到143 fps,比以前的工作速度快38倍。此外,为了评估检测精度,我们使用来自KITTI数据库的5361个泳道测试了2495个帧(比我们之前的实验多1637个泳道)。结果表明,总体成功检测率从98.7%提高到99.5%。[1807.02752v1] 处理速度达到143 fps,比以前的工作速度快38倍。此外,为了评估检测精度,我们使用来自KITTI数据库的5361个泳道测试了2495个帧(比我们之前的实验多1637个泳道)。结果表明,总体成功检测率从98.7%提高到99.5%。[1807.02752v1] 处理速度达到143 fps,比以前的工作速度快38倍。此外,为了评估检测精度,我们使用来自KITTI数据库的5361个泳道测试了2495个帧(比我们之前的实验多1637个泳道)。结果表明,总体成功检测率从98.7%提高到99.5%。[1807.02752v1]
Dohyeun Kim,Tae Joon Jun,Daeyoung Kim,Youngsub Eom
解决分类问题时,类之间数据集的数量不平衡通常会导致性能下降。特别是当某些类通过其大量数据集主导其他类时,训练有素的模型在识别主导类时表现出较低的性能。这是医疗数据集的常见情况。由于严重程度的病例并不常见,严重病例和正常病例之间的数据集数量不平衡。而且,由于它们之间的模糊性,难以精确地识别医学数据的等级。要解决这些问题,我们提出了基于排名CNN的卷积神经网络的新架构,其在识别支配类中表现出显着的性能增益,同时在主导类中消除非常小的准确度损失。我们的方法补充了将聚合多个二元神经网络模型输出的CNN排序方法应用于医疗数据时出现的问题。通过聚合方法中的锦标赛结构和使用非常深的预训练二元模型,我们提出的模型记录了68.36%的精确匹配准确度,而排名CNN记录了53.40%,预训练的Resnet记录了56.12%,CNN的线性回归记录了57.48%。因此,我们提出的方法被有效地应用于白内障分级,其具有在类之间具有不平衡数据的序数标签,还可以进一步应用于具有与白内障和类似数据集配置类似特征的医学问题。[1807.02657v1]
Ivan Ustyuzhaninov,Claudio Michaelis,Wieland Brendel,Matthias Bethge
我们介绍了一次性纹理分割:在给定一个参考纹理补丁的情况下分割包含多个纹理的输入图像的任务。此任务旨在将基于纹理的感知分组问题转变为客观基准。我们表明,从相对较少的自然纹理中为此任务生成大型合成数据集是直截了当的。特别地,该任务可以被铸造为自我监督的问题,从而减少了对传统分割任务所需的大量手动注释数据的需求。在本文中,我们介绍和研究了两个具体的数据集:一个密集的纹理拼贴(CollTex)和一个杂乱的纹理化Omniglot数据集。我们展示了对这些合成数据进行训练的基线模型能够推广到自然图像和视频而无需进一步微调,这表明学习的图像表示对于更高级别的视觉任务是有用的。[1807.02654v1]
赛义德·马吉德·阿芝米,爱莲活力,礼Bahmanyar,马可·克尔纳,彼得·赖纳茨
无约束情景下的遥感图像中的自动多类物体检测对于包括交通监控和灾害管理在内的多种应用具有高度的兴趣。%是一个至关重要的工具。物体尺度,方向,类别和复杂背景的巨大变化以及不同的相机传感器对当前的算法提出了巨大的挑战。在这项工作中,我们提出了一种新的方法,包括一个新的联合图像级联和功能金字塔网络与多尺寸卷积核,以提取多尺度的强弱语义特征。这些特征被馈送到基于旋转的区域提议和感兴趣区域网络以产生对象检测。最后,应用旋转非最大抑制来移除冗余检测。在训练中,我们最小化关节水平和定向边界框损失函数,以及强制定向框为矩形的新颖损失。在具有挑战性的新DOTA数据集上,我们的方法在水平方向上实现了68.16%的mAP,在定向边界框检测任务上实现了72.45%的mAP,大大超过了所有已发布的方法($ + 6 $ \%和$ + $ $ \%绝对值分别改进)。%,而排行榜的最佳结果是54.13 \%和60.46 \%。此外,它推广到另外两个数据集,NWPU VHR-10和UCAS-AOD,即使在DOTA上接受过培训,也可以通过基线获得有竞争力的结果。我们的方法可以部署在多类物体检测应用中,无论图像和物体的尺度和方向如何,使其成为无约束天线和卫星图像的绝佳选择。[1807.02700v1] 以及强制定向框为矩形的新颖损失。在具有挑战性的新DOTA数据集上,我们的方法在水平方向上实现了68.16%的mAP,在定向边界框检测任务上实现了72.45%的mAP,大大超过了所有已发布的方法($ + 6 $ \%和$ + $ $ \%绝对值分别改进)。%,而排行榜的最佳结果是54.13 \%和60.46 \%。此外,它推广到另外两个数据集,NWPU VHR-10和UCAS-AOD,即使在DOTA上接受过培训,也可以通过基线获得有竞争力的结果。我们的方法可以部署在多类物体检测应用中,无论图像和物体的尺度和方向如何,使其成为无约束天线和卫星图像的绝佳选择。[1807.02700v1] 以及强制定向框为矩形的新颖损失。在具有挑战性的新DOTA数据集上,我们的方法在水平方向上实现了68.16%的mAP,在定向边界框检测任务上实现了72.45%的mAP,大大超过了所有已发布的方法($ + 6 $ \%和$ + $ $ \%绝对值分别改进)。%,而排行榜的最佳结果是54.13 \%和60.46 \%。此外,它推广到另外两个数据集,NWPU VHR-10和UCAS-AOD,即使在DOTA上接受过培训,也可以通过基线获得有竞争力的结果。我们的方法可以部署在多类物体检测应用中,无论图像和物体的尺度和方向如何,使其成为无约束天线和卫星图像的绝佳选择。[1807.02700v1] 在具有挑战性的新DOTA数据集的定向边界框检测任务上水平16 \%mAP和72.45 \%mAP,大幅超越所有已发布方法(分别为$ + $ $%和$ + $ $%绝对改善) )。%,而排行榜的最佳结果是54.13 \%和60.46 \%。此外,它推广到另外两个数据集,NWPU VHR-10和UCAS-AOD,即使在DOTA上接受过培训,也可以通过基线获得有竞争力的结果。我们的方法可以部署在多类物体检测应用中,无论图像和物体的尺度和方向如何,使其成为无约束天线和卫星图像的绝佳选择。[1807.02700v1] 在具有挑战性的新DOTA数据集的定向边界框检测任务上水平16 \%mAP和72.45 \%mAP,大幅超越所有已发布方法(分别为$ + $ $%和$ + $ $%绝对改善) )。%,而排行榜的最佳结果是54.13 \%和60.46 \%。此外,它推广到另外两个数据集,NWPU VHR-10和UCAS-AOD,即使在DOTA上接受过培训,也可以通过基线获得有竞争力的结果。我们的方法可以部署在多类物体检测应用中,无论图像和物体的尺度和方向如何,使其成为无约束天线和卫星图像的绝佳选择。[1807.02700v1] 大幅超越所有公布的方法(分别为$ + $ $ \%和$ + $ $ \%绝对改善)。%,而排行榜的最佳结果是54.13 \%和60.46 \%。此外,它推广到另外两个数据集,NWPU VHR-10和UCAS-AOD,即使在DOTA上接受过培训,也可以通过基线获得有竞争力的结果。我们的方法可以部署在多类物体检测应用中,无论图像和物体的尺度和方向如何,使其成为无约束天线和卫星图像的绝佳选择。[1807.02700v1] 大幅超越所有公布的方法(分别为$ + $ $ \%和$ + $ $ \%绝对改善)。%,而排行榜的最佳结果是54.13 \%和60.46 \%。此外,它推广到另外两个数据集,NWPU VHR-10和UCAS-AOD,即使在DOTA上接受过培训,也可以通过基线获得有竞争力的结果。我们的方法可以部署在多类物体检测应用中,无论图像和物体的尺度和方向如何,使其成为无约束天线和卫星图像的绝佳选择。[1807.02700v1] 我们的方法可以部署在多类物体检测应用中,无论图像和物体的尺度和方向如何,使其成为无约束天线和卫星图像的绝佳选择。[1807.02700v1] 我们的方法可以部署在多类物体检测应用中,无论图像和物体的尺度和方向如何,使其成为无约束天线和卫星图像的绝佳选择。[1807.02700v1]
A. Vafaei Sadr,Etienne。E. Vos,Bruce A. Bassett,Zafiirah Hosenie,N。Oozeer,Michelle Lochner
低信噪比的点源检测对于天文测量来说是一项挑战,特别是在噪声相关的无线电干涉测量图像中。机器学习是一种很有前途的解决方案,允许开发针对特定望远镜阵列和科学案例的算法。我们提出DeepSource–一种深度学习解决方案 – 使用卷积神经网络来实现这些目标。DeepSource增强了原始地图的信噪比(SNR),然后使用动态斑点检测来检测源。经过两组500个模拟1度x 1度MeerKAT图像的训练和测试,共有300,000个源,DeepSource在纯度和完整性方面都非常完美,低至SNR = 4,并且在所有指标中都优于PyBDSF。对于均匀加权的图像,它在SNR = 3时达到0.73的纯度x完整性(PC)分数,而最佳PyBDSF模型的分数为0.31。对于自然加权,我们发现在SNR = 3时,PC得分的改善幅度较小,约为40%。相反,我们要求在纯度或完整性首先降至90%时,我们发现DeepSource在SNR = 3.6时达到此值与PyBDSF的4.3(自然加权)相比。DeepSource的一个关键优势是它可以学习如何以最佳方式权衡所考虑的任何科学案例的纯度和完整性。我们的研究结果表明,深度学习是天文图像中点源检测的一种有前景的方法。[1807.02701v1] 相反,如果我们发现纯度或完整性首先下降到90%,我们发现DeepSource在SNR = 3.6时达到这个值,相比之下,PyBDSF的4.3(自然加权)。DeepSource的一个关键优势是它可以学习如何以最佳方式权衡所考虑的任何科学案例的纯度和完整性。我们的研究结果表明,深度学习是天文图像中点源检测的一种有前景的方法。[1807.02701v1] 相反,如果我们发现纯度或完整性首先下降到90%,我们发现DeepSource在SNR = 3.6时达到这个值,相比之下,PyBDSF的4.3(自然加权)。DeepSource的一个关键优势是它可以学习如何以最佳方式权衡所考虑的任何科学案例的纯度和完整性。我们的研究结果表明,深度学习是天文图像中点源检测的一种有前景的方法。[1807.02701v1] 我们的研究结果表明,深度学习是天文图像中点源检测的一种有前景的方法。[1807.02701v1] 我们的研究结果表明,深度学习是天文图像中点源检测的一种有前景的方法。[1807.02701v1]
Matthew Yung,Eli T. Brown,Alexander Rasin,Jacob D. Furst,Daniela S. Raicu
我们探讨了不平衡多标签分类问题的几种过采样技术,这是在开发计算机辅助诊断(CADx)系统模型时经常遇到的设置。虽然大多数CADx系统的目的是在不考虑每个类别的相对分布的情况下优化分类器的总体准确度,但我们在预测恶性程度时会考虑使用合成采样来提高每级性能。使用低级图像特征和随机森林分类器,我们表明使用合成过采样技术可以将少数类别的敏感度平均提高7.22%,特定少数类别的灵敏度提高19.88%。 。此外,对合成结节的低级图像特征分布的分析表明,这些结节可以提供关于如何预处理图像数据以获得更好的分类性能或如何在更多数据采集可行时补充原始数据集的见解。[1807.02608v1]
木村良介,小夜明彦,费边洛伦佐戴雷尔,中岛裕太,浩川崎,安布罗西奥布兰科,池内胜志
从RGB-D重建人的形状和运动是一个具有挑战性的问题,近年来受到很多关注。最近的全身重建方法使用统计形状模型,该模型建立在对紧身衣服的人进行精确的全身扫描的基础上,以完成由于闭塞而导致的不可见部分。这样的统计模型仍然适合于穿着宽松衣服的RGB-D测量,但是不能描述其变形,例如衣服皱纹。可以从实际测量精确地重建观察到的表面,而我们没有观察到未观察到的表面。对于宽松衣服的全身重建,我们建议使用纹理和变形的低维嵌入(称为本征纹理和本征变形)来再现甚至未观察到的表面的视图。提供从一系列部分测量为3D网格的全身重建,然后使用特征分解嵌入每个三角形的纹理和变形。结合基于神经网络的系数回归,我们的方法从任意视点合成纹理和变形。我们使用模拟数据评估我们的方法,并直观地演示我们的方法如何处理实际数据。[1807.02632v1]
Yunseok Jang,Gunhee Kim,Yale Song
视频预测旨在通过学习动态视觉模式来生成逼真的未来帧。一个基本的挑战是应对未来的不确定性:当存在多个正确的,同样可能的未来时,模型应该如何表现?我们提出了一个Appearance-Motion有条件GAN来应对这一挑战。我们提供外观和运动信息作为指定未来如何的条件,降低不确定性水平。我们的模型包括一个生成器,两个负责外观和运动路径的鉴别器,以及一个感知排名模块,鼓励类似条件的视频看起来相似。为了训练我们的模型,我们开发了一种新颖的调节方案,该方案由外观和运动条件的不同组合组成。我们使用面部表情和人类活动数据集评估我们的模型,并报告与现有方法相比的有利结果。[1807.02635v1]
Nan Yang,Rui Wang,JörgStückler,Daniel Cremers
纯粹依赖于几何线索的单目视觉测距方法易于出现尺度漂移,并且在连续帧中需要足够的运动视差以用于运动估计和3D重建。在本文中,我们建议利用深度单眼深度预测来克服基于几何的单眼视觉测距的局限性。为此,我们将直接稀疏测距的深度预测结合为直接虚拟立体测量。对于深度预测,我们设计了一种新的深度网络,在两阶段过程中从单个图像中提炼出预测深度。我们以半监督的方式训练我们的网络,用于立体图像中的光子一致性以及与立体声DSO的精确稀疏深度重建的一致性。我们的深度预测在KITTI基准测试中优于单眼深度的最先进方法。此外,我们的深度虚拟立体声测距显然超过以前的单眼和深度学习方法的准确性。它甚至可以实现与最先进的立体声方法相媲美的性能,同时仅依靠单个相机。[1807.02570v1]
Stanislav Pidhorskyi,Ranya Almohsen,Donald A Adjeroh,Gianfranco Doretto
新颖性检测是识别新数据点是被认为是内部还是异常的问题。我们假设培训数据仅用于描述内部分布。最近的方法主要利用深度编码器 – 解码器网络架构来计算重建误差,该重建误差用于计算新颖性得分或训练单级分类器。虽然我们也利用这种新型网络,但我们采用概率方法并有效地计算内部分布生成样本的可能性。我们通过两个主要贡献实现这一目标 首先,我们使新奇概率的计算可行,因为我们将参数化流形线性化,捕获了内部分布的基础结构,并且示出了概率如何分解并且可以相对于歧管切线空间的局部坐标来计算。其次,我们改进了自动编码器网络的培训。一系列广泛的结果表明,该方法在几个基准数据集上实现了最先进的结果。[1807.02588v1]
Shiqi Yang, Gang Peng
在本文中,我们介绍了一个简单但非常有效的D-PCN识别框架,旨在增强CNN的特征提取能力。该框架由两个并行CNN组成,一个鉴别器和一个额外的分类器,它从并行网络中获取集成特征并给出最终预测。鉴别器是核心,它驱动并行网络聚焦于不同的区域并学习不同的表示。引入相应的培训策略以确保使用鉴别器。我们在基准数据集上使用几种CNN模型验证D-PCN:CIFAR-100和ImageNet,D-PCN增强了所有模型。特别是与相关工作相比,它在CIFAR-100上产生了最先进的性能。我们还对细粒度的Stanford Dogs数据集进行可视化实验,以验证我们的动机。此外,我们在PASCAL VOC 2012上应用D-PCN进行细分,并进行促销。[1807.02265v2]
Ben Eckart,Kihwan Kim,Jan Kautz
点云注册是许多重要且具有挑战性的3D感知问题的核心,包括自主导航,SLAM,对象/场景识别和增强现实。在本文中,我们提出了一种新的配准算法,它能够通过使用分层高斯混合模型(GMM)表示来实现最先进的速度和准确性。我们的方法通过在GPU上并行递归地运行许多小规模数据似然分段来构建点云数据的自上而下的多尺度表示。我们利用基于PCA的新颖优化标准来利用所得到的表示,该标准自适应地找到用于在点云数据的空间子集之间执行数据关联的最佳比例。与之前的迭代最近点和基于GMM的技术相比,我们的基于树的点关联算法以对数时间执行数据关联,同时动态调整细节水平以最佳地匹配局部场景几何的复杂性和空间分布特征。此外,与将协方差限制为各向同性的其他GMM方法不同,即使使用完全各向异性的高斯协方差,我们新的基于PCA的优化准则也能很好地逼近真正的MLE解决方案。高效的数据关联,多尺度适应性和强大的MLE近似产生了一种算法,该算法比从LiDAR捕获的各种3D数据集中的当前最先进技术更快且更准确地达到一个数量级。结构光。[1807.02587v1] 与限制协方差为各向同性的其他GMM方法不同,即使使用完全各向异性的高斯协方差,我们新的基于PCA的优化准则也能很好地逼近真正的MLE解。高效的数据关联,多尺度适应性和强大的MLE近似产生了一种算法,该算法比从LiDAR捕获的各种3D数据集中的当前最先进技术更快且更准确地达到一个数量级。结构光。[1807.02587v1] 与限制协方差为各向同性的其他GMM方法不同,即使使用完全各向异性的高斯协方差,我们新的基于PCA的优化准则也能很好地逼近真正的MLE解。高效的数据关联,多尺度适应性和强大的MLE近似产生了一种算法,该算法比从LiDAR捕获的各种3D数据集中的当前最先进技术更快且更准确地达到一个数量级。结构光。[1807.02587v1] 并且强大的MLE近似产生的算法比从LiDAR捕获到结构光的各种3D数据集上的当前最新技术更快且更准确地达到一个数量级。[1807.02587v1] 并且强大的MLE近似产生的算法比从LiDAR捕获到结构光的各种3D数据集上的当前最新技术更快且更准确地达到一个数量级。[1807.02587v1]
Geoffrey H. Tison,Jeffrey Zhang,Francesca N. Delling,Rahul C. Deo
心电图或心电图已经使用了100多年,并且仍然是表征心脏结构和电活动的最广泛执行的诊断测试。我们假设计算能力的并行进步,机器学习算法的创新以及大规模数字化心电数据的可用性将使心电图的效用扩展到其当前的限制之外,同时保持可解释性,这是医学的基础。做决定。我们从UCSF数据库中确定了36,186个ECG,这些ECG是1)正常窦性心律,2)能够训练特定模型以估计心脏结构或功能或检测疾病。我们使用卷积神经网络(CNN)和隐马尔可夫模型(HMM)推导出一种新的心电图分割模型,并通过比较电气间期估计与临床工作流程中的141,864次测量来评估其输出。我们使用下采样分割数据和训练有素的机器学习模型构建了一个725元的患者级心电图,以评估左心室质量,左心房容积,二尖瓣环e’以及检测和跟踪四种疾病:肺动脉高压(PAH),肥厚心肌病(HCM),心脏淀粉样蛋白(CA)和二尖瓣脱垂(MVP)。CNN-HMM衍生的ECG分割与临床估计一致,中位绝对偏差(MAD)为观察值的一部分,心率为0.6%,QT间期为4%。患者级心电图概况能够在左心室肥厚和舒张功能的二元分类模型中对左心室和二尖瓣环e’速度进行定量估计,并具有良好的区分。用于疾病检测的模型的MVP的AUROC为0.94至0.77。所有模型的排名靠前的变量包括已知的ECG特征以及这些特征/疾病的新预测因子。[1807.02569v1]
Ilke Demir,Daniel G. Aliaga
我们描述了一个引导程序化框架,该框架优化了体系结构输入模型上的几何处理以提取目标语法。我们的目标是通过从现有3D模型创建程序表示来提供高效的艺术工作流程,其中程序表达性由用户控制。建筑重建和建模任务被处理为耗时的手动过程或具有难以控制和艺术影响的程序生成。我们通过将现有的手动建模体系结构转换为程序可编辑的参数化模型,并通过让用户定义目标程序表示将指导带到程序域,来弥补创建和生成之间的差距。此外,我们提出了这种程序表示的各种应用,包括指导完成点云模型,可控3D城市建模以及程序建模的其他好处。[1807.02578v1]
转载请注明:《用于精确检测和识别场景中文本的Verisimilar图像合成+PARN:用于密集语义对应估计的金字塔仿射回归网络+自动上下文R-CNN》