l

WSMN: An optimized multipurpose blind watermarking in Shearlet domain using MLP and NSGA-II. (arXiv:2005.03382v1 [cs.CR])

Digital watermarking is a remarkable issue in the field of information security to avoid the misuse of images in multimedia networks. Although access to unauthorized persons can be prevented through cryptography, it cannot be simultaneously used for copyright protection or content authentication with the preservation of image integrity. Hence, this paper presents an optimized multipurpose blind watermarking in Shearlet domain with the help of smart algorithms including MLP and NSGA-II. In this method, four copies of the robust copyright logo are embedded in the approximate coefficients of Shearlet by using an effective quantization technique. Furthermore, an embedded random sequence as a semi-fragile authentication mark is effectively extracted from details by the neural network. Due to performing an effective optimization algorithm for selecting optimum embedding thresholds, and also distinguishing the texture of blocks, the imperceptibility and robustness have been preserved. The experimental results reveal the superiority of the scheme with regard to the quality of watermarked images and robustness against hybrid attacks over other state-of-the-art schemes. The average PSNR and SSIM of the dual watermarked images are 38 dB and 0.95, respectively; Besides, it can effectively extract the copyright logo and locates forgery regions under severe attacks with satisfactory accuracy.




l

2kenize: Tying Subword Sequences for Chinese Script Conversion. (arXiv:2005.03375v1 [cs.CL])

Simplified Chinese to Traditional Chinese character conversion is a common preprocessing step in Chinese NLP. Despite this, current approaches have poor performance because they do not take into account that a simplified Chinese character can correspond to multiple traditional characters. Here, we propose a model that can disambiguate between mappings and convert between the two scripts. The model is based on subword segmentation, two language models, as well as a method for mapping between subword sequences. We further construct benchmark datasets for topic classification and script conversion. Our proposed method outperforms previous Chinese Character conversion approaches by 6 points in accuracy. These results are further confirmed in a downstream application, where 2kenize is used to convert pretraining dataset for topic classification. An error analysis reveals that our method's particular strengths are in dealing with code-mixing and named entities.




l

Playing Minecraft with Behavioural Cloning. (arXiv:2005.03374v1 [cs.AI])

MineRL 2019 competition challenged participants to train sample-efficient agents to play Minecraft, by using a dataset of human gameplay and a limit number of steps the environment. We approached this task with behavioural cloning by predicting what actions human players would take, and reached fifth place in the final ranking. Despite being a simple algorithm, we observed the performance of such an approach can vary significantly, based on when the training is stopped. In this paper, we detail our submission to the competition, run further experiments to study how performance varied over training and study how different engineering decisions affected these results.




l

Accessibility in 360-degree video players. (arXiv:2005.03373v1 [cs.MM])

Any media experience must be fully inclusive and accessible to all users regardless of their ability. With the current trend towards immersive experiences, such as Virtual Reality (VR) and 360-degree video, it becomes key that these environments are adapted to be fully accessible. However, until recently the focus has been mostly on adapting the existing techniques to fit immersive displays, rather than considering new approaches for accessibility designed specifically for these increasingly relevant media experiences. This paper surveys a wide range of 360-degree video players and examines the features they include for dealing with accessibility, such as Subtitles, Audio Description, Sign Language, User Interfaces, and other interaction features, like voice control and support for multi-screen scenarios. These features have been chosen based on guidelines from standardization contributions, like in the World Wide Web Consortium (W3C) and the International Communication Union (ITU), and from research contributions for making 360-degree video consumption experiences accessible. The in-depth analysis has been part of a research effort towards the development of a fully inclusive and accessible 360-degree video player. The paper concludes by discussing how the newly developed player has gone above and beyond the existing solutions and guidelines, by providing accessibility features that meet the expectations for a widely used immersive medium, like 360-degree video.




l

Vid2Curve: Simultaneously Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video. (arXiv:2005.03372v1 [cs.GR])

Thin structures, such as wire-frame sculptures, fences, cables, power lines, and tree branches, are common in the real world.

It is extremely challenging to acquire their 3D digital models using traditional image-based or depth-based reconstruction methods because thin structures often lack distinct point features and have severe self-occlusion.

We propose the first approach that simultaneously estimates camera motion and reconstructs the geometry of complex 3D thin structures in high quality from a color video captured by a handheld camera.

Specifically, we present a new curve-based approach to estimate accurate camera poses by establishing correspondences between featureless thin objects in the foreground in consecutive video frames, without requiring visual texture in the background scene to lock on.

Enabled by this effective curve-based camera pose estimation strategy, we develop an iterative optimization method with tailored measures on geometry, topology as well as self-occlusion handling for reconstructing 3D thin structures.

Extensive validations on a variety of thin structures show that our method achieves accurate camera pose estimation and faithful reconstruction of 3D thin structures with complex shape and topology at a level that has not been attained by other existing reconstruction methods.




l

Energy-efficient topology to enhance the wireless sensor network lifetime using connectivity control. (arXiv:2005.03370v1 [cs.NI])

Wireless sensor networks have attracted much attention because of many applications in the fields of industry, military, medicine, agriculture, and education. In addition, the vast majority of researches has been done to expand its applications and improve its efficiency. However, there are still many challenges for increasing the efficiency in different parts of this network. One of the most important parts is to improve the network lifetime in the wireless sensor network. Since the sensor nodes are generally powered by batteries, the most important issue to consider in these types of networks is to reduce the power consumption of the nodes in such a way as to increase the network lifetime to an acceptable level. The contribution of this paper is using topology control, the threshold for the remaining energy in nodes, and two of the meta-algorithms include SA (Simulated annealing) and VNS (Variable Neighbourhood Search) to increase the energy remaining in the sensors. Moreover, using a low-cost spanning tree, an appropriate connectivity control among nodes is created in the network in order to increase the network lifetime. The results of simulations show that the proposed method improves the sensor lifetime and reduces the energy consumed.




l

Soft Interference Cancellation for Random Coding in Massive Gaussian Multiple-Access. (arXiv:2005.03364v1 [cs.IT])

We utilize recent results on the exact block error probability of Gaussian random codes in additive white Gaussian noise to analyze Gaussian random coding for massive multiple-access at finite message length. Soft iterative interference cancellation is found to closely approach the performance bounds recently found in [1]. The existence of two fundamentally different regimes in the trade-off between power and bandwidth efficiency reported in [2] is related to much older results in [3] on power optimization by linear programming. Furthermore, we tighten the achievability bounds of [1] in the low power regime and show that orthogonal constellations are very close to the theoretical limits for message lengths around 100 and above.




l

Probabilistic Hyperproperties of Markov Decision Processes. (arXiv:2005.03362v1 [cs.LO])

We study the specification and verification of hyperproperties for probabilistic systems represented as Markov decision processes (MDPs). Hyperproperties are system properties that describe the correctness of a system as a relation between multiple executions. Hyperproperties generalize trace properties and include information-flow security requirements, like noninterference, as well as requirements like symmetry, partial observation, robustness, and fault tolerance. We introduce the temporal logic PHL, which extends classic probabilistic logics with quantification over schedulers and traces. PHL can express a wide range of hyperproperties for probabilistic systems, including both classical applications, such as differential privacy, and novel applications in areas such as robotics and planning. While the model checking problem for PHL is in general undecidable, we provide methods both for proving and for refuting a class of probabilistic hyperproperties for MDPs.




l

JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation. (arXiv:2005.03361v1 [cs.CL])

Neural machine translation (NMT) needs large parallel corpora for state-of-the-art translation quality. Low-resource NMT is typically addressed by transfer learning which leverages large monolingual or parallel corpora for pre-training. Monolingual pre-training approaches such as MASS (MAsked Sequence to Sequence) are extremely effective in boosting NMT quality for languages with small parallel corpora. However, they do not account for linguistic information obtained using syntactic analyzers which is known to be invaluable for several Natural Language Processing (NLP) tasks. To this end, we propose JASS, Japanese-specific Sequence to Sequence, as a novel pre-training alternative to MASS for NMT involving Japanese as the source or target language. JASS is joint BMASS (Bunsetsu MASS) and BRSS (Bunsetsu Reordering Sequence to Sequence) pre-training which focuses on Japanese linguistic units called bunsetsus. In our experiments on ASPEC Japanese--English and News Commentary Japanese--Russian translation we show that JASS can give results that are competitive with if not better than those given by MASS. Furthermore, we show for the first time that joint MASS and JASS pre-training gives results that significantly surpass the individual methods indicating their complementary nature. We will release our code, pre-trained models and bunsetsu annotated data as resources for researchers to use in their own NLP tasks.




l

Self-Supervised Human Depth Estimation from Monocular Videos. (arXiv:2005.03358v1 [cs.CV])

Previous methods on estimating detailed human depth often require supervised training with `ground truth' depth data. This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data collection simple and improves the generalization of the learned network. The self-supervised learning is achieved by minimizing a photo-consistency loss, which is evaluated between a video frame and its neighboring frames warped according to the estimated depth and the 3D non-rigid motion of the human body. To solve this non-rigid motion, we first estimate a rough SMPL model at each video frame and compute the non-rigid body motion accordingly, which enables self-supervised learning on estimating the shape details. Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.




l

Estimating Blood Pressure from Photoplethysmogram Signal and Demographic Features using Machine Learning Techniques. (arXiv:2005.03357v1 [eess.SP])

Hypertension is a potentially unsafe health ailment, which can be indicated directly from the Blood pressure (BP). Hypertension always leads to other health complications. Continuous monitoring of BP is very important; however, cuff-based BP measurements are discrete and uncomfortable to the user. To address this need, a cuff-less, continuous and a non-invasive BP measurement system is proposed using Photoplethysmogram (PPG) signal and demographic features using machine learning (ML) algorithms. PPG signals were acquired from 219 subjects, which undergo pre-processing and feature extraction steps. Time, frequency and time-frequency domain features were extracted from the PPG and their derivative signals. Feature selection techniques were used to reduce the computational complexity and to decrease the chance of over-fitting the ML algorithms. The features were then used to train and evaluate ML algorithms. The best regression models were selected for Systolic BP (SBP) and Diastolic BP (DBP) estimation individually. Gaussian Process Regression (GPR) along with ReliefF feature selection algorithm outperforms other algorithms in estimating SBP and DBP with a root-mean-square error (RMSE) of 6.74 and 3.59 respectively. This ML model can be implemented in hardware systems to continuously monitor BP and avoid any critical health conditions due to sudden changes.




l

DramaQA: Character-Centered Video Story Understanding with Hierarchical QA. (arXiv:2005.03356v1 [cs.CL])

Despite recent progress on computer vision and natural language processing, developing video understanding intelligence is still hard to achieve due to the intrinsic difficulty of story in video. Moreover, there is not a theoretical metric for evaluating the degree of video understanding. In this paper, we propose a novel video question answering (Video QA) task, DramaQA, for a comprehensive understanding of the video story. The DramaQA focused on two perspectives: 1) hierarchical QAs as an evaluation metric based on the cognitive developmental stages of human intelligence. 2) character-centered video annotations to model local coherence of the story. Our dataset is built upon the TV drama "Another Miss Oh" and it contains 16,191 QA pairs from 23,928 various length video clips, with each QA pair belonging to one of four difficulty levels. We provide 217,308 annotated images with rich character-centered annotations, including visual bounding boxes, behaviors, and emotions of main characters, and coreference resolved scripts. Additionally, we provide analyses of the dataset as well as Dual Matching Multistream model which effectively learns character-centered representations of video to answer questions about the video. We are planning to release our dataset and model publicly for research purposes and expect that our work will provide a new perspective on video story understanding research.




l

Quantum correlation alignment for unsupervised domain adaptation. (arXiv:2005.03355v1 [quant-ph])

Correlation alignment (CORAL), a representative domain adaptation (DA) algorithm, decorrelates and aligns a labelled source domain dataset to an unlabelled target domain dataset to minimize the domain shift such that a classifier can be applied to predict the target domain labels. In this paper, we implement the CORAL on quantum devices by two different methods. One method utilizes quantum basic linear algebra subroutines (QBLAS) to implement the CORAL with exponential speedup in the number and dimension of the given data samples. The other method is achieved through a variational hybrid quantum-classical procedure. In addition, the numerical experiments of the CORAL with three different types of data sets, namely the synthetic data, the synthetic-Iris data, the handwritten digit data, are presented to evaluate the performance of our work. The simulation results prove that the variational quantum correlation alignment algorithm (VQCORAL) can achieve competitive performance compared with the classical CORAL.




l

DMCP: Differentiable Markov Channel Pruning for Neural Networks. (arXiv:2005.03354v1 [cs.CV])

Recent works imply that the channel pruning can be regarded as searching optimal sub-structure from unpruned networks.

However, existing works based on this observation require training and evaluating a large number of structures, which limits their application.

In this paper, we propose a novel differentiable method for channel pruning, named Differentiable Markov Channel Pruning (DMCP), to efficiently search the optimal sub-structure.

Our method is differentiable and can be directly optimized by gradient descent with respect to standard task loss and budget regularization (e.g. FLOPs constraint).

In DMCP, we model the channel pruning as a Markov process, in which each state represents for retaining the corresponding channel during pruning, and transitions between states denote the pruning process.

In the end, our method is able to implicitly select the proper number of channels in each layer by the Markov process with optimized transitions. To validate the effectiveness of our method, we perform extensive experiments on Imagenet with ResNet and MobilenetV2.

Results show our method can achieve consistent improvement than state-of-the-art pruning methods in various FLOPs settings. The code is available at https://github.com/zx55/dmcp




l

Pricing under a multinomial logit model with non linear network effects. (arXiv:2005.03352v1 [cs.GT])

We study the problem of pricing under a Multinomial Logit model where we incorporate network effects over the consumer's decisions. We analyse both cases, when sellers compete or collaborate. In particular, we pay special attention to the overall expected revenue and how the behaviour of the no purchase option is affected under variations of a network effect parameter. Where for example we prove that the market share for the no purchase option, is decreasing in terms of the value of the network effect, meaning that stronger communication among costumers increases the expected amount of sales. We also analyse how the customer's utility is altered when network effects are incorporated into the market, comparing the cases where both competitive and monopolistic prices are displayed. We use tools from stochastic approximation algorithms to prove that the probability of purchasing the available products converges to a unique stationary distribution. We model that the sellers can use this stationary distribution to establish their strategies. Finding that under those settings, a pure Nash Equilibrium represents the pricing strategies in the case of competition, and an optimal (that maximises the total revenue) fixed price characterise the case of collaboration.




l

Error estimates for the Cahn--Hilliard equation with dynamic boundary conditions. (arXiv:2005.03349v1 [math.NA])

A proof of convergence is given for bulk--surface finite element semi-discretisation of the Cahn--Hilliard equation with Cahn--Hilliard-type dynamic boundary conditions in a smooth domain. The semi-discretisation is studied in the weak formulation as a second order system. Optimal-order uniform-in-time error estimates are shown in the $L^2$ and $H^1$ norms. The error estimates are based on a consistency and stability analysis. The proof of stability is performed in an abstract framework, based on energy estimates exploiting the anti-symmetric structure of the second order system. Numerical experiments illustrate the theoretical results.




l

Regression Forest-Based Atlas Localization and Direction Specific Atlas Generation for Pancreas Segmentation. (arXiv:2005.03345v1 [cs.CV])

This paper proposes a fully automated atlas-based pancreas segmentation method from CT volumes utilizing atlas localization by regression forest and atlas generation using blood vessel information. Previous probabilistic atlas-based pancreas segmentation methods cannot deal with spatial variations that are commonly found in the pancreas well. Also, shape variations are not represented by an averaged atlas. We propose a fully automated pancreas segmentation method that deals with two types of variations mentioned above. The position and size of the pancreas is estimated using a regression forest technique. After localization, a patient-specific probabilistic atlas is generated based on a new image similarity that reflects the blood vessel position and direction information around the pancreas. We segment it using the EM algorithm with the atlas as prior followed by the graph-cut. In evaluation results using 147 CT volumes, the Jaccard index and the Dice overlap of the proposed method were 62.1% and 75.1%, respectively. Although we automated all of the segmentation processes, segmentation results were superior to the other state-of-the-art methods in the Dice overlap.




l

Arranging Test Tubes in Racks Using Combined Task and Motion Planning. (arXiv:2005.03342v1 [cs.RO])

The paper develops a robotic manipulation system to treat the pressing needs for handling a large number of test tubes in clinical examination and replace or reduce human labor. It presents the technical details of the system, which separates and arranges test tubes in racks with the help of 3D vision and artificial intelligence (AI) reasoning/planning. The developed system only requires a person to put a rack with mixed and non-arranged tubes in front of a robot. The robot autonomously performs recognition, reasoning, planning, manipulation, etc., and returns a rack with separated and arranged tubes. The system is simple-to-use, and there are no requests for expert knowledge in robotics. We expect such a system to play an important role in helping managing public health and hope similar systems could be extended to other clinical manipulation like handling mixers and pipettes in the future.




l

Scene Text Image Super-Resolution in the Wild. (arXiv:2005.03341v1 [cs.CV])

Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones. Recognizing low-resolution text images is challenging because they lose detailed content information, leading to poor recognition accuracy. An intuitive solution is to introduce super-resolution (SR) techniques as pre-processing. However, previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images (e.g.Bicubic down-sampling), which is simple and not suitable for real low-resolution text recognition. To this end, we pro-pose a real scene text SR dataset, termed TextZoom. It contains paired real low-resolution and high-resolution images which are captured by cameras with different focal length in the wild. It is more authentic and challenging than synthetic data, as shown in Fig. 1. We argue improv-ing the recognition accuracy is the ultimate goal for Scene Text SR. In this purpose, a new Text Super-Resolution Network termed TSRN, with three novel modules is developed. (1) A sequential residual block is proposed to extract the sequential information of the text images. (2) A boundary-aware loss is designed to sharpen the character boundaries. (3) A central alignment module is proposed to relieve the misalignment problem in TextZoom. Extensive experiments on TextZoom demonstrate that our TSRN largely improves the recognition accuracy by over 13%of CRNN, and by nearly 9.0% of ASTER and MORAN compared to synthetic SR data. Furthermore, our TSRN clearly outperforms 7 state-of-the-art SR methods in boosting the recognition accuracy of LR images in TextZoom. For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN. Our results suggest that low-resolution text recognition in the wild is far from being solved, thus more research effort is needed.




l

Wavelet Integrated CNNs for Noise-Robust Image Classification. (arXiv:2005.03337v1 [cs.CV])

Convolutional Neural Networks (CNNs) are generally prone to noise interruptions, i.e., small image noise can cause drastic changes in the output. To suppress the noise effect to the final predication, we enhance CNNs by replacing max-pooling, strided-convolution, and average-pooling with Discrete Wavelet Transform (DWT). We present general DWT and Inverse DWT (IDWT) layers applicable to various wavelets like Haar, Daubechies, and Cohen, etc., and design wavelet integrated CNNs (WaveCNets) using these layers for image classification. In WaveCNets, feature maps are decomposed into the low-frequency and high-frequency components during the down-sampling. The low-frequency component stores main information including the basic object structures, which is transmitted into the subsequent layers to extract robust high-level features. The high-frequency components, containing most of the data noise, are dropped during inference to improve the noise-robustness of the WaveCNets. Our experimental results on ImageNet and ImageNet-C (the noisy version of ImageNet) show that WaveCNets, the wavelet integrated versions of VGG, ResNets, and DenseNet, achieve higher accuracy and better noise-robustness than their vanilla versions.




l

Causal Paths in Temporal Networks of Face-to-Face Human Interactions. (arXiv:2005.03333v1 [cs.SI])

In a temporal network causal paths are characterized by the fact that links from a source to a target must respect the chronological order. In this article we study the causal paths structure in temporal networks of human face to face interactions in different social contexts. In a static network paths are transitive i.e. the existence of a link from $a$ to $b$ and from $b$ to $c$ implies the existence of a path from $a$ to $c$ via $b$. In a temporal network the chronological constraint introduces time correlations that affects transitivity. A probabilistic model based on higher order Markov chains shows that correlations that can invalidate transitivity are present only when the time gap between consecutive events is larger than the average value and are negligible below such a value. The comparison between the densities of the temporal and static accessibility matrices shows that the static representation can be used with good approximation. Moreover, we quantify the extent of the causally connected region of the networks over time.




l

Global Distribution of Google Scholar Citations: A Size-independent Institution-based Analysis. (arXiv:2005.03324v1 [cs.DL])

Most currently available schemes for performance based ranking of Universities or Research organizations, such as, Quacarelli Symonds (QS), Times Higher Education (THE), Shanghai University based All Research of World Universities (ARWU) use a variety of criteria that include productivity, citations, awards, reputation, etc., while Leiden and Scimago use only bibliometric indicators. The research performance evaluation in the aforesaid cases is based on bibliometric data from Web of Science or Scopus, which are commercially available priced databases. The coverage includes peer reviewed journals and conference proceedings. Google Scholar (GS) on the other hand, provides a free and open alternative to obtaining citations of papers available on the net, (though it is not clear exactly which journals are covered.) Citations are collected automatically from the net and also added to self created individual author profiles under Google Scholar Citations (GSC). This data was used by Webometrics Lab, Spain to create a ranked list of 4000+ institutions in 2016, based on citations from only the top 10 individual GSC profiles in each organization. (GSC excludes the top paper for reasons explained in the text; the simple selection procedure makes the ranked list size-independent as claimed by the Cybermetrics Lab). Using this data (Transparent Ranking TR, 2016), we find the regional and country wise distribution of GS-TR Citations. The size independent ranked list is subdivided into deciles of 400 institutions each and the number of institutions and citations of each country obtained for each decile. We test for correlation between institutional ranks between GS TR and the other ranking schemes for the top 20 institutions.




l

Specification and Automated Analysis of Inter-Parameter Dependencies in Web APIs. (arXiv:2005.03320v1 [cs.SE])

Web services often impose inter-parameter dependencies that restrict the way in which two or more input parameters can be combined to form valid calls to the service. Unfortunately, current specification languages for web services like the OpenAPI Specification (OAS) provide no support for the formal description of such dependencies, which makes it hardly possible to automatically discover and interact with services without human intervention. In this article, we present an approach for the specification and automated analysis of inter-parameter dependencies in web APIs. We first present a domain-specific language, called Inter-parameter Dependency Language (IDL), for the specification of dependencies among input parameters in web services. Then, we propose a mapping to translate an IDL document into a constraint satisfaction problem (CSP), enabling the automated analysis of IDL specifications using standard CSP-based reasoning operations. Specifically, we present a catalogue of nine analysis operations on IDL documents allowing to compute, for example, whether a given request satisfies all the dependencies of the service. Finally, we present a tool suite including an editor, a parser, an OAS extension, a constraint programming-aided library, and a test suite supporting IDL specifications and their analyses. Together, these contributions pave the way for a new range of specification-driven applications in areas such as code generation and testing.




l

Encoding in the Dark Grand Challenge: An Overview. (arXiv:2005.03315v1 [eess.IV])

A big part of the video content we consume from video providers consists of genres featuring low-light aesthetics. Low light sequences have special characteristics, such as spatio-temporal varying acquisition noise and light flickering, that make the encoding process challenging. To deal with the spatio-temporal incoherent noise, higher bitrates are used to achieve high objective quality. Additionally, the quality assessment metrics and methods have not been designed, trained or tested for this type of content. This has inspired us to trigger research in that area and propose a Grand Challenge on encoding low-light video sequences. In this paper, we present an overview of the proposed challenge, and test state-of-the-art methods that will be part of the benchmark methods at the stage of the participants' deliverable assessment. From this exploration, our results show that VVC already achieves a high performance compared to simply denoising the video source prior to encoding. Moreover, the quality of the video streams can be further improved by employing a post-processing image enhancement method.




l

Boosting Cloud Data Analytics using Multi-Objective Optimization. (arXiv:2005.03314v1 [cs.DB])

Data analytics in the cloud has become an integral part of enterprise businesses. Big data analytics systems, however, still lack the ability to take user performance goals and budgetary constraints for a task, collectively referred to as task objectives, and automatically configure an analytic job to achieve these objectives. This paper presents a data analytics optimizer that can automatically determine a cluster configuration with a suitable number of cores as well as other system parameters that best meet the task objectives. At a core of our work is a principled multi-objective optimization (MOO) approach that computes a Pareto optimal set of job configurations to reveal tradeoffs between different user objectives, recommends a new job configuration that best explores such tradeoffs, and employs novel optimizations to enable such recommendations within a few seconds. We present efficient incremental algorithms based on the notion of a Progressive Frontier for realizing our MOO approach and implement them into a Spark-based prototype. Detailed experiments using benchmark workloads show that our MOO techniques provide a 2-50x speedup over existing MOO methods, while offering good coverage of the Pareto frontier. When compared to Ottertune, a state-of-the-art performance tuning system, our approach recommends configurations that yield 26\%-49\% reduction of running time of the TPCx-BB benchmark while adapting to different application preferences on multiple objectives.




l

Nakdan: Professional Hebrew Diacritizer. (arXiv:2005.03312v1 [cs.CL])

We present a system for automatic diacritization of Hebrew text. The system combines modern neural models with carefully curated declarative linguistic knowledge and comprehensive manually constructed tables and dictionaries. Besides providing state of the art diacritization accuracy, the system also supports an interface for manual editing and correction of the automatic output, and has several features which make it particularly useful for preparation of scientific editions of Hebrew texts. The system supports Modern Hebrew, Rabbinic Hebrew and Poetic Hebrew. The system is freely accessible for all use at this http URL




l

Interval type-2 fuzzy logic system based similarity evaluation for image steganography. (arXiv:2005.03310v1 [cs.MM])

Similarity measure, also called information measure, is a concept used to distinguish different objects. It has been studied from different contexts by employing mathematical, psychological, and fuzzy approaches. Image steganography is the art of hiding secret data into an image in such a way that it cannot be detected by an intruder. In image steganography, hiding secret data in the plain or non-edge regions of the image is significant due to the high similarity and redundancy of the pixels in their neighborhood. However, the similarity measure of the neighboring pixels, i.e., their proximity in color space, is perceptual rather than mathematical. This paper proposes an interval type 2 fuzzy logic system (IT2 FLS) to determine the similarity between the neighboring pixels by involving an instinctive human perception through a rule-based approach. The pixels of the image having high similarity values, calculated using the proposed IT2 FLS similarity measure, are selected for embedding via the least significant bit (LSB) method. We term the proposed procedure of steganography as IT2 FLS LSB method. Moreover, we have developed two more methods, namely, type 1 fuzzy logic system based least significant bits (T1FLS LSB) and Euclidean distance based similarity measures for least significant bit (SM LSB) steganographic methods. Experimental simulations were conducted for a collection of images and quality index metrics, such as PSNR, UQI, and SSIM are used. All the three steganographic methods are applied on datasets and the quality metrics are calculated. The obtained stego images and results are shown and thoroughly compared to determine the efficacy of the IT2 FLS LSB method. Finally, we have done a comparative analysis of the proposed approach with the existing well-known steganographic methods to show the effectiveness of our proposed steganographic method.




l

Adaptive Dialog Policy Learning with Hindsight and User Modeling. (arXiv:2005.03299v1 [cs.AI])

Reinforcement learning methods have been used to compute dialog policies from language-based interaction experiences. Efficiency is of particular importance in dialog policy learning, because of the considerable cost of interacting with people, and the very poor user experience from low-quality conversations. Aiming at improving the efficiency of dialog policy learning, we develop algorithm LHUA (Learning with Hindsight, User modeling, and Adaptation) that, for the first time, enables dialog agents to adaptively learn with hindsight from both simulated and real users. Simulation and hindsight provide the dialog agent with more experience and more (positive) reinforcements respectively. Experimental results suggest that, in success rate and policy quality, LHUA outperforms competitive baselines from the literature, including its no-simulation, no-adaptation, and no-hindsight counterparts.




l

Knowledge Enhanced Neural Fashion Trend Forecasting. (arXiv:2005.03297v1 [cs.IR])

Fashion trend forecasting is a crucial task for both academia and industry. Although some efforts have been devoted to tackling this challenging task, they only studied limited fashion elements with highly seasonal or simple patterns, which could hardly reveal the real fashion trends. Towards insightful fashion trend forecasting, this work focuses on investigating fine-grained fashion element trends for specific user groups. We first contribute a large-scale fashion trend dataset (FIT) collected from Instagram with extracted time series fashion element records and user information. Further-more, to effectively model the time series data of fashion elements with rather complex patterns, we propose a Knowledge EnhancedRecurrent Network model (KERN) which takes advantage of the capability of deep recurrent neural networks in modeling time-series data. Moreover, it leverages internal and external knowledge in fashion domain that affects the time-series patterns of fashion element trends. Such incorporation of domain knowledge further enhances the deep learning model in capturing the patterns of specific fashion elements and predicting the future trends. Extensive experiments demonstrate that the proposed KERN model can effectively capture the complicated patterns of objective fashion elements, therefore making preferable fashion trend forecast.




l

Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data. (arXiv:2005.03295v1 [eess.AS])

We propose Cotatron, a transcription-guided speech encoder for speaker-independent linguistic representation. Cotatron is based on the multispeaker TTS architecture and can be trained with conventional TTS datasets. We train a voice conversion system to reconstruct speech with Cotatron features, which is similar to the previous methods based on Phonetic Posteriorgram (PPG). By training and evaluating our system with 108 speakers from the VCTK dataset, we outperform the previous method in terms of both naturalness and speaker similarity. Our system can also convert speech from speakers that are unseen during training, and utilize ASR to automate the transcription with minimal reduction of the performance. Audio samples are available at https://mindslab-ai.github.io/cotatron, and the code with a pre-trained model will be made available soon.




l

Expressing Accountability Patterns using Structural Causal Models. (arXiv:2005.03294v1 [cs.SE])

While the exact definition and implementation of accountability depend on the specific context, at its core accountability describes a mechanism that will make decisions transparent and often provides means to sanction "bad" decisions. As such, accountability is specifically relevant for Cyber-Physical Systems, such as robots or drones, that embed themselves into a human society, take decisions and might cause lasting harm. Without a notion of accountability, such systems could behave with impunity and would not fit into society. Despite its relevance, there is currently no agreement on its meaning and, more importantly, no way to express accountability properties for these systems. As a solution we propose to express the accountability properties of systems using Structural Causal Models. They can be represented as human-readable graphical models while also offering mathematical tools to analyze and reason over them. Our central contribution is to show how Structural Causal Models can be used to express and analyze the accountability properties of systems and that this approach allows us to identify accountability patterns. These accountability patterns can be catalogued and used to improve systems and their architectures.




l

Deep Learning based Person Re-identification. (arXiv:2005.03293v1 [cs.CV])

Automated person re-identification in a multi-camera surveillance setup is very important for effective tracking and monitoring crowd movement. In the recent years, few deep learning based re-identification approaches have been developed which are quite accurate but time-intensive, and hence not very suitable for practical purposes. In this paper, we propose an efficient hierarchical re-identification approach in which color histogram based comparison is first employed to find the closest matches in the gallery set, and next deep feature based comparison is carried out using Siamese network. Reduction in search space after the first level of matching helps in achieving a fast response time as well as improving the accuracy of prediction by the Siamese network by eliminating vastly dissimilar elements. A silhouette part-based feature extraction scheme is adopted in each level of hierarchy to preserve the relative locations of the different body structures and make the appearance descriptors more discriminating in nature. The proposed approach has been evaluated on five public data sets and also a new data set captured by our team in our laboratory. Results reveal that it outperforms most state-of-the-art approaches in terms of overall accuracy.




l

YANG2UML: Bijective Transformation and Simplification of YANG to UML. (arXiv:2005.03292v1 [cs.SE])

Software Defined Networking is currently revolutionizing computer networking by decoupling the network control (control plane) from the forwarding functions (data plane) enabling the network control to become directly programmable and the underlying infrastructure to be abstracted for applications and network services. Next to the well-known OpenFlow protocol, the XML-based NETCONF protocol is also an important means for exchanging configuration information from a management platform and is nowadays even part of OpenFlow. In combination with NETCONF, YANG is the corresponding protocol that defines the associated data structures supporting virtually all network configuration protocols. YANG itself is a semantically rich language, which -- in order to facilitate familiarization with the relevant subject -- is often visualized to involve other experts or developers and to support them by their daily work (writing applications which make use of YANG). In order to support this process, this paper presents an novel approach to optimize and simplify YANG data models to assist further discussions with the management and implementations (especially of interfaces) to reduce complexity. Therefore, we have defined a bidirectional mapping of YANG to UML and developed a tool that renders the created UML diagrams. This combines the benefits to use the formal language YANG with automatically maintained UML diagrams to involve other experts or developers, closing the gap between technically improved data models and their human readability.




l

On the unique solution of the generalized absolute value equation. (arXiv:2005.03287v1 [math.NA])

In this paper, some useful necessary and sufficient conditions for the unique solution of the generalized absolute value equation (GAVE) $Ax-B|x|=b$ with $A, Bin mathbb{R}^{n imes n}$ from the optimization field are first presented, which cover the fundamental theorem for the unique solution of the linear system $Ax=b$ with $Ain mathbb{R}^{n imes n}$. Not only that, some new sufficient conditions for the unique solution of the GAVE are obtained, which are weaker than the previous published works.




l

Multi-view data capture using edge-synchronised mobiles. (arXiv:2005.03286v1 [cs.MM])

Multi-view data capture permits free-viewpoint video (FVV) content creation. To this end, several users must capture video streams, calibrated in both time and pose, framing the same object/scene, from different viewpoints. New-generation network architectures (e.g. 5G) promise lower latency and larger bandwidth connections supported by powerful edge computing, properties that seem ideal for reliable FVV capture. We have explored this possibility, aiming to remove the need for bespoke synchronisation hardware when capturing a scene from multiple viewpoints, making it possible through off-the-shelf mobiles. We propose a novel and scalable data capture architecture that exploits edge resources to synchronise and harvest frame captures. We have designed an edge computing unit that supervises the relaying of timing triggers to and from multiple mobiles, in addition to synchronising frame harvesting. We empirically show the benefits of our edge computing unit by analysing latencies and show the quality of 3D reconstruction outputs against an alternative and popular centralised solution based on Unity3D.




l

Continuous maximal covering location problems with interconnected facilities. (arXiv:2005.03274v1 [math.OC])

In this paper we analyze a continuous version of the maximal covering location problem, in which the facilities are required to be interconnected by means of a graph structure in which two facilities are allowed to be linked if a given distance is not exceed. We provide a mathematical programming framework for the problem and different resolution strategies. First, we propose a Mixed Integer Non Linear Programming formulation, and derive properties of the problem that allow us to project the continuous variables out avoiding the nonlinear constraints, resulting in an equivalent pure integer programming formulation. Since the number of constraints in the integer programming formulation is large and the constraints are, in general, difficult to handle, we propose two branch-&-cut approaches that avoid the complete enumeration of the constraints resulting in more efficient procedures. We report the results of an extensive battery of computational experiments comparing the performance of the different approaches.




l

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions. (arXiv:2005.03271v1 [eess.AS])

In recent years, all-neural end-to-end approaches have obtained state-of-the-art results on several challenging automatic speech recognition (ASR) tasks. However, most existing works focus on building ASR models where train and test data are drawn from the same domain. This results in poor generalization characteristics on mismatched-domains: e.g., end-to-end models trained on short segments perform poorly when evaluated on longer utterances. In this work, we analyze the generalization properties of streaming and non-streaming recurrent neural network transducer (RNN-T) based end-to-end models in order to identify model components that negatively affect generalization performance. We propose two solutions: combining multiple regularization techniques during training, and using dynamic overlapping inference. On a long-form YouTube test set, when the non-streaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22.3% to 14.8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67.0% to 25.3%. Finally, when trained on Librispeech, we find that dynamic overlapping inference improves WER on YouTube from 99.8% to 33.0%.




l

Data selection for multi-task learning under dynamic constraints. (arXiv:2005.03270v1 [eess.SY])

Learning-based techniques are increasingly effective at controlling complex systems using data-driven models. However, most work done so far has focused on learning individual tasks or control laws. Hence, it is still a largely unaddressed research question how multiple tasks can be learned efficiently and simultaneously on the same system. In particular, no efficient state space exploration schemes have been designed for multi-task control settings. Using this research gap as our main motivation, we present an algorithm that approximates the smallest data set that needs to be collected in order to achieve high control performance for multiple learning-based control laws. We describe system uncertainty using a probabilistic Gaussian process model, which allows us to quantify the impact of potentially collected data on each learning-based controller. We then determine the optimal measurement locations by solving a stochastic optimization problem approximately. We show that, under reasonable assumptions, the approximate solution converges towards that of the exact problem. Additionally, we provide a numerical illustration of the proposed algorithm.




l

Online Proximal-ADMM For Time-varying Constrained Convex Optimization. (arXiv:2005.03267v1 [eess.SY])

This paper considers a convex optimization problem with cost and constraints that evolve over time. The function to be minimized is strongly convex and possibly non-differentiable, and variables are coupled through linear constraints.In this setting, the paper proposes an online algorithm based on the alternating direction method of multipliers(ADMM), to track the optimal solution trajectory of the time-varying problem; in particular, the proposed algorithm consists of a primal proximal gradient descent step and an appropriately perturbed dual ascent step. The paper derives tracking results, asymptotic bounds, and linear convergence results. The proposed algorithm is then specialized to a multi-area power grid optimization problem, and our numerical results verify the desired properties.




l

Adaptive Feature Selection Guided Deep Forest for COVID-19 Classification with Chest CT. (arXiv:2005.03264v1 [eess.IV])

Chest computed tomography (CT) becomes an effective tool to assist the diagnosis of coronavirus disease-19 (COVID-19). Due to the outbreak of COVID-19 worldwide, using the computed-aided diagnosis technique for COVID-19 classification based on CT images could largely alleviate the burden of clinicians. In this paper, we propose an Adaptive Feature Selection guided Deep Forest (AFS-DF) for COVID-19 classification based on chest CT images. Specifically, we first extract location-specific features from CT images. Then, in order to capture the high-level representation of these features with the relatively small-scale data, we leverage a deep forest model to learn high-level representation of the features. Moreover, we propose a feature selection method based on the trained deep forest model to reduce the redundancy of features, where the feature selection could be adaptively incorporated with the COVID-19 classification model. We evaluated our proposed AFS-DF on COVID-19 dataset with 1495 patients of COVID-19 and 1027 patients of community acquired pneumonia (CAP). The accuracy (ACC), sensitivity (SEN), specificity (SPE) and AUC achieved by our method are 91.79%, 93.05%, 89.95% and 96.35%, respectively. Experimental results on the COVID-19 dataset suggest that the proposed AFS-DF achieves superior performance in COVID-19 vs. CAP classification, compared with 4 widely used machine learning methods.




l

Quda: Natural Language Queries for Visual Data Analytics. (arXiv:2005.03257v1 [cs.CL])

Visualization-oriented natural language interfaces (V-NLIs) have been explored and developed in recent years. One challenge faced by V-NLIs is in the formation of effective design decisions that usually requires a deep understanding of user queries. Learning-based approaches have shown potential in V-NLIs and reached state-of-the-art performance in various NLP tasks. However, because of the lack of sufficient training samples that cater to visual data analytics, cutting-edge techniques have rarely been employed to facilitate the development of V-NLIs. We present a new dataset, called Quda, to help V-NLIs understand free-form natural language. Our dataset contains 14;035 diverse user queries annotated with 10 low-level analytic tasks that assist in the deployment of state-of-the-art techniques for parsing complex human language. We achieve this goal by first gathering seed queries with data analysts who are target users of V-NLIs. Then we employ extensive crowd force for paraphrase generation and validation. We demonstrate the usefulness of Quda in building V-NLIs by creating a prototype that makes effective design decisions for free-form user queries. We also show that Quda can be beneficial for a wide range of applications in the visualization community by analyzing the design tasks described in academic publications.




l

DFSeer: A Visual Analytics Approach to Facilitate Model Selection for Demand Forecasting. (arXiv:2005.03244v1 [cs.HC])

Selecting an appropriate model to forecast product demand is critical to the manufacturing industry. However, due to the data complexity, market uncertainty and users' demanding requirements for the model, it is challenging for demand analysts to select a proper model. Although existing model selection methods can reduce the manual burden to some extent, they often fail to present model performance details on individual products and reveal the potential risk of the selected model. This paper presents DFSeer, an interactive visualization system to conduct reliable model selection for demand forecasting based on the products with similar historical demand. It supports model comparison and selection with different levels of details. Besides, it shows the difference in model performance on similar products to reveal the risk of model selection and increase users' confidence in choosing a forecasting model. Two case studies and interviews with domain experts demonstrate the effectiveness and usability of DFSeer.




l

Enhancing Software Development Process Using Automated Adaptation of Object Ensembles. (arXiv:2005.03241v1 [cs.SE])

Software development has been changing rapidly. This development process can be influenced through changing developer friendly approaches. We can save time consumption and accelerate the development process if we can automatically guide programmer during software development. There are some approaches that recommended relevant code snippets and APIitems to the developer. Some approaches apply general code, searching techniques and some approaches use an online based repository mining strategies. But it gets quite difficult to help programmers when they need particular type conversion problems. More specifically when they want to adapt existing interfaces according to their expectation. One of the familiar triumph to guide developers in such situation is adapting collections and arrays through automated adaptation of object ensembles. But how does it help to a novice developer in real time software development that is not explicitly specified? In this paper, we have developed a system that works as a plugin-tool integrated with a particular Data Mining Integrated environment (DMIE) to recommend relevant interface while they seek for a type conversion situation. We have a mined repository of respective adapter classes and related APIs from where developer, search their query and get their result using the relevant transformer classes. The system that recommends developers titled automated objective ensembles (AOE plugin).From the investigation as we have ever made, we can see that our approach much better than some of the existing approaches.




l

Phase retrieval of complex-valued objects via a randomized Kaczmarz method. (arXiv:2005.03238v1 [cs.IT])

This paper investigates the convergence of the randomized Kaczmarz algorithm for the problem of phase retrieval of complex-valued objects. While this algorithm has been studied for the real-valued case}, its generalization to the complex-valued case is nontrivial and has been left as a conjecture. This paper establishes the connection between the convergence of the algorithm and the convexity of an objective function. Based on the connection, it demonstrates that when the sensing vectors are sampled uniformly from a unit sphere and the number of sensing vectors $m$ satisfies $m>O(nlog n)$ as $n, m ightarrowinfty$, then this algorithm with a good initialization achieves linear convergence to the solution with high probability.




l

Mortar-based entropy-stable discontinuous Galerkin methods on non-conforming quadrilateral and hexahedral meshes. (arXiv:2005.03237v1 [math.NA])

High-order entropy-stable discontinuous Galerkin (DG) methods for nonlinear conservation laws reproduce a discrete entropy inequality by combining entropy conservative finite volume fluxes with summation-by-parts (SBP) discretization matrices. In the DG context, on tensor product (quadrilateral and hexahedral) elements, SBP matrices are typically constructed by collocating at Lobatto quadrature points. Recent work has extended the construction of entropy-stable DG schemes to collocation at more accurate Gauss quadrature points.

In this work, we extend entropy-stable Gauss collocation schemes to non-conforming meshes. Entropy-stable DG schemes require computing entropy conservative numerical fluxes between volume and surface quadrature nodes. On conforming tensor product meshes where volume and surface nodes are aligned, flux evaluations are required only between "lines" of nodes. However, on non-conforming meshes, volume and surface nodes are no longer aligned, resulting in a larger number of flux evaluations. We reduce this expense by introducing an entropy-stable mortar-based treatment of non-conforming interfaces via a face-local correction term, and provide necessary conditions for high-order accuracy. Numerical experiments in both two and three dimensions confirm the stability and accuracy of this approach.




l

Safe Reinforcement Learning through Meta-learned Instincts. (arXiv:2005.03233v1 [cs.LG])

An important goal in reinforcement learning is to create agents that can quickly adapt to new goals while avoiding situations that might cause damage to themselves or their environments. One way agents learn is through exploration mechanisms, which are needed to discover new policies. However, in deep reinforcement learning, exploration is normally done by injecting noise in the action space. While performing well in many domains, this setup has the inherent risk that the noisy actions performed by the agent lead to unsafe states in the environment. Here we introduce a novel approach called Meta-Learned Instinctual Networks (MLIN) that allows agents to safely learn during their lifetime while avoiding potentially hazardous states. At the core of the approach is a plastic network trained through reinforcement learning and an evolved "instinctual" network, which does not change during the agent's lifetime but can modulate the noisy output of the plastic network. We test our idea on a simple 2D navigation task with no-go zones, in which the agent has to learn to approach new targets during deployment. MLIN outperforms standard meta-trained networks and allows agents to learn to navigate to new targets without colliding with any of the no-go zones. These results suggest that meta-learning augmented with an instinctual network is a promising new approach for safe AI, which may enable progress in this area on a variety of different domains.




l

Multi-Target Deep Learning for Algal Detection and Classification. (arXiv:2005.03232v1 [cs.CV])

Water quality has a direct impact on industry, agriculture, and public health. Algae species are common indicators of water quality. It is because algal communities are sensitive to changes in their habitats, giving valuable knowledge on variations in water quality. However, water quality analysis requires professional inspection of algal detection and classification under microscopes, which is very time-consuming and tedious. In this paper, we propose a novel multi-target deep learning framework for algal detection and classification. Extensive experiments were carried out on a large-scale colored microscopic algal dataset. Experimental results demonstrate that the proposed method leads to the promising performance on algal detection, class identification and genus identification.




l

Constructing Accurate and Efficient Deep Spiking Neural Networks with Double-threshold and Augmented Schemes. (arXiv:2005.03231v1 [cs.NE])

Spiking neural networks (SNNs) are considered as a potential candidate to overcome current challenges such as the high-power consumption encountered by artificial neural networks (ANNs), however there is still a gap between them with respect to the recognition accuracy on practical tasks. A conversion strategy was thus introduced recently to bridge this gap by mapping a trained ANN to an SNN. However, it is still unclear that to what extent this obtained SNN can benefit both the accuracy advantage from ANN and high efficiency from the spike-based paradigm of computation. In this paper, we propose two new conversion methods, namely TerMapping and AugMapping. The TerMapping is a straightforward extension of a typical threshold-balancing method with a double-threshold scheme, while the AugMapping additionally incorporates a new scheme of augmented spike that employs a spike coefficient to carry the number of typical all-or-nothing spikes occurring at a time step. We examine the performance of our methods based on MNIST, Fashion-MNIST and CIFAR10 datasets. The results show that the proposed double-threshold scheme can effectively improve accuracies of the converted SNNs. More importantly, the proposed AugMapping is more advantageous for constructing accurate, fast and efficient deep SNNs as compared to other state-of-the-art approaches. Our study therefore provides new approaches for further integration of advanced techniques in ANNs to improve the performance of SNNs, which could be of great merit to applied developments with spike-based neuromorphic computing.




l

Hierarchical Predictive Coding Models in a Deep-Learning Framework. (arXiv:2005.03230v1 [cs.CV])

Bayesian predictive coding is a putative neuromorphic method for acquiring higher-level neural representations to account for sensory input. Although originating in the neuroscience community, there are also efforts in the machine learning community to study these models. This paper reviews some of the more well known models. Our review analyzes module connectivity and patterns of information transfer, seeking to find general principles used across the models. We also survey some recent attempts to cast these models within a deep learning framework. A defining feature of Bayesian predictive coding is that it uses top-down, reconstructive mechanisms to predict incoming sensory inputs or their lower-level representations. Discrepancies between the predicted and the actual inputs, known as prediction errors, then give rise to future learning that refines and improves the predictive accuracy of learned higher-level representations. Predictive coding models intended to describe computations in the neocortex emerged prior to the development of deep learning and used a communication structure between modules that we name the Rao-Ballard protocol. This protocol was derived from a Bayesian generative model with some rather strong statistical assumptions. The RB protocol provides a rubric to assess the fidelity of deep learning models that claim to implement predictive coding.




l

Diagnosis of Coronavirus Disease 2019 (COVID-19) with Structured Latent Multi-View Representation Learning. (arXiv:2005.03227v1 [eess.IV])

Recently, the outbreak of Coronavirus Disease 2019 (COVID-19) has spread rapidly across the world. Due to the large number of affected patients and heavy labor for doctors, computer-aided diagnosis with machine learning algorithm is urgently needed, and could largely reduce the efforts of clinicians and accelerate the diagnosis process. Chest computed tomography (CT) has been recognized as an informative tool for diagnosis of the disease. In this study, we propose to conduct the diagnosis of COVID-19 with a series of features extracted from CT images. To fully explore multiple features describing CT images from different views, a unified latent representation is learned which can completely encode information from different aspects of features and is endowed with promising class structure for separability. Specifically, the completeness is guaranteed with a group of backward neural networks (each for one type of features), while by using class labels the representation is enforced to be compact within COVID-19/community-acquired pneumonia (CAP) and also a large margin is guaranteed between different types of pneumonia. In this way, our model can well avoid overfitting compared to the case of directly projecting highdimensional features into classes. Extensive experimental results show that the proposed method outperforms all comparison methods, and rather stable performances are observed when varying the numbers of training data.