on

Quantization of Lax integrable systems and Conformal Field Theory. (arXiv:2005.03053v1 [math-ph])

We present the correspondence between Lax integrable systems with spectral parameter on a Riemann surface, and Conformal Field Theories, in quite general set-up suggested earlier by the author. This correspondence turns out to give a prequantization of the integrable systems in question.




on

General Asymptotic Regional Gradient Observer. (arXiv:2005.03009v1 [math.OC])

The main purpose of this paper is to study and characterize the existing of general asymptotic regional gradient observer which observe the current gradient state of the original system in connection with gradient strategic sensors. Thus, we give an approach based to Luenberger observer theory of linear distributed parameter systems which is enabled to determinate asymptotically regional gradient estimator of current gradient system state. More precisely, under which condition the notion of asymptotic regional gradient observability can be achieved. Furthermore, we show that the measurement structures allows the existence of general asymptotic regional gradient observer and we give a sufficient condition for such asymptotic regional gradient observer in general case. We also show that, there exists a dynamical system for the considered system is not general asymptotic gradient observer in the usual sense, but it may be general asymptotic regional gradient observer. Then, for this purpose we present various results related to different types of sensor structures, domains and boundary conditions in two dimensional distributed diffusion systems




on

GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU. (arXiv:1908.01407v3 [cs.DC] CROSS LISTED)

High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs, because of three challenges: (1) difficulty of coming up with graph building blocks, (2) load imbalance on parallel hardware, and (3) graph problems having low arithmetic intensity. To address these challenges, GraphBLAS is an innovative, on-going effort by the graph analytics community to propose building blocks based in sparse linear algebra, which will allow graph algorithms to be expressed in a performant, succinct, composable and portable manner. In this paper, we examine the performance challenges of a linear algebra-based approach to building graph frameworks and describe new design principles for overcoming these bottlenecks. Among the new design principles is exploiting input sparsity, which allows users to write graph algorithms without specifying push and pull direction. Exploiting output sparsity allows users to tell the backend which values of the output in a single vectorized computation they do not want computed. Load-balancing is an important feature for balancing work amongst parallel workers. We describe the important load-balancing features for handling graphs with different characteristics. The design principles described in this paper have been implemented in "GraphBLAST", the first open-source linear algebra-based graph framework on GPU targeting high-performance computing. The results show that on a single GPU, GraphBLAST has on average at least an order of magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL, comparable performance to the fastest GPU hardwired primitives and shared-memory graph frameworks Ligra and Gunrock, and better performance than any other GPU graph framework, while offering a simpler and more concise programming model.




on

Multi-Resolution POMDP Planning for Multi-Object Search in 3D. (arXiv:2005.02878v2 [cs.RO] UPDATED)

Robots operating in household environments must find objects on shelves, under tables, and in cupboards. Previous work often formulate the object search problem as a POMDP (Partially Observable Markov Decision Process), yet constrain the search space in 2D. We propose a new approach that enables the robot to efficiently search for objects in 3D, taking occlusions into account. We model the problem as an object-oriented POMDP, where the robot receives a volumetric observation from a viewing frustum and must produce a policy to efficiently search for objects. To address the challenge of large state and observation spaces, we first propose a per-voxel observation model which drastically reduces the observation size necessary for planning. Then, we present a novel octree-based belief representation which captures beliefs at different resolutions and supports efficient exact belief update. Finally, we design an online multi-resolution planning algorithm that leverages the resolution layers in the octree structure as levels of abstractions to the original POMDP problem. Our evaluation in a simulated 3D domain shows that, as the problem scales, our approach significantly outperforms baselines without resolution hierarchy by 25%-35% in cumulative reward. We demonstrate the practicality of our approach on a torso-actuated mobile robot searching for objects in areas of a cluttered lab environment where objects appear on surfaces at different heights.




on

Modeling nanoconfinement effects using active learning. (arXiv:2005.02587v2 [physics.app-ph] UPDATED)

Predicting the spatial configuration of gas molecules in nanopores of shale formations is crucial for fluid flow forecasting and hydrocarbon reserves estimation. The key challenge in these tight formations is that the majority of the pore sizes are less than 50 nm. At this scale, the fluid properties are affected by nanoconfinement effects due to the increased fluid-solid interactions. For instance, gas adsorption to the pore walls could account for up to 85% of the total hydrocarbon volume in a tight reservoir. Although there are analytical solutions that describe this phenomenon for simple geometries, they are not suitable for describing realistic pores, where surface roughness and geometric anisotropy play important roles. To describe these, molecular dynamics (MD) simulations are used since they consider fluid-solid and fluid-fluid interactions at the molecular level. However, MD simulations are computationally expensive, and are not able to simulate scales larger than a few connected nanopores. We present a method for building and training physics-based deep learning surrogate models to carry out fast and accurate predictions of molecular configurations of gas inside nanopores. Since training deep learning models requires extensive databases that are computationally expensive to create, we employ active learning (AL). AL reduces the overhead of creating comprehensive sets of high-fidelity data by determining where the model uncertainty is greatest, and running simulations on the fly to minimize it. The proposed workflow enables nanoconfinement effects to be rigorously considered at the mesoscale where complex connected sets of nanopores control key applications such as hydrocarbon recovery and CO2 sequestration.




on

The Cascade Transformer: an Application for Efficient Answer Sentence Selection. (arXiv:2005.02534v2 [cs.CL] UPDATED)

Large transformer-based language models have been shown to be very effective in many classification tasks. However, their computational complexity prevents their use in applications requiring the classification of a large set of candidates. While previous works have investigated approaches to reduce model size, relatively little attention has been paid to techniques to improve batch throughput during inference. In this paper, we introduce the Cascade Transformer, a simple yet effective technique to adapt transformer-based models into a cascade of rankers. Each ranker is used to prune a subset of candidates in a batch, thus dramatically increasing throughput at inference time. Partial encodings from the transformer model are shared among rerankers, providing further speed-up. When compared to a state-of-the-art transformer model, our approach reduces computation by 37% with almost no impact on accuracy, as measured on two English Question Answering datasets.




on

On the list recoverability of randomly punctured codes. (arXiv:2005.02478v2 [math.CO] UPDATED)

We show that a random puncturing of a code with good distance is list recoverable beyond the Johnson bound. In particular, this implies that there are Reed-Solomon codes that are list recoverable beyond the Johnson bound. It was previously known that there are Reed-Solomon codes that do not have this property. As an immediate corollary to our main theorem, we obtain better degree bounds on unbalanced expanders that come from Reed-Solomon codes.




on

Temporal Event Segmentation using Attention-based Perceptual Prediction Model for Continual Learning. (arXiv:2005.02463v2 [cs.CV] UPDATED)

Temporal event segmentation of a long video into coherent events requires a high level understanding of activities' temporal features. The event segmentation problem has been tackled by researchers in an offline training scheme, either by providing full, or weak, supervision through manually annotated labels or by self-supervised epoch based training. In this work, we present a continual learning perceptual prediction framework (influenced by cognitive psychology) capable of temporal event segmentation through understanding of the underlying representation of objects within individual frames. Our framework also outputs attention maps which effectively localize and track events-causing objects in each frame. The model is tested on a wildlife monitoring dataset in a continual training manner resulting in $80\%$ recall rate at $20\%$ false positive rate for frame level segmentation. Activity level testing has yielded $80\%$ activity recall rate for one false activity detection every 50 minutes.




on

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations. (arXiv:2005.01348v2 [cs.CL] UPDATED)

Large-scale pretrained language models are the major driving force behind recent improvements in performance on the Winograd Schema Challenge, a widely employed test of common sense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding. Our results highlight interesting differences between humans and language models: language models are more sensitive to number or gender alternations and synonym replacements than humans, and humans are more stable and consistent in their predictions, maintain a much higher absolute performance, and perform better on non-associative instances than associative ones. Overall, humans are correct more often than out-of-the-box models, and the models are sometimes right for the wrong reasons. Finally, we show that fine-tuning on a large, task-specific dataset can offer a solution to these issues.




on

Prediction of Event Related Potential Speller Performance Using Resting-State EEG. (arXiv:2005.01325v3 [cs.HC] UPDATED)

Event-related potential (ERP) speller can be utilized in device control and communication for locked-in or severely injured patients. However, problems such as inter-subject performance instability and ERP-illiteracy are still unresolved. Therefore, it is necessary to predict classification performance before performing an ERP speller in order to use it efficiently. In this study, we investigated the correlations with ERP speller performance using a resting-state before an ERP speller. In specific, we used spectral power and functional connectivity according to four brain regions and five frequency bands. As a result, the delta power in the frontal region and functional connectivity in the delta, alpha, gamma bands are significantly correlated with the ERP speller performance. Also, we predicted the ERP speller performance using EEG features in the resting-state. These findings may contribute to investigating the ERP-illiteracy and considering the appropriate alternatives for each user.




on

Quantum arithmetic operations based on quantum Fourier transform on signed integers. (arXiv:2005.00443v2 [cs.IT] UPDATED)

The quantum Fourier transform brings efficiency in many respects, especially usage of resource, for most operations on quantum computers. In this study, the existing QFT-based and non-QFT-based quantum arithmetic operations are examined. The capabilities of QFT-based addition and multiplication are improved with some modifications. The proposed operations are compared with the nearest quantum arithmetic operations. Furthermore, novel QFT-based subtraction and division operations are presented. The proposed arithmetic operations can perform non-modular operations on all signed numbers without any limitation by using less resources. In addition, novel quantum circuits of two's complement, absolute value and comparison operations are also presented by using the proposed QFT based addition and subtraction operations.




on

On-board Deep-learning-based Unmanned Aerial Vehicle Fault Cause Detection and Identification. (arXiv:2005.00336v2 [eess.SP] UPDATED)

With the increase in use of Unmanned Aerial Vehicles (UAVs)/drones, it is important to detect and identify causes of failure in real time for proper recovery from a potential crash-like scenario or post incident forensics analysis. The cause of crash could be either a fault in the sensor/actuator system, a physical damage/attack, or a cyber attack on the drone's software. In this paper, we propose novel architectures based on deep Convolutional and Long Short-Term Memory Neural Networks (CNNs and LSTMs) to detect (via Autoencoder) and classify drone mis-operations based on sensor data. The proposed architectures are able to learn high-level features automatically from the raw sensor data and learn the spatial and temporal dynamics in the sensor data. We validate the proposed deep-learning architectures via simulations and experiments on a real drone. Empirical results show that our solution is able to detect with over 90% accuracy and classify various types of drone mis-operations (with about 99% accuracy (simulation data) and upto 88% accuracy (experimental data)).




on

Generative Adversarial Networks in Digital Pathology: A Survey on Trends and Future Potential. (arXiv:2004.14936v2 [eess.IV] UPDATED)

Image analysis in the field of digital pathology has recently gained increased popularity. The use of high-quality whole slide scanners enables the fast acquisition of large amounts of image data, showing extensive context and microscopic detail at the same time. Simultaneously, novel machine learning algorithms have boosted the performance of image analysis approaches. In this paper, we focus on a particularly powerful class of architectures, called Generative Adversarial Networks (GANs), applied to histological image data. Besides improving performance, GANs also enable application scenarios in this field, which were previously intractable. However, GANs could exhibit a potential for introducing bias. Hereby, we summarize the recent state-of-the-art developments in a generalizing notation, present the main applications of GANs and give an outlook of some chosen promising approaches and their possible future applications. In addition, we identify currently unavailable methods with potential for future applications.




on

Towards Embodied Scene Description. (arXiv:2004.14638v2 [cs.RO] UPDATED)

Embodiment is an important characteristic for all intelligent agents (creatures and robots), while existing scene description tasks mainly focus on analyzing images passively and the semantic understanding of the scenario is separated from the interaction between the agent and the environment. In this work, we propose the Embodied Scene Description, which exploits the embodiment ability of the agent to find an optimal viewpoint in its environment for scene description tasks. A learning framework with the paradigms of imitation learning and reinforcement learning is established to teach the intelligent agent to generate corresponding sensorimotor activities. The proposed framework is tested on both the AI2Thor dataset and a real world robotic platform demonstrating the effectiveness and extendability of the developed method.




on

Self-Attention with Cross-Lingual Position Representation. (arXiv:2004.13310v2 [cs.CL] UPDATED)

Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences. However, in cross-lingual scenarios, e.g. machine translation, the PEs of source and target sentences are modeled independently. Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem. In this paper, we augment SANs with emph{cross-lingual position representations} to model the bilingually aware latent structure for the input sentence. Specifically, we utilize bracketing transduction grammar (BTG)-based reordering information to encourage SANs to learn bilingual diagonal alignments. Experimental results on WMT'14 English$Rightarrow$German, WAT'17 Japanese$Rightarrow$English, and WMT'17 Chinese$Leftrightarrow$English translation tasks demonstrate that our approach significantly and consistently improves translation quality over strong baselines. Extensive analyses confirm that the performance gains come from the cross-lingual information.




on

Jealousy-freeness and other common properties in Fair Division of Mixed Manna. (arXiv:2004.11469v2 [cs.GT] UPDATED)

We consider a fair division setting where indivisible items are allocated to agents. Each agent in the setting has strictly negative, zero or strictly positive utility for each item. We, thus, make a distinction between items that are good for some agents and bad for other agents (i.e. mixed), good for everyone (i.e. goods) or bad for everyone (i.e. bads). For this model, we study axiomatic concepts of allocations such as jealousy-freeness up to one item, envy-freeness up to one item and Pareto-optimality. We obtain many new possibility and impossibility results in regard to combinations of these properties. We also investigate new computational tasks related to such combinations. Thus, we advance the state-of-the-art in fair division of mixed manna.




on

On the regularity of De Bruijn multigrids. (arXiv:2004.10128v2 [cs.DM] UPDATED)

In this paper we prove that any odd multigrid with non-zero rational offsets is regular, which means that its dual is a rhombic tiling. To prove this result we use a result on trigonometric diophantine equations.




on

SPECTER: Document-level Representation Learning using Citation-informed Transformers. (arXiv:2004.07180v3 [cs.CL] UPDATED)

Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards token- and sentence-level training objectives and do not leverage information on inter-document relatedness, which limits their document-level representation power. For applications on scientific documents, such as classification and recommendation, the embeddings power strong performance on end tasks. We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph. Unlike existing pretrained language models, SPECTER can be easily applied to downstream applications without task-specific fine-tuning. Additionally, to encourage further research on document-level models, we introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction, to document classification and recommendation. We show that SPECTER outperforms a variety of competitive baselines on the benchmark.




on

The growth rate over trees of any family of set defined by a monadic second order formula is semi-computable. (arXiv:2004.06508v3 [cs.DM] UPDATED)

Monadic second order logic can be used to express many classical notions of sets of vertices of a graph as for instance: dominating sets, induced matchings, perfect codes, independent sets or irredundant sets. Bounds on the number of sets of any such family of sets are interesting from a combinatorial point of view and have algorithmic applications. Many such bounds on different families of sets over different classes of graphs are already provided in the literature. In particular, Rote recently showed that the number of minimal dominating sets in trees of order $n$ is at most $95^{frac{n}{13}}$ and that this bound is asymptotically sharp up to a multiplicative constant. We build on his work to show that what he did for minimal dominating sets can be done for any family of sets definable by a monadic second order formula.

We first show that, for any monadic second order formula over graphs that characterizes a given kind of subset of its vertices, the maximal number of such sets in a tree can be expressed as the extit{growth rate of a bilinear system}. This mostly relies on well known links between monadic second order logic over trees and tree automata and basic tree automata manipulations. Then we show that this "growth rate" of a bilinear system can be approximated from above.We then use our implementation of this result to provide bounds on the number of independent dominating sets, total perfect dominating sets, induced matchings, maximal induced matchings, minimal perfect dominating sets, perfect codes and maximal irredundant sets on trees. We also solve a question from D. Y. Kang et al. regarding $r$-matchings and improve a bound from G'orska and Skupie'n on the number of maximal matchings on trees. Remark that this approach is easily generalizable to graphs of bounded tree width or clique width (or any similar class of graphs where tree automata are meaningful).




on

Decoding EEG Rhythms During Action Observation, Motor Imagery, and Execution for Standing and Sitting. (arXiv:2004.04107v2 [cs.HC] UPDATED)

Event-related desynchronization and synchronization (ERD/S) and movement-related cortical potential (MRCP) play an important role in brain-computer interfaces (BCI) for lower limb rehabilitation, particularly in standing and sitting. However, little is known about the differences in the cortical activation between standing and sitting, especially how the brain's intention modulates the pre-movement sensorimotor rhythm as they do for switching movements. In this study, we aim to investigate the decoding of continuous EEG rhythms during action observation (AO), motor imagery (MI), and motor execution (ME) for standing and sitting. We developed a behavioral task in which participants were instructed to perform both AO and MI/ME in regard to the actions of sit-to-stand and stand-to-sit. Our results demonstrated that the ERD was prominent during AO, whereas ERS was typical during MI at the alpha band across the sensorimotor area. A combination of the filter bank common spatial pattern (FBCSP) and support vector machine (SVM) for classification was used for both offline and pseudo-online analyses. The offline analysis indicated the classification of AO and MI providing the highest mean accuracy at 82.73$pm$2.38\% in stand-to-sit transition. By applying the pseudo-online analysis, we demonstrated the higher performance of decoding neural intentions from the MI paradigm in comparison to the ME paradigm. These observations led us to the promising aspect of using our developed tasks based on the integration of both AO and MI to build future exoskeleton-based rehabilitation systems.




on

PACT: Privacy Sensitive Protocols and Mechanisms for Mobile Contact Tracing. (arXiv:2004.03544v4 [cs.CR] UPDATED)

The global health threat from COVID-19 has been controlled in a number of instances by large-scale testing and contact tracing efforts. We created this document to suggest three functionalities on how we might best harness computing technologies to supporting the goals of public health organizations in minimizing morbidity and mortality associated with the spread of COVID-19, while protecting the civil liberties of individuals. In particular, this work advocates for a third-party free approach to assisted mobile contact tracing, because such an approach mitigates the security and privacy risks of requiring a trusted third party. We also explicitly consider the inferential risks involved in any contract tracing system, where any alert to a user could itself give rise to de-anonymizing information.

More generally, we hope to participate in bringing together colleagues in industry, academia, and civil society to discuss and converge on ideas around a critical issue rising with attempts to mitigate the COVID-19 pandemic.




on

Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms. (arXiv:2004.00526v2 [eess.AS] UPDATED)

Recent advances in deep learning have facilitated the design of speaker verification systems that directly input raw waveforms. For example, RawNet extracts speaker embeddings from raw waveforms, which simplifies the process pipeline and demonstrates competitive performance. In this study, we improve RawNet by scaling feature maps using various methods. The proposed mechanism utilizes a scale vector that adopts a sigmoid non-linear function. It refers to a vector with dimensionality equal to the number of filters in a given feature map. Using a scale vector, we propose to scale the feature map multiplicatively, additively, or both. In addition, we investigate replacing the first convolution layer with the sinc-convolution layer of SincNet. Experiments performed on the VoxCeleb1 evaluation dataset demonstrate the effectiveness of the proposed methods, and the best performing system reduces the equal error rate by half compared to the original RawNet. Expanded evaluation results obtained using the VoxCeleb1-E and VoxCeleb-H protocols marginally outperform existing state-of-the-art systems.




on

Personal Health Knowledge Graphs for Patients. (arXiv:2004.00071v2 [cs.AI] UPDATED)

Existing patient data analytics platforms fail to incorporate information that has context, is personal, and topical to patients. For a recommendation system to give a suitable response to a query or to derive meaningful insights from patient data, it should consider personal information about the patient's health history, including but not limited to their preferences, locations, and life choices that are currently applicable to them. In this review paper, we critique existing literature in this space and also discuss the various research challenges that come with designing, building, and operationalizing a personal health knowledge graph (PHKG) for patients.




on

Human Motion Transfer with 3D Constraints and Detail Enhancement. (arXiv:2003.13510v2 [cs.GR] UPDATED)

We propose a new method for realistic human motion transfer using a generative adversarial network (GAN), which generates a motion video of a target character imitating actions of a source character, while maintaining high authenticity of the generated results. We tackle the problem by decoupling and recombining the posture information and appearance information of both the source and target characters. The innovation of our approach lies in the use of the projection of a reconstructed 3D human model as the condition of GAN to better maintain the structural integrity of transfer results in different poses. We further introduce a detail enhancement net to enhance the details of transfer results by exploiting the details in real source frames. Extensive experiments show that our approach yields better results both qualitatively and quantitatively than the state-of-the-art methods.




on

Watching the World Go By: Representation Learning from Unlabeled Videos. (arXiv:2003.07990v2 [cs.CV] UPDATED)

Recent single image unsupervised representation learning techniques show remarkable success on a variety of tasks. The basic principle in these works is instance discrimination: learning to differentiate between two augmented versions of the same image and a large batch of unrelated images. Networks learn to ignore the augmentation noise and extract semantically meaningful representations. Prior work uses artificial data augmentation techniques such as cropping, and color jitter which can only affect the image in superficial ways and are not aligned with how objects actually change e.g. occlusion, deformation, viewpoint change. In this paper, we argue that videos offer this natural augmentation for free. Videos can provide entirely new views of objects, show deformation, and even connect semantically similar but visually distinct concepts. We propose Video Noise Contrastive Estimation, a method for using unlabeled video to learn strong, transferable single image representations. We demonstrate improvements over recent unsupervised single image techniques, as well as over fully supervised ImageNet pretraining, across a variety of temporal and non-temporal tasks. Code and the Random Related Video Views dataset are available at https://www.github.com/danielgordon10/vince




on

Hierarchical Neural Architecture Search for Single Image Super-Resolution. (arXiv:2003.04619v2 [cs.CV] UPDATED)

Deep neural networks have exhibited promising performance in image super-resolution (SR). Most SR models follow a hierarchical architecture that contains both the cell-level design of computational blocks and the network-level design of the positions of upsampling blocks. However, designing SR models heavily relies on human expertise and is very labor-intensive. More critically, these SR models often contain a huge number of parameters and may not meet the requirements of computation resources in real-world applications. To address the above issues, we propose a Hierarchical Neural Architecture Search (HNAS) method to automatically design promising architectures with different requirements of computation cost. To this end, we design a hierarchical SR search space and propose a hierarchical controller for architecture search. Such a hierarchical controller is able to simultaneously find promising cell-level blocks and network-level positions of upsampling layers. Moreover, to design compact architectures with promising performance, we build a joint reward by considering both the performance and computation cost to guide the search process. Extensive experiments on five benchmark datasets demonstrate the superiority of our method over existing methods.




on

Testing Scenario Library Generation for Connected and Automated Vehicles: An Adaptive Framework. (arXiv:2003.03712v2 [eess.SY] UPDATED)

How to generate testing scenario libraries for connected and automated vehicles (CAVs) is a major challenge faced by the industry. In previous studies, to evaluate maneuver challenge of a scenario, surrogate models (SMs) are often used without explicit knowledge of the CAV under test. However, performance dissimilarities between the SM and the CAV under test usually exist, and it can lead to the generation of suboptimal scenario libraries. In this paper, an adaptive testing scenario library generation (ATSLG) method is proposed to solve this problem. A customized testing scenario library for a specific CAV model is generated through an adaptive process. To compensate the performance dissimilarities and leverage each test of the CAV, Bayesian optimization techniques are applied with classification-based Gaussian Process Regression and a new-designed acquisition function. Comparing with a pre-determined library, a CAV can be tested and evaluated in a more efficient manner with the customized library. To validate the proposed method, a cut-in case study was performed and the results demonstrate that the proposed method can further accelerate the evaluation process by a few orders of magnitude.




on

Lake Ice Detection from Sentinel-1 SAR with Deep Learning. (arXiv:2002.07040v2 [eess.IV] UPDATED)

Lake ice, as part of the Essential Climate Variable (ECV) lakes, is an important indicator to monitor climate change and global warming. The spatio-temporal extent of lake ice cover, along with the timings of key phenological events such as freeze-up and break-up, provide important cues about the local and global climate. We present a lake ice monitoring system based on the automatic analysis of Sentinel-1 Synthetic Aperture Radar (SAR) data with a deep neural network. In previous studies that used optical satellite imagery for lake ice monitoring, frequent cloud cover was a main limiting factor, which we overcome thanks to the ability of microwave sensors to penetrate clouds and observe the lakes regardless of the weather and illumination conditions. We cast ice detection as a two class (frozen, non-frozen) semantic segmentation problem and solve it using a state-of-the-art deep convolutional network (CNN). We report results on two winters ( 2016 - 17 and 2017 - 18 ) and three alpine lakes in Switzerland. The proposed model reaches mean Intersection-over-Union (mIoU) scores >90% on average, and >84% even for the most difficult lake. Additionally, we perform cross-validation tests and show that our algorithm generalises well across unseen lakes and winters.




on

On Rearrangement of Items Stored in Stacks. (arXiv:2002.04979v2 [cs.RO] UPDATED)

There are $n ge 2$ stacks, each filled with $d$ items, and one empty stack. Every stack has capacity $d > 0$. A robot arm, in one stack operation (step), may pop one item from the top of a non-empty stack and subsequently push it onto a stack not at capacity. In a {em labeled} problem, all $nd$ items are distinguishable and are initially randomly scattered in the $n$ stacks. The items must be rearranged using pop-and-pushs so that in the end, the $k^{ m th}$ stack holds items $(k-1)d +1, ldots, kd$, in that order, from the top to the bottom for all $1 le k le n$. In an {em unlabeled} problem, the $nd$ items are of $n$ types of $d$ each. The goal is to rearrange items so that items of type $k$ are located in the $k^{ m th}$ stack for all $1 le k le n$. In carrying out the rearrangement, a natural question is to find the least number of required pop-and-pushes.

Our main contributions are: (1) an algorithm for restoring the order of $n^2$ items stored in an $n imes n$ table using only $2n$ column and row permutations, and its generalization, and (2) an algorithm with a guaranteed upper bound of $O(nd)$ steps for solving both versions of the stack rearrangement problem when $d le lceil cn ceil$ for arbitrary fixed positive number $c$. In terms of the required number of steps, the labeled and unlabeled version have lower bounds $Omega(nd + nd{frac{log d}{log n}})$ and $Omega(nd)$, respectively.




on

Toward Improving the Evaluation of Visual Attention Models: a Crowdsourcing Approach. (arXiv:2002.04407v2 [cs.CV] UPDATED)

Human visual attention is a complex phenomenon. A computational modeling of this phenomenon must take into account where people look in order to evaluate which are the salient locations (spatial distribution of the fixations), when they look in those locations to understand the temporal development of the exploration (temporal order of the fixations), and how they move from one location to another with respect to the dynamics of the scene and the mechanics of the eyes (dynamics). State-of-the-art models focus on learning saliency maps from human data, a process that only takes into account the spatial component of the phenomenon and ignore its temporal and dynamical counterparts. In this work we focus on the evaluation methodology of models of human visual attention. We underline the limits of the current metrics for saliency prediction and scanpath similarity, and we introduce a statistical measure for the evaluation of the dynamics of the simulated eye movements. While deep learning models achieve astonishing performance in saliency prediction, our analysis shows their limitations in capturing the dynamics of the process. We find that unsupervised gravitational models, despite of their simplicity, outperform all competitors. Finally, exploiting a crowd-sourcing platform, we present a study aimed at evaluating how strongly the scanpaths generated with the unsupervised gravitational models appear plausible to naive and expert human observers.




on

A memory of motion for visual predictive control tasks. (arXiv:2001.11759v3 [cs.RO] UPDATED)

This paper addresses the problem of efficiently achieving visual predictive control tasks. To this end, a memory of motion, containing a set of trajectories built off-line, is used for leveraging precomputation and dealing with difficult visual tasks. Standard regression techniques, such as k-nearest neighbors and Gaussian process regression, are used to query the memory and provide on-line a warm-start and a way point to the control optimization process. The proposed technique allows the control scheme to achieve high performance and, at the same time, keep the computational time limited. Simulation and experimental results, carried out with a 7-axis manipulator, show the effectiveness of the approach.




on

Continuous speech separation: dataset and analysis. (arXiv:2001.11482v3 [cs.SD] UPDATED)

This paper describes a dataset and protocols for evaluating continuous speech separation algorithms. Most prior studies on speech separation use pre-segmented signals of artificially mixed speech utterances which are mostly emph{fully} overlapped, and the algorithms are evaluated based on signal-to-distortion ratio or similar performance metrics. However, in natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components. In addition, the signal-based metrics have very weak correlations with automatic speech recognition (ASR) accuracy. We think that not only does this make it hard to assess the practical relevance of the tested algorithms, it also hinders researchers from developing systems that can be readily applied to real scenarios. In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a extit{continuous} audio stream that contains multiple utterances that are emph{partially} overlapped by a varying degree. A new real recorded dataset, called LibriCSS, is derived from LibriSpeech by concatenating the corpus utterances to simulate a conversation and capturing the audio replays with far-field microphones. A Kaldi-based ASR evaluation protocol is also established by using a well-trained multi-conditional acoustic model. By using this dataset, several aspects of a recently proposed speaker-independent CSS algorithm are investigated. The dataset and evaluation scripts are available to facilitate the research in this direction.




on

Evolutionary Dynamics of Higher-Order Interactions. (arXiv:2001.10313v2 [physics.soc-ph] UPDATED)

We live and cooperate in networks. However, links in networks only allow for pairwise interactions, thus making the framework suitable for dyadic games, but not for games that are played in groups of more than two players. To remedy this, we introduce higher-order interactions, where a link can connect more than two individuals, and study their evolutionary dynamics. We first consider a public goods game on a uniform hypergraph, showing that it corresponds to the replicator dynamics in the well-mixed limit, and providing an exact theoretical foundation to study cooperation in networked groups. We also extend the analysis to heterogeneous hypergraphs that describe interactions of groups of different sizes and characterize the evolution of cooperation in such cases. Finally, we apply our new formulation to study the nature of group dynamics in real systems, showing how to extract the actual dependence of the synergy factor on the size of a group from real-world collaboration data in science and technology. Our work is a first step towards the implementation of new actions to boost cooperation in social groups.




on

A Real-Time Approach for Chance-Constrained Motion Planning with Dynamic Obstacles. (arXiv:2001.08012v2 [cs.RO] UPDATED)

Uncertain dynamic obstacles, such as pedestrians or vehicles, pose a major challenge for optimal robot navigation with safety guarantees. Previous work on motion planning has followed two main strategies to provide a safe bound on an obstacle's space: a polyhedron, such as a cuboid, or a nonlinear differentiable surface, such as an ellipsoid. The former approach relies on disjunctive programming, which has a relatively high computational cost that grows exponentially with the number of obstacles. The latter approach needs to be linearized locally to find a tractable evaluation of the chance constraints, which dramatically reduces the remaining free space and leads to over-conservative trajectories or even unfeasibility. In this work, we present a hybrid approach that eludes the pitfalls of both strategies while maintaining the original safety guarantees. The key idea consists in obtaining a safe differentiable approximation for the disjunctive chance constraints bounding the obstacles. The resulting nonlinear optimization problem is free of chance constraint linearization and disjunctive programming, and therefore, it can be efficiently solved to meet fast real-time requirements with multiple obstacles. We validate our approach through mathematical proof, simulation and real experiments with an aerial robot using nonlinear model predictive control to avoid pedestrians.




on

Provenance for the Description Logic ELHr. (arXiv:2001.07541v2 [cs.LO] UPDATED)

We address the problem of handling provenance information in ELHr ontologies. We consider a setting recently introduced for ontology-based data access, based on semirings and extending classical data provenance, in which ontology axioms are annotated with provenance tokens. A consequence inherits the provenance of the axioms involved in deriving it, yielding a provenance polynomial as an annotation. We analyse the semantics for the ELHr case and show that the presence of conjunctions poses various difficulties for handling provenance, some of which are mitigated by assuming multiplicative idempotency of the semiring. Under this assumption, we study three problems: ontology completion with provenance, computing the set of relevant axioms for a consequence, and query answering.




on

Hardware Implementation of Neural Self-Interference Cancellation. (arXiv:2001.04543v2 [eess.SP] UPDATED)

In-band full-duplex systems can transmit and receive information simultaneously on the same frequency band. However, due to the strong self-interference caused by the transmitter to its own receiver, the use of non-linear digital self-interference cancellation is essential. In this work, we describe a hardware architecture for a neural network-based non-linear self-interference (SI) canceller and we compare it with our own hardware implementation of a conventional polynomial based SI canceller. In particular, we present implementation results for a shallow and a deep neural network SI canceller as well as for a polynomial SI canceller. Our results show that the deep neural network canceller achieves a hardware efficiency of up to $312.8$ Msamples/s/mm$^2$ and an energy efficiency of up to $0.9$ nJ/sample, which is $2.1 imes$ and $2 imes$ better than the polynomial SI canceller, respectively. These results show that NN-based methods applied to communications are not only useful from a performance perspective, but can also be a very effective means to reduce the implementation complexity.




on

Maximal Closed Set and Half-Space Separations in Finite Closure Systems. (arXiv:2001.04417v2 [cs.AI] UPDATED)

Several problems of artificial intelligence, such as predictive learning, formal concept analysis or inductive logic programming, can be viewed as a special case of half-space separation in abstract closure systems over finite ground sets. For the typical scenario that the closure system is given via a closure operator, we show that the half-space separation problem is NP-complete. As a first approach to overcome this negative result, we relax the problem to maximal closed set separation, give a greedy algorithm solving this problem with a linear number of closure operator calls, and show that this bound is sharp. For a second direction, we consider Kakutani closure systems and prove that they are algorithmically characterized by the greedy algorithm. As a first special case of the general problem setting, we consider Kakutani closure systems over graphs, generalize a fundamental characterization result based on the Pasch axiom to graph structured partitioning of finite sets, and give a sufficient condition for this kind of closures systems in terms of graph minors. For a second case, we then focus on closure systems over finite lattices, give an improved adaptation of the greedy algorithm for this special case, and present two applications concerning formal concept and subsumption lattices. We also report some experimental results to demonstrate the practical usefulness of our algorithm.




on

Intra-Variable Handwriting Inspection Reinforced with Idiosyncrasy Analysis. (arXiv:1912.12168v2 [cs.CV] UPDATED)

In this paper, we work on intra-variable handwriting, where the writing samples of an individual can vary significantly. Such within-writer variation throws a challenge for automatic writer inspection, where the state-of-the-art methods do not perform well. To deal with intra-variability, we analyze the idiosyncrasy in individual handwriting. We identify/verify the writer from highly idiosyncratic text-patches. Such patches are detected using a deep recurrent reinforcement learning-based architecture. An idiosyncratic score is assigned to every patch, which is predicted by employing deep regression analysis. For writer identification, we propose a deep neural architecture, which makes the final decision by the idiosyncratic score-induced weighted average of patch-based decisions. For writer verification, we propose two algorithms for patch-fed deep feature aggregation, which assist in authentication using a triplet network. The experiments were performed on two databases, where we obtained encouraging results.




on

Safe non-smooth black-box optimization with application to policy search. (arXiv:1912.09466v3 [math.OC] UPDATED)

For safety-critical black-box optimization tasks, observations of the constraints and the objective are often noisy and available only for the feasible points. We propose an approach based on log barriers to find a local solution of a non-convex non-smooth black-box optimization problem $min f^0(x)$ subject to $f^i(x)leq 0,~ i = 1,ldots, m$, at the same time, guaranteeing constraint satisfaction while learning an optimal solution with high probability. Our proposed algorithm exploits noisy observations to iteratively improve on an initial safe point until convergence. We derive the convergence rate and prove safety of our algorithm. We demonstrate its performance in an application to an iterative control design problem.




on

SCAttNet: Semantic Segmentation Network with Spatial and Channel Attention Mechanism for High-Resolution Remote Sensing Images. (arXiv:1912.09121v2 [cs.CV] UPDATED)

High-resolution remote sensing images (HRRSIs) contain substantial ground object information, such as texture, shape, and spatial location. Semantic segmentation, which is an important task for element extraction, has been widely used in processing mass HRRSIs. However, HRRSIs often exhibit large intraclass variance and small interclass variance due to the diversity and complexity of ground objects, thereby bringing great challenges to a semantic segmentation task. In this paper, we propose a new end-to-end semantic segmentation network, which integrates lightweight spatial and channel attention modules that can refine features adaptively. We compare our method with several classic methods on the ISPRS Vaihingen and Potsdam datasets. Experimental results show that our method can achieve better semantic segmentation results. The source codes are available at https://github.com/lehaifeng/SCAttNet.




on

A predictive path-following controller for multi-steered articulated vehicles. (arXiv:1912.06259v5 [math.OC] UPDATED)

Stabilizing multi-steered articulated vehicles in backward motion is a complex task for any human driver. Unless the vehicle is accurately steered, its structurally unstable joint-angle kinematics during reverse maneuvers can cause the vehicle segments to fold and enter a jack-knife state. In this work, a model predictive path-following controller is proposed enabling automatic low-speed steering control of multi-steered articulated vehicles, comprising a car-like tractor and an arbitrary number of trailers with passive or active steering. The proposed path-following controller is tailored to follow nominal paths that contains full state and control-input information, and is designed to satisfy various physical constraints on the vehicle states as well as saturations and rate limitations on the tractor's curvature and the trailer steering angles. The performance of the proposed model predictive path-following controller is evaluated in a set of simulations for a multi-steered 2-trailer with a car-like tractor where the last trailer has steerable wheels.




on

SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval. (arXiv:1912.05891v2 [cs.IR] UPDATED)

In learning-to-rank for information retrieval, a ranking model is automatically learned from the data and then utilized to rank the sets of retrieved documents. Therefore, an ideal ranking model would be a mapping from a document set to a permutation on the set, and should satisfy two critical requirements: (1)~it should have the ability to model cross-document interactions so as to capture local context information in a query; (2)~it should be permutation-invariant, which means that any permutation of the inputted documents would not change the output ranking. Previous studies on learning-to-rank either design uni-variate scoring functions that score each document separately, and thus failed to model the cross-document interactions; or construct multivariate scoring functions that score documents sequentially, which inevitably sacrifice the permutation invariance requirement. In this paper, we propose a neural learning-to-rank model called SetRank which directly learns a permutation-invariant ranking model defined on document sets of any size. SetRank employs a stack of (induced) multi-head self attention blocks as its key component for learning the embeddings for all of the retrieved documents jointly. The self-attention mechanism not only helps SetRank to capture the local context information from cross-document interactions, but also to learn permutation-equivariant representations for the inputted documents, which therefore achieving a permutation-invariant ranking model. Experimental results on three large scale benchmarks showed that the SetRank significantly outperformed the baselines include the traditional learning-to-rank models and state-of-the-art Neural IR models.




on

Novel Deep Learning Framework for Wideband Spectrum Characterization at Sub-Nyquist Rate. (arXiv:1912.05255v2 [eess.SP] UPDATED)

Introduction of spectrum-sharing in 5G and subsequent generation networks demand base-station(s) with the capability to characterize the wideband spectrum spanned over licensed, shared and unlicensed non-contiguous frequency bands. Spectrum characterization involves the identification of vacant bands along with center frequency and parameters (energy, modulation, etc.) of occupied bands. Such characterization at Nyquist sampling is area and power-hungry due to the need for high-speed digitization. Though sub-Nyquist sampling (SNS) offers an excellent alternative when the spectrum is sparse, it suffers from poor performance at low signal to noise ratio (SNR) and demands careful design and integration of digital reconstruction, tunable channelizer and characterization algorithms. In this paper, we propose a novel deep-learning framework via a single unified pipeline to accomplish two tasks: 1)~Reconstruct the signal directly from sub-Nyquist samples, and 2)~Wideband spectrum characterization. The proposed approach eliminates the need for complex signal conditioning between reconstruction and characterization and does not need complex tunable channelizers. We extensively compare the performance of our framework for a wide range of modulation schemes, SNR and channel conditions. We show that the proposed framework outperforms existing SNS based approaches and characterization performance approaches to Nyquist sampling-based framework with an increase in SNR. Easy to design and integrate along with a single unified deep learning framework make the proposed architecture a good candidate for reconfigurable platforms.




on

IPG-Net: Image Pyramid Guidance Network for Small Object Detection. (arXiv:1912.00632v3 [cs.CV] UPDATED)

For Convolutional Neural Network-based object detection, there is a typical dilemma: the spatial information is well kept in the shallow layers which unfortunately do not have enough semantic information, while the deep layers have a high semantic concept but lost a lot of spatial information, resulting in serious information imbalance. To acquire enough semantic information for shallow layers, Feature Pyramid Networks (FPN) is used to build a top-down propagated path. In this paper, except for top-down combining of information for shallow layers, we propose a novel network called Image Pyramid Guidance Network (IPG-Net) to make sure both the spatial information and semantic information are abundant for each layer. Our IPG-Net has two main parts: the image pyramid guidance transformation module and the image pyramid guidance fusion module. Our main idea is to introduce the image pyramid guidance into the backbone stream to solve the information imbalance problem, which alleviates the vanishment of the small object features. This IPG transformation module promises even in the deepest stage of the backbone, there is enough spatial information for bounding box regression and classification. Furthermore, we designed an effective fusion module to fuse the features from the image pyramid and features from the backbone stream. We have tried to apply this novel network to both one-stage and two-stage detection models, state of the art results are obtained on the most popular benchmark data sets, i.e. MS COCO and Pascal VOC.




on

Towards a Proof of the Fourier--Entropy Conjecture?. (arXiv:1911.10579v2 [cs.DM] UPDATED)

The total influence of a function is a central notion in analysis of Boolean functions, and characterizing functions that have small total influence is one of the most fundamental questions associated with it. The KKL theorem and the Friedgut junta theorem give a strong characterization of such functions whenever the bound on the total influence is $o(log n)$. However, both results become useless when the total influence of the function is $omega(log n)$. The only case in which this logarithmic barrier has been broken for an interesting class of functions was proved by Bourgain and Kalai, who focused on functions that are symmetric under large enough subgroups of $S_n$.

In this paper, we build and improve on the techniques of the Bourgain-Kalai paper and establish new concentration results on the Fourier spectrum of Boolean functions with small total influence. Our results include:

1. A quantitative improvement of the Bourgain--Kalai result regarding the total influence of functions that are transitively symmetric.

2. A slightly weaker version of the Fourier--Entropy Conjecture of Friedgut and Kalai. This weaker version implies in particular that the Fourier spectrum of a constant variance, Boolean function $f$ is concentrated on $2^{O(I[f]log I[f])}$ characters, improving an earlier result of Friedgut. Removing the $log I[f]$ factor would essentially resolve the Fourier--Entropy Conjecture, as well as settle a conjecture of Mansour regarding the Fourier spectrum of polynomial size DNF formulas.

Our concentration result has new implications in learning theory: it implies that the class of functions whose total influence is at most $K$ is agnostically learnable in time $2^{O(Klog K)}$, using membership queries.




on

Two-Stream FCNs to Balance Content and Style for Style Transfer. (arXiv:1911.08079v2 [cs.CV] UPDATED)

Style transfer is to render given image contents in given styles, and it has an important role in both computer vision fundamental research and industrial applications. Following the success of deep learning based approaches, this problem has been re-launched recently, but still remains a difficult task because of trade-off between preserving contents and faithful rendering of styles. Indeed, how well-balanced content and style are is crucial in evaluating the quality of stylized images. In this paper, we propose an end-to-end two-stream Fully Convolutional Networks (FCNs) aiming at balancing the contributions of the content and the style in rendered images. Our proposed network consists of the encoder and decoder parts. The encoder part utilizes a FCN for content and a FCN for style where the two FCNs have feature injections and are independently trained to preserve the semantic content and to learn the faithful style representation in each. The semantic content feature and the style representation feature are then concatenated adaptively and fed into the decoder to generate style-transferred (stylized) images. In order to train our proposed network, we employ a loss network, the pre-trained VGG-16, to compute content loss and style loss, both of which are efficiently used for the feature injection as well as the feature concatenation. Our intensive experiments show that our proposed model generates more balanced stylized images in content and style than state-of-the-art methods. Moreover, our proposed network achieves efficiency in speed.




on

t-SS3: a text classifier with dynamic n-grams for early risk detection over text streams. (arXiv:1911.06147v2 [cs.CL] UPDATED)

A recently introduced classifier, called SS3, has shown to be well suited to deal with early risk detection (ERD) problems on text streams. It obtained state-of-the-art performance on early depression and anorexia detection on Reddit in the CLEF's eRisk open tasks. SS3 was created to deal with ERD problems naturally since: it supports incremental training and classification over text streams, and it can visually explain its rationale. However, SS3 processes the input using a bag-of-word model lacking the ability to recognize important word sequences. This aspect could negatively affect the classification performance and also reduces the descriptiveness of visual explanations. In the standard document classification field, it is very common to use word n-grams to try to overcome some of these limitations. Unfortunately, when working with text streams, using n-grams is not trivial since the system must learn and recognize which n-grams are important "on the fly". This paper introduces t-SS3, an extension of SS3 that allows it to recognize useful patterns over text streams dynamically. We evaluated our model in the eRisk 2017 and 2018 tasks on early depression and anorexia detection. Experimental results suggest that t-SS3 is able to improve both current results and the richness of visual explanations.




on

Unsupervised Domain Adaptation on Reading Comprehension. (arXiv:1911.06137v4 [cs.CL] UPDATED)

Reading comprehension (RC) has been studied in a variety of datasets with the boosted performance brought by deep neural networks. However, the generalization capability of these models across different domains remains unclear. To alleviate this issue, we are going to investigate unsupervised domain adaptation on RC, wherein a model is trained on labeled source domain and to be applied to the target domain with only unlabeled samples. We first show that even with the powerful BERT contextual representation, the performance is still unsatisfactory when the model trained on one dataset is directly applied to another target dataset. To solve this, we provide a novel conditional adversarial self-training method (CASe). Specifically, our approach leverages a BERT model fine-tuned on the source dataset along with the confidence filtering to generate reliable pseudo-labeled samples in the target domain for self-training. On the other hand, it further reduces domain distribution discrepancy through conditional adversarial learning across domains. Extensive experiments show our approach achieves comparable accuracy to supervised models on multiple large-scale benchmark datasets.




on

Revisiting Semantics of Interactions for Trace Validity Analysis. (arXiv:1911.03094v2 [cs.SE] UPDATED)

Interaction languages such as MSC are often associated with formal semantics by means of translations into distinct behavioral formalisms such as automatas or Petri nets. In contrast to translational approaches we propose an operational approach. Its principle is to identify which elementary communication actions can be immediately executed, and then to compute, for every such action, a new interaction representing the possible continuations to its execution. We also define an algorithm for checking the validity of execution traces (i.e. whether or not they belong to an interaction's semantics). Algorithms for semantic computation and trace validity are analyzed by means of experiments.




on

Imitation Learning for Human-robot Cooperation Using Bilateral Control. (arXiv:1909.13018v2 [cs.RO] UPDATED)

Robots are required to operate autonomously in response to changing situations. Previously, imitation learning using 4ch-bilateral control was demonstrated to be suitable for imitation of object manipulation. However, cooperative work between humans and robots has not yet been verified in these studies. In this study, the task was expanded by cooperative work between a human and a robot. 4ch-bilateral control was used to collect training data for training robot motion. We focused on serving salad as a task in the home. The task was executed with a spoon and a fork fixed to robots. Adjustment of force was indispensable in manipulating indefinitely shaped objects such as salad. Results confirmed the effectiveness of the proposed method as demonstrated by the success of the task.