pr

Visualisation and knowledge discovery from interpretable models. (arXiv:2005.03632v1 [cs.LG])

Increasing number of sectors which affect human lives, are using Machine Learning (ML) tools. Hence the need for understanding their working mechanism and evaluating their fairness in decision-making, are becoming paramount, ushering in the era of Explainable AI (XAI). In this contribution we introduced a few intrinsically interpretable models which are also capable of dealing with missing values, in addition to extracting knowledge from the dataset and about the problem. These models are also capable of visualisation of the classifier and decision boundaries: they are the angle based variants of Learning Vector Quantization. We have demonstrated the algorithms on a synthetic dataset and a real-world one (heart disease dataset from the UCI repository). The newly developed classifiers helped in investigating the complexities of the UCI dataset as a multiclass problem. The performance of the developed classifiers were comparable to those reported in literature for this dataset, with additional value of interpretability, when the dataset was treated as a binary class problem.




pr

Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach. (arXiv:2005.03582v1 [cs.LG])

Early detection of patients vulnerable to infections acquired in the hospital environment is a challenge in current health systems given the impact that such infections have on patient mortality and healthcare costs. This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units by means of machine-learning methods. The aim is to support decision making addressed at reducing the incidence rate of infections. In this field, it is necessary to deal with the problem of building reliable classifiers from imbalanced datasets. We propose a clustering-based undersampling strategy to be used in combination with ensemble classifiers. A comparative study with data from 4616 patients was conducted in order to validate our proposal. We applied several single and ensemble classifiers both to the original dataset and to data preprocessed by means of different resampling methods. The results were analyzed by means of classic and recent metrics specifically designed for imbalanced data classification. They revealed that the proposal is more efficient in comparison with other approaches.




pr

Estimating customer impatience in a service system with balking. (arXiv:2005.03576v1 [math.PR])

This paper studies a service system in which arriving customers are provided with information about the delay they will experience. Based on this information they decide to wait for service or to leave the system. The main objective is to estimate the customers' patience-level distribution and the corresponding potential arrival rate, using knowledge of the actual workload process only. We cast the system as a queueing model, so as to evaluate the corresponding likelihood function. Estimating the unknown parameters relying on a maximum likelihood procedure, we prove strong consistency and derive the asymptotic distribution of the estimation error. Several applications and extensions of the method are discussed. In particular, we indicate how our method generalizes to a multi-server setting. The performance of our approach is assessed through a series of numerical experiments. By fitting parameters of hyperexponential and generalized-hyperexponential distributions our method provides a robust estimation framework for any continuous patience-level distribution.




pr

Sequential Aggregation of Probabilistic Forecasts -- Applicaton to Wind Speed Ensemble Forecasts. (arXiv:2005.03540v1 [stat.AP])

In the field of numerical weather prediction (NWP), the probabilistic distribution of the future state of the atmosphere is sampled with Monte-Carlo-like simulations, called ensembles. These ensembles have deficiencies (such as conditional biases) that can be corrected thanks to statistical post-processing methods. Several ensembles exist and may be corrected with different statistiscal methods. A further step is to combine these raw or post-processed ensembles. The theory of prediction with expert advice allows us to build combination algorithms with theoretical guarantees on the forecast performance. This article adapts this theory to the case of probabilistic forecasts issued as step-wise cumulative distribution functions (CDF). The theory is applied to wind speed forecasting, by combining several raw or post-processed ensembles, considered as CDFs. The second goal of this study is to explore the use of two forecast performance criteria: the Continous ranked probability score (CRPS) and the Jolliffe-Primo test. Comparing the results obtained with both criteria leads to reconsidering the usual way to build skillful probabilistic forecasts, based on the minimization of the CRPS. Minimizing the CRPS does not necessarily produce reliable forecasts according to the Jolliffe-Primo test. The Jolliffe-Primo test generally selects reliable forecasts, but could lead to issuing suboptimal forecasts in terms of CRPS. It is proposed to use both criterion to achieve reliable and skillful probabilistic forecasts.




pr

Interpreting Deep Models through the Lens of Data. (arXiv:2005.03442v1 [cs.LG])

Identification of input data points relevant for the classifier (i.e. serve as the support vector) has recently spurred the interest of researchers for both interpretability as well as dataset debugging. This paper presents an in-depth analysis of the methods which attempt to identify the influence of these data points on the resulting classifier. To quantify the quality of the influence, we curated a set of experiments where we debugged and pruned the dataset based on the influence information obtained from different methods. To do so, we provided the classifier with mislabeled examples that hampered the overall performance. Since the classifier is a combination of both the data and the model, therefore, it is essential to also analyze these influences for the interpretability of deep learning models. Analysis of the results shows that some interpretability methods can detect mislabels better than using a random approach, however, contrary to the claim of these methods, the sample selection based on the training loss showed a superior performance.




pr

Relevance Vector Machine with Weakly Informative Hyperprior and Extended Predictive Information Criterion. (arXiv:2005.03419v1 [stat.ML])

In the variational relevance vector machine, the gamma distribution is representative as a hyperprior over the noise precision of automatic relevance determination prior. Instead of the gamma hyperprior, we propose to use the inverse gamma hyperprior with a shape parameter close to zero and a scale parameter not necessary close to zero. This hyperprior is associated with the concept of a weakly informative prior. The effect of this hyperprior is investigated through regression to non-homogeneous data. Because it is difficult to capture the structure of such data with a single kernel function, we apply the multiple kernel method, in which multiple kernel functions with different widths are arranged for input data. We confirm that the degrees of freedom in a model is controlled by adjusting the scale parameter and keeping the shape parameter close to zero. A candidate for selecting the scale parameter is the predictive information criterion. However the estimated model using this criterion seems to cause over-fitting. This is because the multiple kernel method makes the model a situation where the dimension of the model is larger than the data size. To select an appropriate scale parameter even in such a situation, we also propose an extended prediction information criterion. It is confirmed that a multiple kernel relevance vector regression model with good predictive accuracy can be obtained by selecting the scale parameter minimizing extended prediction information criterion.




pr

A Locally Adaptive Interpretable Regression. (arXiv:2005.03350v1 [stat.ML])

Machine learning models with both good predictability and high interpretability are crucial for decision support systems. Linear regression is one of the most interpretable prediction models. However, the linearity in a simple linear regression worsens its predictability. In this work, we introduce a locally adaptive interpretable regression (LoAIR). In LoAIR, a metamodel parameterized by neural networks predicts percentile of a Gaussian distribution for the regression coefficients for a rapid adaptation. Our experimental results on public benchmark datasets show that our model not only achieves comparable or better predictive performance than the other state-of-the-art baselines but also discovers some interesting relationships between input and target variables such as a parabolic relationship between CO2 emissions and Gross National Product (GNP). Therefore, LoAIR is a step towards bridging the gap between econometrics, statistics, and machine learning by improving the predictive ability of linear regression without depreciating its interpretability.




pr

On a computationally-scalable sparse formulation of the multidimensional and non-stationary maximum entropy principle. (arXiv:2005.03253v1 [stat.CO])

Data-driven modelling and computational predictions based on maximum entropy principle (MaxEnt-principle) aim at finding as-simple-as-possible - but not simpler then necessary - models that allow to avoid the data overfitting problem. We derive a multivariate non-parametric and non-stationary formulation of the MaxEnt-principle and show that its solution can be approximated through a numerical maximisation of the sparse constrained optimization problem with regularization. Application of the resulting algorithm to popular financial benchmarks reveals memoryless models allowing for simple and qualitative descriptions of the major stock market indexes data. We compare the obtained MaxEnt-models to the heteroschedastic models from the computational econometrics (GARCH, GARCH-GJR, MS-GARCH, GARCH-PML4) in terms of the model fit, complexity and prediction quality. We compare the resulting model log-likelihoods, the values of the Bayesian Information Criterion, posterior model probabilities, the quality of the data autocorrelation function fits as well as the Value-at-Risk prediction quality. We show that all of the considered seven major financial benchmark time series (DJI, SPX, FTSE, STOXX, SMI, HSI and N225) are better described by conditionally memoryless MaxEnt-models with nonstationary regime-switching than by the common econometric models with finite memory. This analysis also reveals a sparse network of statistically-significant temporal relations for the positive and negative latent variance changes among different markets. The code is provided for open access.




pr

Fractional ridge regression: a fast, interpretable reparameterization of ridge regression. (arXiv:2005.03220v1 [stat.ME])

Ridge regression (RR) is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using RR is the need to set a hyperparameter ($alpha$) that controls the amount of regularization. Cross-validation is typically used to select the best $alpha$ from a set of candidates. However, efficient and appropriate selection of $alpha$ can be challenging, particularly where large amounts of data are analyzed. Because the selected $alpha$ depends on the scale of the data and predictors, it is not straightforwardly interpretable. Here, we propose to reparameterize RR in terms of the ratio $gamma$ between the L2-norms of the regularized and unregularized coefficients. This approach, called fractional RR (FRR), has several benefits: the solutions obtained for different $gamma$ are guaranteed to vary, guarding against wasted calculations, and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. We provide an algorithm to solve FRR, as well as open-source software implementations in Python and MATLAB (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems, and delivers results that are straightforward to interpret and compare across models and datasets.




pr

Convergence and inference for mixed Poisson random sums. (arXiv:2005.03187v1 [math.PR])

In this paper we obtain the limit distribution for partial sums with a random number of terms following a class of mixed Poisson distributions. The resulting weak limit is a mixing between a normal distribution and an exponential family, which we call by normal exponential family (NEF) laws. A new stability concept is introduced and a relationship between {alpha}-stable distributions and NEF laws is established. We propose estimation of the parameters of the NEF models through the method of moments and also by the maximum likelihood method, which is performed via an Expectation-Maximization algorithm. Monte Carlo simulation studies are addressed to check the performance of the proposed estimators and an empirical illustration on financial market is presented.




pr

Adaptive Invariance for Molecule Property Prediction. (arXiv:2005.03004v1 [q-bio.QM])

Effective property prediction methods can help accelerate the search for COVID-19 antivirals either through accurate in-silico screens or by effectively guiding on-going at-scale experimental efforts. However, existing prediction tools have limited ability to accommodate scarce or fragmented training data currently available. In this paper, we introduce a novel approach to learn predictors that can generalize or extrapolate beyond the heterogeneous data. Our method builds on and extends recently proposed invariant risk minimization, adaptively forcing the predictor to avoid nuisance variation. We achieve this by continually exercising and manipulating latent representations of molecules to highlight undesirable variation to the predictor. To test the method we use a combination of three data sources: SARS-CoV-2 antiviral screening data, molecular fragments that bind to SARS-CoV-2 main protease and large screening data for SARS-CoV-1. Our predictor outperforms state-of-the-art transfer learning methods by significant margin. We also report the top 20 predictions of our model on Broad drug repurposing hub.




pr

Call for nominations: NSW Premier’s History Awards 2020

Wednesday 19 February 2020
The State Library announces the opening of nominations for the NSW Premier’s History Awards 2020.

 




pr

Shortlists announced for 2020 NSW Premier’s Literary Awards

Friday 20 March 2020
Contemporary works by leading and emerging Australian writers have been shortlisted for the 2020 NSW Premier's Literary Awards, the State Library of NSW announced today.




pr

2020 NSW Premier’s Literary Awards announced

Sunday 26 April 2020
A total of $295,000 awarded across 12 prize categories. 




pr

History of Pre-Modern Medicine Seminar Series, Spring 2018

The History of Pre-Modern Medicine seminar series returns this month. The 2017–18 series – organised by a group of historians of medicine based at London universities and hosted by the Wellcome Library – will conclude with four seminars. The series… Continue reading




pr

2020 NSW Premier’s Literary Awards announced

A total of $295,000 awarded across 12 prize categories.




pr

Wyllie's treatment of epilepsy : principles and practice

149639769X




pr

Wood microbiology : decay and its prevention

Zabel, R. A. (Robert A.), author
9780128205730 (electronic bk.)




pr

Wine science : principles and applications

Jackson, Ron S., author.
9780128161180




pr

Vertebrate and invertebrate respiratory proteins, lipoproteins and other body fluid proteins

9783030417697 (electronic bk.)




pr

Treatment of skin diseases : a practical guide

Zaidi, Zohra, author.
9783319895819 (electronic bk.)




pr

Tissue engineering : principles, protocols, and practical exercises

9783030396985




pr

Theranostics approaches to gastric and colon cancer

9789811520174 (electronic bk.)




pr

The public policy primer : managing the policy process

Wu, Xun, author.
9781315624754 (electronic bk.)




pr

The evolution of feathers : from their origin to the present

9783030272234 electronic book




pr

The complexity of bird behaviour : a facet theory approach

Hackett, Paul, 1960- author
9783030121921 (electronic bk.)




pr

Temporomandibular disorders : a translational approach from basic science to clinical applicability

9783319572475 (electronic bk.)




pr

Systems approaches to making change : a practical guide

9781447174721 (electronic bk.)




pr

Sustainable digital communities : 15th International Conference, iConference 2020, Boras, Sweden, March 23–26, 2020, Proceedings

iConference (Conference) (15th : 2020 : Boras, Sweden)
9783030436872




pr

Science and practice of pressure ulcer management

9781447174134 (electronic bk.)




pr

Salt, fat and sugar reduction : sensory approaches for nutritional reformulation of foods and beverages

O'Sullivan, Maurice G., author
9780128226124 (electronic bk.)




pr

Requirements engineering : 26th International Working Conference, REFSQ 2020, Pisa, Italy, March 24-27, 2020, Proceedings

REFSQ (Conference) (26th : 2020 : Pisa, Italy)
9783030444297




pr

Radiomics and radiogenomics in neuro-oncology : First International Workshop, RNO-AI 2019, held in conjunction with MICCAI 2019, Shenzhen, China, October 13, proceedings

Radiomics and Radiogenomics in Neuro-oncology using AI Workshop (1st : 2019 : Shenzhen Shi, China)
9783030401245




pr

Progress in botany.

9783030363277 (electronic bk.)




pr

Priming-mediated stress and cross-stress tolerance in crop plants

9780128178935 (electronic bk.)




pr

Primary care for older adults : models and challenges

9783319613291




pr

Prevention of chronic diseases and age-related disability

9783319965291 (electronic bk.)




pr

Plastic waste and recycling : environmental impact, societal issues, prevention, and solutions

9780128178812 (electronic bk.)




pr

Pediatric pelvic and proximal femoral osteotomies

9783319780337 978-3-319-78033-7




pr

Passive and active measurement : 21st International Conference, PAM 2020, Eugene, Oregon, USA, March 30-31, 2020, Proceedings

PAM (Conference) (21st : 2020 : Eugene, Oregon)
9783030440817




pr

Oral rehabilitation for compromised and elderly patients

3319761293 (electronic book)




pr

Natural materials and products from insects : chemistry and applications

9783030366100 (electronic bk.)




pr

Milk proteins : from expression to food

9780128152522 (electronic bk.)




pr

Microbial endophytes : prospects for sustainable agriculture

0128187255




pr

Microalgae biotechnology for food, health and high value products

9789811501692 (electronic bk.)




pr

Maxillofacial cone beam computed tomography : principles, techniques and clinical applications

9783319620619 (electronic bk.)




pr

LGBTQ cultures : what health care professionals need to know about sexual and gender diversity

Eliason, Michele J., author.
9781496394606 paperback




pr

Ketamine : from abused drug to rapid-acting antidepressant

9789811529023




pr

Invertebrate embryology and reproduction

El-Bawab, Fatma, author.
9780128141151 (electronic bk.)




pr

Information retrieval technology : 15th Asia Information Retrieval Societies Conference, AIRS 2019, Hong Kong, China, November 7-9, 2019, proceedings

Asia Information Retrieval Societies Conference (15th : 2019 : Hong Kong, China)
9783030428358