es

A Locally Adaptive Interpretable Regression. (arXiv:2005.03350v1 [stat.ML])

Machine learning models with both good predictability and high interpretability are crucial for decision support systems. Linear regression is one of the most interpretable prediction models. However, the linearity in a simple linear regression worsens its predictability. In this work, we introduce a locally adaptive interpretable regression (LoAIR). In LoAIR, a metamodel parameterized by neural networks predicts percentile of a Gaussian distribution for the regression coefficients for a rapid adaptation. Our experimental results on public benchmark datasets show that our model not only achieves comparable or better predictive performance than the other state-of-the-art baselines but also discovers some interesting relationships between input and target variables such as a parabolic relationship between CO2 emissions and Gross National Product (GNP). Therefore, LoAIR is a step towards bridging the gap between econometrics, statistics, and machine learning by improving the predictive ability of linear regression without depreciating its interpretability.




es

Training and Classification using a Restricted Boltzmann Machine on the D-Wave 2000Q. (arXiv:2005.03247v1 [cs.LG])

Restricted Boltzmann Machine (RBM) is an energy based, undirected graphical model. It is commonly used for unsupervised and supervised machine learning. Typically, RBM is trained using contrastive divergence (CD). However, training with CD is slow and does not estimate exact gradient of log-likelihood cost function. In this work, the model expectation of gradient learning for RBM has been calculated using a quantum annealer (D-Wave 2000Q), which is much faster than Markov chain Monte Carlo (MCMC) used in CD. Training and classification results are compared with CD. The classification accuracy results indicate similar performance of both methods. Image reconstruction as well as log-likelihood calculations are used to compare the performance of quantum and classical algorithms for RBM training. It is shown that the samples obtained from quantum annealer can be used to train a RBM on a 64-bit `bars and stripes' data set with classification performance similar to a RBM trained with CD. Though training based on CD showed improved learning performance, training using a quantum annealer eliminates computationally expensive MCMC steps of CD.




es

Fast multivariate empirical cumulative distribution function with connection to kernel density estimation. (arXiv:2005.03246v1 [cs.DS])

This paper revisits the problem of computing empirical cumulative distribution functions (ECDF) efficiently on large, multivariate datasets. Computing an ECDF at one evaluation point requires $mathcal{O}(N)$ operations on a dataset composed of $N$ data points. Therefore, a direct evaluation of ECDFs at $N$ evaluation points requires a quadratic $mathcal{O}(N^2)$ operations, which is prohibitive for large-scale problems. Two fast and exact methods are proposed and compared. The first one is based on fast summation in lexicographical order, with a $mathcal{O}(N{log}N)$ complexity and requires the evaluation points to lie on a regular grid. The second one is based on the divide-and-conquer principle, with a $mathcal{O}(Nlog(N)^{(d-1){vee}1})$ complexity and requires the evaluation points to coincide with the input points. The two fast algorithms are described and detailed in the general $d$-dimensional case, and numerical experiments validate their speed and accuracy. Secondly, the paper establishes a direct connection between cumulative distribution functions and kernel density estimation (KDE) for a large class of kernels. This connection paves the way for fast exact algorithms for multivariate kernel density estimation and kernel regression. Numerical tests with the Laplacian kernel validate the speed and accuracy of the proposed algorithms. A broad range of large-scale multivariate density estimation, cumulative distribution estimation, survival function estimation and regression problems can benefit from the proposed numerical methods.




es

Classification of pediatric pneumonia using chest X-rays by functional regression. (arXiv:2005.03243v1 [stat.AP])

An accurate and prompt diagnosis of pediatric pneumonia is imperative for successful treatment intervention. One approach to diagnose pneumonia cases is using radiographic data. In this article, we propose a novel parsimonious scalar-on-image classification model adopting the ideas of functional data analysis. Our main idea is to treat images as functional measurements and exploit underlying covariance structures to select basis functions; these bases are then used in approximating both image profiles and corresponding regression coefficient. We re-express the regression model into a standard generalized linear model where the functional principal component scores are treated as covariates. We apply the method to (1) classify pneumonia against healthy and viral against bacterial pneumonia patients, and (2) test the null effect about the association between images and responses. Extensive simulation studies show excellent numerical performance in terms of classification, hypothesis testing, and efficient computation.




es

Detecting Latent Communities in Network Formation Models. (arXiv:2005.03226v1 [econ.EM])

This paper proposes a logistic undirected network formation model which allows for assortative matching on observed individual characteristics and the presence of edge-wise fixed effects. We model the coefficients of observed characteristics to have a latent community structure and the edge-wise fixed effects to be of low rank. We propose a multi-step estimation procedure involving nuclear norm regularization, sample splitting, iterative logistic regression and spectral clustering to detect the latent communities. We show that the latent communities can be exactly recovered when the expected degree of the network is of order log n or higher, where n is the number of nodes in the network. The finite sample performance of the new estimation and inference methods is illustrated through both simulated and real datasets.




es

Fractional ridge regression: a fast, interpretable reparameterization of ridge regression. (arXiv:2005.03220v1 [stat.ME])

Ridge regression (RR) is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using RR is the need to set a hyperparameter ($alpha$) that controls the amount of regularization. Cross-validation is typically used to select the best $alpha$ from a set of candidates. However, efficient and appropriate selection of $alpha$ can be challenging, particularly where large amounts of data are analyzed. Because the selected $alpha$ depends on the scale of the data and predictors, it is not straightforwardly interpretable. Here, we propose to reparameterize RR in terms of the ratio $gamma$ between the L2-norms of the regularized and unregularized coefficients. This approach, called fractional RR (FRR), has several benefits: the solutions obtained for different $gamma$ are guaranteed to vary, guarding against wasted calculations, and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. We provide an algorithm to solve FRR, as well as open-source software implementations in Python and MATLAB (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems, and delivers results that are straightforward to interpret and compare across models and datasets.




es

Efficient Characterization of Dynamic Response Variation Using Multi-Fidelity Data Fusion through Composite Neural Network. (arXiv:2005.03213v1 [stat.ML])

Uncertainties in a structure is inevitable, which generally lead to variation in dynamic response predictions. For a complex structure, brute force Monte Carlo simulation for response variation analysis is infeasible since one single run may already be computationally costly. Data driven meta-modeling approaches have thus been explored to facilitate efficient emulation and statistical inference. The performance of a meta-model hinges upon both the quality and quantity of training dataset. In actual practice, however, high-fidelity data acquired from high-dimensional finite element simulation or experiment are generally scarce, which poses significant challenge to meta-model establishment. In this research, we take advantage of the multi-level response prediction opportunity in structural dynamic analysis, i.e., acquiring rapidly a large amount of low-fidelity data from reduced-order modeling, and acquiring accurately a small amount of high-fidelity data from full-scale finite element analysis. Specifically, we formulate a composite neural network fusion approach that can fully utilize the multi-level, heterogeneous datasets obtained. It implicitly identifies the correlation of the low- and high-fidelity datasets, which yields improved accuracy when compared with the state-of-the-art. Comprehensive investigations using frequency response variation characterization as case example are carried out to demonstrate the performance.




es

Model Reduction and Neural Networks for Parametric PDEs. (arXiv:2005.03180v1 [math.NA])

We develop a general framework for data-driven approximation of input-output maps between infinite-dimensional spaces. The proposed approach is motivated by the recent successes of neural networks and deep learning, in combination with ideas from model reduction. This combination results in a neural network approximation which, in principle, is defined on infinite-dimensional spaces and, in practice, is robust to the dimension of finite-dimensional approximations of these spaces required for computation. For a class of input-output maps, and suitably chosen probability measures on the inputs, we prove convergence of the proposed approximation methodology. Numerically we demonstrate the effectiveness of the method on a class of parametric elliptic PDE problems, showing convergence and robustness of the approximation scheme with respect to the size of the discretization, and compare our method with existing algorithms from the literature.




es

MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation. (arXiv:2005.03161v1 [stat.ML])

Model Stealing (MS) attacks allow an adversary with black-box access to a Machine Learning model to replicate its functionality, compromising the confidentiality of the model. Such attacks train a clone model by using the predictions of the target model for different inputs. The effectiveness of such attacks relies heavily on the availability of data necessary to query the target model. Existing attacks either assume partial access to the dataset of the target model or availability of an alternate dataset with semantic similarities.

This paper proposes MAZE -- a data-free model stealing attack using zeroth-order gradient estimation. In contrast to prior works, MAZE does not require any data and instead creates synthetic data using a generative model. Inspired by recent works in data-free Knowledge Distillation (KD), we train the generative model using a disagreement objective to produce inputs that maximize disagreement between the clone and the target model. However, unlike the white-box setting of KD, where the gradient information is available, training a generator for model stealing requires performing black-box optimization, as it involves accessing the target model under attack. MAZE relies on zeroth-order gradient estimation to perform this optimization and enables a highly accurate MS attack.

Our evaluation with four datasets shows that MAZE provides a normalized clone accuracy in the range of 0.91x to 0.99x, and outperforms even the recent attacks that rely on partial data (JBDA, clone accuracy 0.13x to 0.69x) and surrogate data (KnockoffNets, clone accuracy 0.52x to 0.97x). We also study an extension of MAZE in the partial-data setting and develop MAZE-PD, which generates synthetic data closer to the target distribution. MAZE-PD further improves the clone accuracy (0.97x to 1.0x) and reduces the query required for the attack by 2x-24x.




es

On the Optimality of Randomization in Experimental Design: How to Randomize for Minimax Variance and Design-Based Inference. (arXiv:2005.03151v1 [stat.ME])

I study the minimax-optimal design for a two-arm controlled experiment where conditional mean outcomes may vary in a given set. When this set is permutation symmetric, the optimal design is complete randomization, and using a single partition (i.e., the design that only randomizes the treatment labels for each side of the partition) has minimax risk larger by a factor of $n-1$. More generally, the optimal design is shown to be the mixed-strategy optimal design (MSOD) of Kallus (2018). Notably, even when the set of conditional mean outcomes has structure (i.e., is not permutation symmetric), being minimax-optimal for variance still requires randomization beyond a single partition. Nonetheless, since this targets precision, it may still not ensure sufficient uniformity in randomization to enable randomization (i.e., design-based) inference by Fisher's exact test to appropriately detect violations of null. I therefore propose the inference-constrained MSOD, which is minimax-optimal among all designs subject to such uniformity constraints. On the way, I discuss Johansson et al. (2020) who recently compared rerandomization of Morgan and Rubin (2012) and the pure-strategy optimal design (PSOD) of Kallus (2018). I point out some errors therein and set straight that randomization is minimax-optimal and that the "no free lunch" theorem and example in Kallus (2018) are correct.




es

Joint Multi-Dimensional Model for Global and Time-Series Annotations. (arXiv:2005.03117v1 [cs.LG])

Crowdsourcing is a popular approach to collect annotations for unlabeled data instances. It involves collecting a large number of annotations from several, often naive untrained annotators for each data instance which are then combined to estimate the ground truth. Further, annotations for constructs such as affect are often multi-dimensional with annotators rating multiple dimensions, such as valence and arousal, for each instance. Most annotation fusion schemes however ignore this aspect and model each dimension separately. In this work we address this by proposing a generative model for multi-dimensional annotation fusion, which models the dimensions jointly leading to more accurate ground truth estimates. The model we propose is applicable to both global and time series annotation fusion problems and treats the ground truth as a latent variable distorted by the annotators. The model parameters are estimated using the Expectation-Maximization algorithm and we evaluate its performance using synthetic data and real emotion corpora as well as on an artificial task with human annotations




es

A comparison of group testing architectures for COVID-19 testing. (arXiv:2005.03051v1 [stat.ME])

An important component of every country's COVID-19 response is fast and efficient testing -- to identify and isolate cases, as well as for early detection of local hotspots. For many countries, producing a sufficient number of tests has been a serious limiting factor in their efforts to control COVID-19 infections. Group testing is a well-established mathematical tool, which can provide a serious and rapid improvement to this situation. In this note, we compare several well-established group testing schemes in the context of qPCR testing for COVID-19. We include example calculations, where we indicate which testing architectures yield the greatest efficiency gains in various settings. We find that for identification of individuals with COVID-19, array testing is usually the best choice, while for estimation of COVID-19 prevalence rates in the total population, Gibbs-Gower testing usually provides the most accurate estimates given a fixed and relatively small number of tests. This note is intended as a helpful handbook for labs implementing group testing methods.




es

Entries open for State Library’s $20,000 short film competition

Thursday 21 November 2019

The State Library of NSW is inviting entries for its short film prize Shortstacks, with a total of $20,000 on offer across two categories.




es

Entries now open for the 2020 National Biography Award

Tuesday 10 December 2019

Entries are now open for the 2020 National Biography Award – Australia's richest prize for biography and memoir writing.




es

State Library creates a new space for Aboriginal communities to connect with their cultural heritage

Thursday 20 February 2020
In an Australian first, the State Library of NSW launched a new digital space for Aboriginal communities to connect with their histories and cultures.




es

Entries open for $40,000 award for female scriptwriters

Friday 6 March 2020
Nominations opened for the 2020 Mona Brand Award for Women Stage and Screen Writers.




es

Public libraries report spike in demand for books in language

Tuesday 17 March 2020
NSW residents are reading more and more books in languages other than English than ever before with the State Library of NSW reporting a 20% increase in requests from public libraries for multicultural material just in the last 12 months.




es

mgm: Estimating Time-Varying Mixed Graphical Models in High-Dimensional Data

We present the R package mgm for the estimation of k-order mixed graphical models (MGMs) and mixed vector autoregressive (mVAR) models in high-dimensional data. These are a useful extensions of graphical models for only one variable type, since data sets consisting of mixed types of variables (continuous, count, categorical) are ubiquitous. In addition, we allow to relax the stationarity assumption of both models by introducing time-varying versions of MGMs and mVAR models based on a kernel weighting approach. Time-varying models offer a rich description of temporally evolving systems and allow to identify external influences on the model structure such as the impact of interventions. We provide the background of all implemented methods and provide fully reproducible examples that illustrate how to use the package.




es

Bayesian Random-Effects Meta-Analysis Using the bayesmeta R Package

The random-effects or normal-normal hierarchical model is commonly utilized in a wide range of meta-analysis applications. A Bayesian approach to inference is very attractive in this context, especially when a meta-analysis is based only on few studies. The bayesmeta R package provides readily accessible tools to perform Bayesian meta-analyses and generate plots and summaries, without having to worry about computational details. It allows for flexible prior specification and instant access to the resulting posterior distributions, including prediction and shrinkage estimation, and facilitating for example quick sensitivity checks. The present paper introduces the underlying theory and showcases its usage.




es

mvord: An R Package for Fitting Multivariate Ordinal Regression Models

The R package mvord implements composite likelihood estimation in the class of multivariate ordinal regression models with a multivariate probit and a multivariate logit link. A flexible modeling framework for multiple ordinal measurements on the same subject is set up, which takes into consideration the dependence among the multiple observations by employing different error structures. Heterogeneity in the error structure across the subjects can be accounted for by the package, which allows for covariate dependent error structures. In addition, different regression coefficients and threshold parameters for each response are supported. If a reduction of the parameter space is desired, constraints on the threshold as well as on the regression coefficients can be specified by the user. The proposed multivariate framework is illustrated by means of a credit risk application.




es

lmSubsets: Exact Variable-Subset Selection in Linear Regression for R

An R package for computing the all-subsets regression problem is presented. The proposed algorithms are based on computational strategies recently developed. A novel algorithm for the best-subset regression problem selects subset models based on a predetermined criterion. The package user can choose from exact and from approximation algorithms. The core of the package is written in C++ and provides an efficient implementation of all the underlying numerical computations. A case study and benchmark results illustrate the usage and the computational efficiency of the package.




es

History of Pre-Modern Medicine Seminar Series, Spring 2018

The History of Pre-Modern Medicine seminar series returns this month. The 2017–18 series – organised by a group of historians of medicine based at London universities and hosted by the Wellcome Library – will conclude with four seminars. The series… Continue reading




es

Arabo-Persian physiological theories in late Imperial China

The last seminar in the 2017–18 History of Pre-Modern Medicine seminar series takes place on Tuesday 27 February. Speaker: Dr Dror Weil (Max Planck Institute for the History of Science, Berlin) Bodies translated: the circulation of Arabo-Persian physiological theories in late… Continue reading




es

COVID-19 in-language resources




es

The Library wants your self-isolation images

The State Library launched a new collecting drive on Instagram today called #NSWathome to ensure your self-isolation images become part of the historic record.




es

The Diary Files go live

The Library has a long tradition of collecting diaries — from First World War soldiers to famous literary figures like Miles Franklin. Now we want your diary entries to become part of the collection.




es

Law Week goes digital in 2020

Get involved online in Law Week 2020.




es

Wyllie's treatment of epilepsy : principles and practice

149639769X




es

Wine science : principles and applications

Jackson, Ron S., author.
9780128161180




es

Wilderness EMS

9781496349453




es

Vertebrate and invertebrate respiratory proteins, lipoproteins and other body fluid proteins

9783030417697 (electronic bk.)




es

Upper extremity injuries in young athletes

9783319566511 (electronic bk.)




es

Trusted computing and information security : 13th Chinese conference, CTCIS 2019, Shanghai, China, October 24-27, 2019

Chinese Conference on Trusted Computing and Information Security (13th : 2019 : Shanghai, China)
9789811534188 (eBook)




es

Trends in biomedical research

9783030412197 (electronic bk.)




es

Treatment of skin diseases : a practical guide

Zaidi, Zohra, author.
9783319895819 (electronic bk.)




es

Tissue engineering : principles, protocols, and practical exercises

9783030396985




es

Theranostics approaches to gastric and colon cancer

9789811520174 (electronic bk.)




es

The science of grapevines

Keller, Markus, (horticulturist) author
9780128167021 (electronic bk.)




es

The public policy primer : managing the policy process

Wu, Xun, author.
9781315624754 (electronic bk.)




es

The evolution of feathers : from their origin to the present

9783030272234 electronic book




es

The duckweed genomes

9783030110451 (electronic bk.)




es

The Best and Worst Places to be a Woman in Canada 2019 : The Gender Gap in Canada’s 26 Biggest Cities

9781771254434 (print)




es

Terrestrial hermit crab populations in the Maldives : ecology, distribution and anthropogenic impact

Steibl, Sebastian, author
9783658295417 (electronic bk.)




es

Technology and adolescent mental health

9783319696386 (electronic bk.)




es

Systems approaches to making change : a practical guide

9781447174721 (electronic bk.)




es

Sustainable digital communities : 15th International Conference, iConference 2020, Boras, Sweden, March 23–26, 2020, Proceedings

iConference (Conference) (15th : 2020 : Boras, Sweden)
9783030436872




es

Sustainable agriculture : advances in plant metabolome and microbiome

Parray, Javid Ahmad, author
9780128173749 (electronic bk.)




es

Sowing legume seeds, reaping cash : a renaissance within communities in Sub-Saharan Africa

Akpo, Essegbemon, author.
9789811508455 (electronic bk.)




es

Science and practice of pressure ulcer management

9781447174134 (electronic bk.)




es

Salt, fat and sugar reduction : sensory approaches for nutritional reformulation of foods and beverages

O'Sullivan, Maurice G., author
9780128226124 (electronic bk.)