We gratefully acknowledge support from
the Simons Foundation and member institutions.

Artificial Intelligence

New submissions

[ total of 68 entries: 1-68 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 1 Dec 21

[1]  arXiv:2111.15108 [pdf]
Title: Interval-valued q-Rung Orthopair Fuzzy Choquet Integral Operators and Its Application in Group Decision Making
Subjects: Artificial Intelligence (cs.AI)

It is more flexible for decision makers to evaluate by interval-valued q-rung orthopair fuzzy set (IVq-ROFS),which offers fuzzy decision-making more applicational space. Meanwhile, Choquet integralses non-additive set function (fuzzy measure) to describe the interaction between attributes directly.In particular, there are a large number of practical issues that have relevance between attributes.Therefore,this paper proposes the correlation operator and group decision-making method based on the interval-valued q-rung orthopair fuzzy set Choquet integral.First,interval-valued q-rung orthopair fuzzy Choquet integral average operator (IVq-ROFCA) and interval-valued q-rung orthopair fuzzy Choquet integral geometric operator (IVq-ROFCG) are inves-tigated,and their basic properties are proved.Furthermore, several operators based on IVq-ROFCA and IVq-ROFCG are developed. Then, a group decision-making method based on IVq-ROFCA is developed,which can solve the decision making problems with interaction between attributes.Finally,through the implementation of the warning management system for hypertension,it is shown that the operator and group decision-making method proposed in this paper can handle complex decision-making cases in reality, and the decision result is consistent with the doctor's diagnosis result.Moreover,the comparison with the results of other operators shows that the proposed operators and group decision-making method are correct and effective,and the decision result will not be affected by the change of q value.

[2]  arXiv:2111.15182 [pdf, other]
Title: Easy Semantification of Bioassays
Comments: 12 pages, 5 figures, Accepted for Publication in AIxIA 2021 (this https URL)
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Digital Libraries (cs.DL); Machine Learning (cs.LG)

Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution juxtaposes the problem of automated semantification as classification versus clustering where the two methods are on opposite ends of the method complexity spectrum. Characteristically modeling our problem, we find the clustering solution significantly outperforms a deep neural network state-of-the-art classification approach. This novel contribution is based on two factors: 1) a learning objective closely modeled after the data outperforms an alternative approach with sophisticated semantic modeling; 2) automatically semantifying biological assays achieves a high performance F1 of nearly 83%, which to our knowledge is the first reported standardized evaluation of the task offering a strong benchmark model.

[3]  arXiv:2111.15445 [pdf, ps, other]
Title: Asymptotics for Pull on the Complete Graph
Subjects: Artificial Intelligence (cs.AI); Probability (math.PR)

Consider the following model to study adversarial effects on opinion forming. A set of initially selected experts form their binary opinion while being influenced by an adversary, who may convince some of them of the falsehood. All other participants in the network then take the opinion of the majority of their neighbouring experts. Can the adversary influence the experts in such a way that the majority of the network believes the falsehood? Alon et al. [1] conjectured that in this context an iterative dissemination process will always be beneficial to the adversary. This work provides a counterexample to that conjecture.
[1] N. Alon, M. Feldman, O. Lev, and M. Tennenholtz. How Robust Is the Wisdom of the Crowds? In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI 2015), pages 2055-2061, 2015.

[4]  arXiv:2111.15611 [pdf, other]
Title: The Power of Communication in a Distributed Multi-Agent System
Comments: Cooperative AI Workshop at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Single-Agent (SA) Reinforcement Learning systems have shown outstanding re-sults on non-stationary problems. However, Multi-Agent Reinforcement Learning(MARL) can surpass SA systems generally and when scaling. Furthermore, MAsystems can be super-powered by collaboration, which can happen through ob-serving others, or a communication system used to share information betweencollaborators. Here, we developed a distributed MA learning mechanism withthe ability to communicate based on decentralised partially observable Markovdecision processes (Dec-POMDPs) and Graph Neural Networks (GNNs). Minimis-ing the time and energy consumed by training Machine Learning models whileimproving performance can be achieved by collaborative MA mechanisms. Wedemonstrate this in a real-world scenario, an offshore wind farm, including a set ofdistributed wind turbines, where the objective is to maximise collective efficiency.Compared to a SA system, MA collaboration has shown significantly reducedtraining time and higher cumulative rewards in unseen and scaled scenarios.

Cross-lists for Wed, 1 Dec 21

[5]  arXiv:2106.00311 (cross-list from stat.ML) [pdf, other]
Title: What's a good imputation to predict with missing values?
Authors: Marine Le Morvan (PARIETAL, IJCLab), Julie Josse (CRISAM), Erwan Scornet (CMAP), Gaël Varoquaux (PARIETAL)
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all missing-values mechanisms, in contrast with the classic statistical results that require missing-at-random settings to use imputation in probabilistic modeling. Moreover, it implies that perfect conditional imputation is not needed for good prediction asymptotically. In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn. Crafting instead the imputation so as to leave the regression function unchanged simply shifts the problem to learning discontinuous imputations. Rather, we suggest that it is easier to learn imputation and regression jointly. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables whatever the missing-value pattern. Experiments confirm that joint imputation and regression through NeuMiss is better than various two step procedures in our experiments with finite number of samples.

[6]  arXiv:2111.14833 (cross-list from cs.LG) [pdf, other]
Title: Adversarial Attacks in Cooperative AI
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Single-agent reinforcement learning algorithms in a multi-agent environment are inadequate for fostering cooperation. If intelligent agents are to interact and work together to solve complex problems, methods that counter non-cooperative behavior are needed to facilitate the training of multiple agents. This is the goal of cooperative AI. Recent work in adversarial machine learning, however, shows that models (e.g., image classifiers) can be easily deceived into making incorrect decisions. In addition, some past research in cooperative AI has relied on new notions of representations, like public beliefs, to accelerate the learning of optimally cooperative behavior. Hence, cooperative AI might introduce new weaknesses not investigated in previous machine learning research. In this paper, our contributions include: (1) arguing that three algorithms inspired by human-like social intelligence introduce new vulnerabilities, unique to cooperative AI, that adversaries can exploit, and (2) an experiment showing that simple, adversarial perturbations on the agents' beliefs can negatively impact performance. This evidence points to the possibility that formal representations of social behavior are vulnerable to adversarial attacks.

[7]  arXiv:2111.14844 (cross-list from cs.LG) [pdf, other]
Title: Evaluation of Machine Learning Techniques for Forecast Uncertainty Quantification
Comments: preprint
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Chaotic Dynamics (nlin.CD)

Producing an accurate weather forecast and a reliable quantification of its uncertainty is an open scientific challenge. Ensemble forecasting is, so far, the most successful approach to produce relevant forecasts along with an estimation of their uncertainty. The main limitations of ensemble forecasting are the high computational cost and the difficulty to capture and quantify different sources of uncertainty, particularly those associated with model errors. In this work proof-of-concept model experiments are conducted to examine the performance of ANNs trained to predict a corrected state of the system and the state uncertainty using only a single deterministic forecast as input. We compare different training strategies: one based on a direct training using the mean and spread of an ensemble forecast as target, the other ones rely on an indirect training strategy using a deterministic forecast as target in which the uncertainty is implicitly learned from the data. For the last approach two alternative loss functions are proposed and evaluated, one based on the data observation likelihood and the other one based on a local estimation of the error. The performance of the networks is examined at different lead times and in scenarios with and without model errors. Experiments using the Lorenz'96 model show that the ANNs are able to emulate some of the properties of ensemble forecasts like the filtering of the most unpredictable modes and a state-dependent quantification of the forecast uncertainty. Moreover, ANNs provide a reliable estimation of the forecast uncertainty in the presence of model error.

[8]  arXiv:2111.14874 (cross-list from astro-ph.GA) [pdf, other]
Title: Weighing the Milky Way and Andromeda with Artificial Intelligence
Comments: 2 figures, 2 tables, 7 pages. Code publicly available at this https URL
Subjects: Astrophysics of Galaxies (astro-ph.GA); Cosmology and Nongalactic Astrophysics (astro-ph.CO); Instrumentation and Methods for Astrophysics (astro-ph.IM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We present new constraints on the masses of the halos hosting the Milky Way and Andromeda galaxies derived using graph neural networks. Our models, trained on thousands of state-of-the-art hydrodynamic simulations of the CAMELS project, only make use of the positions, velocities and stellar masses of the galaxies belonging to the halos, and are able to perform likelihood-free inference on halo masses while accounting for both cosmological and astrophysical uncertainties. Our constraints are in agreement with estimates from other traditional methods.

[9]  arXiv:2111.14911 (cross-list from cs.LG) [pdf, other]
Title: Optimizing High-Dimensional Physics Simulations via Composite Bayesian Optimization
Comments: Fourth Workshop on Machine Learning and the Physical Sciences at NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Physical simulation-based optimization is a common task in science and engineering. Many such simulations produce image- or tensor-based outputs where the desired objective is a function of those outputs, and optimization is performed over a high-dimensional parameter space. We develop a Bayesian optimization method leveraging tensor-based Gaussian process surrogates and trust region Bayesian optimization to effectively model the image outputs and to efficiently optimize these types of simulations, including a radio-frequency tower configuration problem and an optical design problem.

[10]  arXiv:2111.14932 (cross-list from cs.LG) [pdf, other]
Title: Learning with Noisy Labels by Efficient Transition Matrix Estimation to Combat Label Miscorrection
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent studies on learning with noisy labels have shown remarkable performance by exploiting a small clean dataset. In particular, model agnostic meta-learning-based label correction methods further improve performance by correcting noisy labels on the fly. However, there is no safeguard on the label miscorrection, resulting in unavoidable performance degradation. Moreover, every training step requires at least three back-propagations, significantly slowing down the training speed. To mitigate these issues, we propose a robust and efficient method that learns a label transition matrix on the fly. Employing the transition matrix makes the classifier skeptical about all the corrected samples, which alleviates the miscorrection issue. We also introduce a two-head architecture to efficiently estimate the label transition matrix every iteration within a single back-propagation, so that the estimated matrix closely follows the shifting noise distribution induced by label correction. Extensive experiments demonstrate that our approach shows the best performance in training efficiency while having comparable or better accuracy than existing methods.

[11]  arXiv:2111.14934 (cross-list from cs.GR) [pdf, other]
Title: GAN-CNMP: An Interactive Generative Drawing Tool
Comments: 9 pages, 10 figures
Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Sketches are abstract representations of visual perception and visuospatial construction. In this work, we proposed a new framework, GAN-CNMP, that incorporates a novel adversarial loss on CNMP to increase sketch smoothness and consistency. Through the experiments, we show that our model can be trained with few unlabeled samples, can construct distributions automatically in the latent space, and produces better results than the base model in terms of shape consistency and smoothness.

[12]  arXiv:2111.14938 (cross-list from cs.LG) [pdf, other]
Title: Distribution Shift in Airline Customer Behavior during COVID-19
Comments: 6 papes, 5 figures, NeurIPS 2021 Workshop on Distribution Shifts: connecting methods and applications (DistShift)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Econometrics (econ.EM)

Traditional AI approaches in customized (personalized) contextual pricing applications assume that the data distribution at the time of online pricing is similar to that observed during training. However, this assumption may be violated in practice because of the dynamic nature of customer buying patterns, particularly due to unanticipated system shocks such as COVID-19. We study the changes in customer behavior for a major airline during the COVID-19 pandemic by framing it as a covariate shift and concept drift detection problem. We identify which customers changed their travel and purchase behavior and the attributes affecting that change using (i) Fast Generalized Subset Scanning and (ii) Causal Forests. In our experiments with simulated and real-world data, we present how these two techniques can be used through qualitative analysis.

[13]  arXiv:2111.14973 (cross-list from cs.CV) [pdf, other]
Title: MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

Predicting the future behavior of road users is one of the most challenging and important problems in autonomous driving. Applying deep learning to this problem requires fusing heterogeneous world state in the form of rich perception signals and map information, and inferring highly multi-modal distributions over possible futures. In this paper, we present MultiPath++, a future prediction model that achieves state-of-the-art performance on popular benchmarks. MultiPath++ improves the MultiPath architecture by revisiting many design choices. The first key design difference is a departure from dense image-based encoding of the input world state in favor of a sparse encoding of heterogeneous scene elements: MultiPath++ consumes compact and efficient polylines to describe road features, and raw agent state information directly (e.g., position, velocity, acceleration). We propose a context-aware fusion of these elements and develop a reusable multi-context gating fusion component. Second, we reconsider the choice of pre-defined, static anchors, and develop a way to learn latent anchor embeddings end-to-end in the model. Lastly, we explore ensembling and output aggregation techniques -- common in other ML domains -- and find effective variants for our probabilistic multimodal output representation. We perform an extensive ablation on these design choices, and show that our proposed model achieves state-of-the-art performance on the Argoverse Motion Forecasting Competition and the Waymo Open Dataset Motion Prediction Challenge.

[14]  arXiv:2111.15000 (cross-list from cs.CV) [pdf, other]
Title: Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Machine learning has been widely adopted in many domains, including high-stakes applications such as healthcare, finance, and criminal justice. To address concerns of fairness, accountability and transparency, predictions made by machine learning models in these critical domains must be interpretable. One line of work approaches this challenge by integrating the power of deep neural networks and the interpretability of case-based reasoning to produce accurate yet interpretable image classification models. These models generally classify input images by comparing them with prototypes learned during training, yielding explanations in the form of "this looks like that." However, methods from this line of work use spatially rigid prototypes, which cannot explicitly account for pose variations. In this paper, we address this shortcoming by proposing a case-based interpretable neural network that provides spatially flexible prototypes, called a deformable prototypical part network (Deformable ProtoPNet). In a Deformable ProtoPNet, each prototype is made up of several prototypical parts that adaptively change their relative spatial positions depending on the input image. This enables each prototype to detect object features with a higher tolerance to spatial transformations, as the parts within a prototype are allowed to move. Consequently, a Deformable ProtoPNet can explicitly capture pose variations, improving both model accuracy and the richness of explanations provided. Compared to other case-based interpretable models using prototypes, our approach achieves competitive accuracy, gives an explanation with greater context, and is easier to train, thus enabling wider use of interpretable models for computer vision.

[15]  arXiv:2111.15013 (cross-list from cs.NI) [pdf, other]
Title: DeepCQ+: Robust and Scalable Routing with Multi-Agent Deep Reinforcement Learning for Highly Dynamic Networks
Comments: arXiv admin note: substantial text overlap with arXiv:2101.03273
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Highly dynamic mobile ad-hoc networks (MANETs) remain as one of the most challenging environments to develop and deploy robust, efficient, and scalable routing protocols. In this paper, we present DeepCQ+ routing protocol which, in a novel manner integrates emerging multi-agent deep reinforcement learning (MADRL) techniques into existing Q-learning-based routing protocols and their variants and achieves persistently higher performance across a wide range of topology and mobility configurations. While keeping the overall protocol structure of the Q-learning-based routing protocols, DeepCQ+ replaces statically configured parameterized thresholds and hand-written rules with carefully designed MADRL agents such that no configuration of such parameters is required a priori. Extensive simulation shows that DeepCQ+ yields significantly increased end-to-end throughput with lower overhead and no apparent degradation of end-to-end delays (hop counts) compared to its Q-learning based counterparts. Qualitatively, and perhaps more significantly, DeepCQ+ maintains remarkably similar performance gains under many scenarios that it was not trained for in terms of network sizes, mobility conditions, and traffic dynamics. To the best of our knowledge, this is the first successful application of the MADRL framework for the MANET routing problem that demonstrates a high degree of scalability and robustness even under environments that are outside the trained range of scenarios. This implies that our MARL-based DeepCQ+ design solution significantly improves the performance of Q-learning based CQ+ baseline approach for comparison and increases its practicality and explainability because the real-world MANET environment will likely vary outside the trained range of MANET scenarios. Additional techniques to further increase the gains in performance and scalability are discussed.

[16]  arXiv:2111.15020 (cross-list from cs.DB) [pdf, other]
Title: US-Rule: Discovering Utility-driven Sequential Rules
Comments: Preprint. 3 figures, 9 tables
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

Utility-driven mining is an important task in data science and has many applications in real life. High utility sequential pattern mining (HUSPM) is one kind of utility-driven mining. HUSPM aims to discover all sequential patterns with high utility. However, the existing algorithms of HUSPM can not provide an accurate probability to deal with some scenarios for prediction or recommendation. High-utility sequential rule mining (HUSRM) was proposed to discover all sequential rules with high utility and high confidence. There is only one algorithm proposed for HUSRM, which is not enough efficient. In this paper, we propose a faster algorithm, called US-Rule, to efficiently mine high-utility sequential rules. It utilizes rule estimated utility co-occurrence pruning strategy (REUCP) to avoid meaningless computation. To improve the efficiency on dense and long sequence datasets, four tighter upper bounds (LEEU, REEU, LERSU, RERSU) and their corresponding pruning strategies (LEEUP, REEUP, LERSUP, RERSUP) are proposed. Besides, US-Rule proposes rule estimated utility recomputing pruning strategy (REURP) to deal with sparse datasets. At last, a large number of experiments on different datasets compared to the state-of-the-art algorithm demonstrate that US-Rule can achieve better performance in terms of execution time, memory consumption and scalability.

[17]  arXiv:2111.15026 (cross-list from cs.DB) [pdf, other]
Title: Anomaly Rule Detection in Sequence Data
Comments: Preprint. 6 figures, 7 tables
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

Analyzing sequence data usually leads to the discovery of interesting patterns and then anomaly detection. In recent years, numerous frameworks and methods have been proposed to discover interesting patterns in sequence data as well as detect anomalous behavior. However, existing algorithms mainly focus on frequency-driven analytic, and they are challenging to be applied in real-world settings. In this work, we present a new anomaly detection framework called DUOS that enables Discovery of Utility-aware Outlier Sequential rules from a set of sequences. In this pattern-based anomaly detection algorithm, we incorporate both the anomalousness and utility of a group, and then introduce the concept of utility-aware outlier sequential rule (UOSR). We show that this is a more meaningful way for detecting anomalies. Besides, we propose some efficient pruning strategies w.r.t. upper bounds for mining UOSR, as well as the outlier detection. An extensive experimental study conducted on several real-world datasets shows that the proposed DUOS algorithm has a better effectiveness and efficiency. Finally, DUOS outperforms the baseline algorithm and has a suitable scalability.

[18]  arXiv:2111.15040 (cross-list from physics.med-ph) [pdf, other]
Title: X-ray Dissectography Enables Stereotography to Improve Diagnostic Performance
Authors: Chuang Niu, Ge Wang
Subjects: Medical Physics (physics.med-ph); Artificial Intelligence (cs.AI)

X-ray imaging is the most popular medical imaging technology. While x-ray radiography is rather cost-effective, tissue structures are superimposed along the x-ray paths. On the other hand, computed tomography (CT) reconstructs internal structures but CT increases radiation dose, is complicated and expensive. Here we propose "x-ray dissectography" to extract a target organ/tissue digitally from few radiographic projections for stereographic and tomographic analysis in the deep learning framework. As an exemplary embodiment, we propose a general X-ray dissectography network, a dedicated X-ray stereotography network, and the X-ray imaging systems to implement these functionalities. Our experiments show that x-ray stereography can be achieved of an isolated organ such as the lungs in this case, suggesting the feasibility of transforming conventional radiographic reading to the stereographic examination of the isolated organ, which potentially allows higher sensitivity and specificity, and even tomographic visualization of the target. With further improvements, x-ray dissectography promises to be a new x-ray imaging modality for CT-grade diagnosis at radiation dose and system cost comparable to that of radiographic or tomosynthetic imaging.

[19]  arXiv:2111.15071 (cross-list from cs.DC) [pdf, ps, other]
Title: Communication-Efficient Federated Learning via Quantized Compressed Sensing
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

In this paper, we present a communication-efficient federated learning framework inspired by quantized compressed sensing. The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server (PS). Our strategy for gradient compression is to sequentially perform block sparsification, dimensional reduction, and quantization. Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression. For accurate aggregation of the local gradients from the compressed signals at the PS, we put forth an approximate minimum mean square error (MMSE) approach for gradient reconstruction using the expectation-maximization generalized-approximate-message-passing (EM-GAMP) algorithm. Assuming Bernoulli Gaussian-mixture prior, this algorithm iteratively updates the posterior mean and variance of local gradients from the compressed signals. We also present a low-complexity approach for the gradient reconstruction. In this approach, we use the Bussgang theorem to aggregate local gradients from the compressed signals, then compute an approximate MMSE estimate of the aggregated gradient using the EM-GAMP algorithm. We also provide a convergence rate analysis of the presented framework. Using the MNIST dataset, we demonstrate that the presented framework achieves almost identical performance with the case that performs no compression, while significantly reducing communication overhead for federated learning.

[20]  arXiv:2111.15080 (cross-list from q-bio.GN) [pdf, other]
Title: SurvODE: Extrapolating Gene Expression Distribution for Early Cancer Identification
Authors: Tong Chen, Sheng Wang
Comments: 12 pages, 6 figures
Subjects: Genomics (q-bio.GN); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

With the increasingly available large-scale cancer genomics datasets, machine learning approaches have played an important role in revealing novel insights into cancer development. Existing methods have shown encouraging performance in identifying genes that are predictive for cancer survival, but are still limited in modeling the distribution over genes. Here, we proposed a novel method that can simulate the gene expression distribution at any given time point, including those that are out of the range of the observed time points. In order to model the irregular time series where each patient is one observation, we integrated a neural ordinary differential equation (neural ODE) with cox regression into our framework. We evaluated our method on eight cancer types on TCGA and observed a substantial improvement over existing approaches. Our visualization results and further analysis indicate how our method can be used to simulate expression at the early cancer stage, offering the possibility for early cancer identification.

[21]  arXiv:2111.15106 (cross-list from cs.LG) [pdf, other]
Title: MAPLE: Microprocessor A Priori for Latency Estimation
Comments: 11 pages, 3 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. As such, neural architecture search (NAS) algorithms take these two constraints into account when generating a new architecture. However, efficiency metrics such as latency are typically hardware dependent requiring the NAS algorithm to either measure or predict the architecture latency. Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. Here we propose Microprocessor A Priori for Latency Estimation MAPLE that does not rely on transfer learning or domain adaptation but instead generalizes to new hardware by incorporating a prior hardware characteristics during training. MAPLE takes advantage of a novel quantitative strategy to characterize the underlying microprocessor by measuring relevant hardware performance metrics, yielding a fine-grained and expressive hardware descriptor. Moreover, the proposed MAPLE benefits from the tightly coupled I/O between the CPU and GPU and their dependency to predict DNN latency on GPUs while measuring microprocessor performance hardware counters from the CPU feeding the GPU hardware. Through this quantitative strategy as the hardware descriptor, MAPLE can generalize to new hardware via a few shot adaptation strategy where with as few as 3 samples it exhibits a 3% improvement over state-of-the-art methods requiring as much as 10 samples. Experimental results showed that, increasing the few shot adaptation samples to 10 improves the accuracy significantly over the state-of-the-art methods by 12%. Furthermore, it was demonstrated that MAPLE exhibiting 8-10% better accuracy, on average, compared to relevant baselines at any number of adaptation samples.

[22]  arXiv:2111.15119 (cross-list from cs.CV) [pdf, other]
Title: Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust Road Extraction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Land remote sensing analysis is a crucial research in earth science. In this work, we focus on a challenging task of land analysis, i.e., automatic extraction of traffic roads from remote sensing data, which has widespread applications in urban development and expansion estimation. Nevertheless, conventional methods either only utilized the limited information of aerial images, or simply fused multimodal information (e.g., vehicle trajectories), thus cannot well recognize unconstrained roads. To facilitate this problem, we introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet), which fully benefits the complementary different modal data (i.e., aerial images and crowdsourced trajectories). Specifically, CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement. In particular, the complementary information of each modality is comprehensively extracted and dynamically propagated to enhance the representation of another modality. Extensive experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction benefiting from blending different modal data, either using image and trajectory data or image and Lidar data. From the experimental results, we observe that the proposed approach outperforms current state-of-the-art methods by large margins.

[23]  arXiv:2111.15179 (cross-list from cs.LG) [pdf, other]
Title: A Highly Effective Low-Rank Compression of Deep Neural Networks with Modified Beam-Search and Modified Stable Rank
Comments: 8 pages, 8 figures, 2 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Compression has emerged as one of the essential deep learning research topics, especially for the edge devices that have limited computation power and storage capacity. Among the main compression techniques, low-rank compression via matrix factorization has been known to have two problems. First, an extensive tuning is required. Second, the resulting compression performance is typically not impressive. In this work, we propose a low-rank compression method that utilizes a modified beam-search for an automatic rank selection and a modified stable rank for a compression-friendly training. The resulting BSR (Beam-search and Stable Rank) algorithm requires only a single hyperparameter to be tuned for the desired compression ratio. The performance of BSR in terms of accuracy and compression ratio trade-off curve turns out to be superior to the previously known low-rank compression methods. Furthermore, BSR can perform on par with or better than the state-of-the-art structured pruning methods. As with pruning, BSR can be easily combined with quantization for an additional compression.

[24]  arXiv:2111.15185 (cross-list from cs.CV) [pdf, other]
Title: SamplingAug: On the Importance of Patch Sampling Augmentation for Single Image Super-Resolution
Comments: BMVC 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

With the development of Deep Neural Networks (DNNs), plenty of methods based on DNNs have been proposed for Single Image Super-Resolution (SISR). However, existing methods mostly train the DNNs on uniformly sampled LR-HR patch pairs, which makes them fail to fully exploit informative patches within the image. In this paper, we present a simple yet effective data augmentation method. We first devise a heuristic metric to evaluate the informative importance of each patch pair. In order to reduce the computational cost for all patch pairs, we further propose to optimize the calculation of our metric by integral image, achieving about two orders of magnitude speedup. The training patch pairs are sampled according to their informative importance with our method. Extensive experiments show our sampling augmentation can consistently improve the convergence and boost the performance of various SISR architectures, including EDSR, RCAN, RDN, SRCNN and ESPCN across different scaling factors (x2, x3, x4). Code is available at https://github.com/littlepure2333/SamplingAug

[25]  arXiv:2111.15205 (cross-list from cs.CR) [pdf, other]
Title: New Datasets for Dynamic Malware Classification
Comments: 5 pages, 2 figures, 6 tables
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Nowadays, malware and malware incidents are increasing daily, even with various anti-viruses systems and malware detection or classification methodologies. Many static, dynamic, and hybrid techniques have been presented to detect malware and classify them into malware families. Dynamic and hybrid malware classification methods have advantages over static malware classification methods by being highly efficient. Since it is difficult to mask malware behavior while executing than its underlying code in static malware classification, machine learning techniques have been the main focus of the security experts to detect malware and determine their families dynamically. The rapid increase of malware also brings the necessity of recent and updated datasets of malicious software. We introduce two new, updated datasets in this work: One with 9,795 samples obtained and compiled from VirusSamples and the one with 14,616 samples from VirusShare. This paper also analyzes multi-class malware classification performance of the balanced and imbalanced version of these two datasets by using Histogram-based gradient boosting, Random Forest, Support Vector Machine, and XGBoost models with API call-based dynamic malware classification. Results show that Support Vector Machine, achieves the highest score of 94% in the imbalanced VirusSample dataset, whereas the same model has 91% accuracy in the balanced VirusSample dataset. While XGBoost, one of the most common gradient boosting-based models, achieves the highest score of 90% and 80%.in both versions of the VirusShare dataset. This paper also presents the baseline results of VirusShare and VirusSample datasets by using the four most widely known machine learning techniques in dynamic malware classification literature. We believe that these two datasets and baseline results enable researchers in this field to test and validate their methods and approaches.

[26]  arXiv:2111.15208 (cross-list from cs.CV) [pdf]
Title: HRNET: AI on Edge for mask detection and social distancing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The purpose of the paper is to provide innovative emerging technology framework for community to combat epidemic situations. The paper proposes a unique outbreak response system framework based on artificial intelligence and edge computing for citizen centric services to help track and trace people eluding safety policies like mask detection and social distancing measure in public or workplace setup. The framework further provides implementation guideline in industrial setup as well for governance and contact tracing tasks. The adoption will thus lead in smart city planning and development focusing on citizen health systems contributing to improved quality of life. The conceptual framework presented is validated through quantitative data analysis via secondary data collection from researcher's public websites, GitHub repositories and renowned journals and further benchmarking were conducted for experimental results in Microsoft Azure cloud environment. The study includes selective AI-models for benchmark analysis and were assessed on performance and accuracy in edge computing environment for large scale societal setup. Overall YOLO model Outperforms in object detection task and is faster enough for mask detection and HRNetV2 outperform semantic segmentation problem applied to solve social distancing task in AI-Edge inferencing environmental setup. The paper proposes new Edge-AI algorithm for building technology-oriented solutions for detecting mask in human movement and social distance. The paper enriches the technological advancement in artificial intelligence and edge-computing applied to problems in society and healthcare systems. The framework further equips government agency, system providers to design and constructs technology-oriented models in community setup to Increase the quality of life using emerging technologies into smart urban environments.

[27]  arXiv:2111.15210 (cross-list from cs.CV) [pdf, other]
Title: Point Cloud Instance Segmentation with Semi-supervised Bounding-Box Mining
Comments: IEEE Trans on Pattern Analysis and Machine Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Point cloud instance segmentation has achieved huge progress with the emergence of deep learning. However, these methods are usually data-hungry with expensive and time-consuming dense point cloud annotations. To alleviate the annotation cost, unlabeled or weakly labeled data is still less explored in the task. In this paper, we introduce the first semi-supervised point cloud instance segmentation framework (SPIB) using both labeled and unlabelled bounding boxes as supervision. To be specific, our SPIB architecture involves a two-stage learning procedure. For stage one, a bounding box proposal generation network is trained under a semi-supervised setting with perturbation consistency regularization (SPCR). The regularization works by enforcing an invariance of the bounding box predictions over different perturbations applied to the input point clouds, to provide self-supervision for network learning. For stage two, the bounding box proposals with SPCR are grouped into some subsets, and the instance masks are mined inside each subset with a novel semantic propagation module and a property consistency graph module. Moreover, we introduce a novel occupancy ratio guided refinement module to refine the instance masks. Extensive experiments on the challenging ScanNet v2 dataset demonstrate our method can achieve competitive performance compared with the recent fully-supervised methods.

[28]  arXiv:2111.15246 (cross-list from cs.CV) [pdf, other]
Title: Hallucinated Neural Radiance Fields in the Wild
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Neural Radiance Fields (NeRF) has recently gained popularity for its impressive novel view synthesis ability. This paper studies the problem of hallucinated NeRF: i.e. recovering a realistic NeRF at a different time of day from a group of tourism images. Existing solutions adopt NeRF with a controllable appearance embedding to render novel views under various conditions, but cannot render view-consistent images with an unseen appearance. To solve this problem, we present an end-to-end framework for constructing a hallucinated NeRF, dubbed as H-NeRF. Specifically, we propose an appearance hallucination module to handle time-varying appearances and transfer them to novel views. Considering the complex occlusions of tourism images, an anti-occlusion module is introduced to decompose the static subjects for visibility accurately. Experimental results on synthetic data and real tourism photo collections demonstrate that our method can not only hallucinate the desired appearances, but also render occlusion-free images from different views. The project and supplementary materials are available at https://rover-xingyu.github.io/H-NeRF/.

[29]  arXiv:2111.15255 (cross-list from eess.SY) [pdf]
Title: Double Fuzzy Probabilistic Interval Linguistic Term Set and a Dynamic Fuzzy Decision Making Model based on Markov Process with tts Application in Multiple Criteria Group Decision Making
Authors: Zongmin Liu
Comments: submitted to IEEE Transactions on Fuzzy Systems
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); General Economics (econ.GN); Optimization and Control (math.OC)

The probabilistic linguistic term has been proposed to deal with probability distributions in provided linguistic evaluations. However, because it has some fundamental defects, it is often difficult for decision-makers to get reasonable information of linguistic evaluations for group decision making. In addition, weight information plays a significant role in dynamic information fusion and decision making process. However, there are few research methods to determine the dynamic attribute weight with time. In this paper, I propose the concept of double fuzzy probability interval linguistic term set (DFPILTS). Firstly, fuzzy semantic integration, DFPILTS definition, its preference relationship, some basic algorithms and aggregation operators are defined. Then, a fuzzy linguistic Markov matrix with its network is developed. Then, a weight determination method based on distance measure and information entropy to reducing the inconsistency of DFPILPR and obtain collective priority vector based on group consensus is developed. Finally, an aggregation-based approach is developed, and an optimal investment case from a financial risk is used to illustrate the application of DFPILTS and decision method in multi-criteria decision making.

[30]  arXiv:2111.15257 (cross-list from cs.CV) [pdf, other]
Title: ARTSeg: Employing Attention for Thermal images Semantic Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The research advancements have made the neural network algorithms deployed in the autonomous vehicle to perceive the surrounding. The standard exteroceptive sensors that are utilized for the perception of the environment are cameras and Lidar. Therefore, the neural network algorithms developed using these exteroceptive sensors have provided the necessary solution for the autonomous vehicle's perception. One major drawback of these exteroceptive sensors is their operability in adverse weather conditions, for instance, low illumination and night conditions. The useability and affordability of thermal cameras in the sensor suite of the autonomous vehicle provide the necessary improvement in the autonomous vehicle's perception in adverse weather conditions. The semantics of the environment benefits the robust perception, which can be achieved by segmenting different objects in the scene. In this work, we have employed the thermal camera for semantic segmentation. We have designed an attention-based Recurrent Convolution Network (RCNN) encoder-decoder architecture named ARTSeg for thermal semantic segmentation. The main contribution of this work is the design of encoder-decoder architecture, which employ units of RCNN for each encoder and decoder block. Furthermore, additive attention is employed in the decoder module to retain high-resolution features and improve the localization of features. The efficacy of the proposed method is evaluated on the available public dataset, showing better performance with other state-of-the-art methods in mean intersection over union (IoU).

[31]  arXiv:2111.15275 (cross-list from q-bio.NC) [pdf, other]
Title: Emotions as abstract evaluation criteria in biological and artificial intelligences
Authors: Claudius Gros
Comments: Frontiers in Computational Neuroscience (in press). arXiv admin note: substantial text overlap with arXiv:1909.11700
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)

Biological as well as advanced artificial intelligences (AIs) need to decide which goals to pursue. We review nature's solution to the time allocation problem, which is based on a continuously readjusted categorical weighting mechanism we experience introspectively as emotions. One observes phylogenetically that the available number of emotional states increases hand in hand with the cognitive capabilities of animals and that raising levels of intelligence entail ever larger sets of behavioral options. Our ability to experience a multitude of potentially conflicting feelings is in this view not a leftover of a more primitive heritage, but a generic mechanism for attributing values to behavioral options that can not be specified at birth. In this view, emotions are essential for understanding the mind.
For concreteness, we propose and discuss a framework which mimics emotions on a functional level. Based on time allocation via emotional stationarity (TAES), emotions are implemented as abstract criteria, such as satisfaction, challenge and boredom, which serve to evaluate activities that have been carried out. The resulting timeline of experienced emotions is compared with the `character' of the agent, which is defined in terms of a preferred distribution of emotional states. The long-term goal of the agent, to align experience with character, is achieved by optimizing the frequency for selecting individual tasks. Upon optimization, the statistics of emotion experience becomes stationary.

[32]  arXiv:2111.15323 (cross-list from math.GT) [pdf, other]
Title: The signature and cusp geometry of hyperbolic knots
Comments: 26 pages, 12 figures
Subjects: Geometric Topology (math.GT); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We introduce a new real-valued invariant called the natural slope of a hyperbolic knot in the 3-sphere, which is defined in terms of its cusp geometry. We show that twice the knot signature and the natural slope differ by at most a constant times the hyperbolic volume divided by the cube of the injectivity radius. This inequality was discovered using machine learning to detect relationships between various knot invariants. It has applications to Dehn surgery and to 4-ball genus. We also show a refined version of the inequality where the upper bound is a linear function of the volume, and the slope is corrected by terms corresponding to short geodesics that link the knot an odd number of times.

[33]  arXiv:2111.15361 (cross-list from cs.CV) [pdf, other]
Title: Seeking Salient Facial Regions for Cross-Database Micro-Expression Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

This paper focuses on the research of cross-database micro-expression recognition, in which the training and test micro-expression samples belong to different microexpression databases. Mismatched feature distributions between the training and testing micro-expression feature degrade the performance of most well-performing micro-expression methods. To deal with cross-database micro-expression recognition, we propose a novel domain adaption method called Transfer Group Sparse Regression (TGSR). TGSR learns a sparse regression matrix for selecting salient facial local regions and the corresponding relationship of the training set and test set. We evaluate our TGSR model in CASME II and SMIC databases. Experimental results show that the proposed TGSR achieves satisfactory performance and outperforms most state-of-the-art subspace learning-based domain adaption methods.

[34]  arXiv:2111.15366 (cross-list from cs.LG) [pdf, other]
Title: AI and the Everything in the Whole Wide World Benchmark
Comments: Accepted in NeurIPS 2021 Benchmarks and Datasets track
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Performance (cs.PF)

There is a tendency across different subfields in AI to valorize a small collection of influential benchmarks. These benchmarks operate as stand-ins for a range of anointed common problems that are frequently framed as foundational milestones on the path towards flexible and generalizable AI systems. State-of-the-art performance on these benchmarks is widely understood as indicative of progress towards these long-term goals. In this position paper, we explore the limits of such benchmarks in order to reveal the construct validity issues in their framing as the functionally "general" broad measures of progress they are set up to be.

[35]  arXiv:2111.15382 (cross-list from cs.LG) [pdf, other]
Title: Continuous Control With Ensemble Deep Deterministic Policy Gradients
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The growth of deep reinforcement learning (RL) has brought multiple exciting tools and methods to the field. This rapid expansion makes it important to understand the interplay between individual elements of the RL toolbox. We approach this task from an empirical perspective by conducting a study in the continuous control setting. We present multiple insights of fundamental nature, including: an average of multiple actors trained from the same data boosts performance; the existing methods are unstable across training runs, epochs of training, and evaluation runs; a commonly used additive action noise is not required for effective training; a strategy based on posterior sampling explores better than the approximated UCB combined with the weighted Bellman backup; the weighted Bellman backup alone cannot replace the clipped double Q-Learning; the critics' initialization plays the major role in ensemble-based actor-critic exploration. As a conclusion, we show how existing tools can be brought together in a novel way, giving rise to the Ensemble Deep Deterministic Policy Gradients (ED2) method, to yield state-of-the-art results on continuous control tasks from OpenAI Gym MuJoCo. From the practical side, ED2 is conceptually straightforward, easy to code, and does not require knowledge outside of the existing RL toolbox.

[36]  arXiv:2111.15446 (cross-list from cs.CR) [pdf, other]
Title: TEGDetector: A Phishing Detector that Knows Evolving Transaction Behaviors
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Recently, phishing scams have posed a significant threat to blockchains. Phishing detectors direct their efforts in hunting phishing addresses. Most of the detectors extract target addresses' transaction behavior features by random walking or constructing static subgraphs. The random walking methods,unfortunately, usually miss structural information due to limited sampling sequence length, while the static subgraph methods tend to ignore temporal features lying in the evolving transaction behaviors. More importantly, their performance undergoes severe degradation when the malicious users intentionally hide phishing behaviors. To address these challenges, we propose TEGDetector, a dynamic graph classifier that learns the evolving behavior features from transaction evolution graphs (TEGs). First, we cast the transaction series into multiple time slices, capturing the target address's transaction behaviors in different periods. Then, we provide a fast non-parametric phishing detector to narrow down the search space of suspicious addresses. Finally, TEGDetector considers both the spatial and temporal evolutions towards a complete characterization of the evolving transaction behaviors. Moreover, TEGDetector utilizes adaptively learnt time coefficient to pay distinct attention to different periods, which provides several novel insights. Extensive experiments on the large-scale Ethereum transaction dataset demonstrate that the proposed method achieves state-of-the-art detection performance.

[37]  arXiv:2111.15527 (cross-list from cs.LG) [pdf, other]
Title: Embedding Principle: a hierarchical structure of loss landscape of deep neural networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We prove a general Embedding Principle of loss landscape of deep neural networks (NNs) that unravels a hierarchical structure of the loss landscape of NNs, i.e., loss landscape of an NN contains all critical points of all the narrower NNs. This result is obtained by constructing a class of critical embeddings which map any critical point of a narrower NN to a critical point of the target NN with the same output function. By discovering a wide class of general compatible critical embeddings, we provide a gross estimate of the dimension of critical submanifolds embedded from critical points of narrower NNs. We further prove an irreversiblility property of any critical embedding that the number of negative/zero/positive eigenvalues of the Hessian matrix of a critical point may increase but never decrease as an NN becomes wider through the embedding. Using a special realization of general compatible critical embedding, we prove a stringent necessary condition for being a "truly-bad" critical point that never becomes a strict-saddle point through any critical embedding. This result implies the commonplace of strict-saddle points in wide NNs, which may be an important reason underlying the easy optimization of wide NNs widely observed in practice.

[38]  arXiv:2111.15626 (cross-list from eess.SP) [pdf, other]
Title: Variational Autoencoders for Studying the Manifold of Precoding Matrices with High Spectral Efficiency
Authors: Evgeny Bobrov (1 and 2), Alexander Markov (3), Dmitry Vetrov (3) ((1) Moscow Research Center, Huawei Technologies, Russia, (2) M. V. Lomonosov Moscow State University, Russia, (3) National Research University Higher School of Economics, Russia)
Comments: 5 pages, 1 figure
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Information Theory (cs.IT)

In multiple-input multiple-output (MIMO) wireless communications systems, neural networks have been employed for channel decoding, detection, channel estimation, and resource management. In this paper, we look at how to use a variational autoencoder to find a precoding matrix with a high Spectral Efficiency (SE). To identify efficient precoding matrices, an optimization approach is used. Our objective is to create a less time-consuming algorithm with minimum quality degradation. To build precoding matrices, we employed two forms of variational autoencoders: conventional variational autoencoders (VAE) and conditional variational autoencoders (CVAE). Both methods may be used to study a wide range of optimal precoding matrices. To the best of our knowledge, the development of precoding matrices for the spectral efficiency objective function (SE) utilising VAE and CVAE methods is being published for the first time.

[39]  arXiv:2111.15636 (cross-list from eess.SP) [pdf]
Title: Generating gapless land surface temperature with a high spatio-temporal resolution by fusing multi-source satellite-observed and model-simulated data
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Applications (stat.AP)

Land surface temperature (LST) is a key parameter when monitoring land surface processes. However, cloud contamination and the tradeoff between the spatial and temporal resolutions greatly impede the access to high-quality thermal infrared (TIR) remote sensing data. Despite the massive efforts made to solve these dilemmas, it is still difficult to generate LST estimates with concurrent spatial completeness and a high spatio-temporal resolution. Land surface models (LSMs) can be used to simulate gapless LST with a high temporal resolution, but this usually comes with a low spatial resolution. In this paper, we present an integrated temperature fusion framework for satellite-observed and LSM-simulated LST data to map gapless LST at a 60-m spatial resolution and half-hourly temporal resolution. The global linear model (GloLM) model and the diurnal land surface temperature cycle (DTC) model are respectively performed as preprocessing steps for sensor and temporal normalization between the different LST data. The Landsat LST, Moderate Resolution Imaging Spectroradiometer (MODIS) LST, and Community Land Model Version 5.0 (CLM 5.0)-simulated LST are then fused using a filter-based spatio-temporal integrated fusion model. Evaluations were implemented in an urban-dominated region (the city of Wuhan in China) and a natural-dominated region (the Heihe River Basin in China), in terms of accuracy, spatial variability, and diurnal temporal dynamics. Results indicate that the fused LST is highly consistent with actual Landsat LST data (in situ LST measurements), in terms of a Pearson correlation coefficient of 0.94 (0.97-0.99), a mean absolute error of 0.71-0.98 K (0.82-3.17 K), and a root-mean-square error of 0.97-1.26 K (1.09-3.97 K).

[40]  arXiv:2111.15664 (cross-list from cs.LG) [pdf, other]
Title: Donut: Document Understanding Transformer without OCR
Comments: 12 pages, 6 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Understanding document images (e.g., invoices) has been an important research topic and has many applications in document processing automation. Through the latest advances in deep learning-based Optical Character Recognition (OCR), current Visual Document Understanding (VDU) systems have come to be designed based on OCR. Although such OCR-based approach promise reasonable performance, they suffer from critical problems induced by the OCR, e.g., (1) expensive computational costs and (2) performance degradation due to the OCR error propagation. In this paper, we propose a novel VDU model that is end-to-end trainable without underpinning OCR framework. To this end, we propose a new task and a synthetic document image generator to pre-train the model to mitigate the dependencies on large-scale real document images. Our approach achieves state-of-the-art performance on various document understanding tasks in public benchmark datasets and private industrial service datasets. Through extensive experiments and analysis, we demonstrate the effectiveness of the proposed model especially with consideration for a real-world application.

Replacements for Wed, 1 Dec 21

[41]  arXiv:2101.03805 (replaced) [pdf, other]
Title: A Conflict-Based Search Framework for Multi-Objective Multi-Agent Path Finding
Comments: 11 pages, preliminary version published in ICRA 2021, journal version submitted
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)
[42]  arXiv:2110.04507 (replaced) [pdf, other]
Title: TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations
Subjects: Artificial Intelligence (cs.AI)
[43]  arXiv:2111.10046 (replaced) [pdf, other]
Title: YMIR: A Rapid Data-centric Development Platform for Vision Applications
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[44]  arXiv:2111.12210 (replaced) [pdf, other]
Title: From Kepler to Newton: Explainable AI for Science Discovery
Comments: 14 pages, 8 figures, 6 tables
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Symbolic Computation (cs.SC)
[45]  arXiv:1810.02244 (replaced) [pdf, other]
Title: Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Comments: Extended version with proofs, accepted at AAAI 2019, added units of measurement of QM9 dataset into appendix, removed results from Wu et al., 2018 due to different units
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[46]  arXiv:2009.07734 (replaced) [pdf, other]
Title: TreeGAN: Incorporating Class Hierarchy into Image Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[47]  arXiv:2101.03581 (replaced) [pdf, other]
Title: Curvature-based Feature Selection with Application in Classifying Electronic Health Records
Comments: Accepted by Technological Forecasting and Social Change; Source code available
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[48]  arXiv:2101.11174 (replaced) [pdf, other]
Title: Graph Neural Network for Traffic Forecasting: A Survey
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[49]  arXiv:2104.06744 (replaced) [pdf, other]
Title: Defending Against Adversarial Denial-of-Service Data Poisoning Attacks
Comments: Published at ACSAC DYNAMICS 2020
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[50]  arXiv:2104.07639 (replaced) [pdf, other]
Title: Robust Optimization for Multilingual Translation with Imbalanced Data
Authors: Xian Li, Hongyu Gong
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[51]  arXiv:2105.14557 (replaced) [pdf, other]
Title: Robust Dynamic Network Embedding via Ensembles
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[52]  arXiv:2106.15535 (replaced) [pdf, other]
Title: Subgroup Generalization and Fairness of Graph Neural Networks
Comments: NeurIPS 2021 Spotlight
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[53]  arXiv:2107.02170 (replaced) [pdf, other]
Title: On Model Calibration for Long-Tailed Object Detection and Instance Segmentation
Comments: Accepted to NeurIPS 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[54]  arXiv:2107.02233 (replaced) [pdf, other]
Title: End-to-End Weak Supervision
Comments: Code URL: this https URL
Journal-ref: Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[55]  arXiv:2110.09714 (replaced) [pdf, other]
Title: Black-box Adversarial Attacks on Commercial Speech Platforms with Minimal Information
Comments: A version of this paper appears in the proceedings of the 28th ACM Conference on Computer and Communications Security (CCS 2021). The notes in Tables 1 and 4 have been updated
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[56]  arXiv:2110.15327 (replaced) [pdf, other]
Title: MEGAN: Memory Enhanced Graph Attention Network for Space-Time Video Super-Resolution
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[57]  arXiv:2111.06206 (replaced) [pdf, other]
Title: Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[58]  arXiv:2111.07819 (replaced) [pdf, ps, other]
Title: Testing the Generalization of Neural Language Models for COVID-19 Misinformation Detection
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[59]  arXiv:2111.10991 (replaced) [pdf, other]
Title: Backdoor Attack through Frequency Domain
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[60]  arXiv:2111.12126 (replaced) [pdf, other]
Title: Panoptic Segmentation Meets Remote Sensing
Comments: 40 pages, 10 figures, submitted to journal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Databases (cs.DB)
[61]  arXiv:2111.12929 (replaced) [pdf, other]
Title: Unbiased Pairwise Learning to Rank in Recommender Systems
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
[62]  arXiv:2111.13295 (replaced) [pdf, other]
Title: Medial Spectral Coordinates for 3D Shape Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[63]  arXiv:2111.13475 (replaced) [pdf, other]
Title: QMagFace: Simple and Accurate Quality-Aware Face Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[64]  arXiv:2111.14282 (replaced) [pdf, other]
Title: Customer Sentiment Analysis using Weak Supervision for Customer-Agent Chat
Authors: Navdeep Jain
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[65]  arXiv:2111.14562 (replaced) [pdf, other]
Title: Instance-wise Occlusion and Depth Orders in Natural Scenes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[66]  arXiv:2111.14693 (replaced) [pdf, other]
Title: SAGCI-System: Towards Sample-Efficient, Generalizable, Compositional, and Incremental Robot Learning
Comments: Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2022
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[67]  arXiv:2111.14746 (replaced) [pdf, other]
Title: Dynamic Inference
Authors: Aolin Xu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Statistics Theory (math.ST)
[68]  arXiv:2111.14799 (replaced) [pdf, other]
Title: UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[ total of 68 entries: 1-68 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2111, contact, help  (Access key information)