Cryptography and Security
New submissions
[ showing up to 2000 entries per page: fewer | more ]
New submissions for Tue, 19 Mar 24
- [1] arXiv:2403.10562 [pdf, other]
-
Title: Counter-Samples: A Stateless Strategy to Neutralize Black Box Adversarial AttacksSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Our paper presents a novel defence against black box attacks, where attackers use the victim model as an oracle to craft their adversarial examples. Unlike traditional preprocessing defences that rely on sanitizing input samples, our stateless strategy counters the attack process itself. For every query we evaluate a counter-sample instead, where the counter-sample is the original sample optimized against the attacker's objective. By countering every black box query with a targeted white box optimization, our strategy effectively introduces an asymmetry to the game to the defender's advantage. This defence not only effectively misleads the attacker's search for an adversarial example, it also preserves the model's accuracy on legitimate inputs and is generic to multiple types of attacks.
We demonstrate that our approach is remarkably effective against state-of-the-art black box attacks and outperforms existing defences for both the CIFAR-10 and ImageNet datasets. Additionally, we also show that the proposed defence is robust against strong adversaries as well. - [2] arXiv:2403.10570 [pdf, other]
-
Title: Symbiotic Game and Foundation Models for Cyber Deception Operations in Strategic Cyber WarfareSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
We are currently facing unprecedented cyber warfare with the rapid evolution of tactics, increasing asymmetry of intelligence, and the growing accessibility of hacking tools. In this landscape, cyber deception emerges as a critical component of our defense strategy against increasingly sophisticated attacks. This chapter aims to highlight the pivotal role of game-theoretic models and foundation models (FMs) in analyzing, designing, and implementing cyber deception tactics. Game models (GMs) serve as a foundational framework for modeling diverse adversarial interactions, allowing us to encapsulate both adversarial knowledge and domain-specific insights. Meanwhile, FMs serve as the building blocks for creating tailored machine learning models suited to given applications. By leveraging the synergy between GMs and FMs, we can advance proactive and automated cyber defense mechanisms by not only securing our networks against attacks but also enhancing their resilience against well-planned operations. This chapter discusses the games at the tactical, operational, and strategic levels of warfare, delves into the symbiotic relationship between these methodologies, and explores relevant applications where such a framework can make a substantial impact in cybersecurity. The chapter discusses the promising direction of the multi-agent neurosymbolic conjectural learning (MANSCOL), which allows the defender to predict adversarial behaviors, design adaptive defensive deception tactics, and synthesize knowledge for the operational level synthesis and adaptation. FMs serve as pivotal tools across various functions for MANSCOL, including reinforcement learning, knowledge assimilation, formation of conjectures, and contextual representation. This chapter concludes with a discussion of the challenges associated with FMs and their application in the domain of cybersecurity.
- [3] arXiv:2403.10576 [pdf, other]
-
Title: Ignore Me But Don't Replace Me: Utilizing Non-Linguistic Elements for Pretraining on the Cybersecurity DomainComments: To appear in NAACL Findings 2024Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cybersecurity information is often technically complex and relayed through unstructured text, making automation of cyber threat intelligence highly challenging. For such text domains that involve high levels of expertise, pretraining on in-domain corpora has been a popular method for language models to obtain domain expertise. However, cybersecurity texts often contain non-linguistic elements (such as URLs and hash values) that could be unsuitable with the established pretraining methodologies. Previous work in other domains have removed or filtered such text as noise, but the effectiveness of these methods have not been investigated, especially in the cybersecurity domain. We propose different pretraining methodologies and evaluate their effectiveness through downstream tasks and probing tasks. Our proposed strategy (selective MLM and jointly training NLE token classification) outperforms the commonly taken approach of replacing non-linguistic elements (NLEs). We use our domain-customized methodology to train CyBERTuned, a cybersecurity domain language model that outperforms other cybersecurity PLMs on most tasks.
- [4] arXiv:2403.10583 [pdf, ps, other]
-
Title: Bitcoin MiCA WhitepaperComments: 32 pagesSubjects: Cryptography and Security (cs.CR); Emerging Technologies (cs.ET)
This document is written as an academic exercise, with the goal of exploring the feasibility of writing a white paper in accordance with Regulation (EU) 2023/1114 (MiCA). It is meant as a Proof of Concept (PoC) illustrating a concrete application of the requirements of MiCA. Like the MiCA white papers PoC shared by ESMA, this document is solely for the purposes of the PoC, to inform the public as to how a crypto-asset white paper could work, inspire public debate and feedback, and enhance the public conversation around the implementation of EU regulations.
- [5] arXiv:2403.10659 [pdf, other]
-
Title: Towards Practical Fabrication Stage Attacks Using Interrupt-Resilient Hardware TrojansSubjects: Cryptography and Security (cs.CR)
We introduce a new class of hardware trojans called interrupt-resilient trojans (IRTs). Our work is motivated by the observation that hardware trojan attacks on CPUs, even under favorable attack scenarios (e.g., an attacker with local system access), are affected by unpredictability due to non-deterministic context switching events. As we confirm experimentally, these events can lead to race conditions between trigger signals and the CPU events targeted by the trojan payloads (e.g., a CPU memory access), thus affecting the reliability of the attacks. Our work shows that interrupt-resilient trojans can successfully address the problem of non-deterministic triggering in CPUs, thereby providing high reliability guarantees in the implementation of sophisticated hardware trojan attacks. Specifically, we successfully utilize IRTs in different attack scenarios against a Linux-capable CPU design and showcase its resilience against context-switching events. More importantly, we show that our design allows for seamless integration during fabrication stage attacks.We evaluate different strategies for the implementation of our attacks on a tape-out ready high-speed RISC-V microarchitecture in a 28nm commercial technology process and successfully implement them with an average overhead delay of only 20 picoseconds, while leaving the sign-off characteristics of the layout intact. In doing so, we challenge the common wisdom regarding the low flexibility of late supply chain stages (e.g., fabrication) for the insertion of powerful trojans. To promote further research on microprocessor trojans, we open-source our designs and provide the accompanying supporting software logic.
- [6] arXiv:2403.10663 [pdf, other]
-
Title: Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View DataSubjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
With the increasing prevalence of Machine Learning as a Service (MLaaS) platforms, there is a growing focus on deep neural network (DNN) watermarking techniques. These methods are used to facilitate the verification of ownership for a target DNN model to protect intellectual property. One of the most widely employed watermarking techniques involves embedding a trigger set into the source model. Unfortunately, existing methodologies based on trigger sets are still susceptible to functionality-stealing attacks, potentially enabling adversaries to steal the functionality of the source model without a reliable means of verifying ownership. In this paper, we first introduce a novel perspective on trigger set-based watermarking methods from a feature learning perspective. Specifically, we demonstrate that by selecting data exhibiting multiple features, also referred to as $\textit{multi-view data}$, it becomes feasible to effectively defend functionality stealing attacks. Based on this perspective, we introduce a novel watermarking technique based on Multi-view dATa, called MAT, for efficiently embedding watermarks within DNNs. This approach involves constructing a trigger set with multi-view data and incorporating a simple feature-based regularization method for training the source model. We validate our method across various benchmarks and demonstrate its efficacy in defending against model extraction attacks, surpassing relevant baselines by a significant margin.
- [7] arXiv:2403.10789 [pdf, other]
-
Title: Adversarial Knapsack and Secondary Effects of Common Information for Cyber OperationsComments: 26 pagesSubjects: Cryptography and Security (cs.CR)
Variations of the Flip-It game have been applied to model network cyber operations. While Flip-It can accurately express uncertainty and loss of control, it imposes no essential resource constraints for operations. Capture the flag (CTF) style competitive games, such as Flip-It , entail uncertainties and loss of control, but also impose realistic constraints on resource use. As such, they bear a closer resemblance to actual cyber operations. We formalize a dynamical network control game for CTF competitions and detail the static game for each time step. The static game can be reformulated as instances of a novel optimization problem called Adversarial Knapsack (AK) or Dueling Knapsack (DK) when there are only two players. We define the Adversarial Knapsack optimization problems as a system of interacting Weighted Knapsack problems, and illustrate its applications to general scenarios involving multiple agents with conflicting optimization goals, e.g., cyber operations and CTF games in particular. Common awareness of the scenario, rewards, and costs will set the stage for a non-cooperative game. Critically, rational players may second guess that their AK solution -- with a better response and higher reward -- is possible if opponents predictably play their AK optimal solutions. Thus, secondary reasoning which such as belief modeling of opponents play can be anticipated for rational players and will introduce a type of non-stability where players maneuver for slight reward differentials. To analyze this, we provide the best-response algorithms and simulation software to consider how rational agents may heuristically search for maneuvers. We further summarize insights offered by the game model by predicting that metrics such as Common Vulnerability Scoring System (CVSS) may intensify the secondary reasoning in cyber operations.
- [8] arXiv:2403.10828 [pdf, other]
-
Title: Data Availability and Decentralization: New Techniques for zk-Rollups in Layer 2 Blockchain NetworksSubjects: Cryptography and Security (cs.CR)
The scalability limitations of public blockchains have hindered their widespread adoption in real-world applications. While the Ethereum community is pushing forward in zk-rollup (zero-knowledge rollup) solutions, such as introducing the ``blob transaction'' in EIP-4844, Layer 2 networks encounter a data availability problem: storing transactions completely off-chain poses a risk of data loss, particularly when Layer 2 nodes are untrusted. Additionally, building Layer 2 blocks requires significant computational power, compromising the decentralization aspect of Layer 2 networks.
This paper introduces new techniques to address the data availability and decentralization challenges in Layer 2 networks. To ensure data availability, we introduce the concept of ``proof of download'', which ensures that Layer 2 nodes cannot aggregate transactions without downloading historical data. Additionally, we design a ``proof of storage'' scheme that punishes nodes who maliciously delete historical data. For decentralization, we introduce a new role separation for Layer 2, allowing nodes with limited hardware to participate. To further avoid collusion among Layer 2 nodes, we design a ``proof of luck'' scheme, which also provides robust protection against maximal extractable value (MEV) attacks. Experimental results show our techniques not only ensure data availability but also improve overall network efficiency, which implies the practicality and potential of our techniques for real-world implementation. - [9] arXiv:2403.10879 [pdf, other]
-
Title: Characterizing the Solana NFT EcosystemComments: This paper has been accepted by WWW 2024Subjects: Cryptography and Security (cs.CR)
Non-Fungible Tokens (NFTs) are digital assets recorded on the blockchain, providing cryptographic proof of ownership over digital or physical items. Although Solana has only begun to gain popularity in recent years, its NFT market has seen substantial transaction volumes. In this paper, we conduct the first systematic research on the characteristics of Solana NFTs from two perspectives: longitudinal measurement and wash trading security audit. We gathered 132,736 Solana NFT from Solscan and analyzed the sales data within these collections. Investigating users' economic activity and NFT owner information reveals that the top users in Solana NFT are skewed toward a higher distribution of purchases. Subsequently, we employ the Local Outlier Factor algorithm to conduct a wash trading audit on 2,175 popular Solana NFTs. We discovered that 138 NFT pools are involved in wash trading, with 8 of these NFTs having a wash trading rate exceeding 50%. Fortunately, none of these NFTs have been entirely washed out.
- [10] arXiv:2403.10893 [pdf, other]
-
Title: A Watermark-Conditioned Diffusion Model for IP ProtectionSubjects: Cryptography and Security (cs.CR)
The ethical need to protect AI-generated content has been a significant concern in recent years. While existing watermarking strategies have demonstrated success in detecting synthetic content (detection), there has been limited exploration in identifying the users responsible for generating these outputs from a single model (owner identification). In this paper, we focus on both practical scenarios and propose a unified watermarking framework for content copyright protection within the context of diffusion models. Specifically, we consider two parties: the model provider, who grants public access to a diffusion model via an API, and the users, who can solely query the model API and generate images in a black-box manner. Our task is to embed hidden information into the generated contents, which facilitates further detection and owner identification. To tackle this challenge, we propose a Watermark-conditioned Diffusion model called WaDiff, which manipulates the watermark as a conditioned input and incorporates fingerprinting into the generation process. All the generative outputs from our WaDiff carry user-specific information, which can be recovered by an image extractor and further facilitate forensic identification. Extensive experiments are conducted on two popular diffusion models, and we demonstrate that our method is effective and robust in both the detection and owner identification tasks. Meanwhile, our watermarking framework only exerts a negligible impact on the original generation and is more stealthy and efficient in comparison to existing watermarking strategies.
- [11] arXiv:2403.10920 [pdf, other]
-
Title: Batch-oriented Element-wise Approximate Activation for Privacy-Preserving Neural NetworksSubjects: Cryptography and Security (cs.CR)
Privacy-Preserving Neural Networks (PPNN) are advanced to perform inference without breaching user privacy, which can serve as an essential tool for medical diagnosis to simultaneously achieve big data utility and privacy protection. As one of the key techniques to enable PPNN, Fully Homomorphic Encryption (FHE) is facing a great challenge that homomorphic operations cannot be easily adapted for non-linear activation calculations. In this paper, batch-oriented element-wise data packing and approximate activation are proposed, which train linear low-degree polynomials to approximate the non-linear activation function - ReLU. Compared with other approximate activation methods, the proposed fine-grained, trainable approximation scheme can effectively reduce the accuracy loss caused by approximation errors. Meanwhile, due to element-wise data packing, a large batch of images can be packed and inferred concurrently, leading to a much higher utility ratio of ciphertext slots. Therefore, although the total inference time increases sharply, the amortized time for each image actually decreases, especially when the batch size increases. Furthermore, knowledge distillation is adopted in the training process to further enhance the inference accuracy. Experiment results show that when ciphertext inference is performed on 4096 input images, compared with the current most efficient channel-wise method, the inference accuracy is improved by 1.65%, and the amortized inference time is reduced by 99.5%.
- [12] arXiv:2403.10968 [pdf, ps, other]
-
Title: Enhancing IoT Security Against DDoS Attacks through Federated LearningSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The rapid proliferation of the Internet of Things (IoT) has ushered in transformative connectivity between physical devices and the digital realm. Nonetheless, the escalating threat of Distributed Denial of Service (DDoS) attacks jeopardizes the integrity and reliability of IoT networks. Conventional DDoS mitigation approaches are ill-equipped to handle the intricacies of IoT ecosystems, potentially compromising data privacy. This paper introduces an innovative strategy to bolster the security of IoT networks against DDoS attacks by harnessing the power of Federated Learning that allows multiple IoT devices or edge nodes to collaboratively build a global model while preserving data privacy and minimizing communication overhead. The research aims to investigate Federated Learning's effectiveness in detecting and mitigating DDoS attacks in IoT. Our proposed framework leverages IoT devices' collective intelligence for real-time attack detection without compromising sensitive data. This study proposes innovative deep autoencoder approaches for data dimensionality reduction, retraining, and partial selection to enhance the performance and stability of the proposed model. Additionally, two renowned aggregation algorithms, FedAvg and FedAvgM, are employed in this research. Various metrics, including true positive rate, false positive rate, and F1-score, are employed to evaluate the model. The dataset utilized in this research, N-BaIoT, exhibits non-IID data distribution, where data categories are distributed quite differently. The negative impact of these distribution disparities is managed by employing retraining and partial selection techniques, enhancing the final model's stability. Furthermore, evaluation results demonstrate that the FedAvgM aggregation algorithm outperforms FedAvg, indicating that in non-IID datasets, FedAvgM provides better stability and performance.
- [13] arXiv:2403.11088 [pdf, other]
-
Title: Programming Frameworks for Differential PrivacyComments: To appear as a chapter in the book "Differential Privacy for Artificial Intelligence," edited by Ferdinando Fioretto and Pascal van Hentenryck and to be published by now publishersSubjects: Cryptography and Security (cs.CR); Databases (cs.DB); Programming Languages (cs.PL)
Many programming frameworks have been introduced to support the development of differentially private software applications. In this chapter, we survey some of the conceptual ideas underlying these frameworks in a way that we hope will be helpful for both practitioners and researchers. For practitioners, the survey can provide a starting point for understanding what features may be valuable when selecting a programming framework. For researchers, it can help organize existing work in a unified way and provide context for understanding new features in future frameworks.
- [14] arXiv:2403.11166 [pdf, other]
-
Title: Pencil: Private and Extensible Collaborative Learning without the Non-Colluding AssumptionComments: Network and Distributed System Security Symposium (NDSS) 2024Journal-ref: Proceedings 2024 Network and Distributed System Security Symposium (2024)Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
The escalating focus on data privacy poses significant challenges for collaborative neural network training, where data ownership and model training/deployment responsibilities reside with distinct entities. Our community has made substantial contributions to addressing this challenge, proposing various approaches such as federated learning (FL) and privacy-preserving machine learning based on cryptographic constructs like homomorphic encryption (HE) and secure multiparty computation (MPC). However, FL completely overlooks model privacy, and HE has limited extensibility (confined to only one data provider). While the state-of-the-art MPC frameworks provide reasonable throughput and simultaneously ensure model/data privacy, they rely on a critical non-colluding assumption on the computing servers, and relaxing this assumption is still an open problem.
In this paper, we present Pencil, the first private training framework for collaborative learning that simultaneously offers data privacy, model privacy, and extensibility to multiple data providers, without relying on the non-colluding assumption. Our fundamental design principle is to construct the n-party collaborative training protocol based on an efficient two-party protocol, and meanwhile ensuring that switching to different data providers during model training introduces no extra cost. We introduce several novel cryptographic protocols to realize this design principle and conduct a rigorous security and privacy analysis. Our comprehensive evaluations of Pencil demonstrate that (i) models trained in plaintext and models trained privately using Pencil exhibit nearly identical test accuracies; (ii) The training overhead of Pencil is greatly reduced: Pencil achieves 10 ~ 260x higher throughput and 2 orders of magnitude less communication than prior art; (iii) Pencil is resilient against both existing and adaptive (white-box) attacks. - [15] arXiv:2403.11171 [pdf, other]
-
Title: A Tip for IOTA Privacy: IOTA Light Node Deanonymization via Tip SelectionComments: This paper is accepted to the IEEE International Conference on Blockchain and Cryptocurrency(ICBC) 2024Subjects: Cryptography and Security (cs.CR)
IOTA is a distributed ledger technology that uses a Directed Acyclic Graph (DAG) structure called the Tangle. It is known for its efficiency and is widely used in the Internet of Things (IoT) environment. Tangle can be configured by utilizing the tip selection process. Due to performance issues with light nodes, full nodes are being asked to perform the tip selections of light nodes. However, in this paper, we demonstrate that tip selection can be exploited to compromise users' privacy. An adversary full node can associate a transaction with the identity of a light node by comparing the light node's request with its ledger. We show that these types of attacks are not only viable in the current IOTA environment but also in IOTA 2.0 and the privacy improvement being studied. We also provide solutions to mitigate these attacks and propose ways to enhance anonymity in the IOTA network while maintaining efficiency and scalability.
- [16] arXiv:2403.11180 [pdf, other]
-
Title: usfAD Based Effective Unknown Attack Detection Focused IDS FrameworkAuthors: Md. Ashraf Uddin, Sunil Aryal, Mohamed Reda Bouadjenek, Muna Al-Hawawreh, Md. Alamin TalukderComments: Deakin University, Australia | This material is based upon work supported by the Air Force Office of Scientific Research under award number FA2386-23-1-4003Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
The rapid expansion of varied network systems, including the Internet of Things (IoT) and Industrial Internet of Things (IIoT), has led to an increasing range of cyber threats. Ensuring robust protection against these threats necessitates the implementation of an effective Intrusion Detection System (IDS). For more than a decade, researchers have delved into supervised machine learning techniques to develop IDS to classify normal and attack traffic. However, building effective IDS models using supervised learning requires a substantial number of benign and attack samples. To collect a sufficient number of attack samples from real-life scenarios is not possible since cyber attacks occur occasionally. Further, IDS trained and tested on known datasets fails in detecting zero-day or unknown attacks due to the swift evolution of attack patterns. To address this challenge, we put forth two strategies for semi-supervised learning based IDS where training samples of attacks are not required: 1) training a supervised machine learning model using randomly and uniformly dispersed synthetic attack samples; 2) building a One Class Classification (OCC) model that is trained exclusively on benign network traffic. We have implemented both approaches and compared their performances using 10 recent benchmark IDS datasets. Our findings demonstrate that the OCC model based on the state-of-art anomaly detection technique called usfAD significantly outperforms conventional supervised classification and other OCC based techniques when trained and tested considering real-life scenarios, particularly to detect previously unseen attacks.
- [17] arXiv:2403.11303 [pdf, ps, other]
-
Title: A Brief Study of Computer Network Security TechnologiesSubjects: Cryptography and Security (cs.CR)
The rapid development of computer network system brings both a great convenience and new security threats for users. Network security problem generally includes network system security and data security. Specifically, it refers to the reliability of network system, confidentiality, integrity and availability of data information in the system. This paper introduces the significance of network security systems and highlights related technologies, mainly authentication, data encryption, firewall and antivirus technology. Network security problems can be faced by any network user, therefore we must greatly prioritize network security, try to prevent hostile attacks and ensure the overall security of the network system.
- [18] arXiv:2403.11445 [pdf, other]
-
Title: Budget Recycling Differential PrivacySubjects: Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Signal Processing (eess.SP)
Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing ``out-of-bound'' noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By ``soft-bounded," we refer to the mechanism's ability to release most outputs within a predefined error boundary, thereby improving utility and maintaining privacy simultaneously. The core of BR-DP consists of two components: a DP kernel responsible for generating a noisy answer per iteration, and a recycler that probabilistically recycles/regenerates or releases the noisy answer. We delve into the privacy accounting of BR-DP, culminating in the development of a budgeting principle that optimally sub-allocates the available budget between the DP kernel and the recycler. Furthermore, we introduce algorithms for tight BR-DP accounting in composition scenarios, and our findings indicate that BR-DP achieves reduced privacy leakage post-composition compared to DP. Additionally, we explore the concept of privacy amplification via subsampling within the BR-DP framework and propose optimal sampling rates for BR-DP across various queries. We experiment with real data, and the results demonstrate BR-DP's effectiveness in lifting the utility-privacy tradeoff provided by DP mechanisms.
- [19] arXiv:2403.11519 [pdf, other]
-
Title: Efficient and Privacy-Preserving Federated Learning based on Full Homomorphic EncryptionSubjects: Cryptography and Security (cs.CR)
Since the first theoretically feasible full homomorphic encryption (FHE) scheme was proposed in 2009, great progress has been achieved. These improvements have made FHE schemes come off the paper and become quite useful in solving some practical problems. In this paper, we propose a set of novel Federated Learning Schemes by utilizing the latest homomorphic encryption technologies, so as to improve the security, functionality and practicality at the same time. Comparisons have been given in four practical data sets separately from medical, business, biometric and financial fields, covering both horizontal and vertical federated learning scenarios. The experiment results show that our scheme achieves significant improvements in security, efficiency and practicality, compared with classical horizontal and vertical federated learning schemes.
- [20] arXiv:2403.11669 [pdf, other]
-
Title: Semantic Data Representation for Explainable Windows Malware Detection ModelsComments: arXiv admin note: substantial text overlap with arXiv:2301.00153Subjects: Cryptography and Security (cs.CR)
Ontologies are a standard tool for creating semantic schemata in many knowledge intensive domains of human interest. They are becoming increasingly important also in the areas that have been until very recently dominated by subsymbolic knowledge representation and machine-learning (ML) based data processing. One such area is information security, and specifically, malware detection. We thus propose PE Malware Ontology that offers a reusable semantic schema for Portable Executable (PE - the Windows binary format) malware files. This ontology is inspired by the structure of the EMBER dataset, which focuses on the static malware analysis of PE files. With this proposal, we hope to provide a unified semantic representation for the existing and future PE-malware datasets and facilitate the application of symbolic, neuro-symbolic, or otherwise explainable approaches in the PE-malware-detection domain, which may produce interpretable results described by the terms defined in our ontology. In addition, we also publish semantically treated EMBER data, including fractional datasets, to support the reproducibility of experiments on EMBER. We supplement our work with a preliminary case study, conducted using concept learning, to show the general feasibility of our approach. While we were not able to match the precision of the state-of-the-art ML tools, the learned malware discriminators were interesting and highly interpretable.
- [21] arXiv:2403.11741 [pdf, ps, other]
-
Title: Post-Quantum Cryptography: Securing Digital Communication in the Quantum EraSubjects: Cryptography and Security (cs.CR)
The advent of quantum computing poses a profound threat to traditional cryptographic systems, exposing vulnerabilities that compromise the security of digital communication channels reliant on RSA, ECC, and similar classical encryption methods. Quantum algorithms, notably Shor's algorithm, exploit the inherent computational power of quantum computers to efficiently solve mathematical problems underlying these cryptographic schemes. In response, post-quantum cryptography (PQC) emerged as a critical field aimed at developing resilient cryptographic algorithms impervious to quantum attacks. This paper delineates the vulnerabilities of classical cryptographic systems to quantum attacks, elucidates the principles of quantum computing, and introduces various PQC algorithms such as lattice-based cryptography, code-based cryptography, hash-based cryptography, and multivariate polynomial cryptography. Highlighting the importance of PQC in securing digital communication amidst quantum computing advancements, this research underscores its pivotal role in safeguarding data integrity, confidentiality, and authenticity in the face of emerging quantum threats.
- [22] arXiv:2403.11798 [pdf, other]
-
Title: Is It Really You Who Forgot the Password? When Account Recovery Meets Risk-Based AuthenticationSubjects: Cryptography and Security (cs.CR)
Risk-based authentication (RBA) is used in online services to protect user accounts from unauthorized takeover. RBA commonly uses contextual features that indicate a suspicious login attempt when the characteristic attributes of the login context deviate from known and thus expected values. Previous research on RBA and anomaly detection in authentication has mainly focused on the login process. However, recent attacks have revealed vulnerabilities in other parts of the authentication process, specifically in the account recovery function. Consequently, to ensure comprehensive authentication security, the use of anomaly detection in the context of account recovery must also be investigated.
This paper presents the first study to investigate risk-based account recovery (RBAR) in the wild. We analyzed the adoption of RBAR by five prominent online services (that are known to use RBA). Our findings confirm the use of RBAR at Google, LinkedIn, and Amazon. Furthermore, we provide insights into the different RBAR mechanisms of these services and explore the impact of multi-factor authentication on them. Based on our findings, we create a first maturity model for RBAR challenges. The goal of our work is to help developers, administrators, and policy-makers gain an initial understanding of RBAR and to encourage further research in this direction. - [23] arXiv:2403.11830 [pdf, other]
-
Title: Problem space structural adversarial attacks for Network Intrusion Detection Systems based on Graph Neural NetworksComments: preprint submitted to IEEE TIFS, under reviewSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Machine Learning (ML) algorithms have become increasingly popular for supporting Network Intrusion Detection Systems (NIDS). Nevertheless, extensive research has shown their vulnerability to adversarial attacks, which involve subtle perturbations to the inputs of the models aimed at compromising their performance. Recent proposals have effectively leveraged Graph Neural Networks (GNN) to produce predictions based also on the structural patterns exhibited by intrusions to enhance the detection robustness. However, the adoption of GNN-based NIDS introduces new types of risks. In this paper, we propose the first formalization of adversarial attacks specifically tailored for GNN in network intrusion detection. Moreover, we outline and model the problem space constraints that attackers need to consider to carry out feasible structural attacks in real-world scenarios. As a final contribution, we conduct an extensive experimental campaign in which we launch the proposed attacks against state-of-the-art GNN-based NIDS. Our findings demonstrate the increased robustness of the models against classical feature-based adversarial attacks, while highlighting their susceptibility to structure-based attacks.
- [24] arXiv:2403.11859 [pdf, other]
-
Title: Towards automated formal security analysis of SAML V2.0 Web Browser SSO standard - the POST/Artifact use caseSubjects: Cryptography and Security (cs.CR)
Single Sign-On (SSO) protocols streamline user authentication with a unified login for multiple online services, improving usability and security. One of the most common SSO protocol frameworks - the Security Assertion Markup Language V2.0 (SAML) Web SSO Profile - has been in use for more than two decades, primarily in government, education and enterprise environments. Despite its mission-critical nature, only certain deployments and configurations of the Web SSO Profile have been formally analyzed. This paper attempts to bridge this gap by performing a comprehensive formal security analysis of the SAML V2.0 SP-initiated SSO with POST/Artifact Bindings use case. Rather than focusing on a specific deployment and configuration, we closely follow the specification with the goal of capturing many different deployments allowed by the standard. Modeling and analysis is performed using Tamarin prover - state-of-the-art tool for automated verification of security protocols in the symbolic model of cryptography. Technically, we build a meta-model of the use case that we instantiate to eight different protocol variants. Using the Tamarin prover, we formally verify a number of critical security properties for those protocol variants, while identifying certain drawbacks and potential vulnerabilities.
- [25] arXiv:2403.11981 [pdf, other]
-
Title: Diffusion Denoising as a Certified Defense against Clean-label PoisoningSubjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We present a certified defense to clean-label poisoning attacks. These attacks work by injecting a small number of poisoning samples (e.g., 1%) that contain $p$-norm bounded adversarial perturbations into the training data to induce a targeted misclassification of a test-time input. Inspired by the adversarial robustness achieved by $denoised$ $smoothing$, we show how an off-the-shelf diffusion model can sanitize the tampered training data. We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy. We compare our defense with existing countermeasures against clean-label poisoning, showing that the defense reduces the attack success the most and offers the best model utility. Our results highlight the need for future work on developing stronger clean-label attacks and using our certified yet practical defense as a strong baseline to evaluate these attacks.
Cross-lists for Tue, 19 Mar 24
- [26] arXiv:2403.10550 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing FlowsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
With the rapid development of the Internet, various types of anomaly traffic are threatening network security. We consider the problem of anomaly network traffic detection and propose a three-stage anomaly detection framework using only normal traffic. Our framework can generate pseudo anomaly samples without prior knowledge of anomalies to achieve the detection of anomaly data. Firstly, we employ a reconstruction method to learn the deep representation of normal samples. Secondly, these representations are normalized to a standard normal distribution using a bidirectional flow module. To simulate anomaly samples, we add noises to the normalized representations which are then passed through the generation direction of the bidirectional flow module. Finally, a simple classifier is trained to differentiate the normal samples and pseudo anomaly samples in the latent space. During inference, our framework requires only two modules to detect anomalous samples, leading to a considerable reduction in model size. According to the experiments, our method achieves the state of-the-art results on the common benchmarking datasets of anomaly network traffic detection. The code is given in the https://github.com/ZxuanDang/ATD-via-Flows.git
- [27] arXiv:2403.10553 (cross-list from cs.LG) [pdf, other]
-
Title: Learning to Watermark LLM-generated Text via Reinforcement LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
We study how to watermark LLM outputs, i.e. embedding algorithmically detectable signals into LLM-generated text to track misuse. Unlike the current mainstream methods that work with a fixed LLM, we expand the watermark design space by including the LLM tuning stage in the watermark pipeline. While prior works focus on token-level watermark that embeds signals into the output, we design a model-level watermark that embeds signals into the LLM weights, and such signals can be detected by a paired detector. We propose a co-training framework based on reinforcement learning that iteratively (1) trains a detector to detect the generated watermarked text and (2) tunes the LLM to generate text easily detectable by the detector while keeping its normal utility. We empirically show that our watermarks are more accurate, robust, and adaptable (to new attacks). It also allows watermarked model open-sourcing. In addition, if used together with alignment, the extra overhead introduced is low - only training an extra reward model (i.e. our detector). We hope our work can bring more effort into studying a broader watermark design that is not limited to working with a fixed LLM. We open-source the code: https://github.com/xiaojunxu/learning-to-watermark-llm .
- [28] arXiv:2403.10558 (cross-list from cs.CV) [pdf, other]
-
Title: Adaptive Hybrid Masking Strategy for Privacy-Preserving Face Recognition Against Model Inversion AttackSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
The utilization of personal sensitive data in training face recognition (FR) models poses significant privacy concerns, as adversaries can employ model inversion attacks (MIA) to infer the original training data. Existing defense methods, such as data augmentation and differential privacy, have been employed to mitigate this issue. However, these methods often fail to strike an optimal balance between privacy and accuracy. To address this limitation, this paper introduces an adaptive hybrid masking algorithm against MIA. Specifically, face images are masked in the frequency domain using an adaptive MixUp strategy. Unlike the traditional MixUp algorithm, which is predominantly used for data augmentation, our modified approach incorporates frequency domain mixing. Previous studies have shown that increasing the number of images mixed in MixUp can enhance privacy preservation but at the expense of reduced face recognition accuracy. To overcome this trade-off, we develop an enhanced adaptive MixUp strategy based on reinforcement learning, which enables us to mix a larger number of images while maintaining satisfactory recognition accuracy. To optimize privacy protection, we propose maximizing the reward function (i.e., the loss function of the FR system) during the training of the strategy network. While the loss function of the FR network is minimized in the phase of training the FR network. The strategy network and the face recognition network can be viewed as antagonistic entities in the training process, ultimately reaching a more balanced trade-off. Experimental results demonstrate that our proposed hybrid masking scheme outperforms existing defense algorithms in terms of privacy preservation and recognition accuracy against MIA.
- [29] arXiv:2403.10573 (cross-list from eess.IV) [pdf, other]
-
Title: Medical Unlearnable Examples: Securing Medical Data from Unauthorized Traning via Sparsity-Aware Local MaskingSubjects: Image and Video Processing (eess.IV); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
With the rapid growth of artificial intelligence (AI) in healthcare, there has been a significant increase in the generation and storage of sensitive medical data. This abundance of data, in turn, has propelled the advancement of medical AI technologies. However, concerns about unauthorized data exploitation, such as training commercial AI models, often deter researchers from making their invaluable datasets publicly available. In response to the need to protect this hard-to-collect data while still encouraging medical institutions to share it, one promising solution is to introduce imperceptible noise into the data. This method aims to safeguard the data against unauthorized training by inducing degradation in model generalization. Although existing methods have shown commendable data protection capabilities in general domains, they tend to fall short when applied to biomedical data, mainly due to their failure to account for the sparse nature of medical images. To address this problem, we propose the Sparsity-Aware Local Masking (SALM) method, a novel approach that selectively perturbs significant pixel regions rather than the entire image as previous strategies have done. This simple-yet-effective approach significantly reduces the perturbation search space by concentrating on local regions, thereby improving both the efficiency and effectiveness of data protection for biomedical datasets characterized by sparse features. Besides, we have demonstrated that SALM maintains the essential characteristics of the data, ensuring its clinical utility remains uncompromised. Our extensive experiments across various datasets and model architectures demonstrate that SALM effectively prevents unauthorized training of deep-learning models and outperforms previous state-of-the-art data protection methods.
- [30] arXiv:2403.10646 (cross-list from cs.LG) [pdf, ps, other]
-
Title: A Survey of Source Code Representations for Machine Learning-Based Cybersecurity TasksSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Machine learning techniques for cybersecurity-related software engineering tasks are becoming increasingly popular. The representation of source code is a key portion of the technique that can impact the way the model is able to learn the features of the source code. With an increasing number of these techniques being developed, it is valuable to see the current state of the field to better understand what exists and what's not there yet. This paper presents a study of these existing ML-based approaches and demonstrates what type of representations were used for different cybersecurity tasks and programming languages. Additionally, we study what types of models are used with different representations. We have found that graph-based representations are the most popular category of representation, and Tokenizers and Abstract Syntax Trees (ASTs) are the two most popular representations overall. We also found that the most popular cybersecurity task is vulnerability detection, and the language that is covered by the most techniques is C. Finally, we found that sequence-based models are the most popular category of models, and Support Vector Machines (SVMs) are the most popular model overall.
- [31] arXiv:2403.10676 (cross-list from cs.IT) [pdf, other]
-
Title: Secure Distributed Storage: Optimal Trade-Off Between Storage Rate and Privacy LeakageComments: 11 pages, 3 figures, two-column, accepted to IEEE Transactions on Information Theory, part of the results was presented at the 2020 IEEE International Symposium on Information Theory (ISIT)Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)
Consider the problem of storing data in a distributed manner over $T$ servers. Specifically, the data needs to (i) be recoverable from any $\tau$ servers, and (ii) remain private from any $z$ colluding servers, where privacy is quantified in terms of mutual information between the data and all the information available at any $z$ colluding servers. For this model, our main results are (i) the fundamental trade-off between storage size and the level of desired privacy, and (ii) the optimal amount of local randomness necessary at the encoder. As a byproduct, our results provide an optimal lower bound on the individual share size of ramp secret sharing schemes under a more general leakage symmetry condition than the ones previously considered in the literature.
- [32] arXiv:2403.10717 (cross-list from cs.LG) [pdf, other]
-
Title: Backdoor Secrets Unveiled: Identifying Backdoor Data with Optimized Scaled Prediction ConsistencyComments: The Twelfth International Conference on Learning Representations (ICLR 2024)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Modern machine learning (ML) systems demand substantial training data, often resorting to external sources. Nevertheless, this practice renders them vulnerable to backdoor poisoning attacks. Prior backdoor defense strategies have primarily focused on the identification of backdoored models or poisoned data characteristics, typically operating under the assumption of access to clean data. In this work, we delve into a relatively underexplored challenge: the automatic identification of backdoor data within a poisoned dataset, all under realistic conditions, i.e., without the need for additional clean data or without manually defining a threshold for backdoor detection. We draw an inspiration from the scaled prediction consistency (SPC) technique, which exploits the prediction invariance of poisoned data to an input scaling factor. Based on this, we pose the backdoor data identification problem as a hierarchical data splitting optimization problem, leveraging a novel SPC-based loss function as the primary optimization objective. Our innovation unfolds in several key aspects. First, we revisit the vanilla SPC method, unveiling its limitations in addressing the proposed backdoor identification problem. Subsequently, we develop a bi-level optimization-based approach to precisely identify backdoor data by minimizing the advanced SPC loss. Finally, we demonstrate the efficacy of our proposal against a spectrum of backdoor attacks, encompassing basic label-corrupted attacks as well as more sophisticated clean-label attacks, evaluated across various benchmark datasets. Experiment results show that our approach often surpasses the performance of current baselines in identifying backdoor data points, resulting in about 4%-36% improvement in average AUROC. Codes are available at https://github.com/OPTML-Group/BackdoorMSPC.
- [33] arXiv:2403.10790 (cross-list from quant-ph) [pdf, other]
-
Title: QuantumLeak: Stealing Quantum Neural Networks from Cloud-based NISQ MachinesJournal-ref: published in IJCNN 2024Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Variational quantum circuits (VQCs) have become a powerful tool for implementing Quantum Neural Networks (QNNs), addressing a wide range of complex problems. Well-trained VQCs serve as valuable intellectual assets hosted on cloud-based Noisy Intermediate Scale Quantum (NISQ) computers, making them susceptible to malicious VQC stealing attacks. However, traditional model extraction techniques designed for classical machine learning models encounter challenges when applied to NISQ computers due to significant noise in current devices. In this paper, we introduce QuantumLeak, an effective and accurate QNN model extraction technique from cloud-based NISQ machines. Compared to existing classical model stealing techniques, QuantumLeak improves local VQC accuracy by 4.99\%$\sim$7.35\% across diverse datasets and VQC architectures.
- [34] arXiv:2403.10856 (cross-list from cs.CL) [pdf, other]
-
Title: Zero-shot Generative Linguistic SteganographyComments: 15 pages, 6 figures. Accepted at NAACL 2024Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Generative linguistic steganography attempts to hide secret messages into covertext. Previous studies have generally focused on the statistical differences between the covertext and stegotext, however, ill-formed stegotext can readily be identified by humans. In this paper, we propose a novel zero-shot approach based on in-context learning for linguistic steganography to achieve better perceptual and statistical imperceptibility. We also design several new metrics and reproducible language evaluations to measure the imperceptibility of the stegotext. Our experimental results indicate that our method produces $1.926\times$ more innocent and intelligible stegotext than any other method.
- [35] arXiv:2403.10883 (cross-list from cs.CV) [pdf, other]
-
Title: Improving Adversarial Transferability of Visual-Language Pre-training Models through Collaborative Multimodal InteractionAuthors: Jiyuan Fu, Zhaoyu Chen, Kaixun Jiang, Haijing Guo, Jiafeng Wang, Shuyong Gao, Wenqiang ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Multimedia (cs.MM)
Despite the substantial advancements in Vision-Language Pre-training (VLP) models, their susceptibility to adversarial attacks poses a significant challenge. Existing work rarely studies the transferability of attacks on VLP models, resulting in a substantial performance gap from white-box attacks. We observe that prior work overlooks the interaction mechanisms between modalities, which plays a crucial role in understanding the intricacies of VLP models. In response, we propose a novel attack, called Collaborative Multimodal Interaction Attack (CMI-Attack), leveraging modality interaction through embedding guidance and interaction enhancement. Specifically, attacking text at the embedding level while preserving semantics, as well as utilizing interaction image gradients to enhance constraints on perturbations of texts and images. Significantly, in the image-text retrieval task on Flickr30K dataset, CMI-Attack raises the transfer success rates from ALBEF to TCL, $\text{CLIP}_{\text{ViT}}$ and $\text{CLIP}_{\text{CNN}}$ by 8.11%-16.75% over state-of-the-art methods. Moreover, CMI-Attack also demonstrates superior performance in cross-task generalization scenarios. Our work addresses the underexplored realm of transfer attacks on VLP models, shedding light on the importance of modality interaction for enhanced adversarial robustness.
- [36] arXiv:2403.10995 (cross-list from cs.LG) [pdf, other]
-
Title: Edge Private Graph Neural Networks with Singular Value PerturbationComments: Accepted at Privacy Enhancing Technologies Symposium (PETS) 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
Graph neural networks (GNNs) play a key role in learning representations from graph-structured data and are demonstrated to be useful in many applications. However, the GNN training pipeline has been shown to be vulnerable to node feature leakage and edge extraction attacks. This paper investigates a scenario where an attacker aims to recover private edge information from a trained GNN model. Previous studies have employed differential privacy (DP) to add noise directly to the adjacency matrix or a compact graph representation. The added perturbations cause the graph structure to be substantially morphed, reducing the model utility. We propose a new privacy-preserving GNN training algorithm, Eclipse, that maintains good model utility while providing strong privacy protection on edges. Eclipse is based on two key observations. First, adjacency matrices in graph structures exhibit low-rank behavior. Thus, Eclipse trains GNNs with a low-rank format of the graph via singular values decomposition (SVD), rather than the original graph. Using the low-rank format, Eclipse preserves the primary graph topology and removes the remaining residual edges. Eclipse adds noise to the low-rank singular values instead of the entire graph, thereby preserving the graph privacy while still maintaining enough of the graph structure to maintain model utility. We theoretically show Eclipse provide formal DP guarantee on edges. Experiments on benchmark graph datasets show that Eclipse achieves significantly better privacy-utility tradeoff compared to existing privacy-preserving GNN training methods. In particular, under strong privacy constraints ($\epsilon$ < 4), Eclipse shows significant gains in the model utility by up to 46%. We further demonstrate that Eclipse also has better resilience against common edge attacks (e.g., LPA), lowering the attack AUC by up to 5% compared to other state-of-the-art baselines.
- [37] arXiv:2403.11052 (cross-list from cs.CV) [pdf, other]
-
Title: Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross AttentionSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Recent advancements in text-to-image diffusion models have demonstrated their remarkable capability to generate high-quality images from textual prompts. However, increasing research indicates that these models memorize and replicate images from their training data, raising tremendous concerns about potential copyright infringement and privacy risks. In our study, we provide a novel perspective to understand this memorization phenomenon by examining its relationship with cross-attention mechanisms. We reveal that during memorization, the cross-attention tends to focus disproportionately on the embeddings of specific tokens. The diffusion model is overfitted to these token embeddings, memorizing corresponding training images. To elucidate this phenomenon, we further identify and discuss various intrinsic findings of cross-attention that contribute to memorization. Building on these insights, we introduce an innovative approach to detect and mitigate memorization in diffusion models. The advantage of our proposed method is that it will not compromise the speed of either the training or the inference processes in these models while preserving the quality of generated images. Our code is available at https://github.com/renjie3/MemAttn .
- [38] arXiv:2403.11162 (cross-list from cs.CV) [pdf, other]
-
Title: CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient InversionComments: Accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Machine Learning (cs.LG)
Diffusion Models (DMs) have evolved into advanced image generation tools, especially for few-shot generation where a pretrained model is fine-tuned on a small set of images to capture a specific style or object. Despite their success, concerns exist about potential copyright violations stemming from the use of unauthorized data in this process. In response, we present Contrasting Gradient Inversion for Diffusion Models (CGI-DM), a novel method featuring vivid visual representations for digital copyright authentication. Our approach involves removing partial information of an image and recovering missing details by exploiting conceptual differences between the pretrained and fine-tuned models. We formulate the differences as KL divergence between latent variables of the two models when given the same input image, which can be maximized through Monte Carlo sampling and Projected Gradient Descent (PGD). The similarity between original and recovered images serves as a strong indicator of potential infringements. Extensive experiments on the WikiArt and Dreambooth datasets demonstrate the high accuracy of CGI-DM in digital copyright authentication, surpassing alternative validation techniques. Code implementation is available at https://github.com/Nicholas0228/Revelio.
- [39] arXiv:2403.11206 (cross-list from cs.LG) [pdf, other]
-
Title: CBR - Boosting Adaptive Classification By Retrieval of Encrypted Network Traffic with Out-of-distributionSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Encrypted network traffic Classification tackles the problem from different approaches and with different goals. One of the common approaches is using Machine learning or Deep Learning-based solutions on a fixed number of classes, leading to misclassification when an unknown class is given as input. One of the solutions for handling unknown classes is to retrain the model, however, retraining models every time they become obsolete is both resource and time-consuming. Therefore, there is a growing need to allow classification models to detect and adapt to new classes dynamically, without retraining, but instead able to detect new classes using few shots learning [1]. In this paper, we introduce Adaptive Classification By Retrieval CBR, a novel approach for encrypted network traffic classification. Our new approach is based on an ANN-based method, which allows us to effectively identify new and existing classes without retraining the model. The novel approach is simple, yet effective and achieved similar results to RF with up to 5% difference (usually less than that) in the classification tasks while having a slight decrease in the case of new samples (from new classes) without retraining. To summarize, the new method is a real-time classification, which can classify new classes without retraining. Furthermore, our solution can be used as a complementary solution alongside RF or any other machine/deep learning classification method, as an aggregated solution.
- [40] arXiv:2403.11238 (cross-list from cs.DC) [pdf, other]
-
Title: JUMBO: Fully Asynchronous BFT Consensus Made Truly ScalableSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
Recent progresses in asynchronous Byzantine fault-tolerant (BFT) consensus, e.g. Dumbo-NG (CCS' 22) and Tusk (EuroSys' 22), show promising performance through decoupling transaction dissemination and block agreement. However, when executed with a larger number $n$ of nodes, like several hundreds, they would suffer from significant degradation in performance. Their dominating scalability bottleneck is the huge authenticator complexity: each node has to multicast $\bigO(n)$ quorum certificates (QCs) and subsequently verify them for each block.
This paper systematically investigates and resolves the above scalability issue. We first propose a signature-free asynchronous BFT consensus FIN-NG that adapts a recent signature-free asynchronous common subset protocol FIN (CCS' 23) into the state-of-the-art framework of concurrent broadcast and agreement. The liveness of FIN-NG relies on our non-trivial redesign of FIN's multi-valued validated Byzantine agreement towards achieving optimal quality. FIN-NG greatly improves the performance of FIN and already outperforms Dumbo-NG in most deployment settings. To further overcome the scalability limit of FIN-NG due to $\bigO(n^3)$ messages, we propose JUMBO, a scalable instantiation of Dumbo-NG, with only $\bigO(n^2)$ complexities for both authenticators and messages. We use various aggregation and dispersal techniques for QCs to significantly reduce the authenticator complexity of original Dumbo-NG implementations by up to $\bigO(n^2)$ orders. We also propose a ``fairness'' patch for JUMBO, thus preventing a flooding adversary from controlling an overwhelming portion of transactions in its output. - [41] arXiv:2403.11297 (cross-list from cs.CL) [pdf, ps, other]
-
Title: A Modified Word Saliency-Based Adversarial Attack on Text Classification ModelsComments: The paper is a preprint of a version submitted in ICCIDA 2024. It consists of 10 pages and contains 7 tablesSubjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
This paper introduces a novel adversarial attack method targeting text classification models, termed the Modified Word Saliency-based Adversarial At-tack (MWSAA). The technique builds upon the concept of word saliency to strategically perturb input texts, aiming to mislead classification models while preserving semantic coherence. By refining the traditional adversarial attack approach, MWSAA significantly enhances its efficacy in evading detection by classification systems. The methodology involves first identifying salient words in the input text through a saliency estimation process, which prioritizes words most influential to the model's decision-making process. Subsequently, these salient words are subjected to carefully crafted modifications, guided by semantic similarity metrics to ensure that the altered text remains coherent and retains its original meaning. Empirical evaluations conducted on diverse text classification datasets demonstrate the effectiveness of the proposed method in generating adversarial examples capable of successfully deceiving state-of-the-art classification models. Comparative analyses with existing adversarial attack techniques further indicate the superiority of the proposed approach in terms of both attack success rate and preservation of text coherence.
- [42] arXiv:2403.11343 (cross-list from cs.LG) [pdf, other]
-
Title: Federated Transfer Learning with Differential PrivacyComments: 76 pages, 3 figuresSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
Federated learning is gaining increasing popularity, with data heterogeneity and privacy being two prominent challenges. In this paper, we address both issues within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of \textit{federated differential privacy}, which offers privacy guarantees for each data set without assuming a trusted central server. Under this privacy constraint, we study three classical statistical problems, namely univariate mean estimation, low-dimensional linear regression, and high-dimensional linear regression. By investigating the minimax rates and identifying the costs of privacy for these problems, we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy. Our analyses incorporate data heterogeneity and privacy, highlighting the fundamental costs of both in federated learning and underscoring the benefit of knowledge transfer across data sets.
- [43] arXiv:2403.11433 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Measuring Quantum Information Leakage Under Detection ThreatSubjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT); Systems and Control (eess.SY)
Gentle quantum leakage is proposed as a measure of information leakage to arbitrary eavesdroppers that aim to avoid detection. Gentle (also sometimes referred to as weak or non-demolition) measurements are used to encode the desire of the eavesdropper to evade detection. The gentle quantum leakage meets important axioms proposed for measures of information leakage including positivity, independence, and unitary invariance. Global depolarizing noise, an important family of physical noise in quantum devices, is shown to reduce gentle quantum leakage (and hence can be used as a mechanism to ensure privacy or security). A lower bound for the gentle quantum leakage based on asymmetric approximate cloning is presented. This lower bound relates information leakage to mutual incompatibility of quantum states. A numerical example, based on the encoding in the celebrated BB84 quantum key distribution algorithm, is used to demonstrate the results.
- [44] arXiv:2403.11759 (cross-list from cs.NI) [pdf, other]
-
Title: Why E.T. Can't Phone Home: A Global View on IP-based Geoblocking at VoWiFiJournal-ref: 22nd Annual International Conference on Mobile Systems, Applications and Services (MobiSys 2024)Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
In current cellular network generations (4G, 5G) the IMS (IP Multimedia Subsystem) plays an integral role in terminating voice calls and short messages. Many operators use VoWiFi (Voice over Wi-Fi, also Wi-Fi calling) as an alternative network access technology to complement their cellular coverage in areas where no radio signal is available (e.g., rural territories or shielded buildings). In a mobile world where customers regularly traverse national borders, this can be used to avoid expensive international roaming fees while journeying overseas, since VoWiFi calls are usually invoiced at domestic rates. To not lose this revenue stream, some operators block access to the IMS for customers staying abroad.
This work evaluates the current deployment status of VoWiFi among worldwide operators and analyzes existing geoblocking measures on the IP layer. We show that a substantial share (IPv4: 14.6%, IPv6: 65.2%) of operators implement geoblocking at the DNS- or VoWiFi protocol level, and highlight severe drawbacks in terms of emergency calling service availability. - [45] arXiv:2403.11778 (cross-list from cs.SD) [pdf, other]
-
Title: Towards the Development of a Real-Time Deepfake Audio Detection System in Communication PlatformsAuthors: Jonat John Mathew, Rakin Ahsan, Sae Furukawa, Jagdish Gautham Krishna Kumar, Huzaifa Pallan, Agamjeet Singh Padda, Sara Adamski, Madhu Reddiboina, Arjun PankajakshanSubjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Deepfake audio poses a rising threat in communication platforms, necessitating real-time detection for audio stream integrity. Unlike traditional non-real-time approaches, this study assesses the viability of employing static deepfake audio detection models in real-time communication platforms. An executable software is developed for cross-platform compatibility, enabling real-time execution. Two deepfake audio detection models based on Resnet and LCNN architectures are implemented using the ASVspoof 2019 dataset, achieving benchmark performances compared to ASVspoof 2019 challenge baselines. The study proposes strategies and frameworks for enhancing these models, paving the way for real-time deepfake audio detection in communication platforms. This work contributes to the advancement of audio stream security, ensuring robust detection capabilities in dynamic, real-time communication scenarios.
- [46] arXiv:2403.11833 (cross-list from cs.CL) [pdf, other]
-
Title: SSCAE -- Semantic, Syntactic, and Context-aware natural language Adversarial Examples generatorJournal-ref: IEEE Transactions on Dependable and Secure Computing (2024), pp. 1-17Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Machine learning models are vulnerable to maliciously crafted Adversarial Examples (AEs). Training a machine learning model with AEs improves its robustness and stability against adversarial attacks. It is essential to develop models that produce high-quality AEs. Developing such models has been much slower in natural language processing (NLP) than in areas such as computer vision. This paper introduces a practical and efficient adversarial attack model called SSCAE for \textbf{S}emantic, \textbf{S}yntactic, and \textbf{C}ontext-aware natural language \textbf{AE}s generator. SSCAE identifies important words and uses a masked language model to generate an early set of substitutions. Next, two well-known language models are employed to evaluate the initial set in terms of semantic and syntactic characteristics. We introduce (1) a dynamic threshold to capture more efficient perturbations and (2) a local greedy search to generate high-quality AEs. As a black-box method, SSCAE generates humanly imperceptible and context-aware AEs that preserve semantic consistency and the source language's syntactical and grammatical requirements. The effectiveness and superiority of the proposed SSCAE model are illustrated with fifteen comparative experiments and extensive sensitivity analysis for parameter optimization. SSCAE outperforms the existing models in all experiments while maintaining a higher semantic consistency with a lower query number and a comparable perturbation rate.
- [47] arXiv:2403.11941 (cross-list from cs.CC) [pdf, ps, other]
-
Title: Perfect Zero-Knowledge PCPs for #PSubjects: Computational Complexity (cs.CC); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS)
We construct perfect zero-knowledge probabilistically checkable proofs (PZK-PCPs) for every language in #P. This is the first construction of a PZK-PCP for any language outside BPP. Furthermore, unlike previous constructions of (statistical) zero-knowledge PCPs, our construction simultaneously achieves non-adaptivity and zero knowledge against arbitrary (adaptive) polynomial-time malicious verifiers.
Our construction consists of a novel masked sumcheck PCP, which uses the combinatorial nullstellensatz to obtain antisymmetric structure within the hypercube and randomness outside of it. To prove zero knowledge, we introduce the notion of locally simulatable encodings: randomised encodings in which every local view of the encoding can be efficiently sampled given a local view of the message. We show that the code arising from the sumcheck protocol (the Reed-Muller code augmented with subcube sums) admits a locally simulatable encoding. This reduces the algebraic problem of simulating our masked sumcheck to a combinatorial property of antisymmetric functions.
Replacements for Tue, 19 Mar 24
- [48] arXiv:2202.06692 (replaced) [pdf, other]
-
Title: TRIP: Trust-Limited Coercion-Resistant In-Person Voter RegistrationAuthors: Louis-Henri Merino, Simone Colombo, Rene Reyes, Alaleh Azhir, Haoqian Zhang, Jeff Allen, Bernhard Tellenbach, Vero Estrada-Galiñanes, Bryan FordComments: 21 pagesSubjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
- [49] arXiv:2303.04097 (replaced) [pdf, ps, other]
-
Title: On additive differential probabilities of the composition of bitwise exclusive-or and a bit rotationSubjects: Cryptography and Security (cs.CR)
- [50] arXiv:2304.10268 (replaced) [pdf, other]
-
Title: BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line EvictionsComments: 15 pages, 11 figuresSubjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
- [51] arXiv:2309.04664 (replaced) [pdf, other]
-
Title: Compact: Approximating Complex Activation Functions for Secure ComputationComments: Accepted to Proceedings on Privacy Enhancing Technologies (PoPETs)Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [52] arXiv:2312.00029 (replaced) [pdf, other]
-
Title: Bergeron: Combating Adversarial Attacks through a Conscience-Based Alignment FrameworkAuthors: Matthew Pisano, Peter Ly, Abraham Sanders, Bingsheng Yao, Dakuo Wang, Tomek Strzalkowski, Mei SiSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [53] arXiv:2312.04135 (replaced) [pdf, other]
-
Title: A Novel Federated Learning-Based IDS for Enhancing UAVs Privacy and SecurityAuthors: Ozlem Ceviz (1), Pinar Sadioglu (1), Sevil Sen (1), Vassilios G. Vassilakis (2) ((1) WISE Lab., Deparment of Computer Engineering, Hacettepe University, Ankara, Turkey (2) Department of Computer Science, University of York, York, United Kingdom)Comments: 15Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [54] arXiv:2403.03267 (replaced) [pdf, other]
-
Title: TTPXHunter: Actionable Threat Intelligence Extraction as TTPs from Finished Cyber Threat ReportsComments: Submitted to Journal of Information Security and Applications (JISA)Subjects: Cryptography and Security (cs.CR)
- [55] arXiv:2403.09209 (replaced) [pdf, other]
-
Title: LAN: Learning Adaptive Neighbors for Real-Time Insider Threat DetectionComments: 13 pagesSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [56] arXiv:2403.09603 (replaced) [pdf, other]
-
Title: Optimistic Verifiable Training by Controlling Hardware NondeterminismComments: 11 pages, 5 figures, preprintSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [57] arXiv:2403.10296 (replaced) [pdf, other]
-
Title: Formal Security Analysis of the AMD SEV-SNP Software InterfaceComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Cryptography and Security (cs.CR)
- [58] arXiv:2211.14769 (replaced) [pdf, other]
-
Title: Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated LearningSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
- [59] arXiv:2302.01622 (replaced) [pdf, other]
-
Title: Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imagingAuthors: Soroosh Tayebi Arasteh, Alexander Ziller, Christiane Kuhl, Marcus Makowski, Sven Nebelung, Rickmer Braren, Daniel Rueckert, Daniel Truhn, Georgios KaissisComments: Published in Communications Medicine. Nature PortfolioJournal-ref: Commun Med 4(1), 46 (2024)Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [60] arXiv:2305.13991 (replaced) [pdf, other]
-
Title: Expressive Losses for Verified Robustness via Convex CombinationsAuthors: Alessandro De Palma, Rudy Bunel, Krishnamurthy Dvijotham, M. Pawan Kumar, Robert Stanforth, Alessio LomuscioComments: ICLR 2024Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
- [61] arXiv:2305.15759 (replaced) [pdf, other]
-
Title: Differentially Private Latent Diffusion ModelsSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [62] arXiv:2306.05524 (replaced) [pdf, other]
-
Title: On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic WritingSubjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [63] arXiv:2308.07553 (replaced) [pdf, other]
-
Title: Enhancing the Antidote: Improved Pointwise Certifications against Poisoning AttacksJournal-ref: Proceedings of the 2023 AAAI Conference on Artificial Intelligence, 37(7), 8861-8869Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
- [64] arXiv:2309.13788 (replaced) [pdf, other]
-
Title: Can LLM-Generated Misinformation Be Detected?Comments: Accepted to Proceedings of ICLR 2024. 9 pages for main paper, 38 pages including appendix. The code, results, dataset for this paper and more resources on "LLMs Meet Misinformation" have been released on the project website: this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [65] arXiv:2310.17645 (replaced) [pdf, other]
-
Title: PubDef: Defending Against Transfer Attacks From Public ModelsComments: ICLR 2024. Code available at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
- [66] arXiv:2312.05720 (replaced) [pdf, other]
-
Title: Beyond Gradient and Priors in Privacy Attacks: Leveraging Pooler Layer Inputs of Language Models in Federated LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
- [67] arXiv:2312.14510 (replaced) [pdf, other]
-
Title: Strategic Bidding Wars in On-chain AuctionsSubjects: Computer Science and Game Theory (cs.GT); Cryptography and Security (cs.CR)
- [68] arXiv:2401.06637 (replaced) [pdf, other]
-
Title: Adversarial Examples are Misaligned in Diffusion Model ManifoldsComments: accepted at IJCNNSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
- [69] arXiv:2402.14899 (replaced) [pdf, other]
-
Title: Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial ImagesAuthors: Zefeng Wang, Zhen Han, Shuo Chen, Fan Xue, Zifeng Ding, Xun Xiao, Volker Tresp, Philip Torr, Jindong GuSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [70] arXiv:2403.06838 (replaced) [pdf, other]
-
Title: ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart ContractsComments: This is a technical report from Nanyang Technological UniversitySubjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
[ showing up to 2000 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, recent, 2403, contact, help (Access key information)