Publications | Priyanshu Gupta

2024

MetaReflection: Learning Instructions for Language Agents using Past Reflections

Priyanshu Gupta^*, Shashank Kirtania^*, Ananya Singha^*, Sumit Gulwani, Arjun Radhakrishna, Gustavo Soares, and Sherry Shi

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024

Abs DOI url

The popularity of Large Language Models (LLMs) have unleashed a new age of Language Agents for solving a diverse range of tasks. While contemporary frontier LLMs are capable enough to power reasonably good Language agents, the closed-API model makes it hard to improve in cases they perform sub-optimally. To address this, recent works have explored techniques to improve their performance using self reflection and prompt optimization techniques. While techniques like self reflection work well in an online setup, contemporary prompt optimization techniques are designed to work on simpler tasks. To address this, we introduce METAREFLECTION, a novel offline reinforcement learning technique that enhances the performance of Language Agents by augmenting a semantic memory based on experiential learnings from past trials. We demonstrate the efficacy of METAREFLECTION by evaluating across multiple domains, including complex logical reasoning, biomedical semantic similarity, open world question answering, and vulnerability threat detection, in Infrastructure-as-Code, with different agent design. METAREFLECTION boosts Language agents’ performance by 4 % to 16.82 % over the raw GPT-4 baseline and performs on par with existing state-of-the-art prompt optimization techniques while requiring fewer LLM calls.
LOGIC-LM++: Multi-Step Refinement for Symbolic Formulations

Shashank Kirtania, Priyanshu Gupta, and Arjun Radhakrishna

In Proceedings of the 2nd Workshop on Natural Language Reasoning and Structured Explanations (@ACL 2024), Aug 2024

Abs url

In this paper we examine the limitations of Large Language Models (LLMs) for complex reasoning tasks. While current approaches leverage formal languages as intermediate representation for these reasoning problems, they still struggle with generating intermediate for-mal specifications with great correctness and in refining these representations. To address these issues, this paper proposes Logic-LM++, an improvement on Logic-LM (Pan et al., 2023). It uses the ability of LLMs to do pairwise comparisons, allowing the evaluation of the refinements suggested by the LLM. The paper demonstrates that Logic-LM++ outperforms Logic-LM and LLM based techniques on natural language reasoning tasks on two datasets, FOLIO, ProofWriter and AR-LSAT. Logic-LM++ show an average improvement of 18.5% on standard prompting, 12.3% on chain of thought prompting and 5% on Logic-LM.
STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with FeedBack

Naman Gupta^*, Shashank Kirtania^*, Priyanshu Gupta, Krishna Kariya, Sumit Gulwani, Arun Iyer, Suresh Parthasarathy, Arjun Radhakrishna, and 2 more authors

Aug 2024

Abs url

Large Language Models (LLMs) often generate incorrect or outdated information, especially in low-resource settings or when dealing with private data. To address this, Retrieval-Augmented Generation (RAG) uses external knowledge bases (KBs), but these can also suffer from inaccuracies. We introduce STACKFEED, a novel Structured Textual Actor-Critic Knowledge base editing with FEEDback approach that iteratively refines the KB based on expert feedback using a multi-actor, centralized critic reinforcement learning framework. Each document is assigned to an actor, modeled as a ReACT agent, which performs structured edits based on document-specific targeted instructions from a centralized critic. Experimental results show that STACKFEED significantly improves KB quality and RAG system performance, enhancing accuracy by up to 8% over baselines.

2023

Grace: Language Models Meet Code Edits

Priyanshu Gupta^*, Avishree Khare^*, Yasharth Bajpai, Saikat Chakraborty, Sumit Gulwani, Aditya Kanade, Arjun Radhakrishna, Gustavo Soares, and 1 more author

In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, CA, USA, Aug 2023

Abs DOI url

Developers spend a significant amount of time in editing code for a variety of reasons such as bug fixing or adding new features. Designing effective methods to predict code edits has been an active yet challenging area of research due to the diversity of code edits and the difficulty of capturing the developer intent. In this work, we address these challenges by endowing pre-trained large language models (LLMs) with the knowledge of relevant prior associated edits, which we call the Grace (Generation conditioned on Associated Code Edits) method. The generative capability of the LLMs helps address the diversity in code changes and conditioning code generation on prior edits helps capture the latent developer intent. We evaluate two well-known LLMs, codex and CodeT5, in zero-shot and fine-tuning settings respectively. In our experiments with two datasets, Grace boosts the performance of the LLMs significantly, enabling them to generate 29% and 54% more correctly edited code in top-1 suggestions relative to the current state-of-the-art symbolic and neural approaches, respectively.
Augmented Embeddings for Custom Retrievals

Anirudh Khatry, Yasharth Bajpai, Priyanshu Gupta, Sumit Gulwani, and Ashish Tiwari

Aug 2023

Abs url

Information retrieval involves selecting artifacts from a corpus that are most relevant to a given search query. The flavor of retrieval typically used in classical applications can be termed as homogeneous and relaxed, where queries and corpus elements are both natural language (NL) utterances (homogeneous) and the goal is to pick most relevant elements from the corpus in the Top-K, where K is large, such as 10, 25, 50 or even 100 (relaxed). Recently, retrieval is being used extensively in preparing prompts for large language models (LLMs) to enable LLMs to perform targeted tasks. These new applications of retrieval are often heterogeneous and strict – the queries and the corpus contain different kinds of entities, such as NL and code, and there is a need for improving retrieval at Top-K for small values of K, such as K=1 or 3 or 5. Current dense retrieval techniques based on pretrained embeddings provide a general-purpose and powerful approach for retrieval, but they are oblivious to task-specific notions of similarity of heterogeneous artifacts. We introduce Adapted Dense Retrieval, a mechanism to transform embeddings to enable improved task-specific, heterogeneous and strict retrieval. Adapted Dense Retrieval works by learning a low-rank residual adaptation of the pretrained black-box embedding. We empirically validate our approach by showing improvements over the state-of-the-art general-purpose embeddings-based baseline.

2022

Overwatch: learning patterns in code edit sequences

Yuhao Zhang^*, Yasharth Bajpai^*, Priyanshu Gupta^*, Ameya Ketkar^*, Miltiadis Allamanis, Titus Barik, Sumit Gulwani, Arjun Radhakrishna, and 3 more authors

Proc. ACM Program. Lang., Oct 2022

Abs DOI url

Integrated Development Environments (IDEs) provide tool support to automate many source code editing tasks. Traditionally, IDEs use only the spatial context, i.e., the location where the developer is editing, to generate candidate edit recommendations. However, spatial context alone is often not sufficient to confidently predict the developer’s next edit, and thus IDEs generate many suggestions at a location. Therefore, IDEs generally do not actively offer suggestions and instead, the developer is usually required to click on a specific icon or menu and then select from a large list of potential suggestions. As a consequence, developers often miss the opportunity to use the tool support because they are not aware it exists or forget to use it. To better understand common patterns in developer behavior and produce better edit recommendations, we can additionally use the temporal context, i.e., the edits that a developer was recently performing. To enable edit recommendations based on temporal context, we present Overwatch, a novel technique for learning edit sequence patterns from traces of developers’ edits performed in an IDE. Our experiments show that Overwatch has 78% precision and that Overwatch not only completed edits when developers missed the opportunity to use the IDE tool support but also predicted new edits that have no tool support in the IDE.

2021

Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels

Mohammadreza Qaraei, Erik Schultheis, Priyanshu Gupta, and Rohit Babbar

In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, Oct 2021

Abs DOI url

Extreme Classification (XC) refers to supervised learning where each training/test instance is labeled with small subset of relevant labels that are chosen from a large set of possible target labels. The framework of XC has been widely employed in web applications such as automatic labeling of web-encyclopedia, prediction of related searches, and recommendation systems. While most state-of-the-art models in XC achieve high overall accuracy by performing well on the frequently occurring labels, they perform poorly on a large number of infrequent (tail) labels. This arises from two statistical challenges, (i) missing labels, as it is virtually impossible to manually assign every relevant label to an instance, and (ii) highly imbalanced data distribution where a large fraction of labels are tail labels. In this work, we consider common loss functions that decompose over labels, and calculate unbiased estimates that compensate missing labels according to Natarajan et al. [26]. This turns out to be disadvantageous from an optimization perspective, as important properties such as convexity and lower-boundedness are lost. To circumvent this problem, we use the fact that typical loss functions in XC are convex surrogates of the 0-1 loss, and thus propose to switch to convex surrogates of its unbiased version. These surrogates are further adapted to the label imbalance by combining with label-frequency-based rebalancing. We show that the proposed loss functions can be easily incorporated into various different frameworks for extreme classification. This includes (i) linear classifiers, such as DiSMEC, on sparse input data representation, (ii) attention-based deep architecture, AttentionXML, learnt on dense Glove embeddings, and (iii) XLNet-based transformer model for extreme classification, APLC-XLNet. Our results demonstrate consistent improvements over the respective vanilla baseline models, on the propensity-scored metrics for precision and nDCG.
A Probabilistic Framework for Knowledge Graph Data Augmentation

Jatin Chauhan^*, Priyanshu Gupta^*, and Pasquale Minervini

Oct 2021

Abs url

We present NNMFAug, a probabilistic framework to perform data augmentation for the task of knowledge graph completion to counter the problem of data scarcity, which can enhance the learning process of neural link predictors. Our method can generate potentially diverse triples with the advantage of being efficient and scalable as well as agnostic to the choice of the link prediction model and dataset used. Experiments and analysis done on popular models and benchmarks show that NNMFAug can bring notable improvements over the baselines.
IITK at SemEval-2021 Task 10: Source-Free Unsupervised Domain Adaptation using Class Prototypes

Harshit Kumar^*, Jinang Shah^*, Nidhi Hegde^*, Priyanshu Gupta^*, Vaibhav Jindal^*, and Ashutosh Modi

In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), Aug 2021

Abs DOI url

Recent progress in deep learning has primarily been fueled by the availability of large amounts of annotated data that is obtained from highly expensive manual annotating pro-cesses. To tackle this issue of availability of annotated data, a lot of research has been done on unsupervised domain adaptation that tries to generate systems for an unlabelled target domain data, given labeled source domain data. However, the availability of annotated or labelled source domain dataset can’t always be guaranteed because of data-privacy issues. This is especially the case with medical data, as it may contain sensitive information of the patients. Source-free domain adaptation (SFDA) aims to resolve this issue by us-ing models trained on the source data instead of using the original annotated source data. In this work, we try to build SFDA systems for semantic processing by specifically focusing on the negation detection subtask of the SemEval2021 Task 10. We propose two approaches -ProtoAUGandAdapt-ProtoAUGthat use the idea of self-entropy to choose reliable and high confidence samples, which are then used for data augmentation and subsequent training of the models. Our methods report an improvement of up to 7% in F1 score over the baseline for the Negation Detection subtask.

2020

A Defocus Based Novel Keyboard Design

Priyanshu Gupta, Tushar Goswamy, Himanshu Kumar, and K. S. Venkatesh

In Human-Computer Interaction. Multimodal and Natural Interaction, Aug 2020

Abs url

Defocus based Depth estimation has been widely applied for constructing 3D setup from 2D image(s), reconstructing 3D scenes and image refocusing. Using defocus enables us to infer depth information from a single image using visual clues which can be captured by a monocular camera. In this paper, we propose an application of Depth from Defocus to a novel, portable keyboard design. Our estimation technique is based on the concept that depth of the finger with respect to our camera and its defocus blur value is correlated, and a map can be obtained to detect the finger position accurately. We have utilised the near-focus region for our design, assuming that the closer an object is to our camera, more will be its defocus blur. The proposed keyboard can be integrated with smartphones, tablets and Personal Computers, and only requires printing on plain paper or projection on a flat surface. The detection approach involves tracking the finger’s position as the user types, measuring its defocus value when a key is pressed, and mapping the measured defocus together with a precalibrated relation between the defocus amount and the keyboard pattern. This is utilised to infer the finger’s depth, which, along with the azimuth position of the stroke, identifies the pressed key. Our minimalistic design only requires a monocular camera, and there is no need for any external hardware. This makes the proposed approach a cost-effective and feasible solution for a portable keyboard.