Research

My research focuses on deep learning for audio processing, with a focus on building efficient models and the integration of language as a tool to improve the performance of audio processing tasks.

I am always open to collaborations, and please feel free to drop me a mail!

Google Scholar

Accepted Papers

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmarks
S Sakshi*, Utkarsh Tyagi*, Sonal Kumar*, Ashish Seth*, Ramaneswaran Selvakumar, Oriol Nieto, Ramani Duraiswami, Sreyan Ghosh*, Dinesh Manocha
Project Website
ICLR 2025 (Spotlight)
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Ashish Seth*, Ramaneswaran S*, S Sakshi, Sonal Kumar, Dinesh Manocha
EMNLP 2024 (Oral)
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Sreyan Ghosh*, Sonal Kumar*, Ashish Seth, Chandra Kiran Reddy Evuru, Utkarsh Tyagi, S Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
Project Website / Summary Tweet / Coverage 1 / Coverage 2
EMNLP 2024 (Oral)
ASPIRE: Language-Guided Augmentation for Robust Image Classification
Sreyan Ghosh*, Chandra Kiran Reddy Evuru*, Sonal Kumar*, S Sakshi, Utkarsh Tyagi, Dinesh Manocha
Code / Poster
ACL 2024 Findings
Do Vision-Language Models Understand Compound Nouns?
Sonal Kumar*, Sreyan Ghosh*, S Sakshi, Utkarsh Tyagi, Dinesh Manocha
Code / Poster
NAACL 2024
DALE: Generative Data Augmentation for Low-Resource Legal NLP
Sreyan Ghosh*, Chandra Kiran Reddy Evuru*, Sonal Kumar, Ramaneswaran S, S Sakshi, Utkarsh Tyagi, Dinesh Manocha
Code / Poster
EMNLP 2023
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Sreyan Ghosh*, Ashish Seth*, Sonal Kumar*, Utkarsh Tyagi*, Chandra Kiran Reddy Evuru*, Ramaneswaran S, S Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
Project Webiste / Slides / Poster
ICLR 2024
DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances
Sreyan Ghosh, Samden Lepcha, Sakshi, Rajiv Ratn Shah, S. Umesh
Code / Data
Interspeech 2022