LoongX: Neural-Driven Image Editing

💡 News

2025/07/08: The homepage has been launched and will be continuously updated. Stay tuned!
2025/07/07: Codes have been made available. Demo and data will be released soon. Stay tuned!
2025/07/06: The paper is being uploaded to arXiv.

📖 Introduction

We introduce LoongX, which effectively integrates multimodal neural signals from datasets L-Mind to guide image editing through Cross-Scale State Space (CS3) encoder and Dynamic Gated Fusion (DGF) modules. Experimental results demonstrate that editing performance using only multimodal neural signals is comparable to text-driven baselines (CLIP-I: 0.6605 vs. 0.6558), and the combination of neural signals with speech instructions surpasses text prompts alone (CLIP-T: 0.2588 vs. 0.2549), proving the method's effectiveness and its significant potential in intuitive and inclusive human-AI interaction.

LoongX leverages multimodal neural signals from advanced brain-computer interfaces (BCIs) to enable hands-free image editing, driven solely by your thoughts!

🖊️ Citation

Coming soon. Please check back after the paper is available on arXiv.