


default search action
CVPR 2025: Nashville, TN, USA - Workshops
- IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2025, Nashville, TN, USA, June 11-15, 2025. Computer Vision Foundation / IEEE 2025

LatinX in Computer Vision Research Workshop
- Abel A. Reyes Angulo, Sidike Paheding:

NExNet Seg: Neuron Expansion Network for Medical Image Segmentation. 1-10 - Esteban Rivera, Surya Prabhakaran, Markus Lienkamp:

HeAL3D: Heuristical-enhanced Active Learning for 3D Object Detection. 11-20 - Lucas M. Ceschini, Gabriel de Oliveira Ramos, Cláudio R. Jung:

Scaling laws in zero-shot gender classification using CLIP. 21-29 - Javier Ródenas Cumplido, Eduardo Aguilar, Petia Radeva:

Slot Attention-based Feature Filtering for Few-Shot Learning. 30-40 - Pooja Kishore Kumar, Willams de Lima Costa, Renato Nogueira Ferraz e Oliveira, Veronica Teichrieb, Estefania Talavera Martínez:

Emotions in LatAm: A new dataset and benchmark for emotion recognition in Latin America. 41-47 - Yasir Ghunaim, Andrés Villa, Gergo Ignacz, Gyorgy Szekely, Motasem Alfarra, Bernard Ghanem:

Towards Faster and More Compact Foundation Models for Molecular Property Prediction. 48-57 - Nicolas Echevarrieta-Catalan, Ana Ribas-Rodriguez, Francisco Cedron, Odelia Schwartz, Vanessa Aguiar-Pulido:

Enhancing Vision Transformer Explainability Using Artificial Astrocytes. 58-64 - Danny Xie-Li, Fabian Fallas-Moya:

PineSORT: A Simple Online Real-time Tracking Framework for Drone Videos in Agriculture. 65-74
8th Multimodal Learning and Applications Workshop
- Merey Ramazanova, Alejandro Pardo, Humam Alwassel, Bernard Ghanem:

Exploring Missing Modality in Multimodal Egocentric Datasets. 75-85 - Binh M. Le, Shaoyuan Xu, Jinmiao Fu, Zhishen Huang, Moyan Li, Yanhui Guo, Hongdong Li, Sameera Ramasinghe, Bryan Wang:

QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Document Understanding. 86-96 - Zhihao Zhao, Reza Ghoddoosian, Isht Dwivedi, Nakul Agarwal, Behzad Dariush:

Pose-Aware Weakly-Supervised Action Segmentation. 97-107 - Ege Özsoy, Felix Holm, Chantal Pellegrini, Tobias Czempiel, Mahdi Saleh, Nassir Navab, Benjamin Busam:

Location-Free Scene Graph Generation. 108-117 - Antonio Luigi Stefani, Niccolò Bisagno, Nicola Conci, Francesco G. B. De Natale:

SplatTouch: Explicit 3D Representation Binding Vision and Touch. 118-127 - Clément Fuchs, Maxime Zanella, Christophe De Vleeschouwer:

Online Gaussian Test-Time Adaptation of Vision-Language Models. 128-137 - Youngrok Jang, Hyesoo Kong, Gyeonghun Kim, Yejin Lee, Stanley Jungkyu Choi, Kyunghoon Bae:

ICT-QA: Question Answering over Multi-modal Contexts including Image, Chart, and Text Modalities. 138-148 - Maxime Zanella, Clément Fuchs, Ismail Ben Ayed, Christophe De Vleeschouwer:

Vocabulary-free few-shot learning for vision-language models. 149-158 - Ans Munir, Faisal Z. Qureshi, Muhammad Haris Khan, Mohsen Ali:

TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification. 159-169 - Kunal Singh, Shreyas Singh, Mukund Khanna:

TRISHUL: Towards Region Identification and Screen Hierarchy Understanding for Large VLM based GUI Agents. 170-179 - Yuanhao Zou, Zhaozheng Yin:

MVCM: Enhancing Multi-View and Cross-Modality Alignment for Medical Visual Question Answering and Medical Image-Text Retrieval. 180-190 - Kun Li, George Vosselman, Michael Ying Yang:

Multimodal Rationales for Explainable Visual Question Answering. 191-201 - Maria Tzelepi, Vasileios Mezaris:

Improving multimodal hateful meme detection exploiting LMM-generated knowledge. 202-211 - Bouthaina Slika, Fadi Dornaika, Fares Bougourzi, Karim Hammoudi:

Transformer-Based Lung Infection Severity Prediction with Cross Attention and Conditional TransMix Augmentation. 212-221 - Sakib Ahammed, Xia Cui, Wenqi Lu, Moi Hoon Yap:

Skin Lesion Classification Using Dermoscopic Images and Clinical Metadata: Insights from Multimodal Models. 222-230 - Yue Ma, Huantao Ren, Boyu Wang, Jingang Jin, Senem Velipasalar, Qinru Qiu:

LVP-CLIP: Revisiting CLIP for Continual Learning with Label Vector Pool. 231-240 - Madhukar Reddy Vongala, Saurabh Srivastava, Jana Kosecka:

Compositional Image-Text Matching and Retrieval by Grounding Entities. 241-250
4th edition of Computer Vision for Metaverse Workshop
- Shiyong Liu, Zhihao Li, Xiao Tang, Jianzhuang Liu:

Direction-Aware Hybrid Representation Learning for 3D Hand Pose and Shape Estimation. 251-260 - Netanel Tamir, Shir Amir, Ranel Itzhaky, Noam Atia, Shobhita Sundaram, Stephanie Fu, Ron Sokolovsky, Phillip Isola, Tali Dekel, Richard Zhang, Miriam Farber:

What Makes for a Good Stereoscopic Image? 261-272 - Seunghyeon Seo, Yeonjin Chang, Jayeon Yoo, Seungwoo Lee, Hojun Lee, Nojun Kwak:

ARC-NeRF: Area Ray Casting for Broader Unseen View Coverage in Few-shot Object Rendering. 273-283 - Jingyu Shi, Achleshwar Luthra, Jiazhi Li, Xiang Gao, Xiyun Song, Zongfang Lin, Xianfeng David Gu, Heather Yu:

OccludeNeRF: Geometry-aware 3D Scene Inpainting with Collaborative Score Distillation in NeRF. 284-294 - Zhexiao Xiong, Zhang Chen, Zhong Li, Yi Xu, Nathan Jacobs:

PanoDreamer: Consistent Text to 360-Degree Scene Generation. 295-304 - Kaichen Zhou, Lanqing Hong, Xinhai Chang, Yingji Zhong, Enze Xie, Hao Dong, Zhihao Li, Yongxin Yang, Zhenguo Li, Wei Zhang:

SplatMesh: Interactive 3D Segmentation and Editing Using Mesh-Based Gaussian Splatting. 305-316 - Sam Bahrami, Dylan Campbell:

PluckeRF: A Line-based 3D Representation for Few-view Reconstruction. 317-326 - Yu Guo, Zhiqiang Lao, Xiyun Song, Yubin Zhou, Zongfang Lin, Heather Yu:

ePBR: Extended PBR Materials in Image Synthesis. 327-336 - Yaseen, Sonain Jamil:

FaceGest: A Comprehensive Facial Gesture Dataset for Human-Computer Interaction. 337-347 - Jakub Zadrozny, Hakan Bilen:

HumMorph: Generalized Dynamic Human Neural Fields from Few Views. 348-357 - Letian Zhang, Ming Li, Chen Chen, Jie Xu:

IL-NeRF: Incremental Learning for Neural Radiance Fields with Camera Pose Alignment. 358-368 - Naoko Sawada, Pedro Miraldo, Suhas Lohit, Tim K. Marks, Moitreya Chatterjee:

FreBIS: Frequency-Based Stratification for Neural Implicit Surface Representations. 369-379 - Wanzhou Liu, Zhexiao Xiong, Xinyu Li, Nathan Jacobs:

DeclutterNeRF: Generative-Free 3D Scene Recovery for Occlusion Removal. 380-390
2nd MetaFood Workshop
- GaYeon Koh, Hyun-Jic Oh, Jeonghyun Noh, Won-Ki Jeong:

Synthetic Data Augmentation using Pre-trained Diffusion Models for Long-tailed Food Image Classification. 391-400 - Long Li, Fengqing Zhu, Heather A. Eicher-Miller, J. Graham Thomas, Yuning Huang, Edward Sazonov:

Extra-Lightweight AI-Based Privacy Preserving Framework for Egocentric Wearable Cameras. 401-410 - Riddhi Jain, Manasi Patwardhan, Aayush Mishra, Parijat Deshpande, Beena Rai:

Privacy Preserving Ordinal-Meta Learning with VLMs for Fine-Grained Fruit Quality Prediction. 411-419 - Julio J. Valdés, Stephie Liu, Shawn Yang, Yuhao Chen, Alexander Wong, Pengcheng Xi:

Food Degradation Analysis Using Multimodal Fuzzy Clustering. 420-429 - Sergio Romero-Tapiador, Ruben Tolosana, Blanca Lacruz-Pleguezuelos, Laura Judith Marcos-Zambrano, Guadalupe X. Bazán, Isabel Espinosa-Salinas, Julian Fierrez, Javier Ortega-Garcia, Enrique Carrillo de Santa Pau, Aythami Morales:

Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition. 430-439 - Javier Ródenas Cumplido, Eduardo Aguilar, Petia Radeva:

Stochastic-based Patch Filtering for Few-Shot Learning. 440-449 - Ahmad AlMughrabi, Umair Haroon, Ricardo Marques, Petia Radeva:

VolTex: Food Volume Estimation using Text-Guided Segmentation and Neural Surface Reconstruction. 450-457 - Krish Shah, Siddharth Viswanath, Pengcheng Xi, Alexander Wong, Yuhao Chen:

FoodVideoQA: A Novel Baseline Framework for Dietary Monitoring. 458-466 - Joshua Li, Fernando Jose Pena Cantu, Emily Yu, Alexander Wong, Yuchen Cui, Yuhao Chen:

SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos. 467-473 - Ocean Monjur, Md. Toukir Ahmed, Md Wadud Ahmed, Mohammed Kamruzzaman:

Agro-Net: A Convolution-Attention Fusion based hyperspectral model for agro-food quality assessment. 474-481 - Pitikorn Khlaisamniang, Kun Kerdthaisong, Supasate Vorathammathorn, Nutchanon Yongsatianchot, Hirunkul Phimsiri, Amrest Chinkamol, Teermade Thitseesaeng, Kanyakorn Veerakanjana, Kaisorn Kachai, Piyalitt Ittichaiwong, Tossaporn Saengja:

Decomposing Food Images for Better Nutrition Analysis: A Nutritionist-Inspired Two-Step Multimodal LLM Approach. 482-491
Expanding Horizons in AI Benchmarking: Multimodal Approaches
- Andrés Villa, Juan León Alcázar, Alvaro Soto, Bernard Ghanem:

Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models. 492-502 - Lu Xu, Sijie Zhu, Chunyuan Li, Chia-Wen Kuo, Fan Chen, Xinyao Wang, Guang Chen, Dawei Du, Ye Yuan, Longyin Wen:

Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model. 503-512 - Jierun Chen, Fangyun Wei, Jinjing Zhao, Sizhe Song, Bohuai Wu, Zhuoxuan Peng, S.-H. Gary Chan, Hongyang Zhang:

Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models. 513-524 - Forouzan Fallah, Maitreya Patel, Agneet Chatterjee, Vlad I. Morariu, Chitta Baral, Yezhou Yang:

TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark. 525-534 - Giselle Zeno, Nour Jedidi, Steven Gomez:

Choosing 'Right' from Wrong: A Closer Look at Selection Bias in Spatial Multiple-Choice Questions in Large Multimodal Models. 535-544 - Atit Pokharel, Ratun Rahman, Thomas Morris, Dinh C. Nguyen:

Quantum Federated Learning for Multimodal Data: A Modality-Agnostic Approach. 545-554 - Jian Lu, Shikhar Srivastava, Junyu Chen, Robik Shrestha, Manoj Acharya, Kushal Kafle, Christopher Kanan:

Revisiting Multi-Modal LLM Evaluation. 555-564 - Yoonshik Kim, Jaeyoon Jung:

KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language. 575-585
The 2nd Workshop on Computer Vision for Videogames
- Yu Wen, Xingke Yang, Aamir Bader Shah, Ruizhi Cao, Miao Pan, Chenhao Xie, Xin Fu:

MoF-Image: Generating Mixture-of-Features Video Game Image Dataset via GPU Rendering Simulation. 586-593 - Alex Zook, Fan-Yun Sun, Josef B. Spjut, Valts Blukis, Stan Birchfield, Jonathan Tremblay:

GRS: Generating Robotic Simulation Tasks from Real-World Images. 594-603 - Ziyang Zhang, Edgar Simo-Serra:

G-Buffer Supported Neural Screen-space Refraction Baking for Real-Time Global Illumination. 604-611 - Josef B. Spjut:

A Generative AI Game Jam Case Study from October 2024. 612-618 - Awal Ahmed Fime, Md. Zarif Hossain, Saika Zaman, Abdur Rahman Bin Shahid, Ahmed Imteaj:

Towards Trustworthy Autonomous Vehicles with Vision-Language Models Under Targeted and Untargeted Adversarial Attacks. 619-628 - Rahul Nair, Bhanu Tokas, Gabriel Tseng, Esther Rolf, Hannah Kerner:

Classification Drives Geographic Bias in Street Scene Segmentation. 629-638 - Kuan Yew Leong, Jaeseung Han:

Bridging Detection and Re-identification: Evaluating Trustworthiness and Error Propagation in Face Recognition Pipelines. 639-648 - Prakhar Kaushik, Ankit Vaidya, Shravan Chaudhari, Alan L. Yuille:

EigenLoRAx: Recycling Adapters to Find Principal Subspaces for Resource-Efficient Adaptation and Inference. 649-659 - Bernd Prach, Christoph H. Lampert:

Intriguing Properties of Robust Classification. 660-669
The 6th Workshop on Fair, Data-Efficient, and Trusted Computer Vision
- Akash Kumar, Ashlesha Kumar, Vibhav Vineet, Yogesh S. Rawat:

A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning. 670-681 - Shreyank N. Gowda, Boyan Gao, Xiao Gu, Xiao-Bo Jin:

Is Temporal Prompting All We Need For Limited Labeled Action Recognition? 682-692 - Mohsin Ali, Haider Raza, John Q. Gan, Muhammad Haris:

Optimising Vision Transformer Performance on Limited Datasets: A Multi-Gradient Approach. 693-702 - Xinlei Liu, Tao Hu, Peng Yi, Qingtao Pan, Hailong Ma, Yiming Jiang, Baolin Li:

Defending Against Transfer-Based Adversarial Attacks Using SVD-Driven Feature Evolution. 703-711 - Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Antonio Guillen, Ricardo Luna, Soumyendu Sarkar:

Coordinated Robustness Evaluation Framework for Vision-Language Models. 712-720 - Mohana Singh, Vivek B. S., Jayavardhana Gubbi, R. Venkatesh Babu:

ProtoPatchNet: An Interpretable Patch-Based Prototypical Network. 721-728 - Nazia Aslam, Kamal Nasrollahi:

Balancing Privacy and Action Performance: A Penalty-Driven Approach to Image Anonymization. 729-738
10th New Trends in Image Restoration and Enhancement Workshop and Challenges
- Ruthy Katz, Adi Teitel, Moran Mordechay, Adi Falik, Eli Bery, Maya Mayberg:

Rethinking Compressive Sensing: A Compression Framework for Video Super-Resolution. 739-748 - Bo-Kai Ruan, Hong-Han Shuai:

MAD: Makeup All-in-One with Cross-Domain Diffusion Model. 749-758 - Josh Myers-Dean, Kangning Liu, Brian L. Price, Yifei Fan, Jason Kuen, Danna Gurari:

conSAMme: Achieving Consistent Segmentations with SAM. 759-768 - Arshita Gupta, Zhe Zhu, Tien Bau:

STAPLE: Siamese Transformer Assisted Pseudo Label Ensembling for Unsupervised Domain Adaptation in No-Reference IQA. 769-778 - Hanzhou Liu, Chengkai Liu, Jiacong Xu, Peng Jiang, Mi Lu:

XYScanNet: A State Space Model for Single Image Deblurring. 779-789 - Juyong Park, Jihun Song, Gyewan Kim, Yoonsuk Hyun:

Text-Guided Patch Scoring and Local Distortion Guidance for Image Quality Assessment. 790-799 - Fadeel Sher Khan, Joshua Ebenezer, Hamid R. Sheikh, Seok-Jun Lee:

MFSR-GAN: Multi-Frame Super-Resolution with Handheld Motion Modeling. 800-809 - Andrew Yanzhe Ke, Lei Luo, Xiaoyu Xiang, Yuchen Fan, Rakesh Ranjan, Alexandre Chapiro, Rafal Mantiuk:

Training Neural Networks on RAW and HDR Images for Restoration Tasks. 810-819 - Mian Muhammad Naeem Abid, Nancy Mehta, Zongwei Wu, Radu Timofte:

DataFormer: Differential Additive Transformer for Lightweight Semantic Segmentation. 820-831 - Nancy Mehta, Akshay Dudhane, Subrahmanyam Murala, Radu Timofte:

KernFusNet: Implicit Kernel Modulation and Fusion for Blind Super-resolution. 832-842 - Jaskaran Singh Walia, Shravan Venkatraman, Pavithra L. K.:

FUSION: Frequency-guided Underwater Spatial Image recOnstructioN. 843-852 - Donghyun Kim, Seil Kang, Seong Jae Hwang:

FALCON: Fast Image Haze Removal Leveraging Continuous Density Mask. 853-863 - Kento Kawai, Takeru Oba, Kyotaro Tokoro, Kazutoshi Akita, Norimichi Ukita:

Efficient Burst Super-Resolution with One-step Diffusion. 864-873 - Mykola Lavreniuk, Alla Lavreniuk:

SPIdepth: Strengthened Pose Information for Self-supervised Monocular Depth Estimation. 874-884 - Amit Monga, Hemkant Nehete, Partha Kaushik, Tharun Kumar Reddy Bollu, Balasubramanian Raman, Gaurav Sharma:

FCTFANet: A Fused CNN-Transformer Feature Aggregator Network for Image Restoration. 885-894 - Jonas Dornbusch, Emanuel Pfarr, Florin-Alexandru Vasluianu, Frank Werner, Radu Timofte:

A Simple Combination of Diffusion Models for Better Quality Trade-Offs in Image Denoising. 895-904 - David Serrano-Lozano, Francisco A. Molina-Bakhos, Danna Xue, Yixiong Yang, Maria Pilligua, Ramon Baldrich, María Vanrell, Javier Vazquez-Corral:

PromptNorm: Image Geometry Guides Ambient Light Normalization. 905-916 - Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li:

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report. 917-966 - Qing Wang, Yang Wang, Hongyu An, Yi Liu, Liou Zhang, Shijie Zhao:

Expanded SPAN for Efficient Super-Resolution. 967-976 - Zhichao Zhang, Xinyue Li, Wei Sun, Zicheng Zhang, Yunhao Li, Xiaohong Liu, Guangtao Zhai:

Leveraging Multimodal Large Language Models for Joint Discrete and Continuous Evaluation in Text-to-Image Alignment. 977-986 - Pierluigi Zama Ramirez, Fabio Tosi, Luigi Di Stefano, Radu Timofte, Alex Costanzino, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Zhe Zhang, Yang Yang, Wu Chen, Anlong Ming, Mingshuai Zhao, Mengying Yu, Shida Gao, Xiangfeng Wang, Feng Xue, Jun Shi, Yong Yang, Yong A, Yixiang Jin, Dingzhe Li, Aryan Shukla, Liam Frija-Altarac, Matthew Toews, Hui Geng, Tianjiao Wan, Zijian Gao, Qisheng Xu, Kele Xu, Zijian Zang, Jameer Babu Pinjari, Kuldeep Purohit, Mykola Lavreniuk, Jing Cao, Shenyi Li, Kui Jiang, Junjun Jiang, Yong Huang:

NTIRE 2025 Challenge on HR Depth from Images of Specular and Transparent Surfaces. 987-1001 - Sangmin Lee, Eunpil Park, Angel Canelo, Hyunhee Park, Youngjo Kim, Hyung-Ju Chun, Xin Jin, Chongyi Li, Chun-Le Guo, Radu Timofte, Qi Wu, Tianheng Qiu, Yuchun Dong, Shenglin Ding, Guanghua Pan, Weiyu Zhou, Tao Hu, Yixu Feng, Duwei Dai, Yu Cao, Peng Wu, Wei Dong, Yanning Zhang, Qingsen Yan, Simon J. Larsen, Senyan Xu, Xingbo Wang, Ruixuan Jiang, Xin Lu, Marcos V. Conde, Javier Abad-Hernández, Álvaro García-Lara, Daniel Feijoo, Álvaro García, Zeyu Xiao, Zhuoyuan Li:

NTIRE 2025 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results. 1002-1017 - Weiyu Zhou, Tao Hu, Yixu Feng, Duwei Dai, Yu Cao, Peng Wu, Wei Dong, Yanning Zhang, Qingsen Yan:

Flow-Guided Deformable Alignment with Channel-wise Self-Attention Reconstruct for Efficient Burst HDR Restoration. 1018-1027 - Kangbiao Shi, Yixu Feng, Tao Hu, Yu Cao, Peng Wu, Yijin Liang, Yanning Zhang, Qingsen Yan:

FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement. 1028-1037 - Tianheng Qiu, Qi Wu, Yuchun Dong, Shenglin Ding, Xuan Huang, Hu Wei, Guanghua Pan:

Recursive Multi-Exposure Alignment with Spatiotemporal Decoupling for Efficient Burst HDR and Restoration. 1038-1047 - Yuqian Fu, Xingyu Qiu, Bin Ren, Yanwei Fu, Radu Timofte, Nicu Sebe, Ming-Hsuan Yang, Luc Van Gool, Kaijin Zhang, Qingpeng Nong, Xiugang Dong, Hong Gao, Xiangsheng Zhou, Jiancheng Pan, Yanxing Liu, Xiao He, Jiahao Li, Yuze Sun, Xiaomeng Huang, Zhenyu Zhang, Ran Ma, Yuhan Liu, Zijian Zhuang, Shuai Yi, Yixiong Zou, Lingyi Hong, Mingxi Chen, Runze Li, Xingdong Sheng, Wenqiang Zhang, Weisen Chen, Yongxin Yan, Xinguo Chen, Yuanjie Shao, Zhengrong Zuo, Nong Sang, Hao Wu, Haoran Sun, Shuming Hu, Yan Zhang, Zhiguang Shi, Yu Zhang, Chao Chen, Tao Wang, Da Feng, Linhai Zhuo, Ziming Lin, Yali Huang, Jie Me, Yiming Yang, Mi Guo, Mingyuan Jiu, Mingliang Xu, Maomao Xiong, Qunshu Zhang, Xinyu Cao, Yuqing Yang, Dianmo Sheng, Xuanpu Zhao, Zhiyu Li, Xuyang Ding, Wenqian Li:

NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results. 1048-1069 - Xin Lu, Jiarong Yang, Yuanfei Bao, Zihao Fan, Anya Hu, Kunyu Wang, Jie Xiao, Xi Wang, Hongjian Liu, Xueyang Fu, Zheng-Jun Zha:

Advancing Ambient Lighting Normalization via Diffusion Shadow Generation. 1070-1080 - Xin Lu, Yuanfei Bao, Jiarong Yang, Anya Hu, Jie Xiao, Kunyu Wang, Dong Li, Senyan Xu, Kean Liu, Xueyang Fu, Zheng-Jun Zha:

EvenFormer: Dynamic Even Transformer for Real-World Image Restoration. 1081-1091 - Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong, Zhengzhong Tu, Yufan Liu, Xiangguang Chen, Zuowei Cao, Minhao Tang, Shan Liu, Kexin Zhang, Jingfen Xie, Yan Wang, Kai Chen, Shijie Zhao, Yunchen Zhang, Xiangkai Xu, Hong Gao, Ji Shi, Yiming Bao, Xiugang Dong, Xiangsheng Zhou, Yaofeng Tu, Ying Liang, Yiwen Wang, Xinning Chai, Yuxuan Zhang, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Rong Xie, Li Song, Wei Sun, Kang Fu, Linhan Cao, Dandan Zhu, Kaiwei Zhang, Yucheng Zhu, Zicheng Zhang, Menghan Hu, Xiongkuo Min, Guangtao Zhai, Zhi Jin, Jiawei Wu, Wei Wang, Wenjian Zhang, Yuhai Lan, Gaoxiong Yi, Hengyuan Na, Wang Luo, Di Wu, Mingyin Bai, Jiawang Du, Zilong Lu, Zhenyu Jiang, Hui Zeng, Ziguan Cui, Zongliang Gan, Guijin Tang, Xinglin Xie, Kehuan Song, Xiaoqiang Lu, Licheng Jiao, Fang Liu, Xu Liu, Puhua Chen, Ha Thu Nguyen, Katrien De Moor, Seyed Ali Amirshahi, Mohamed-Chaker Larabi, Qi Tang, Linfeng He, Zhiyong Gao, Zixuan Gao, Guohua Zhang, Zhiye Huang, Yi Deng, Qingmiao Jiang, Lu Chen, Yi Yang, Xi Liao, Nourine Mohammed Nadir, Yuxuan Jiang, Qiang Zhu, Siyue Teng, Fan Zhang, Shuyuan Zhu, Bing Zeng, David Bull, Meiqin Liu, Chao Yao, Yao Zhao:

NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results. 1092-1103 - Shuhao Han, Haotian Fan, Fangyuan Kong, Wenjie Liao, Chunle Guo, Chongyi Li, Radu Timofte, Liang Li, Tao Li, Junhui Cui, Yunqiu Wang, Yang Tai, Jingwei Sun, Jianhui Sun, Xinli Yue, Tianyi Wang, Huan Hou, Junda Lu, Xinyang Huang, Zitang Zhou, Zijian Zhang, Xuhui Zheng, Xuecheng Wu, Chong Peng, Xuezhi Cao, Trong-Hieu Nguyen-Mau, Minh-Hoang Le, Minh-Khoa Le-Phan, Duy-Nam Ly, Hai-Dang Nguyen, Minh-Triet Tran, Yukang Lin, Yan Hong, Chuanbiao Song, Siyuan Li, Jun Lan, Zhichao Zhang, Xinyue Li, Wei Sun, Zicheng Zhang, Yunhao Li, Xiaohong Liu, Guangtao Zhai, Zitong Xu, Huiyu Duan, Jiarui Wang, Guangji Ma, Liu Yang, Lu Liu, Qiang Hu, Xiongkuo Min, Zichuan Wang, Zhenchen Tang, Bo Peng, Jing Dong, Fengbin Guan, Zihao Yu, Yiting Lu, Wei Luo, Xin Li, Minhao Lin, Haofeng Chen, Xuanxuan He, Kele Xu, Qisheng Xu, Zijian Gao, Tianjiao Wan, Bo-Cheng Qiu, Chih-Chung Hsu, Chia-Ming Lee, Yu-Fan Lin, Bo Yu, Zehao Wang, Da Mu, Mingxiu Chen, Junkang Fang, Huamei Sun, Wending Zhao, Zhiyu Wang, Wang Liu, Weikang Yu, Puhong Duan, Bin Sun, Xudong Kang, Shutao Li, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Jiarong He, Zhishan Qiao, Yongqing Huang, Zewen Chen, Zhe Pang, Juan Wang, Jian Guo, Zhizhuo Shao, Ziyu Feng, Bing Li, Weiming Hu, Hesong Li, Dehua Liu, Zeming Liu, Qingsong Xie, Ruichen Wang, Zhihao Li, Yuqi Liang, Jianqi Bi, Jun Luo, Junfeng Yang, Can Li, Jing Fu, Hongwei Xu, Mingrui Long, Lulin Tang:

NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment. 1104-1125 - Xin Li, Xijun Wang, Bingchen Li, Kun Yuan, Yizhen Shao, Suhang Yao, Ming Sun, Chao Zhou, Radu Timofte, Zhibo Chen:

NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: KwaiSR Dataset and Study. 1126-1136 - Haosong Liu, Xiancheng Zhu, Huanqiang Zeng, Jianqing Zhu, Yifan Shi, Jing Chen, Junhui Hou:

LFTramba: Comprehensive Information Learning for Light Field Image Super-Resolution via A Hybrid Transformer-Mamba Framework. 1137-1147 - Marcos V. Conde, Radu Timofte, Zihao Lu, Xiangyu Kong, Xiaoxia Xing, Fan Wang, Suejin Han, MinKyu Park, Tianyu Hao, Yuhong He, Ruoqi Li, Yueqi Yang, Jianyang Yu, Kele Xu, Zisheng Xu, Yong Dou, Watchara Ruangsang, Ruixuan Jiang, Senyan Xu, Siyuan Jiang, Xueyang Fu, Zheng-Jun Zha, Jiajie Lu, Xiang Yu, Minmin Yi, Yuanjia Chen, Liwen Zhang, Zijie Jin, Tianyu Zhang, Xin Lu, Yeda Chen, Dong Liu, Li Pang, Yuhang Yang, Hongzhong Wang, Xiangyong Cao, Cheng Li, Lian Liu, Wei Song, Heng Sun, Yubo Wang, Jinghua Wang, Guanlan Hong:

NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution. 1148-1171 - Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou, Qirui Yang, Fangpu Zhang, Yunlong Lin, Sixiang Chen, Guoxi Huang, Ruirui Lin, Yan Zhang, Jingyu Yang, Huanjing Yue, Jiyuan Chen, Qiaosi Yi, Hongjun Wang, Chenxi Xie, Shuai Li, Yuhui Wu, Kaiyi Ma, Jiakui Hu, Juncheng Li, Liwen Pan, Guangwei Gao, Wenjie Li, Zhenyu Jin, Heng Guo, Zhanyu Ma, Yubo Wang, Jinghua Wang, Wangzhi Xing, Anjusree Karnavar, Diqi Chen, Mohammad Aminul Islam, Hao Yang, Ruikun Zhang, Liyuan Pan, Qianhao Luo, Xin Cao, Han Zhou, Yan Min, Wei Dong, Jun Chen, Taoyi Wu, Weijia Dou, Yu Wang, Shengjie Zhao, Yongcheng Huang, Xingyu Han, Anyan Huang, Hongtao Wu, Hong Wang, Yefeng Zheng, Abhijeet Kumar, Aman Kumar, Marcos V. Conde, Paula Garrido, Daniel Feijoo, Juan C. Benito, Guanglu Dong, Xin Lin, Siyuan Liu, Tianheng Zheng, Jiayu Zhong, Shouyi Wang, Xiangtai Li, Lanqing Guo, Lu Qi, Chao Ren, Shuaibo Wang, Shilong Zhang, Wanyu Zhou, Yunze Wu, Qinzhong Tan, Jieyuan Pei, Zhuoxuan Li, Jiayu Wang, Haoyu Bian, Haoran Sun, Subhajit Paul, Ni Tang, Junhao Huang, Zihan Cheng, Hongyun Zhu, Yuehan Wu, Kaixin Deng, Huang Ouyang, Tianxin Xiao, Fan Yang, Zhizun Luo, Zeyu Xiao, Zhuoyuan Li, Pham Hoang Le Nguyen, Dinh Thien An, Luu Thanh Son, Kiet Van Nguyen, Ronghua Xu, Xianmin Tian, Weijian Zhou, Jiacheng Zhang, Yuqian Chen, Yihang Duan, Yujie Wu, Suresh Raikwar, Arsh Garg, Kritika Kritika, Jianhua Zheng, Xiaoshan Ma, Ruolin Zhao, Yongyu Yang, Yongsheng Liang, Guiming Huang, Qiang Li, Hongbin Zhang, Xiangyu Zheng, A. N. Rajagopalan:

NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results. 1172-1183 - Varun Jain, Zongwei Wu, Quan Zou, Louis Florentin, Henrik Turbell, Sandeep Siddhartha, Radu Timofte, Qifan Gao, Linyan Jiang, Qing Luo, Jie Song, Yaqing Li, Summer Luo, Mae Chen, Stefan Liu, Danie Song, Huimin Zeng, Qi Chen, Ajeet Kumar Verma, Shweta Tripathi, Vinit Jakhetiya, Badri N. Subhdhi, Sunil Jaiswal:

NTIRE 2025 Challenge on Video Quality Enhancement for Video Conferencing: Datasets, Methods and Results. 1184-1194 - Kai Jin, Zeqiang Wei, Angulia Yang, Di Wu, Mingzhi Gao, Xiuzhuang Zhou:

LFTransMamba: A Hybrid Mamba-Transformer Model for Light Field Image Super-Resolution. 1195-1204 - Xiaoning Liu, Zongwei Wu, Florin-Alexandru Vasluianu, Hailong Yan, Bin Ren, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Kangbiao Shi, Yixu Feng, Tao Hu, Yu Cao, Peng Wu, Yijin Liang, Yanning Zhang, Qingsen Yan, Han Zhou, Wei Dong, Yan Min, Mohab Kishawy, Jun Chen, Pengpeng Yu, Anjin Park, Seung-Soo Lee, Young-Joon Park, Zixiao Hu, Junyv Liu, Huilin Zhang, Jun Zhang, Fei Wan, Bingxin Xu, Hongzhe Liu, Cheng Xu, Weiguo Pan, Songyin Dai, Xunpeng Yi, Qinglong Yan, Yibing Zhang, Jiayi Ma, Changhui Hu, Kerui Hu, Donghang Jing, Tiesheng Chen, Zhi Jin, Hongjun Wu, Biao Huang, Haitao Ling, Jiahao Wu, Dandan Zhan, G. Gyaneshwar Rao, Vijayalaxmi Ashok Aralikatti, Nikhil Akalwadi, Ramesh Ashok Tabib, Uma Mudenagudi, Ruirui Lin, Guoxi Huang, Nantheera Anantrasirichai, Qirui Yang, Alexandru Brateanu, Ciprian Orhei, Cosmin Ancuti, Daniel Feijoo, Juan C. Benito, Álvaro García, Marcos V. Conde, Yang Qin, Raul Balmez, Anas M. Ali, Bilel Benjdira, Wadii Boulila, Tianyi Mao, Huan Zheng, Yanyan Wei, Shengeng Tang, Dan Guo, Zhao Zhang, Sabari Nathan, K. Uma, A. Sasithradevi, B. Sathya Bama, S. Mohamed Mansoor Roomi, Ao Li, Xiangtao Zhang, Zhe Liu, Yijie Tang, Jialong Tang, Zhicheng Fu, Gong Chen, Joe Nasti, John Nicholson, Zeyu Xiao, Zhuoyuan Li, Ashutosh Kulkarni, Prashant W. Patil, Santosh Kumar Vipparthi, Subrahmanyam Murala, Duan Liu, Weile Li, Hangyuan Lu, Rixian Liu, Tengfeng Wang, Jinxing Liang, Chenxin Yu:

NTIRE 2025 Challenge on Low Light Image Enhancement: Methods and Results. 1205-1215 - Yuanfei Bao, Xin Lu, Xingbo Wang, Jiarong Yang, Anya Hu, Kunyu Wang, Jie Xiao, Dong Li, Xueyang Fu, Zheng-Jun Zha:

Frequency-Prior Enhanced Ambient Lighting Normalization via Visual Perceptual Refinement. 1216-1226 - Yingqian Wang, Zhengyu Liang, Fengyuan Zhang, Lvli Tian, Longguang Wang, Juncheng Li, Jungang Yang, Radu Timofte, Yulan Guo, Kai Jin, Zeqiang Wei, Angulia Yang, Di Wu, Mingzhi Gao, Xiuzhuang Zhou, Yue Yan, Yuaho Wang, Shuang Chen, Zeping Tian, Yizhi Hu, Yao Lu, Haosong Liu, Xiancheng Zhu, Huanqiang Zeng, Jianqing Zhu, Yifan Shi, Junhui Hou, Mingyang Yu, Zhijian Wu, Dingjiang Huang, Wenli Zheng, Zekai Xu, Huiyuan Fu, Heng Zhang, Zhijuan Huang, Hongyuan Yu, Zeke Zexi Hu, Haodong Chen, Vera Yuk Ying Chung, Xiaoming Chen, Zean Chen, Yeyao Chen, Gangyi Jiang, Haiyong Xu, Ting Luo, Guanglong Liao, Danhao Zhang, Siyu Zhang, Wendong Mao, Zhongfeng Wang, Sunita Arya, Abhishek Kumar Sinha, S. Manthira Moorthi, Hao Zhang, Hao Sheng, Da Yang, Zhenglong Cui, Shuai Wang, Haotian Zhang, Xingzheng Wang, Yuanbo Huang, Jiahao Lin, Yuhang Lin, Ahmed Salem, Ebrahem Elkady, Hatem Ibrahem, Jae-Won Suh, Hyun-Soo Kang, Changguang Wu, Hao Hou, Pengpeng Li, Peng Huang, Jiangxin Dong, Jinhui Tang:

NTIRE 2025 Challenge on Light Field Image Super-Resolution: Methods and Results. 1227-1246 - Ajeet Kumar Verma, Shweta Tripathi, Vinit Jakhetiya, Badri N. Subudhi, Sunil Jaiswal:

Q-CIDNet: Perceptual Quality aware Color and Intensity Decoupling Network for Video Quality Enhancement. 1247-1253 - Marcos V. Conde, Radu Timofte, Radu Berdan, Beril Besbinar, Daisuke Iso:

RAW Image Reconstruction from RGB on Smartphones. NTIRE 2025 Challenge Report. 1254-1268 - Jie Liang, Radu Timofte, Qiaosi Yi, Zhengqiang Zhang, Shuaizheng Liu, Lingchen Sun, Rongyuan Wu, Xindong Zhang, Hui Zeng, Lei Zhang, Tianyu Hao, Lin Wang, Zhe Xiao, Pengzhou Ji, Shupeng Zhong, Xiangming Wang, Jiaqi Yan, Sishun Pan, Ce Wang, Yibin Huang, Zhang Sheng Wang, Haobo Liang, Zhenghao Pan, Jinjian Wu, Yushen Zuo, Yuanbo Zhou:

NTIRE 2025 the 2nd Restore Any Image Model (RAIM) in the Wild Challenge. 1269-1278 - Zijian Zhang, Xuhui Zheng, Xuecheng Wu, Chong Peng, Xuezhi Cao:

TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs. 1279-1288 - Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou, Zongwei Wu, Radu Timofte, Yuanfei Bao, Xingbo Wang, Xin Lu, Jiarong Yang, Anya Hu, Kunyu Wang, Jie Xiao, Dong Li, Xueyang Fu, Zheng-Jun Zha, Zihao Fan, Xi Wang, Yurui Zhu, Kean Liu, Senyan Xu, Hongjian Liu, Yupeng Xiao, David Serrano-Lozano, Francisco A. Molina-Bakhos, Danna Xue, Yixiong Yang, Maria Pilligua, Ramon Baldrich, María Vanrell, Javier Vazquez-Corral, Xuan Sun, Zijie Lou, Ting Liu, Kuldeep Purohit, Jameer Babu Pinjari, Yilin Zhang, Huan Zheng, Yanyan Wei, Suiyi Zhao, Shengeng Tang, Zhao Zhang, Yushen Zuo, Zongqi He, Zhe Xiao, Cuixin Yang, Rongkang Dong, Jun Xiao, Kin-Man Lam, Nikhil Akalwadi, Vijayalaxmi Ashok Aralikatti, Dheeraj Damodhar Hegde, Ramesh Ashok Tabib, Uma Mudenagudi, Anas M. Ali, Bilel Benjdira, Wadii Boulila:

NTIRE 2025 Ambient Lighting Normalization Challenge Report. 1289-1300 - Kangning Yang, Jie Cai, Ling Ouyang, Florin-Alexandru Vasluianu, Radu Timofte, Jiaming Ding, Huiming Sun, Lan Fu, Jinlong Li, Chiu Man Ho, Zibo Meng, Mingjia Li, Hainuo Wang, Qiming Hu, Jiarui Wang, Hao Zhao, Jin Hu, Xiaojie Guo, Mengru Yang, Jing He, Yiqing Wang, Zhiyang Chen, Hao Fang, Wei Zhang, Runmin Cong, Dheeraj Damodhar Hegde, Jatin Kalal, Nikhil Akalwadi, Ramesh Ashok Tabib, Uma Mudenagudi, Yu-Fan Lin, Chia-Ming Lee, Chih-Chung Hsu, Mengxin Zhang, Sabari Nathan, K. Uma, A. Sasithradevi, B. Sathya Bama, S. Mohamed Mansoor Roomi, Bilel Benjdira, Anas M. Ali, Wadii Boulila, Wei Dong, Yunzhe Li, Ali Hussein, Han Zhou, Jun Chen, Zeyu Xiao, Zhuoyuan Li:

NTIRE 2025 Challenge on Single Image Reflection Removal in the Wild: Datasets, Methods and Results. 1301-1311 - Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou, Cailan Chen, Zongwei Wu, Radu Timofte, Mingjia Li, Jin Hu, Hainuo Wang, Hengxing Liu, Jiarui Wang, Qiming Hu, Xiaojie Guo, Xin Lu, Jiarong Yang, Yuanfei Bao, Anya Hu, Zihao Fan, Kunyu Wang, Jie Xiao, Xi Wang, Xueyang Fu, Zheng-Jun Zha, Yu-Fan Lin, Chia-Ming Lee, Chih-Chung Hsu, Xingbo Wang, Dong Li, Yuxu Chen, Bin Chen, Yuanbo Zhou, Yuanbin Chen, Hongwei Wang, Jiannan Lin, Qinquan Gao, Tong Tong, Zhao Zhang, Yanyan Wei, Wei Dong, Han Zhou, Seyed Amirreza Mousavi, Jun Chen, Haobo Liang, Jiajie Jing, Junyu Li, Yan Yang, Seoyeon Lee, Chaewon Kim, Ziyu Feng, Shidi Chen, Bowen Luan, Zewen Chen, Vijayalaxmi Ashok Aralikatti, G. Gyaneshwar Rao, Nikhil Akalwadi, Chaitra Desai, Ramesh Ashok Tabib, Uma Mudenagudi, Anas M. Ali, Bilel Benjdira, Wadii Boulila, Alexandru Brateanu, Cosmin Ancuti, Tanmay Chaturvedi, Manish Kumar, Anmol Srivastav, Daksh Trivedi, Shashwat Thakur, Kishor P. Upla, Zeyu Xiao, Zhuoyuan Li, Boda Zhou, Shashank Shekhar, Kele Xu, Qisheng Xu, Zijian Gao, Tianjiao Wan, Suiyi Zhao, Bo Wang, Yan Luo, Mingshen Wang, Yilin Zhang:

NTIRE 2025 Image Shadow Removal Challenge Report. 1312-1323 - Lei Sun, Andrea Alfarano, Peiqi Duan, Shaolin Su, Kaiwei Wang, Boxin Shi, Radu Timofte, Danda Pani Paudel, Luc Van Gool, Qinglin Liu, Wei Yu, Xiaoqian Lv, Lu Yang, Shuigen Wang, Shengping Zhang, Xiangyang Ji, Long Bao, Yuqiang Yang, Jinao Song, Ziyi Wang, Shuang Wen, Heng Sun, Kean Liu, Mingchen Zhong, Senyan Xu, Zhijing Sun, Jiaying Zhu, Chengjie Ge, Xingbo Wang, Yidi Liu, Xin Lu, Xueyang Fu, Zheng-Jun Zha, Dawei Fan, Dafeng Zhang, Yong Yang, Siru Zhang, Qinghua Yang, Hao Kang, Huiyuan Fu, Heng Zhang, Hongyuan Yu, Zhijuan Huang, Shouyan Wei, Feng Li, Runmin Cong, Weiqi Luo, Mingyun Lin, Chenxu Jiang, Hongyi Liu, Lei Yu, Weilun Li, Jiajun Zhai, Tingting Lin, Shuang Ma, Sai Zhou, Zhanwen Liu, Yang Wang, Eiffel Chong, Nuwan Bandara, Thivya Kandappu, Archan Misra, Yihang Chen, Zhan Li, Weijun Yuan, Wenzhuo Wang, Boyang Yao, Zhanglu Chen, Yijing Sun, Tianjiao Wan, Zijian Gao, Qisheng Xu, Kele Xu, Yukun Zhang, Yu He, Xiaoyan Xie, Tao Fu, Yashu Guatamkumar Patel, Vihar Ramesh Jain, Divesh Basina, Rishik Ashili, Manish Kumar Manjhi, Sourav Kumar, Prinon Benny, Himanshu Ghunawat, B. Sri Sairam Gautam, Anett Varghese, Abhishek Yadav:

NTIRE 2025 Challenge on Event-Based Image Deblurring: Methods and Results. 1324-1341 - Lei Sun, Hang Guo, Bin Ren, Luc Van Gool, Radu Timofte, Yawei Li:

The Tenth NTIRE 2025 Image Denoising Challenge Report. 1342-1369 - Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinlong Li, Mengfei Han:

STRRNet: Semantics-guided Two-stage Raindrop Removal Network. 1370-1378 - Xinli Yue, Jianhui Sun, Junda Lu, Liangchao Yao, Fan Xia, Tianyi Wang, Fengyun Rao, Jing Lyu, Yuetang Deng:

Instruction-augmented Multimodal Alignment for Image-Text and Element Matching. 1379-1388 - Xiaohong Liu, Xiongkuo Min, Qiang Hu, Xiaoyun Zhang, Jie Guo, Guangtao Zhai, Shushi Wang, Yingjie Zhou, Lu Liu, Jingxin Li, Liu Yang, Farong Wen, Li Xu, Yanwei Jiang, Xilei Zhu, Chunyi Li, Zicheng Zhang, Huiyu Duan, Xiele Wu, Yixuan Gao, Yuqin Cao, Jun Jia, Wei Sun, Jiezhang Cao, Radu Timofte, Baojun Li, Jiamian Huang, Dan Luo, Tao Liu, Weixia Zhang, Bingkun Zheng, Junlin Chen, Ruikai Zhou, Meiya Chen, Yu Wang, Hao Jiang, Xiantao Li, Yuxiang Jiang, Jun Tang, Yimeng Zhao, Bo Hu, Zelu Qi, Chaoyang Zhang, Fei Zhao, Ping Shi, Lingzhi Fu, Heng Cong, Shuai He, Rongyu Zhang, Jiarong He, Zongyao Hu, Wei Luo, Zihao Yu, Fengbin Guan, Yiting Lu, Xin Li, Zhibo Chen, Mengjing Su, Yi Wang, Tuo Chen, Chunxiao Li, Shuaiyu Zhao, Jiaxin Wen, Chuyi Lin, Sitong Liu, Ningxin Chu, Jing Wan, Yu Zhou, Baoying Chen, Jishen Zeng, Jiarui Liu, Xianjin Liu, Xin Chen, Lanzhi Zhou, Hangyu Li, You Han, Bibo Xiang, Zhenjie Liu, Jianzhang Lu, Jialin Gui, Renjie Lu, Shangfei Wang, Donghao Zhou, Jingyu Lin, Quanjian Song, Jiancheng Huang, Yufeng Yang, Changwei Wang, Shupeng Zhong, Yang Yang, Lihuo He, Jia Liu, Yuting Xing, Tida Fang, Yuchun Jin:

NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results. 1389-1402 - Wei Sun, Kang Fu, Linhan Cao, Dandan Zhu, Kaiwei Zhang, Yucheng Zhu, Zicheng Zhang, Menghan Hu, Xiongkuo Min, Guangtao Zhai:

An Empirical Study for Efficient Video Quality Assessment. 1403-1413 - Mengjing Su, Yi Wang, Tuo Chen, Chunxiao Li, Shuaiyu Zhao, Jiaxin Wen, Chuyi Lin, Sitong Liu, Ningxin Chu, Yu Zhou:

Quality Assessment for Talking Head Videos via Multi-modal Feature Representation. 1414-1420 - Yiwen Wang, Ying Liang, Yuxuan Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Rong Xie, Li Song:

Enhanced Semantic Extraction and Guidance for UGC Image Super Resolution. 1421-1430 - Xinning Chai, Yao Zhang, Yuxuan Zhang, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song:

Distillation-Supervised Convolutional Low-Rank Adaptation for Efficient Image Super-Resolution. 1431-1440 - Arnim Gautam, Aditi Pawar, Aishwarya Joshi, Satya Narayan Tazi, Sachin Chaudhary, Praful Hambarde, Akshay Dudhane, Santosh Kumar Vipparthi, Subrahmanyam Murala:

Pureformer: Transformer-Based Image Denoising. 1441-1449 - Mingyang Yu, Zhijian Wu, Dingjiang Huang:

LFMix: A Lightweight Hybrid Architecture for Light Field Super-Resolution. 1450-1459 - Wei Dong, Yan Min, Han Zhou, Jun Chen:

Towards Scale-Aware Low-Light Enhancement via Structure-Guided Transformer Design. 1460-1470 - Wei Dong, Han Zhou, Seyed Amirreza Mousavi, Jun Chen:

Retinex-Guided Histogram Transformer for Mask-Free Shadow Removal. 1471-1481 - Kean Liu, Mingchen Zhong, Senyan Xu, Zhijing Sun, Jiaying Zhu, Chengjie Ge, Xingbo Wang, Xin Lu, Xueyang Fu, Zheng-Jun Zha:

Event-Conditioned Dual-Modal Fusion for Motion Deblurring. 1482-1492 - Zelu Qi, Ping Shi, Chaoyang Zhang, Shuqi Wang, Fei Zhao, Da Pan, Zefeng Ying:

Towards Holistic Visual Quality Assessment of AI-Generated Videos: A LLM-Based Multi-Dimensional Evaluation Model. 1493-1502 - Nickolay Safonov, Alexey Bryntsev, Andrey Moskalenko, Dmitry Kulikov, Dmitriy S. Vatolin, Radu Timofte, Haibo Lei, Qifan Gao, Qing Luo, Yaqing Li, Jie Song, Shaozhe Hao, Meisong Zheng, Jingyi Xu, Chengbin Wu, Jiahui Liu, Ying Chen, Xin Deng, Mai Xu, Peipei Liang, Jie Ma, Junjie Jin, Yingxue Pang, Fangzhou Luo, Kai Chen, Shijie Zhao, Mingyang Wu, Renjie Li, Yushen Zuo, Zhengzhong Tu, Shengyun Zhong:

NTIRE 2025 Challenge on UGC Video Enhancement: Methods and Results. 1503-1513 - Egor I. Ershov, Sergey Korchagin, Aleksei Khalin, Artyom Panshin, Arseniy P. Terekhin, Ekaterina Zaychenkova, Georgiy Lobarev, Vsevolod Plokhotnyuk, Denis Abramov, Elisey Zhdanov, Sofia Dorogova, Yasin Mamedov, Nikola Banic, Georgy Perevozchikov, Radu Timofte, Lize Zhang, Yuqian Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Yibin Huang, Guangqi Shao, Xiaotao Wang, Lei Lei, Sishun Pan, Zhiqiang Zhong, Yang Yang, Anas M. Ali, Hamad Aloqayli, Bilel Benjdira, Wadii Boulila, Xiaoyang Ma, Zijun Gao, Leyi Xing, Zongqi He, Yushen Zuo, Zhe Xiao, Kin-Chung Chan, Hanmin Li, Jun Xiao, Kin-Man Lam, Yunpeng Wu, Dmitrij Manzura, Daniil Storonkin, Weixin Guo, Kele Xu, Qisheng Xu, Zijian Gao, Tianjiao Wan, Buda Vampilov, Furkan Kinli, Furkan Kiraç:

NTIRE 2025 Challenge on Night Photography Rendering. 1514-1524 - Zheng Chen, Kai Liu, Jue Gong, Jingkai Wang, Lei Sun, Zongwei Wu, Radu Timofte, Yulun Zhang, Xiangyu Kong, Xiaoxuan Yu, Hyunhee Park, Suejin Han, Hakjae Jeon, Dafeng Zhang, Hyung-Ju Chun, Donghun Ryou, Inju Ha, Bohyung Han, Lu Zhao, Yuyi Zhang, Pengyu Yan, Jiawei Hu, Pengwei Liu, Fengjun Guo, Hongyuan Yu, Pufan Xu, Zhijuan Huang, Shuyuan Cui, Peng Guo, Jiahui Liu, Dongkai Zhang, Heng Zhang, Huiyuan Fu, Huadong Ma, Yanhui Guo, Sisi Tian, Xin Li, Jinwen Liang, Jie Liu, Jie Tang, Gangshan Wu, Zeyu Xiao, Zhuoyuan Li, Yinxiang Zhang, Wenxuan Cai, Vijayalaxmi Ashok Aralikatti, Nikhil Akalwadi, G. Gyaneshwar Rao, Chaitra Desai, Ramesh Ashok Tabib, Uma Mudenagudi, Marcos V. Conde, Alejandro Merino, Bruno Longarela, Javier Abad, Weijun Yuan, Zhan Li, Zhanglu Chen, Boyang Yao, Aagam Jain, Milan Kumar Singh, Ankit Kumar, Shubh Kawa, Divyavardhan Singh, Anjali Sarvaiya, Kishor P. Upla, Raghavendra Ramachandra, Chia-Ming Lee, Yu-Fan Lin, Chih-Chung Hsu, Risheek V. Hiremath, Palani Yashaswini, Yuxuan Jiang, Qiang Zhu, Siyue Teng, Fan Zhang, Shuyuan Zhu, Bing Zeng, David Bull, Jingwei Liao, Yuqing Yang, Wenda Shao, Junyi Zhao, Qisheng Xu, Kele Xu, Sunder Ali Khowaja, Ik Hyun Lee, Snehal Singh Tomar, Rajarshi Ray, Klaus Mueller, Sachin Chaudhary, Surya Vashisth, Akshay Dudhane, Praful Hambarde, Satya Naryan Tazi, Prashant W. Patil, Santosh Kumar Vipparthi, Subrahmanyam Murala, Bilel Benjdira, Anas M. Ali, Wadii Boulila, Zahra Moammeri, Ahmad Mahmoudi-Aznaveh, Ali Karbasi, Hossein Motamednia, Liangyan Li, Guanhua Zhao, Kevin Le, Yimo Ning, Haoxuan Huang, Jun Chen:

NTIRE 2025 Challenge on Image Super-Resolution (x4): Methods and Results. 1525-1535 - Zheng Chen, Jingkai Wang, Kai Liu, Jue Gong, Lei Sun, Zongwei Wu, Radu Timofte, Yulun Zhang, Jianxing Zhang, Jinlong Wu, Jun Wang, Zheng Xie, Hakjae Jeon, Suejin Han, Hyung-Ju Chun, Hyunhee Park, Zhicun Yin, Junjie Chen, Ming Liu, Xiaoming Li, Chao Zhou, Wangmeng Zuo, Weixia Zhang, Dingquan Li, Kede Ma, Yun Zhang, Zhuofan Zheng, Yuyue Liu, Shizhen Tang, Zihao Zhang, Yi Ning, Hao Jiang, Wenjie An, Kangmeng Yu, Chenyang Wang, Kui Jiang, Xianming Liu, Junjun Jiang, Yingfu Zhang, Gang He, Siqi Wang, Kepeng Xu, Zhenyang Liu, Changxin Zhou, Shanlan Shen, Yubo Duan, Yiang Chen, Jin Guo, Mengru Yang, Jen-Wei Lee, Chia-Ming Lee, Chih-Chung Hsu, Hu Peng, Chunming He:

NTIRE 2025 Challenge on Real-World Face Restoration: Methods and Results. 1536-1547 - Jiancheng Pan, Yanxing Liu, Xiao He, Long Peng, Jiahao Li, Yuze Sun, Xiaomeng Huang:

Enhance Then Search: An Augmentation-Search Strategy with Foundation Models for Cross-Domain Few-Shot Object Detection. 1548-1556 - Weixia Zhang, Bingkun Zheng, Junlin Chen, Zhihua Wang:

Multi-Dimensional Quality Assessment for UGC Videos via Modular Multi-Modal Vision-Language Models. 1557-1566 - Yali Huang, Jie Mei, Yiming Yang, Mi Guo, Mingyuan Jiu, Mingliang Xu:

Instance Feature Caching for Cross-Domain Few-Shot Object Detection. 1567-1575
Navigating the Future: Ensuring Trustworthiness in Multi-Modal Open-World Intelligence
- Chenfei Liao, Kaiyu Lei, Xu Zheng, Junha Moon, Zhixiong Wang, Yixuan Wang, Danda Pani Paudel, Luc Van Gool, Xuming Hu:

Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness. 1576-1586 - Zongxia Li, Xiyang Wu, Hongyang Du, Fuxiao Liu, Huy Nghiem, Guangyao Shi:

A Survey of State of the Art Large Vision Language Models: Benchmark Evaluations and Challenges. 1587-1606 - Long Chen, Yuling Chen, Yun Luo, Hui Dou, Xinyang Zhong:

Attention-Guided Hierarchical Defense for Multimodal Attacks in Vision-Language Models. 1607-1617 - Haoren Zhao, Tianyi Chen, Zhen Wang:

On the Robustness of GUI Grounding Models Against Image Attacks. 1618-1623 - Lanyun Zhu, Deyi Ji, Tianrun Chen, Peng Xu, Jieping Ye, Jun Liu:

IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding. 1624-1633 - Yuchang Su, Renping Zhou, Siyu Huang, Xingjian Li, Tianyang Wang, Ziyue Wang, Min Xu:

Multimodal Generalized Category Discovery. 1634-1643 - Àlex Pujol Vidal, Kamal Nasrollahi, Thomas B. Moeslund, Sergio Escalera:

Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU. 1644-1653 - Eunju Park:

Prompt the Missing: Prompt-Based Robust Audio-Visual Classification under Uncertain Modalities. 1654-1662 - Erum Mushtaq, Zalan Fabian, Yavuz Faruk Bakman, Anil Ramakrishna, Mahdi Soltanolkotabi, Salman Avestimehr:

HARMONY: Hidden Activation Representations and Model Output-Aware Uncertainty Estimation for Vision-Language Models. 1663-1668 - Stephen D. Liang:

Vision Language Models for Massive MIMO Semantic Communication. 1669-1679
8th International Workshop on Visual Odometry and Computer Vision Applications Based on Location Clues
- Erik Sandström, Ganlin Zhang, Keisuke Tateno, Michael Oechsle, Michael Niemeyer, Youmin Zhang, Manthan Patel, Luc Van Gool, Martin R. Oswald, Federico Tombari:

Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians. 1680-1691 - Rohit Venkata Sai Dulam, Chandra Kambhamettu:

Salient Object Detection with Dynamic Convolutions. 1692-1702 - Kyle O'Donnell, Chandra Kambhamettu:

Feature Matching in the Dark: Homography-Based RGB-IR Feature Transformation for Low-Light Vision. 1703-1711
Uncertainty Quantification for Computer Vision
- Jisoo Jeong, Hong Cai, Jamie Menjay Lin, Fatih Porikli:

Improving Optical Flow and Stereo Depth Estimation by Leveraging Uncertainty-Based Learning Difficulties. 1712-1721 - Rupayan Mallick, Sibo Dong, Nataniel Ruiz, Sarah Adel Bargal:

D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition. 1722-1731 - Shadi Alijani, Homayoun Najjaran:

WQLCP: Weighted Adaptive Conformal Prediction for Robust Uncertainty Quantification Under Distribution Shifts. 1732-1741 - Kowshik Thopalli, Vivek Sivaraman Narayanaswamy, Jayaraman J. Thiagarajan:

The Surprising Utility of Group Partitioning in Improving Conformal Prediction of Visual Classifiers under Distributional Shifts. 1742-1751 - Mihir Mulye, Matias Valdenegro-Toro:

Uncertainty Quantification for Gradient-based Explanations in Neural Networks. 1752-1760
The 4th Workshop on Federated Learning for Computer Vision
- Tuo Zhang, Tiantian Feng, Samiul Alam, Dimitrios Dimitriadis, Sunwoo Lee, Mi Zhang, Shrikanth S. Narayanan, Salman Avestimehr:

GPT-FL: Generative Pre-trained Model-Assisted Federated Learning. 1761-1770 - Joseph Geo Benjamin, Mothilal Asokan, Mohammad Yaqub, Karthik Nandakumar:

FedSECA: Sign Election and Coordinate-wise Aggregation of Gradients for Byzantine Tolerant Federated Learning. 1771-1780 - Yu-Syuan Tseng, Tzu-Chin Hsu, Chih-Ting Liu, Shao-Yi Chien:

FedCAPR: Federated Camera-Aware Unsupervised Person Re-Identification with Identity-Distributed Equalization for Decentralized Data Clustering. 1781-1790 - Ensieh Khazaei, Dimitrios Hatzinakos:

Forget Less, Learn More: Contrastive-Based Federated Class Incremental Learning with a Low-Dimensional Projection Layer. 1791-1800 - Sunny Gupta, Vinay Sutar, Varunav Singh, Amit Sethi:

FedAlign: Federated Domain Generalization with Cross-Client Feature Alignment. 1801-1810 - Ahmed Radwan, Mahmoud Soliman, Omar Abdelaziz, Mohamed Shehata:

FedDG-MoE: Test-Time Mixture-of-Experts Fusion for Federated Domain Generalization. 1811-1820 - Rahmat Izwan Heroza, John Q. Gan, Haider Raza:

FedCIAL: Federated Color-Invariant Adversarial Learning for Enhancing Fairness and Performance in Skin Lesion Classification. 1821-1828 - Ratun Rahman, Atit Pokharel, Dinh C. Nguyen:

Sporadic Federated Learning Approach in Quantum Environment to Tackle Quantum Noise. 1829-1838
Mobile AI workshop and associated challenges, 5th edition
- Sudhakar Sah, Ravish Kumar, Darshan C. Ganji, Ehsan Saboori:

ActNAS : Generating Efficient YOLO Models using Activation NAS. 1839-1847 - Zixun Huang, Keling Yao, Zhihao Zhao, Chuanyu Pan, Allen Y. Yang:

Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset. 1848-1857 - Shambhavi Balamuthu Sampath, Judeson Anthony Fernando, Moritz Thoma, Nael Fasfous, Lukas Frickenstein, Pierpaolo Morì, Manoj Rohit Vemparala, Alexander Frickenstein, Ulf Schlichtmann, Walter Stechele:

RepFC: Universal Structural Reparametrization Block for High Performance, Lightweight Deep Neural Networks. 1858-1866 - Maximilian Augustin, Syed Shakib Sarwar, Mostafa Elhoushi, Yuecheng Li, Sai Qian Zhang, Barbara De Salvo:

PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers. 1867-1877 - Andrei Arhire, Radu Timofte:

Learned Lightweight Smartphone ISP with Unpaired Data. 1878-1887 - Huu-Phong Phan-Nguyen, Anh Dao, Tien-Huy Nguyen, Tuan Quang, Huu-Loc Tran, Tinh-Anh Nguyen-Nhu, Huy-Thach Pham, Quan Nguyen, Hoang M. Le, Quang-Vinh Dinh:

Cycle Training with Semi-Supervised Domain Adaptation: Bridging Accuracy and Efficiency for Real-Time Mobile Scene Detection. 1888-1897 - Moritz Thoma, Jorge Villasante, Emad Aghajanzadeh, Shambhavi Balamuthu Sampath, Pierpaolo Morì, Maximilian Groetzinger, Daniil Dylkin, Manoj Rohit Vemparala, Nael Fasfous, Alexander Frickenstein, Daniel Mueller-Gritschneder, Ulf Schlichtmann:

FLAR-SVD: Fast and Latency-Aware Singular Value Decomposition for Model Compression. 1898-1907 - Andrey Ignatov, Georgy Perevozchikov, Radu Timofte, Zhiyu Zhang, Tianxiao Gao, Yukun Yang, Shiai Zhu, Shihao Wang, Kihwan Yoon, Ganzorig Gankhuyag, Hyeon-Cheol Moon, Taehyun Jeong, Yumi Kim, Suhyeon Lee, Jaehun Baek, Jinwoo Jeong, Eunjun Park, Jun Lee, Heejun Lee, Sungjei Kim, Dafeng Zhang, Yong Yang, Heo Myeong Cheol, Yonghyun Park, Jooho Jeong, Wontae Kim, Kanghwan Lee, Diankai Zhang, Biao Wu, Chengjian Zheng, Shaoli Liu, Si Gao, Ning Wang, Mingshen Wang, Zhao Zhang, Suiyi Zhao, Jinhan Guan, Bo Wang, Yan Luo:

Quantized Image Super-Resolution on Mobile NPUs, Mobile AI 2025 Challenge: Report. 1908-1921 - Andrey Ignatov, Georgy Perevozchikov, Radu Timofte, Wu Pan, Song Wang, Dong Zhang, Zhao Ran, Xiaochen Li, Shichang Ju, Diankai Zhang, Biao Wu, Shaoli Liu, Si Gao, Chengjian Zheng, Ning Wang, Yi Feng, Cailu Wan, Xiangji Wu, Hailong Yan, Ao Li, Xiangtao Zhang, Zhe Liu, Ce Zhu, Le Zhang, Jinjie Zhou, Yang Lu, Feng Duo, Runhua Deng, Xuanyu Chen, Shuhui Xie, Guojie Xiao, Zhifeng Wang, Long Peng, Aiwen Jiang:

RGB Photo Enhancement on Mobile GPUs, Mobile AI 2025 Challenge: Report. 1922-1933 - Andrey Ignatov, Georgy Perevozchikov, Radu Timofte, Cheng Li, Lian Liu, Jun Cao, Heng Sun, Wu Pan, Song Wang, Keqiang Yu, Shuo Liu, Hongqin He, Zhenhao Dong, Jianke Chen, Dejun Hao, Keqiang Yu, Tingniao Wang, Xiaoqing Zhou, Dong Zhang, Chunxia Zhang, Jianguang He, Hailong Yan, Ao Li, Xiangtao Zhang, Zhe Liu, Ce Zhu, Le Zhang, Andrei Arhire, Shuo Liu, Junpyo Seo, Fen Xie, Xiuzhi Fang, Chen Wu, Zhangsheng Wang, Pengbo Zhang, Jiazi Huang:

Learned Smartphone ISP on Mobile GPUs, Mobile AI 2025 Challenge: Report. 1934-1946 - Jing Li, Chengyu Wang, Hamid R. Sheikh, Seok-Jun Lee:

CDVS: Compressed Domain On Device Memory Efficient 8K Video SlowMo. 1947-1953 - Chengyu Wang, Jing Li, Saurabh Kumar, Seok-Jun Lee, Hamid R. Sheikh:

Compressed Domain Multiframe Processing. 1954-1963
Data Driven Autonomous Driving Simulation
- Mitchell Goff, Greg Hogan, George Hotz, Armand du Parc Locmaria, Kacper Raczy, Harald Schäfer, Adeeb Shihadeh, Weixing Zhang, Yassine Yousfi:

Learning to Drive from a World Model. 1964-1973 - Alexandru Buburuzan, Anuj Sharma, John Redford, Puneet K. Dokania, Romain Mueller:

MObI: Multimodal Object Inpainting Using Diffusion Models. 1974-1984
2nd Workshop on Urban Scene Modeling: Where Vision meets Photogrammetry and Graphics
- Giovanni Pintore, Uzair Shah, Marco Agus, Enrico Gobbetti:

NadirFloorNet: reconstructing multi-room floorplans from a small set of registered panoramic images. 1985-1994 - Jie Deng, Wenhao Chai, Jianshu Guo, Qixuan Huang, Junsheng Huang, Wenhao Hu, Shengyu Hao, Jenq-Neng Hwang, Gaoang Wang:

CityGen: Infinite and Controllable City Layout Generation. 1995-2005 - Yixuan Li, Xingjian Ran, Linning Xu, Tao Lu, Mulin Yu, Zhenzhi Wang, Yuanbo Xiangli, Dahua Lin, Bo Dai:

Proc-GS: Procedural Building Generation for City Assembly with 3D Gaussians. 2006-2015 - Wenzhao Tang, Weihang Li, Xiucheng Liang, Olaf Wysocki, Filip Biljecki, Christoph Holst, Boris Jutzi:

Texture2LoD3: Enabling LoD3 Building Reconstruction With Panoramic Images. 2016-2026 - Luca Barco, Giacomo Blanco, Gaetano Chiriaco, Alessia Intini, Luigi La Riccia, Vittorio Scolamiero, Piero Boccardo, Paolo Garza, Fabrizio Dominici:

Turin3D: Evaluating Adaptation Strategies under Label Scarcity in Urban LiDAR Segmentation with Semi-Supervised Techniques. 2027-2035 - Yilei Wang, Giacomo D'Amicantonio, Egor Bondarev:

Near-incident detection in railroad environments: lateral distance estimation from train-mounted monocular camera. 2036-2045
The 12th Workshop on Fine-Grained Visual Categorization
- Lukás Picek, Klára Janousková, Vojtech Cermák, Jiri Matas:

FungiTastic: A Multi-Modal Dataset and Benchmark for Image Categorization. 2046-2056 - Samuel Black, Richard Souvenir:

Fine-grained Few-Shot Classification with Part Matching. 2057-2067 - Nauman Ullah Gilal, Khaled A. Al-Thelaya, Fahad Majeed, Zhihe Lu, Sabri Boughorbel, Jens Schneider, Marco Agus:

CYFLOD: Cyclic Filtering and Loss Damping for Alleviating Noisy Labels in Fine-grained Visual Classification. 2068-2078 - Md. Atabuzzaman, Gino DiMatteo, Hani Alomari, Chiawei Tang, Connor Hale, Adam E. Goode, David Ryan King, Chris Thomas:

Real-Time Ultra-Fine-Grained Surgical Instrument Classification. 2079-2088 - Bianca Lamm, Janis Keuper:

A Visual RAG Pipeline for Few-Shot Fine-Grained Product Classification. 2089-2098 - Lukás Adam, Vojtech Cermák, Kostas Papafitsoros, Lukás Picek:

WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals. 2099-2109 - Matthew Walmer, Rose Catherine Kanjirathinkal, Kai Sheng Tai, Keyur Muzumdar, Tai-Peng Tian, Abhinav Shrivastava:

Multi-entity Video Transformers for Fine-Grained Video Representation Learning. 2110-2120 - Taegyeong Lee, Jinsik Bang, Soyeong Kwon, Taehwan Kim:

Multi-aspect Knowledge Distillation with Large Language Model. 2121-2130 - Joona Kareinen, Tuomas Eerola, Kaisa Kraft, Lasse Lensu, Sanna Suikkanen, Heikki Kälviäinen:

Self-Supervised Pretraining for Fine-Grained Plankton Recognition. 2131-2141 - Seyed Mohamad Ali Tousi, Jacket Demby's, Ramy Farag, Gbenga Omotara, Guilherme N. DeSouza:

Combining Vision-Language Models and Weak Supervision for Nuanced Vision Classification Tasks. 2142-2151 - Darshana Saravanan, Naresh Manwani, Vineet Gandhi:

Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning. 2152-2161 - Shahrzad Ziaee, Ahmed Elgammal, Marian Mazzone:

A Fine-grained Artist Identification Method for Authentication and Attribution of Drawings using Hatching Lines. 2162-2173 - Thijs L. van der Plas, Stephen Law, Michael JO Pocock:

Predicting butterfly species presence from satellite imagery using soft contrastive regularisation. 2174-2183
EarthVision: Large Scale Computer Vision for Remote Sensing Imagery
- Islam Mansour, Georg Fischer, Ronny Hänsch, Irena Hajnsek:

Hybrid AI-Physical Modeling for Penetration Bias Correction in X-band InSAR DEMs: A Greenland Case Study. 2184-2193 - Abhishek Kuriyal, Elliot Vincent, Mathieu Aubry, Loïc Landrieu:

CoDEx: Combining Domain Expertise for Spatial Generalization in Satellite Image Analysis. 2194-2203 - Leonard Waldmann, Ando Shah, Yi Wang, Nils Lehmann, Adam J. Stewart, Zhitong Xiong, Xiao Xiang Zhu, Stefan Bauer, John Chuang:

Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation. 2204-2214 - Georgios Voulgaris:

Bridging Classical and Modern Computer Vision: PerceptiveNet for Tree Crown Semantic Segmentation. 2215-2224 - Jonathan Prexl, Michael Recla, Michael Schmitt:

SARFormer - An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data. 2225-2234 - Samuel Scheele, Katherine Picchione, Jeffrey Liu:

LADI v2: Multi-label Dataset and Classifiers for Low-Altitude Disaster Imagery. 2235-2243 - Gabriel Tseng, Hannah Kerner, David Rolnick:

Task-Informed Meta-Learning for Remote Sensing. 2244-2253 - Saikat Dutta, Akhil Vasim, Siddhant Gole, Hamid Rezatofighi, Biplab Banerjee:

AerOSeg: Harnessing SAM for Open-Vocabulary Segmentation in Remote Sensing Images. 2254-2264 - Burak Ekim, Girmaw Abebe Tadesse, Caleb Robinson, Gilles Quentin Hacheme, Michael Schmitt, Rahul Dodhia, Juan M. Lavista Ferres:

Distribution Shifts at Scale: Out-of-distribution Detection in Earth Observation. 2265-2274 - Valérie Zermatten, Javiera Castillo-Navarro, Pallavi Jain, Devis Tuia, Diego Marcos:

EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observations and Wikipedia. 2275-2285 - Nikita Basargin, Alberto Alonso-González, Irena Hajnsek:

Explainable Physical PolSAR Autoencoders for Soil Moisture Estimation. 2286-2295 - Elliot Vincent, Mehraïl Saroufim, Jonathan Chemla, Yves Ubelmann, Philippe Marquis, Jean Ponce, Mathieu Aubry:

Detecting Looted Archaeological Sites from Satellite Image Time Series. 2296-2307 - Siyuan Xu, Yucheng Wang, Xihaier Luo, Byung-Jun Yoon, Xiaoning Qian:

Scale-Invariant Implicit Neural Representations For Object Counting. 2308-2318 - Hichem Boussaid, Lucrezia Tosato, Flora Weissgerber, Camille Kurtz, Laurent Wendling, Sylvain Lobry:

Visual Question Answering on Multiple Remote Sensing Image Modalities. 2319-2328 - Ragini Bal Mahesh, Ronny Hänsch:

Better Coherence, Better Height: Fusing Physical Models and Deep Learning for Forest Height Estimation from Interferometric SAR Data. 2329-2338 - Tristan Amadei, Enric Meinhardt-Llopis, Carlo de Franchis, Jérémy Anger, Thibaud Ehret, Gabriele Facciolo:

s2p-hd: Gpu-Accelerated Binocular Stereo Pipeline for Large-Scale Same-Date Stereo. 2339-2348 - César Leblanc, Lukás Picek, Rémi Palard, Benjamin Deneu, Maximilien Servajean, Pierre Bonnet, Alexis Joly:

Mapping biodiversity at very-high resolution in Europe. 2349-2358 - Hariseetharam Gunduboina, Muhammad Haris Khan, Biplab Banerjee:

FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing. 2359-2372 - Shabnam Choudhury, Yash Salunkhe, Sarthak Mehrotra, Biplab Banerjee:

REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval. 2373-2382 - Elías Masquil, Roger Marí, Thibaud Ehret, Enric Meinhardt-Llopis, Pablo Musé, Gabriele Facciolo:

S-EO: A Large-Scale Dataset for Geometry-Aware Shadow Detection in Remote Sensing Applications. 2383-2393 - Benedikt Blumenstiel, Paolo Fraccaro, Valerio Marsocci, Johannes Jakubik, Stefano Maurogiovanni, Mikolaj Czerkawski, Rocco Sedona, Gabriele Cavallaro, Thomas Brunschwiler, Juan Bernabé-Moreno, Nicolas Longépé:

TerraMesh: A Planetary Mosaic of Multimodal Earth Observation Data. 2394-2402 - Elena Plekhanova, Damien Robert, Johannes Dollinger, Emilia Arens, Philipp Brun, Jan Dirk Wegner, Niklaus E. Zimmermann:

SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology. 2403-2414
Workshop on Autonomous Driving
- Loveneet Saini, Mirko Meuter, Hasan Tercan, Tobias Meisen:

AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection. 2415-2424 - Michael Hubbertz, Pascal Colling, Qi Han, Tobias Meisen:

Inferring Driving Maps by Deep Learning-based Trail Map Extraction. 2425-2434 - Harsh Yadav, Maximilian Schäfer, Kun Zhao, Tobias Meisen:

LMFormer: Lane based Motion Prediction Transformer. 2435-2444 - Korawat Charoenpitaks, Van-Quang Nguyen, Masanori Suganuma, Kentaro Arai, Seiji Totsuka, Hiroshi Ino, Takayuki Okatani:

TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos. 2445-2455 - Oren Shrout, Ori Nizan, Yizhak Ben-Shabat, Ayellet Tal:

PatchContrast: Self-Supervised Pre-Training for 3D Object Detection. 2456-2466 - Yezhi Shen, Qiuchen Zhai, Fengqing Zhu:

PS4PRO: Pixel-to-pixel Supervision for Photorealistic Rendering and Optimization. 2467-2476 - Adam Lilja, Erik Wallin, Junsheng Fu, Lars Hammarstrand:

Exploring Semi-Supervised Learning for Online Mapping. 2477-2487 - Mahan Rafidashti, Ji Lan, Maryam Fatemi, Junsheng Fu, Lars Hammarstrand, Lennart Svensson:

NeuRadar: Neural Radiance Fields for Automotive Radar Point Clouds. 2488-2498 - Deepti Hegde, Suhas Lohit, Kuan-Chuan Peng, Michael Jones, Vishal Patel:

Multimodal 3D Object Detection on Unseen Domains. 2499-2509 - Rajeev Yasarla, Shizhong Han, Hong Cai, Fatih Porikli:

DySS: Dynamic Queries and State-Space Learning for Efficient 3D Object Detection from Multi-Camera Videos. 2510-2519 - Nicola Marinello, Simen Cassiman, Jonas Heylen, Marc Proesmans, Luc Van Gool:

Camera-Only 3D Panoptic Scene Completion for Autonomous Driving through Differentiable Object Shapes. 2520-2529 - Brunó Bence Englert, Tommie Kerssies, Gijs Dubbelman:

What is the Added Value of UDA in the VFM Era? 2530-2540 - Yunheng Xu, Jie Chen, Shuoheng Wang, Xinwen Wang:

TrajGNAS: Heterogeneous Multiagent Trajectory Prediction Based on a Graph Neural Architecture Search. 2541-2550 - Mohammad Altillawi, Fengyi Shen, Liudi Yang, Sai Manoj Prakhya, Ziyuan Liu:

CE-NPBG: Connectivity Enhanced Neural Point-Based Graphics for Novel View Synthesis in Autonomous Driving Scenes. 2551-2559 - Zhe Huang, Yizhe Zhao, Hao Xiao, Chenyan Wu, Lingting Ge:

DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection. 2560-2570 - Alexander Naumann, Xunjiang Gu, Tolga Dimlioglu, Mariusz Bojarski, Alperen Degirmenci, Alexander Popov, Devansh Bisla, Marco Pavone, Urs Muller, Boris Ivanovic:

Data Scaling Laws for End-to-End Autonomous Driving. 2571-2582 - Daniel C. Moura, Shizhan Zhu, Orly Zvitia:

Nexar Dashcam Collision Prediction Dataset and Challenge. 2583-2591
Pixel-level Video Understanding in the Wild Challenge
- Sakib Reza, Xiyun Song, Heather Yu, Zongfang Lin, Mohsen Moghaddam, Octavia I. Camps:

REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding. 2592-2603 - Alicia Li, Xiaodong Chen, Bohao Liang, Qian Bao, Wu Liu:

M-Adaptor: Text-driven Whole-body Human Motion Generation. 2604-2613 - Jongmin Yu, Zhongtian Sun, Chen Bene Chi, Jinhong Yang, Shan Luo:

Adversarially Domain-adaptive Latent Diffusion for Unsupervised Semantic Segmentation. 2614-2624 - Shuming Liu, Chen Zhao, Fatimah Zohra, Mattia Soldan, Alejandro Pardo, Mengmeng Xu, Lama Alssum, Merey Ramazanova, Juan León Alcázar, Anthony Cioppa, Silvio Giancola, Carlos Hinojosa, Bernard Ghanem:

OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection. 2625-2635 - Thanos Delatolas, Vicky Kalogeiton, Dim P. Papadopoulos:

Studying Image Diffusion Features for Zero-Shot Video Object Segmentation. 2636-2647 - Ding Qi, Shuguang Dou, Jian Liu, Huaixuan Cao, Hao Zhang, Dongsheng Jiang, Cairong Zhao:

MTA-VPS: A Large-scale Benchmark for Video-Based Person Search. 2648-2658 - Xianhang Li, Peng Wang, Xinyu Li, Heng Wang, Hongru Zhu, Cihang Xie:

Efficient VideoMAE via Temporal Progressive Training. 2659-2668 - Henghui Ding, Chang Liu, Nikhila Ravi, Shuting He, Yunchao Wei, Song Bai, Philip Torr:

PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild. 2669-2678
The 4th Explainable AI for Computer Vision (XAI4CV) Workshop
- Guillaume Jeanneret, Loïc Simon, Frédéric Jurie:

Disentangling Visual Transformers: Patch-level Interpretability for Image Classification. 2679-2689 - Jongseo Lee, Wooil Lee, Gyeong-Moon Park, Seong Tae Kim, Jinwoo Choi:

PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition. 2690-2699 - Soham Mitra, Atri Sukul, Swalpa Kumar Roy, Pravendra Singh, Vinay Kumar Verma:

ScoreCAM++: Gated Score-Weighted Visual Explanations for CNNs. 2700-2709 - Yihong Wu, Yuwen Heng, Mahesan Niranjan, Hansung Kim:

How does the Machine Perceive Depth for Indoor Single Images with CNN? 2710-2719 - Riccardo Campi, Santiago Borrego, Antonio De Santis, Matteo Bianchi, Andrea Tocchetti, Marco Brambilla:

Towards Synthetic Concept Activation Vectors via Generative Models. 2720-2728 - Valentina Bazyleva, Nicolò Bonettini, Gaurav Bharaj:

X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models. 2729-2739 - Bhat Dittakavi, Bharathi Callepalli, Swarnim Maheshwari, Vineeth Balasubramanian:

PoseGuru: Landmarks for Explainable Pose Correction using Exemplar-Guided Algorithmic Recourse. 2740-2749 - Maguelonne Heritier, Djebril Mekhazni, Cédric Leblond-Ménard, Benoit Godbout, Nathan Guilbaud, Mahdi Alehdaghi, Eric Granger:

ExaM: Unsupervised Concept-Based Representation Learning to Better Explain Models in Vision Tasks. 2750-2759 - Yu Cheng, Arushi Goel, Hakan Bilen:

Visually Interpretable Subtask Reasoning for Visual Question Answering. 2760-2780 - Daniel DeAlcala, Aythami Morales, Julian Fierrez, Gonzalo Mancera, Ruben Tolosana:

gMINT: Gradiant-based Membership Inference Test applied to Image Models. 2781-2790 - Jorge Francisco Ciprián-Sánchez, Josafat-Mattias Burmeister, Rico Richter, Jürgen Döllner:

Explaining 3D Point Cloud Semantic Segmentation Models Through Adversarial Attacks. 2791-2800 - Madhumitha V, Sunayna Padhye, Shanawaj S. Madarkar, Susmit Agrawal, Konda Reddy Mopuri:

Rel-SA: Alzheimer's Disease Detection using Relevance-augmented Self Attention by Inducing Domain Priors in Vision Transformers. 2801-2810
Image Matching: Local Features and Beyond
- Johan Edstedt:

Less Biased Noise Scale Estimation for Threshold-Robust RANSAC. 2811-2820 - Saurabh Pandey, Luca Magri, Federica Arrigoni, Vladislav Golyanik:

Outlier-Robust Multi-Model Fitting on Quantum Annealers. 2821-2830 - Leyla Mirvakhabova, Hong Cai, Jisoo Jeong, Hanno Ackermann, Farhad G. Zanjani, Fatih Porikli:

Learning Optical Flow Field via Neural Ordinary Differential Equation. 2831-2840 - Xiaolong Guo, Min Wang, Hui Wu, Wengang Zhou, Houqiang Li:

Detector-free Image Matching with Lightweight Backbone and Feature Filtering. 2841-2848 - Davide Sferrazza, Gabriele Moreno Berton, Gabriele Trivigno, Carlo Masone:

To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition. 2849-2860 - Gabriele Moreno Berton, Carlo Masone:

MegaLoc: One Retrieval to Place Them All. 2861-2867
2nd Workshop on Human Motion Generation
- Julian Tanke, Takashi Shibuya, Kengo Uchida, Koichi Saito, Yuki Mitsufuji:

Dyadic Mamba: Long-term Dyadic Human Motion Synthesis. 2868-2877 - Xiaogang Peng, Yiming Xie, Zizhao Wu, Varun Jampani, Deqing Sun, Huaizu Jiang:

HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models. 2878-2888 - Leo Bringer, Joey Wilson, Kira Barton, Maani Ghaffari:

MDMP: Multi-modal Diffusion for supervised Motion Predictions with uncertainty. 2889-2899 - Elly Akhoundi, Hung Yu Ling, Anup Anand Deshmukh, Judith Bütepage:

SILK: Smooth InterpoLation frameworK for motion in-betweening. 2900-2909 - Kengo Uchida, Takashi Shibuya, Yuhta Takida, Naoki Murata, Julian Tanke, Shusuke Takahashi, Yuki Mitsufuji:

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training. 2910-2919 - Inwoo Hwang, Jinseok Bae, Donggeun Lim, Young Min Kim:

Goal-Driven Human Motion Synthesis in Diverse Task. 2920-2930 - Gabriel Maldonado, Armin Danesh Pazho, Ghazal Alinezhad Noghre, Vinit Katariya, Hamed Tabkhi:

MoCLIP Motion-Aware Fine-Tuning and Distillation of CLIP for Human Motion Generation. 2931-2941
Multimodal Algorithmic Reasoning Workshop
- Yi-Lun Lee, Chen-Yu Lee, Wei-Chen Chiu, Yi-Hsuan Tsai:

Exemplar Masking for Multimodal Incremental Learning. 2942-2951 - Sadegh Rahmani-Boldaji, Filip Rybansky, Quoc Vuong, Frank Guerin, Andrew Gilbert:

Human vs. Machine Minds: Ego-Centric Action Recognition Compared. 2952-2962 - Yiqiao Huang, Qi He, Zhaorun Chen, Haopeng Zhang, Hanchao Yu, Zhuokai Zhao:

Autonomous Multimodal Reasoning via Implicit Chain-of-Vision. 2963-2972 - Wei Lin, Muhammad Jehanzeb Mirza, Sivan Doveh, Rogério Feris, Raja Giryes, Sepp Hochreiter, Leonid Karlinsky:

Comparison Visual Instruction Tuning. 2973-2983 - Tan-Hanh Pham, Trong-Duong Bui, Quang Minh Luu, Tan-Huong Pham, Chris Ngo, Truong-Son Hy:

SilVar-Med: A Speech-Driven Visual Language Model for Explainable Abnormality Detection in Medical Imaging. 2984-2994 - Mohammadmostafa Rostamkhani, Baktash Ansari, Hoorieh Sabzevari, Farzan Rahmani, Sauleh Eetemadi:

Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions. 2995-3004
Workshop on Foundation and Large Vision Models in Remote Sensing
- Christel Chappuis, Gencer Sümbül, Syrielle Montariol, Sylvain Lobry, Devis Tuia:

PAN-RSVQA: Vision Foundation Models as Pseudo-ANnotators for Remote Sensing Visual Question Answering. 3005-3016 - Thomas Kerdreux, Alexandre Tuel, Quentin Febvre, Alexis Mouche, Bertrand Chapron:

Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation. 3017-3027 - Darryl Hannan, John Cooper, Dylan White, Timothy Doster, Henry Kvinge, Yijing Watkins:

Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization. 3028-3037 - Martina Pastorino, Michael Alibani, Nicola Acito, Gabriele Moser:

Deep Diffusion Models and Unsupervised Hyperspectral Unmixing for Realistic Abundance Map Synthesis. 3038-3046 - Anan Yaghmour, Melba M. Crawford, Saurabh Prasad:

A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation via Synergistic Pseudo-Labeling and Generative Learning. 3047-3056 - Clément Barbier, Baptiste Abeloss, Stéphane Herbin:

Bridging the Modality Gap: Training-free Adaptation of Vision-Language Models for Remote Sensing via Visual Prototypes. 3057-3066 - Paul Borne--Pons, Mikolaj Czerkawski, Rosalie Martin, Romain Rouffet:

MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data. 3067-3075 - Chenyu Li, Zhaojie Pan, Danfeng Hong:

Dynamic State-Control Modeling for Generalized Remote Sensing Image Super-Resolution. 3076-3084 - Miguel Espinosa, Valerio Marsocci, Yuru Jia, Elliot Crowley, Mikolaj Czerkawski:

COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails. 3085-3095 - Pierre Adorni, Minh-Tan Pham, Stéphane May, Sébastien Lefèvre:

Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach. 3096-3106
SyntaGen: 2nd Workshop on Harnessing Generative Models for Synthetic Visual Datasets
- Nhat-Tan Bui, Dinh-Hieu Hoang, Quoc-Huy Trinh, Minh-Triet Tran, Truong Nguyen, Susan Gauch:

NeIn: Telling What You Don't Want. 3107-3115 - Yao Ni, Song Wen, Piotr Koniusz, Anoop Cherian:

Noise Consistency Regularization for Improved Subject-Driven Image Synthesis. 3116-3126 - Ying Zhao:

AnomalyHybrid: A Domain-agnostic Generative Framework for General Anomaly Detection. 3127-3136 - Yonwoo Choi:

SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation. 3137-3147 - Pranavi Kolouju, Eric Xing, Robert Pless, Nathan Jacobs, Abby Stylianou:

good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval. 3148-3157 - Li-Syun Hsiung, Jun-Kai Tu, Kuan-Wu Chu, Yu-Hsuan Chiu, Yan-Tsung Peng, Sheng-Luen Chung, Gee-Sern Hsu:

Syn3DTxt: Embedding 3D Cues for Scene Text Generation. 3158-3166 - David C. Wong, Bin Wang, Gorkem Durak, Marouane Tliba, Akshay Chaudhari, Aladine Chetouani, Ahmet Enis Çetin, Cagdas Topel, Nicolo Gennaro, Camila Lopes Vendrami, Tugce Agirlar Trabzonlu, Amir Ali Rahsepar, Laetitia Perronne, Matthew Antalek, Onural Ozturk, Gokcan Okur, Andrew C. Gordon, Ayis Pyrros, Frank H. Miller, Amir Borhani, Hatice Savas, Eric M. Hart, Drew A. Torigian, Jayaram K. Udupa, Elizabeth A. Krupinski, Ulas Bagci:

Eyes Tell the Truth: GazeVal Highlights Shortcomings of Generative AI in Medical Imaging. 3167-3175
2nd Workshop on Efficient Large Vision Models
- Sepehr Sameni, Simon Jenni, Paolo Favaro:

ViDROP: Video Dense Representation through Spatio-Temporal Sparsity. 3176-3186 - Yifan Li, Wentao Bao, Botao Ye, Zhen Tan, Tianlong Chen, Huan Liu, Yu Kong:

Window Token Concatenation for Efficient Visual Large Language Models. 3187-3197 - Wangyu Wu, Xianglin Qiu, Siqi Song, Zhenhong Chen, Xiaowei Huang, Fei Ma, Jimin Xiao:

Prompt Categories Cluster for Weakly Supervised Semantic Segmentation. 3198-3207 - Mustafa Munir, Guihong Li, Md Mostafijur Rahman, Alex Zhang, Radu Marculescu:

From Data to Design: Leveraging Frequency Statistics for Efficient Neural Network Architectures. 3208-3218 - Joseph Liu, Joshua Geddes, Ziyu Guo, Haomiao Jiang, Mahesh Kumar Nandwana:

SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers. 3229-3238 - Steven Walton, Ali Hassani, Xingqian Xu, Zhangyang Wang, Humphrey Shi:

Efficient Image Generation with Variadic Attention Heads. 3239-3250 - Alex Ergasti, Filippo Botti, Tomaso Fontanini, Claudio Ferrari, Massimo Bertozzi, Andrea Prati:

U-Shape Mamba: State Space Model for faster diffusion. 3251-3258 - Tomer Gafni, Asaf Karnieli, Yair Hanani:

Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference. 3259-3269 - Feiyang Wang, Nan Luo, Wangyu Wu:

VisionCube: 3D-Aware Vision-Language Model for Multi-Step Spatial Reasoning. 3270-3279 - Navin Ranjan, Andreas E. Savakis:

Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model. 3280-3290 - Fatimah Zohra, Chen Zhao, Shuming Liu, Bernard Ghanem:

Effectiveness of Max-Pooling for Fine-Tuning CLIP on Videos. 3291-3300 - Hanchen Xie, Rose Ma, Jiageng Zhu, Zheda Mai, Wael Abd-Almageed, Zubin Abraham:

Efficiently Mitigating Video Content Misalignment on Large Vision Model with Time-Series Data Alignment. 3301-3307 - Xingzi Xu, Qi Li, Shuwen Qiu, Julien Han, Karim Bouyarmane:

DEFT-VTON: Efficient Virtual Try-On with Consistent Generalised H-Transform. 3308-3317 - Rebati Raman Gaire, Arman Roohi:

CARN: Complexity-Aware Routing Network for Efficient and Adaptive Inference. 3318-3326 - Surya Selvam, Ravi K. Rajendran, Murugan Sankaradas, Anand Raghunathan, Srimat T. Chakradhar:

SimCache: Similarity Caching for Efficient VLM-based Scene Understanding. 3327-3336 - Steven Walton, Valeriy Klyukin, Maksim Artemev, Denis Derkach, Nikita Orlov, Humphrey Shi:

Distilling Normalizing Flows. 3337-3346 - Sam Pollard, Michael Wray:

Video, How Do Your Tokens Merge? 3347-3356
Test-time Scaling for Computer Vision
- Xiaxu Chen, Wei Li, Chunxu Liu, Chi Xie, Xiaoyan Hu, Chengqian Ma, Feng Zhu, Rui Zhao:

On the Suitability of Reinforcement Fine-Tuning to Visual Tasks. 3357-3361 - Yuming Qiao, Yuechen Wang, Xudong Zhang, Dan Meng:

TTGen: Incorporating Test-time Scaling to Diffusion Models. 3362-3366 - Prabhav Sanga, Jaskaran Singh, Tapabrata Chakraborti:

Get a GRIP on Test Time Adaptation! - Group Robust Inference-Time Policy Optimization for Vision Models. 3367-3376
Women in Computer Vision
- Reef Alturki, Adrian Hilton, Jean-Yves Guillemaut:

Enhanced Multi-View Pedestrian Detection Using Probabilistic Occupancy Volume. 3377-3386 - Pooja Kumari, Sukhendu Das:

Document Image Rectification using Stable Diffusion Transformer. 3387-3396 - Tushar Shinde, Shivaanee Eswaran:

Uncertainty-guided Style-aware Perceptual Quality Assessment for AI-Generated Images. 3397-3405 - Zixuan Liu, Guangkai Jiang, Siavash H. Khajavi:

LLaVA-SCo: Teach Vision Language Models to Self-Correct. 3406-3415 - Rosa Zuccarà, Georgia Fargetta, Alessandro Ortis, Sebastiano Battiato:

Exploiting Adversarial Learning and Topology Augmentation for Open-Set Visual Recognition. 3416-3424 - Deepak Ravikumar, Efstathia Soufleri, Kaushik Roy:

Improved Out-of-Distribution Detection with Additive Angular Margin Loss. 3425-3432 - Nurjahan Sultana, Wenqi Lu, Xinqi Fan, Moi Hoon Yap:

Domain Adaptation for Skin Lesion: Evaluating Real-World Generalisation. 3433-3443 - Kavitha Viswanathan, Amit Sethi, Shashwat Pathak, Piyush Bharambe, Harsh Choudhary:

Low-Resource Video Super-Resolution using Memory, Wavelets, and Deformable Convolutions. 3444-3453 - Kanimozhi Soundararajan, Sabari Nathan, A. Sasithradevi:

IdolDanceNet: Indian Heritage idol Dance Pose Classification. 3454-3463 - Mika Feng, Koichi Ito, Takafumi Aoki, Tetsushi Ohki, Masakatsu Nishigaki:

Leveraging Intermediate Features of Vision Transformer for Face Anti-Spoofing. 3464-3472 - Omid Halimi Milani, Amanda Nikho, Marouane Tliba, Lauren Mills, Ahmet Enis Çetin, Mohammed H. Elnagar:

Knowledge Distillation Approach for SOS Fusion Staging: Towards Fully Automated Skeletal Maturity Assessment. 3473-3480 - Romala Mishra, Sobhan Kanti Dhara:

Dust to Detail: Restoring Sand-dust Images with Frequency-Guided Attention and Multi-Scale Features. 3481-3490 - Suruchi Kumari, Pravendra Singh:

Leveraging Fixed and Dynamic Pseudo-Labels in Cross-Supervision Framework for Semi-Supervised Medical Image Segmentation. 3491-3501 - Christina Runkel, Kanchana Vaishnavi Gandikota, Jonas Geiping, Carola-Bibiane Schönlieb, Michael Moeller:

Training Data Reconstruction: Privacy due to Uncertainty? 3502-3510
The 5th Workshop of Adversarial Machine Learning on Computer Vision: Foundation Models + X
- Yuwei Chen, Shiyong Chu:

Trustworthy Multi-UAV Collaboration: A Self-Supervised Framework for Explainable and Adversarially Robust Decision-Making. 3511-3522 - Fatemeh Amerehi, Patrick Healy:

Defending Against Frequency-Based Attacks with Diffusion Models. 3523-3533 - Hondamunige Prasanna Silva, Federico Becattini, Lorenzo Seidenari:

Attacking Attention of Foundation Models Disrupts Downstream Tasks. 3534-3543 - Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar, Fahad Shahbaz Khan, Salman Khan:

Towards Evaluating the Robustness of Visual State Space Models. 3544-3553 - Zhenshu Ma, Xuan Cai, Changhang Tian, Yuqi Fan, Kemou Jiang, Gangfu Liu, Xuesong Bai, Aoyong Li, Yilong Ren, Haiyang Yu:

FullCycle: Full Stage Adversarial Attack For Reinforcement Learning Robustness Evaluation. 3554-3560 - Siwei Yang, Zeyu Wang, Diego Ortiz Barbosa, Luis Burbano, Murat Kantarcioglu, Alvaro A. Cárdenas, Cihang Xie:

Probing Vulnerabilities of Vision-LiDAR Based Autonomous Driving Systems. 3561-3569 - Brian Pulfer, Yury Belousov, Vitaliy Kinakh, Teddy Furon, Slava Voloshynovskiy:

Task-Agnostic Attacks Against Vision Foundation Models. 3570-3581 - Xuesong Bai, Changhang Tian, Wei Xia, Zhenshu Ma, Haiyang Yu, Yilong Ren:

EL-Attack: Explicit and Latent Space Hybrid Optimization based General and Effective Attack for Autonomous Driving Trajectory Prediction. 3582-3590 - Pedram MohajerAnsari, Amir Salarpour, David Fernandez, Cigdem Kokenoz, Bing Li, Mert D. Pesé:

Attention-Aware Temporal Adversarial Shadows on Traffic Sign Sequences. 3591-3599
The 3rd Workshop on What is Next in Multimodal Foundation Models?
- Yang Jiao, Haibo Qiu, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, Yu-Gang Jiang:

UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding. 3600-3610 - Shehreen Azad, Yash Jain, Rishit Garg, Vibhav Vineet, Yogesh S. Rawat:

Understanding Depth and Height Perception in Large Visual-Language Models. 3611-3620 - Rohit Kundu, Sudipta Paul, Arindam Dutta, Amit Roy-Chowdhury:

Repurposing SAM for User-Defined Semantics Aware Segmentation. 3621-3631 - Chau Pham, Hoang Phan, David S. Doermann, Yunjie Tian:

PLVM: A tuning-free approach for Personalized Large Vision-Language Model. 3632-3641 - Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Jameel Hassan, Muzammal Naseer, Federico Tombari, Fahad Shahbaz Khan, Salman Khan:

How Good is my Video-LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs. 3642-3651 - Zane Durante, Ran Gong, Bidipta Sarkar, Naoki Wake, Rohan Taori, Paul Tang, Shrinidhi Kowshika Lakshmikanth, Kevin A. Schulman, Arnold Milstein, Hoi Vo, Ehsan Adeli, Demetri Terzopoulos, Li Fei-Fei, Jianfeng Gao:

An Interactive Agent Foundation Model. 3652-3662 - Neha Mukund Kalibhat, Priyatham Kattakinda, Sumit Nawathe, Arman Zarei, Nikita Seleznev, Samuel Sharpe, Senthil Kumar, Soheil Feizi:

Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning. 3663-3672
1st International Workshop on Interactive Video Search and Exploration
- Luca Rossetto, George Awad, Werner Bailer, Cathal Gurrin, Björn Þór Jónsson, Jakub Lokoc, Stevan Rudinac, Klaus Schoeffmann:

Overview of the 1st International Workshop on Interactive Video Search and Exploration. 3673-3678 - Heng Liu, Siru Jiang, Fangyun Duan, Yongzhe Lyu, Xiusong Wang, Hanlin Ge, Chao Liang:

CadenceRAG: Context-Aware and Dependency-Enhanced Retrieval Augmented Generation for Holistic Video Understanding. 3679-3688 - Bao Tran Gia, Khiem Le, Tien Do, Tien-Dung Mai, Thanh Duc Ngo, Duy-Dinh Le, Shin'ichi Satoh:

VRAG: Retrieval-Augmented Video Question Answering for Long-Form Videos. 3689-3698 - Khanh-An C. Quan, Qui Ngoc Nguyen, Duc-Tuan Luu:

Toward Automation in Text-based Video Retrieval with LLM Assistance. 3699-3707 - Tinh-Anh Nguyen-Nhu, Huu-Loc Tran, Nguyen-Khang Le, Minh-Nhat Nguyen, Tien-Huy Nguyen, Hoang-Long Nguyen-Huu, Huu-Phong Phan-Nguyen, Huy-Thach Pham, Quan Nguyen, Hoang M. Le, Quang-Vinh Dinh:

A Lightweight Moment Retrieval System with Global Re-Ranking and Robust Adaptive Bidirectional Temporal Search. 3708-3718 - Huu-Loc Tran, Tinh-Anh Nguyen-Nhu, Huu-Phong Phan-Nguyen, Tien-Huy Nguyen, Nhat-Minh Nguyen-Dich, Anh Dao, Huy-Duc Do, Quan Nguyen, Hoang M. Le, Quang-Vinh Dinh:

Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking. 3719-3729 - Damianos Galanopoulos, Andreas Goulas, Antonios Leventakis, Ioannis Patras, Vasileios Mezaris:

An LLM Framework for Long-form Video Retrieval and Audio-Visual Question Answering Using Qwen2/2.5. 3730-3739 - Ujjwal Sharma, Omar Shahbaz Khan, Stevan Rudinac, Björn Þór Jónsson:

Can Relevance Feedback, Conversational Search and Foundation Models Work Together for Interactive Video Search and Exploration? 3740-3749 - Klaus Schoeffmann, Mario Leopold:

AI-based Video Content Understanding for Automatic and Interactive Multimedia Retrieval. 3750-3758
Workshop on Distillation of Foundation Models for Autonomous Driving
- Yafei Qi, Menghao Yang, Fan Wu, Chen Wang, Yongmin Zhang:

Harmonizing Attention Fields with Knowledge Distillation for Multi-View 3D Object Detection. 3759-3767 - Sheng Yang, Tong Zhan, Shichen Qiao, Jicheng Gong, Qing Yang, Jian Wang, Yanfeng Lu:

ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving. 3768-3777 - Kaidong Li, Tianxiao Zhang, Kuan-Chuan Peng, Guanghui Wang:

PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector. 3778-3787 - Zihao Sheng, Zilin Huang, Yansong Qu, Yue Leng, Sikai Chen:

Talk2Traffic: Interactive and Editable Traffic Scenario Generation for Autonomous Driving with Multimodal Large Language Model. 3788-3797 - Ankit Kumar Shaw, Kun Jiang, Tuopu Wen, Chandan Kumar Sah, Yining Shi, Mengmeng Yang, Diange Yang, Xiaoli Lian:

CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates. 3798-3807 - Elahe Yahyapour, Chengbo Ai:

Fairness-Aware Boosting Model for Imbalanced 3D Point Cloud Segmentation in Autonomous Driving. 3808-3816 - Dunant Cusipuma, David Ortega, Victor Flores-Benites, Arturo Deza:

Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru. 3817-3828 - Li Zhong, Ahmed Ghazal, Jun-Jun Wan, Frederik Zilly, Patrick Mackens, Joachim E. Vollrath, Bogdan Sorin Coseriu:

Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation. 3829-3837 - Yujin Wang, Quanfeng Liu, Zhengxin Jiang, Tianyi Wang, Junfeng Jiao, Hongqing Chu, Bingzhao Gao, Hong Chen:

RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving. 3838-3848 - Tzoulio Chamiti, Leandro Di Bella, Adrian Munteanu, Nikos Deligiannis:

ReferGPT: Towards Zero-Shot Referring Multi-Object Tracking. 3849-3858 - Tin Stribor Sohn, Maximilian Dillitzer, Johannes Bach, Jason J. Corso, Tim Brühl, Robin Schwager, Tim Dieter Eberhardt, Eric Sax:

Drive4C: A Closed-Loop Benchmark on What Foundation Models Really Need to Be Capable of for Language-Guided Autonomous Driving. 3859-3869 - Amirhosein Chahe, Lifeng Zhou:

ReasonDrive: Efficient Visual Question Answering for Autonomous Vehicles with Reasoning-Enhanced Small Vision-Language Models. 3870-3879
Rhobin2025: The Third Rhobin Challenge on Reconstruction of Human-Object Interaction
- Romain Brégier, Fabien Baradel, Thomas Lucas, Salma Galaaoui, Matthieu Armando, Philippe Weinzaepfel, Grégory Rogez:

CondiMen: Conditional Multi-Person Mesh Recovery. 3880-3890 - Ayce Idil Aytekin, Chuqiao Li, Diogo C. Luvizon, Rishabh Dabral, Martin R. Oswald, Marc Habermann, Christian Theobalt:

Physics-based Human Pose Estimation from a Single Moving RGB Camera. 3891-3900 - Xiyuan Kang, Yi Yuan, Xu Dong, Muhammad Awais, Lilian Tang, Josef Kittler, Zhenhua Feng:

Short-term 3D Human Mesh Recovery with Virtual Markers Disentanglement. 3901-3911 - Sonain Jamil:

PoseSynViT: Lightweight and Scalable Vision Transformers for Human Pose Estimation. 3912-3921
VAND: Visual Anomaly and Novelty Detection - 3rd Edition
- YeongHyeon Park, Sungho Kang, Myung Jin Kim, Hyeong Seok Kim, Juneho Yi:

Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection. 3922-3932 - Mathis Kruse, Bodo Rosenhahn:

Multi-Flow: Multi-View-Enriched Normalizing Flows for Industrial Anomaly Detection. 3933-3944 - Eun-Ju Park, Taekyung Kim, Minju Kim, Hojun Lee, Gil-Jun Lee:

SK-RD4AD : Skip-Connected Reverse Distillation For Robust One-Class Anomaly Detection. 3945-3953 - Amol Khanna, Chenyi Ling, Derek Everett, Edward Raff, Nathan Inkawhich:

Multi-layer Radial Basis Function Networks for Out-of-distribution Detection. 3954-3963 - Guangyao Chen, Kai A. Horstmann, Zhilong Wang, Fengqi You:

Automated Essential Concept Discovery for Few-Shot Out-of-Distribution Detection. 3964-3974 - Xinyi Zhao, Congjing Zhang, Pei Guo, Wei Li, Lin Chen, Chaoyue Zhao, Shuai Huang:

SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models. 3975-3985 - Masud An Nur Islam Fahim, Jani Boutellier:

No-MambAAD: Revitalizing Conv-Only Networks for Unsupervised Anomaly Detection. 3986-3994 - Yu-Chen Lai, Motoharu Sonogashira, Itthisak Phueaksri, Yasutomo Kawanishi:

Scene-Specific Anomalous Relationship Detection Using Scene Graph Summarization. 3995-4003 - Yona Falinie A. Gaus, Brian K. S. Isaac-Medina, Neelanjan Bhowmik, Yam T. Lee, Toby P. Breckon:

Semi-supervised Object-Wise Anomaly Detection for Firearm and Firearm Component Detection in X-ray Security Imagery. 4004-4014 - Aimira Baitieva, Yacine Bouaouni, Alexandre Briot, Dick Ameln, Souhaiel Khalfaoui, Samet Akcay:

Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection. 4015-4025 - Manuel Barusco, Francesco Borsatti, Davide Dalle Pezze, Francesco Paissan, Elisabetta Farella, Gian Antonio Susto:

PaSTe: Improving the Efficiency of Visual Anomaly Detection at the Edge. 4026-4035 - Khaled Dawoud, Zaigham Zaheer, Mustaqeem Khan, Karthik Nandakumar, Abdulmotaleb Elsaddik, Muhammad Haris Khan:

FusedVision: A Knowledge-Infusing Approach for Practical Anomaly Detection in Real-world Surveillance Videos. 4036-4046 - Latha Pemula, Dongqing Zhang, Onkar Dabeer:

Robust AD: A Real World Benchmark Dataset For Robustness in Industrial Anomaly Detection. 4047-4057 - Sassan Mokhtar, Arian Mousakhan, Silvio Galesso, Jawad Tayyub, Thomas Brox:

Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models. 4058-4067 - Tapan Ganatma Nakkina, Yuhao Zhong, Pete Sumethasorn, Haopeng Tian, Satish T. S. Bukkapatnam:

When Textures Deceive: Weakly Supervised Industrial Anomaly Detection with Adapted-Loss CycleGAN. 4068-4077
The 3rd Workshop on Sign Language Recognition, Translation and Production
- Kepeng Wu, Zecheng Li, Weichao Zhao, Hezhen Hu, Wengang Zhou, Houqiang Li:

Cross-Modal Consistency Learning for Sign Language Recognition. 4078-4087 - Razieh Rastgoo, Kourosh Kiani, Sergio Escalera:

Diffusion-Based Continuous Sign Language Generation with Cluster-Specific Fine-Tuning and Motion-Adapted Transformer. 4088-4097 - Sarah N. Alyami, Hamzah Luqman:

CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition. 4098-4108 - Harry Walsh, Edward Fish, Ozge Mercanoglu Sincan, Mohamed Ilyes Lakhal, Richard Bowden, Neil Fox, Bencie Woll, Kepeng Wu, Zecheng Li, Weichao Zhao, Haodong Wang, Wengang Zhou, Houqiang Li, Shengeng Tang, Jiayi He, Xu Wang, Ruobei Zhang, Yaxiong Wang, Lechao Cheng, Sümeyye Meryem Tasyürek, Tugçe Kiziltepe, Hacer Yalim Keles:

SLRTP2025 Sign Language Production Challenge: Methodology, Results and Future Work. 4109-4119
Workshop on Pixel-level understanding with Vision Foundation Models
- Josh Myers-Dean, Brian L. Price, Yifei Fan, Danna Gurari:

Hierarchical Semantic Segmentation with Autoregressive Language Modeling. 4120-4130 - Yuji Nozawa, Yu-Chieh Lin, Kazumoto Nakamura, Youyang Ng:

Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval. 4131-4141 - M. Arda Aydin, Efe Mert Çirpar, Elvin Abdinli, Gozde Unal, Yusuf Hüseyin Sahin:

ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements. 4142-4152 - Gabriele Rosi, Fabio Cermelli:

Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation. 4153-4163
2nd Workshop on Multi-Agent Embodied Intelligent Systems Meet Generative-AI Era: Opportunities, Challenges and Futures
- Kang Ding, Chunxuan Jiao, Yunze Hu, Kangjie Zhou, Pengying Wu, Yao Mu, Chang Liu:

SwarmDiff: Swarm Robotic Trajectory Planning in Cluttered Environments via Diffusion Transformer. 4164-4173 - Haiyong Yu, Yanqiong Jin, Yonghao He, Wei Sui:

Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization. 4174-4183 - Frank P.-W. Lo, Jianing Qiu, Zeyu Wang, Haibao Yu, Yeming Chen, Gao Zhang, Benny Lo:

AI Hiring with LLMs: A Context-Aware and Explainable Multi-Agent Framework for Resume Screening. 4184-4193 - Junhong Chen, Ziqi Yang, Haoyuan G. Xu, Dandan Zhang, George P. Mylonas:

Multi-Agent Systems for Robotic Autonomy with LLMs. 4194-4204 - Zeyu Wang, Frank Po Wen Lo, Qian Chen, Yongqi Zhang, Chen Lin, Xu Chen, Zhenhua Yu, Alexander J. Thompson, Eric M. Yeatman, Benny P. L. Lo:

An LLM-enabled Multi-Agent Autonomous Mechatronics Design Framework. 4205-4215 - Iman Abbasnejad, Xuefeng Liu, Atanu Roy:

Deciding the Path: Leveraging Multi-Agent Systems for Solving Complex Tasks. 4216-4225 - Xiangbo Gao, Yuheng Wu, Rujia Wang, Chenxi Liu, Yang Zhou, Zhengzhong Tu:

LangCoop: Collaborative Driving with Language. 4226-4237
Computer Vision for Drug Discovery Where Are We and What is Beyond?
- Hooman Ramezani, Charlotte Vedrines, Dionne M. Aleman, Daniel Létourneau:

LNTransformer: Lung Nodule Transformer for Sparse CT Segmentation. 4238-4245 - Jakub Kosciukiewicz, Dawid Rymarczyk, Bartosz Zielinski:

HCS-DFC: A Diffusion Classifier for Mode of Action Prediction Using Morphological Profiles. 4246-4251 - Zichao Li, Shiqing Qiu, Zong Ke:

Revolutionizing Drug Discovery: Integrating Spatial Transcriptomics with Advanced Computer Vision Techniques. 4252-4258 - Arijit Patra, Jinge Wu, Honghan Wu, Anshul Thakur:

Towards exploring continual learning for toxicologic pathology in pharmaceutical drug discovery. 4259-4268 - Adib Bazgir, Yuwen Zhang:

Drug Discovery Agent: An Automated Vision Detection System for Drug-Cell Interactions. 4269-4277 - Syed Sameed Husain, Jan Bober, Amaia Irizar, Miroslaw Bober:

Bridging Self-Supervision and Mechanism of Action Discovery in Morphological Profiling. 4278-4285 - Taha Razzaq, Ahmed Rashid Qazi, Asim Iqbal:

Segment AnyNeuron. 4286-4293 - Fatemeh Dashti Ahangar, Jiann-Shiun Yuan:

Bridging Morphology and Molecular Signatures: Multi-Task Deep Learning for Multi-Omics Prediction from Histopathology. 4294-4302 - Lawrence Phillips, Rory M. Donovan-Maiye:

CellRep: A Multichannel Image Representation Learning Model. 4303-4309
7th Safe Artificial Intelligence for All Domains
- Youssef Shoeb, Azarm Nowzad, Hanno Gottschalk:

Out-of-Distribution Segmentation in Autonomous Driving: Problems and State of the Art. 4310-4320 - Ben Batten, Alessio Lomuscio:

Improving Weather-based OOD Generalisation in Lidar-based Object Detection Models via Adversarial Training. 4321-4329 - Kento Oonishi, Tsunato Nakai:

Universal Shape of Strong Remote Adversarial Patches for Object Detection with Convolutional Neural Networks. 4330-4340 - Tharun Anand, Siva Sankar, Pravin Nair:

Detecting Localized Deepfake Manipulations Using Action Unit-Guided Video Representations. 4341-4351 - Sophia Kalanovska, Michael Luck, Christopher Hampson:

CTC: Contribution to Classification of Complex Features. 4352-4361 - Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Antonio Guillen, Ricardo Luna, Soumyendu Sarkar:

Robustness Evaluation for Video Models with Reinforcement Learning. 4362-4370 - Mariusz Karol Nowak, Jacek Cyranka, Natalia Maslany, Aleksander Kostuch, Jakub Derbisz, Mateusz Komorkiewicz, Patryk Siwek, Mateusz Wójcik, Dariusz Marchewka, Pawel Skruch:

How Much Noise is there in Labels Generated by Humans? A Method to Validate Automatically Generated Bounding Boxes. 4371-4380 - Muneeb Ahmed Khan, Yujin Choi, Jiho Eum, Heemin Park:

Traffic Sign Recognition Under Visual Perturbations: Shadows, Light Patches, and Simulated Obstructions. 4381-4390 - Thomas Botschen, Konstantin Kirchheim, Frank Ortmeier:

Out-of-Distribution Detection with Adversarial Outlier Exposure. 4391-4400 - Moritz Thoma, Tobias Preintner, Emad Aghajanzadeh, Shambhavi Balamuthu Sampath, Pierpaolo Morì, Nael Fasfous, Manoj Rohit Vemparala, Alexander Frickenstein, Daniel Mueller-Gritschneder, Ulf Schlichtmann:

Uncertainty Aware Training to Improve Uncertainty Active Learning for Semantic Segmentation. 4401-4411
21st Workshop on Perception Beyond the Visible Spectrum (PBVS'2025)
- Alexander Ulrichsen, Thomas De Kerf, David Dunphy, Paul Murray, Steve Vanlanduit, Stephen Marshall:

A True Hyperspectral Image Super-Resolution Dataset. 4412-4421 - Valentin Braeutigam, Vanessa Wirth, Ingrid Ullmann, Christian Schüßler, Martin Vossiek, Matthias Berking, Bernhard Egger:

3D Face Reconstruction From Radar Images. 4422-4431 - Stefan Becker, Ann-Kristin Grosselfinger, Jens Bayer, David Münch, Wolfgang Hübner, Michael Arens:

Fusion or Confusion? A Look at Dataset Pooling for Infrared Object Detection. 4432-4441 - Hideaki Kanayama, Mahdi Chamseddine, Suresh Guttikonda, So Okumura, Soichiro Yokota, Didier Stricker, Jason R. Rambach:

ToF-360 - A Panoramic Time-of-flight RGB-D Dataset for Single Capture Indoor Semantic 3D Reconstruction. 4442-4451 - Yuan Luo, Rudolf Hoffmann, Yan Xia, Olaf Wysocki, Benedikt Schwab, Thomas H. Kolbe, Daniel Cremers:

RADLER: Radar Object Detection Leveraging Semantic 3D City Models and Self-Supervised Radar-Image Learning. 4452-4461 - Isaac Corley, Conor Wallace, Sourav Agrawal, Burton Putrah, Jonathan Lwowski:

Aerial Infrared Health Monitoring of Solar Photovoltaic Farms at Scale. 4462-4470 - Inseo Lee, Youngyoon Choi, Joonseok Lee:

GaussianVideo: Efficient Video Representation and Compression by Gaussian Splatting. 4471-4480 - Vasyl Vasylenko, Ihor Tymchyshyn, Vitalii Tymchyshyn:

XiEff Representation for Interpretable Near-Field Imaging. 4481-4489 - Wenqi Guo, Yiyang Du, Shan Du:

LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset. 4490-4500 - Alberto Pepe, Yuxin Yao, Joan Lasenby:

Define, Refine, Align: Correspondence-free 3D Line Alignment with Attentional, Equivariant and Rotational Layers. 4501-4511 - Hongzhi Guo, Paul T. Schrader, Erik Blasch:

Enhancing Multi-modal Automatic Target Recognition using Out-of-Distribution Exploitation (MATRODE). 4512-4520 - Yuxin Yao, Yan Zhang, Zhening Huang, Joan Lasenby:

SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos. 4521-4530 - Tyler Rust, Michael Pergeorelis, Chandra Kambhamettu, Colin Kelly:

S-Band SAR Target Classification via 2D and 3D Deep Learning Methods. 4531-4539 - Daniel Stadler, Andreas Specker:

A Strong Baseline for Multi-Person Tracking in Thermal Infrared Imagery. 4540-4550 - Mickael Cormier, Andreas Specker, Jürgen Beyerer:

UPPET: Unified Pedestrian Pose Estimation in Thermal Imaging. 4551-4560 - Ghazal Rouhafzay, Stephen Rowlands, Angel J. Valencia, Shengsong Yang, Pierre Payeur, Haitao Tian, James Dickens:

Multi-Spectral Imaging and Data Fusion for Real-Time Bleeding Detection. 4561-4568 - Anirudh Nanduri, Siyuan Huang, Rama Chellappa:

Cross-Spectral Body Recognition with Side Information Embedding: Benchmarks on LLCM and Analyzing Range-Induced Occlusions on IJB-MDF. 4569-4577 - Manjunath D, Aniruddh Sikdar, Prajwal Gurunath, Sumanth Udupa, Suresh Sundaram:

SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems. 4578-4588 - Hang Zhong, Yu Wang, Shengjie Zhao:

SwinPaste: A Swin Transformer-Based Framework for RGB-Guided Thermal Image Super-Resolution. 4589-4594 - Michael Pergeorelis, Tyler Rust, Chandra Kambhamettu:

Open Dataset and Enhancement Method for Long-wave Thermal Diurnal Material Classification. 4595-4601 - Wassim A. El Ahmar, Ángel D. Sappa, Riad I. Hammoud:

Thermal Pedestrian Multiple Object Tracking Challenge (TP-MOT). 4602-4609 - Priya Kansal, Sabari Nathan:

Dual-Input Frequency-Aware Network for High-Quality Thermal Image Super-Resolution. 4610-4620 - Rafael E. Rivadeneira, Ángel D. Sappa, Riad I. Hammoud, Jiyong Rao, Hang Zhong, Yu Wang, Shengjie Zhao, Zhiwei Zhong, Yung-Hui Li, Shiqi Wang, Qiangqiang Shen, Hanzhang Wang, Xuanqi Zhang:

Thermal Image Super-Resolution Challenge Results - PBVS 2025. 4621-4630 - Nathan Inkawhich, Claire Thorp, Justice Wheelwright, Oliver Nina, Dylan Bowald, Ángel D. Sappa, Erik Blasch:

4th Multi-modal Aerial View Image Challenge: SAR Classification - PBVS 2025. 4631-4639 - Dylan Bowald, Justice Wheelwright, Oliver Nina, Ángel D. Sappa, Riad I. Hammoud, Erik Blasch, Nathan Inkawhich:

3rd Multi-modal Aerial View Image Challenge: Sensor Domain Translation - PBVS 2025. 4640-4649 - Xiaowei Chen, Guoliang Fan:

Probabilistic Perspective-n-lines for Indoor Camera Pose Estimation. 4650-4659 - Hongli Liu, Wang Yu, Shengjie Zhao:

CSRN: Cross-Sensor Robust Recognition Network for Multi-modal Aerial View Object Classification. 4660-4666
Workshop on Computer Vision for Microscopy Image Analysis
- Pawel Tomasz Pieta, Peter Winkel Rasmussen, Anders Bjorholm Dahl, Anders Nymark Christensen:

Fast Sphericity and Roundness approximation in 2D and 3D using Local Thickness. 4667-4677 - Uzair Shah, Marco Agus, Daniya Boges, Vanessa Chiappini, Mahmood Alzubaidi, Jens Schneider, Markus Hadwiger, Pierre J. Magistretti, Mowafa S. Househ, Corrado Calì:

SAM4EM: Efficient memory-based two stage prompt-free segment anything model adapter for complex 3D neuroscience electron microscopy stacks. 4678-4687 - Tristan Piater, Björn Barz, Alexander Freytag:

Prompt-Tuning SAM: From Generalist to Specialist with only 2048 Parameters and 16 Training Images. 4688-4698 - Siyavash Shabani, Sahar A. Mohammed, Bahram Parvin:

A Novel 3D Decoder with Weighted and Learnable Triple Attention for 3D Microscopy Image Segmentation. 4699-4708 - Mary Damilola Aiyetigbo, Wanqi Yuan, Feng Luo, Xin Li, Tong Ye, Nianyi Li:

Generalizable Unsupervised Microscopy Video Denoising via Weighted SpatioTemporal Sampling. 4709-4718 - Hao Chen, Julian Najera, Dagmawit Geresu, Meenal Datta, Cody J. Smith, Scott S. Howard:

Zero-Shot Denoising for Fluorescence Lifetime Imaging Microscopy with Intensity-Guided Learning. 4719-4728 - Mina Gachloo, Akhila Nangineedi, Mahsa Partovi, Fardifa Fathmiul Alam, Tzu-Yu Chu, James Schvaneveldt, Xiaoming Lu, Tirthankar Biswas, Marc R. Birtwistle, Federico Iuricich:

Low-Frame-Rate Cell Tracking: Unmet Needs and Future Directions. 4729-4738 - Yaroslav Prytula, Illia Tsiporenko, Ali Zeynalli, Dmytro Fishman:

IAUNet: Instance-Aware U-Net. 4739-4748 - Vedrana Ivezic, Ashwath Radhachandran, Ekaterina Redekop, Shreeram Athreya, Dongwoo Lee, Vivek Sant, Corey W. Arnold, William Speier:

CytoFM: The first cytology foundation model. 4749-4757 - Jesus Dassaef López-Barrios, Miguel Angel Ontiveros-Torres, Jose Antonio Cantoral-Ceballos:

Beyond Neurofibrillary Tangles: Explainable AI for Microscopic Tauopathy Classification in Immunofluorescence Imaging. 4758-4768
Foundation Models for V2X-Based Cooperative Autonomous Driving
- Jannik Lübberstedt, Esteban Rivera, Nico Uhlemann, Markus Lienkamp:

V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving. 4769-4778 - Tonko E. W. Bossen, Andreas Møgelmose, Ross Greer:

Can Vision-Language Models Understand and Interpret Dynamic Gestures from Pedestrians? Pilot Datasets and Exploration Towards Instructive Nonverbal Commands for Cooperative Autonomous Vehicles. 4779-4788 - Johannes Spöcklberger, Wei Lin, Pedro Hermosilla, Sivan Doveh, Horst Possegger, Muhammad Jehanzeb Mirza:

Exploring Modality Guidance to Enhance VFM-based Feature Fusion for UDA in 3D Semantic Segmentation. 4789-4798
Mechanistic Interpretability for Vision
- Robin Hesse, Jonas Fischer, Simone Schaub-Meyer, Stefan Roth:

Disentangling Polysemantic Channels in Convolutional Neural Networks. 4799-4803 - André Longon:

Naturally Computed Scale Invariance in the Residual Stream of ResNet18. 4804-4808 - Matthew Bozoukov:

Uncovering Branch-specialization in InceptionV1 using k sparse autoencoders. 4809-4813 - Ashim Dahal, Saydul Akbar Murad, Nick Rahimi:

Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM's Representation Learning. 4814-4818 - Ryota Takatsuki, Sonia Joseph, Ippei Fujisawa, Ryota Kanai:

Decoding Vision Transformers: the Diffusion Steering Lens. 4819-4824 - Matthew W. Shinkle, Mark D. Lescroart:

Visualizing and Controlling Cortical Responses Using Voxel-Weighted Activation Maximization. 4825-4829 - Sophia J. Abraham, Jonathan D. Hauenstein, Walter J. Scheirer:

Wavelet-Based Mechanistic Interpretability of Vision Transformers via Frequency-Aware Ablations. 4830-4834 - Matthew Lyle Olson, Musashi Hinck, Neale Ratzlaff, Changbai Li, Phillip Howard, Vasudev Lal, Shao-Yen Tseng:

Analyzing Hierarchical Structure in Vision Models with Sparse Autoencoders. 4835-4839 - Amar Kumar, Anita Kriz, Barak Pertzov, Tal Arbel:

Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging. 4840-4845 - Zahra Tehraninasab, Amar Kumar, Tal Arbel:

Language-Guided Trajectory Traversal in Disentangled Stable Diffusion Latent Space for Factorized Medical Image Generation. 4846-4851 - Darshana Saravanan, Makarand Tapaswi, Vineet Gandhi:

Investigating Mechanisms for In-Context Vision Language Binding. 4852-4856
5th International Workshop on Event-Based Vision
- Andreu Girbau-Xalabarder, Jun Nagata, Shinichi Sumiyoshi:

Probabilistic Online Event Downsampling. 4857-4865 - Viktor Rudnev, Gereon Fox, Mohamed Elgharib, Christian Theobalt, Vladislav Golyanik:

Dynamic EventNeRF: Reconstructing General Dynamic Scenes from Multi-view RGB and Event Streams. 4866-4876 - Shintaro Shiba, Quan Kong, Norimasa Kobori:

E-VLC: A Real-World Dataset for Event-based Visible Light Communication And Localization. 4877-4886 - Shintaro Shiba, Quan Kong, Norimasa Kobori:

Augmented Reality Applications Using Active Markers With An Event Camera. 4887-4888 - Jonah Sengupta:

Quadrocular, Neuromorphic Stereo Triangulation and Asynchronous Data Fusion for 3D Object Tracking. 4889-4897 - Vincent Brebion, Julien Moreau, Franck Davoine:

DELTA: Dense Depth from Events and LiDAR using Transformer's Attention. 4898-4907 - Juan Luis Valerdi, Xabier Iturbe:

Best Linear Unbiased Estimation for 2D and 3D Flow with Event-based Cameras. 4908-4917 - Ryo Yamaki, Shintaro Shiba, Guillermo Gallego, Yoshimitsu Aoki:

Iterative Event-based Motion Segmentation by Variational Contrast Maximization. 4918-4927 - Michael C. Daugherty, Matthew DiSalvo, Aaron Goldfain, Alexander Peterson, Edward Kwee, Thomas Germer, Gregory Cooksey, Jagat Budhathoki, Peter Bajcsy:

Nanoparticle Diameter Measurements With Event Camera Tracking. 4928-4937 - Pietro Bonazzi, Christian Vogt, Michael Jost, Lyes Khacef, Federico Paredes-Vallés, Michele Magno:

Towards Low-Latency Event-based Obstacle Avoidance on a FPGA-Drone. 4938-4946 - Gabriele Magrini, Federico Becattini, Giovanni Colombo, Pietro Pala:

EV-Flying: an Event-based Dataset for In-The-Wild Recognition of Flying Objects. 4947-4955 - Laurie Bose, Piotr Dudek:

Demo : Point-Feature Tracking for Pixel Processor Arrays. 4956-4957 - Hugo Bulzomi, Alimatou Sadia Memudu, Yuta Nakano, Jean Martinet:

Real-Time Pedestrian Detection at the Edge on a Fully Asynchronous Neuromorphic System. 4958-4967 - Ziyun Wang, Friedhelm Hamann, Kenneth Chaney, Wen Jiang, Guillermo Gallego, Kostas Daniilidis:

Event-based Continuous Color Video Decompression from Single Frames. 4968-4978 - Fernando Cladera, Kenneth Chaney, Caroline Pritchard, M. Ani Hsieh, Vijay Kumar, Camillo J. Taylor, Kostas Daniilidis:

Looking into the Shadow: Recording a Total Solar Eclipse with High-resolution Event Cameras. 4979-4983 - Sami Arja, Nimrod Kruger, Alexandre Marcireau, Nicholas Owen Ralph, Saeed Afshar, Gregory Cohen:

Seeing like a Cephalopod: Colour Vision with a Monochrome Event Camera. 4984-4993 - Andreas Ziegler, David Joseph, Thomas Gossard, Emil Moldovan, Andreas Zell:

BiasBench: A reproducible benchmark for tuning the biases of event cameras. 4994-5003 - Ignacio G. Bugueño-Córdova, Javier Ruiz-del-Solar, Rodrigo Verschae:

Human-Robot Navigation using Event-based Cameras and Reinforcement Learning. 5004-5012 - Zhipeng Tang, Shifan Zhu, Zezhou Cheng, Donghyun Kim, Erik G. Learned-Miller:

E-BARF: Bundle Adjusting Neural Radiance Fields from a Moving Event Camera. 5013-5022 - Victor Hoffmann, Valentina Cavinato, Kirk Y. W. Scheper:

Live Demonstration: NeuroTouch - A Neuromorphic Vision-based Tactile Sensor for Real-Time Gesture Recognition. 5023-5024 - Ivan Alberico, Marco Cannici, Giovanni Cioffi, Davide Scaramuzza:

Egocentric Event-Based Vision for Ping Pong Ball Trajectory Prediction. 5025-5034 - Carl Brander, Giovanni Cioffi, Nico Messikommer, Davide Scaramuzza:

Reading in the Dark with Foveated Event Vision. 5035-5043 - Hesam Araghi, Jan van Gemert, Nergis Tomen:

Making Every Event Count: Balancing Data Efficiency and Accuracy in Event Camera Subsampling. 5044-5054 - Muhammad Aitsam, Sergio Davies, Alessandro G. Di Nuovo:

Event-Driven Dynamic Attention for Multi-Object Tracking on Neuromorphic Hardware. 5055-5062 - Shrutarv Awasthi, Anas Gouda, Sven Franke, Jérôme Rutinowski, Frank Hoffmann, Moritz Roidl:

MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection. 5063-5071 - Gokul Raju Govinda Raju, Nikola Zubic, Marco Cannici, Davide Scaramuzza:

Perturbed State Space Feature Encoders for Optical Flow with Event Cameras. 5072-5081 - Muhammad Ahmed Humais, Xiaoqian Huang, Hussain M. Sajwani, Sajid Javed, Yahya H. Zweiri:

Spatio-Temporal State Space Model For Efficient Event-Based Optical Flow. 5082-5091 - Marcin Kowalczyk, Kamil Jeziorek, Tomasz Kryjak:

Learning from Noise: Enhancing DNNs for Event-Based Vision through Controlled Noise Injection. 5092-5102 - Piotr Wzorek, Krzysztof Blachut, Kamil Jeziorek, Tomasz Kryjak:

Live Demonstration: Real-time event-data processing with Graph Convolutional Neural Networks and SoC FPGA. 5103-5104 - Kaustav Chanda, Aayush Atul Verma, Arpitsinh Vaghela, Yezhou Yang, Bharatesh Chakravarthi:

Event Quality Score (EQS): Assessing the Realism of Simulated Event Camera Streams via Distance in Latent Space. 5105-5113 - Ning Zhang, Timothy Shea, Arto V. Nurmikko:

Event-based Tracking and Imaging of Randomly Moving Objects in Dense Dynamical Scattering Media. 5114-5125 - Youssef Farah, Federico Paredes-Vallés, Guido de Croon, Muhammad Ahmed Humais, Hussain M. Sajwani, Yahya H. Zweiri:

EV-LayerSegNet: Self-supervised Motion Segmentation using Event Cameras. 5126-5135 - Yuliang Wu, Han Han, Jinze Chen, Wei Zhai, Yang Cao, Zhengjun Zha:

BRAT: Bidirectional Relative Positional Attention Transformer for Event-based Eye tracking. 5136-5144 - Hongxiang Huang, Xiaopeng Lin, Hongwei Ren, Yue Zhou, Bojun Cheng:

Exploring Temporal Dynamics in Event-based Eye Tracker. 5145-5154 - Hoang M. Truong, Vinh-Thuan Ly, Huy G. Tran, Thuan-Phat Nguyen, Tram T. Doan:

Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling. 5155-5163 - Qinyu Chen, Chang Gao, Min Liu, Daniele Perrone, Yan Ru Pei, Zuowen Wang, Zhuo Zou, Shihang Tan, Tao Han, Guorui Lu, Zhen Xu, Junyuan Ding, Ziteng Wang, Zongwei Wu, Han Han, Yuliang Wu, Jinze Chen, Wei Zhai, Yang Cao, Zhengjun Zha, Nuwan Bandara, Thivya Kandappu, Archan Misra, Xiaopeng Lin, Hongxiang Huang, Hongwei Ren, Bojun Cheng, Hoang M. Truong, Vinh-Thuan Ly, Huy G. Tran, Thuan-Phat Nguyen, Tram T. Doan:

Event-based eye tracking. Even-based Vision Workshop 2025. 5164-5176
Exploring the Next Generation of Data
- Qiushi Guo, Shaoxiang Wang, Chun-Peng Chang, Jason R. Rambach:

CACP: Context-Aware Copy-Paste to Enrich Image Content for Data Augmentation. 5177-5186 - Ekaterina Redekop, Mara Pleasure, Vedrana Ivezic, Zichen Wang, Kimberly Flores, Anthony Sisk, William Speier, Corey W. Arnold:

Prototype-Guided Diffusion for Digital Pathology: Achieving Foundation Model Performance with Minimal Clinical Data. 5187-5195 - Yuxiang Lin, Jingdong Sun, Zhi-Qi Cheng, Jue Wang, Haomin Liang, Zebang Cheng, Yifei Dong, Jun-Yan He, Xiaojiang Peng, Xian-Sheng Hua:

Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models. 5196-5206
Open-World 3D Scene Understanding with Foundation Models
- Corentin Dumery, Aoxiang Fan, Ren Li, Nicolas Talabot, Pascal Fua:

Enforcing View-Consistency in Class-Agnostic 3D Segmentation Fields. 5207-5216 - Luis Wiedmann, Luca Wiehe, Dávid Rozenberszki:

DCSEG: Decoupled 3D Open-Set Segmentation using Gaussian Splatting. 5217-5226 - Yushan Bai, Shaohu Wang, Rongtao Xu, Yuchuang Tong, Chaoran Xu, Zhengtao Zhang:

Segment Any Primitive: Zero-Shot 3D Primitive Segmentation from Point Cloud. 5227-5235 - Hardik Shah, Jiaxu Xing, Nico Messikommer, Boyang Sun, Marc Pollefeys, Davide Scaramuzza:

ForesightNav: Learning Scene Imagination for Efficient Exploration. 5236-5245 - Jens Piekenbrinck, Christian Schmidt, Alexander Hermans, Narunas Vaskevicius, Timm Linder, Bastian Leibe:

OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting. 5246-5255 - Alexander Rusnak, Frédéric Kaplan:

HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering. 5256-5265
ReGenAI: Second Workshop on Responsible Generative AI
- János Horváth:

ECO-AI - Energy-Conscious Optimization for AI Training. 5266-5270 - Yunzhuo Chen, Jordan Vice, Naveed Akhtar, Nur Al Hasan Haldar, Ajmal Mian:

Dynamic watermarks in images generated by diffusion models. 5271-5277 - Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Shayakh Islam:

Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA. 5278-5282
The 7th Workshop on Precognition: Seeing through the Future
- Yuseon Kim, Kyongseok Park:

SRVP: Strong Recollection Video Prediction Model Using Attention-Based Spatiotemporal Correlation Fusion. 5283-5292 - Sriram Mandalika, Lalitha V, Athira Nambiar:

PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario. 5293-5301 - Huu-Thien Tran, Thanh-Dat Truong, Khoa Luu:

BIMA: Bijective Maximum Likelihood Learning Approach to Hallucination Prediction and Mitigation in Large Vision-Language Models. 5302-5311 - Quan Tran, Hoang-Thien Nguyen, Thanh-Huy Nguyen, Gia-Van To, Tien-Huy Nguyen, Quan Nguyen:

IGL-DT: Iterative Global-Local Feature Learning with Dual-Teacher Semantic Segmentation Framework under Limited Annotation Scheme. 5312-5321 - Tran Quoc Khanh Le, Nguyen Lan Vi Vu, Ha-Hieu Pham, Xuan-Loc Huynh, Tien-Huy Nguyen, Minh Huu Nhat Le, Quan Nguyen, Hien D. Nguyen:

HDC: Hierarchical Distillation for Multi-level Noisy Consistency in Semi-Supervised Fetal Ultrasound Segmentation. 5322-5331
The 6th Annual International Workshop And Prize Challenge on Agriculture-Vision: Challenges and Opportunities for Computer Vision in Agriculture
- Rajhans Singh, Rafael Bidese-Puhl, Kshitiz Dhakal, Sudhir Sornapudi:

Few-Shot Adaptation of Grounding DINO for Agricultural Domain. 5332-5342 - Luca Giovannesi, Paolo Russo, Roberto Beraldi:

Vit4V: a Video Classification Method for the Detection of Varroa Destructor from Honeybees. 5343-5351 - Bradley Ezard, Ling Li, Senjian An:

Multiple Instance Learning for Visual Grain Quality Analysis Without Instance-level Annotation. 5352-5359 - Daiwei Zhang, Joaquin Gajardo, Tomislav Medic, Isinsu Katircioglu, Mike Boss, Norbert Kirchgeßner, Achim Walter, Lukas Roth:

Wheat3DGS: In-field 3D Reconstruction, Instance Segmentation and Phenotyping of Wheat Heads with Gaussian Splatting. 5360-5370 - Nitin Rai, Arnold W. Schumann, Nathan Boyd:

PhytoSynth: Leveraging Multi-modal Generative Model for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach. 5371-5380 - Tieqiao Wang, Abhinav Jain, Liqiang He, Cindy Grimm, Sinisa Todorovic:

A Dataset for Semantic and Instance Segmentation of Modern Fruit Orchards. 5381-5391 - Earl Ranario, Lars Lundqvist, Heesup Yun, Brian N. Bailey, J. Mason Earles:

AGILE: A Diffusion-Based Attention-Guided Image and Label Translation for Efficient Cross-Domain Plant Trait Identification. 5392-5401 - Pedro Cisdeli, Gustavo Nocera Santiago, German Mandrini, Ignacio Antonio Ciampitti:

Maize ear sensing for on-farm yield predictions. 5402-5411 - Keyhan Najafian, Farhad Maleki, Lingling Jin, Ian Stavness:

A Semi-Self-Supervised Approach for Dense-Pattern Video Object Segmentation. 5412-5421 - Numair Nadeem, Muhammad Hamza Asad, Saeed Anwar, Abdul Bais:

MaskAdapt: Unsupervised Geometry-Aware Domain Adaptation Using Multimodal Contextual Learning and RGB-Depth Masking. 5422-5432 - Gerardus Croonen, Andreas Trondl, Julia Simon, Daniel Steininger:

SemanticSugarBeets: A Multi-Task Framework and Dataset for Inspecting Harvest and Storage Characteristics of Sugar Beets. 5433-5442 - Laura von Hirschhausen, Jannes S. Magnusson, Mykyta Kovalenko, Fredrik Boye, Tanay Rawat, Peter Eisert, Anna Hilsmann, Sebastian Pretzsch, Sebastian Bosse:

AppleGrowthVision: A large-scale stereo dataset for phenological analysis, fruit detection, and 3D reconstruction in apple orchards. 5443-5450 - Nazifa Azam Khan, Mikolaj Cieslak, Mark G. Eramian, Ian McQuillan:

Effectiveness of Training with Procedurally Generated Synthetic Images of Crop Plants. 5451-5461 - Manuel Knott, Divinefavour Odion, Sameer Sontakke, Anup Karwa, Thijs Defraeye:

Weakly Supervised Panoptic Segmentation for Defect-Based Grading of Fresh Produce. 5462-5471 - Kibon Ku, Talukder Z. Jubery, Elijah Rodriguez, Aditya Balu, Soumik Sarkar, Adarsh Krishnamurthy, Baskar Ganapathysubramanian:

SC-NeRF: NeRF-based Point Cloud Reconstruction using a Stationary Camera for Agricultural Applications. 5472-5481 - Amir Ehsan Niaraki Asli, Jansel Herrera-Gerena, Jeremy Roghair, Ali Jannesari:

Maximizing aerial detection of organic objects in non-exhaustively searchable survey area. 5482-5490 - Hamid Kamangir, Mona Hajiesmaeeli, J. Mason Earles:

California Crop Yield Benchmark: Combining Satellite Image, Climate, Evapotranspiration, and Soil Data Layers for County-Level Yield Forecasting of Over 70 Crops. 5491-5500 - Naitik Jain, Amogh Joshi, Mason Earles:

iNatAg: Multi-Class Classification Models Enabled by a Large-Scale Benchmark Dataset with 4.7M Images of 2, 959 Crop and Weed Species. 5501-5510 - Md Jaber Al Nahian, Tapotosh Ghosh, Farnaz Sheikhi, Farhad Maleki:

Agri-FM+: A Self-Supervised Foundation Model for Agricultural Vision. 5511-5523 - Abdellah Lakhssassi, Toqi Tahamid Sarker, Khaled R. Ahmed, Naoufal Lakhssassi, Khalid Meksem:

SoyStageNet: Balancing Accuracy and Efficiency for Real-Time Soybean Growth Stage Detection. 5524-5533 - Tushar Shinde:

An Efficient and Scalable Framework for Lightweight Crop Disease Recognition in Low-Resource Settings. 5534-5541
8th Workshop and Competition on Affective & Behavior Analysis in-the-wild
- Dominick Reilly, Srijita Das, Srijan Das:

Learning Pose-aware Representations in Vision Transformers for Understanding Activities of Daily Living. 5542-5551 - Mohd Aquib, Nishchal K. Verma, M. Jaleel Akhtar:

Decoupling Identity Confounders for Enhanced Facial Expression Recognition: An Information-Theoretic Approach. 5552-5561 - Sarosij Bose, Hannah Dela Cruz, Arindam Dutta, Elena Kokkoni, Konstantinos Karydis, Amit K. Roy-Chowdhury:

Leveraging Synthetic Adult Datasets for Unsupervised Infant Pose Estimation. 5562-5571 - Dimitrios Kollias, Panagiotis Tzirakis, Alan Cowen, Stefanos Zafeiriou, Irene Kotsia, Eric Granger, Marco Pedersoli, Simon Bacon, Alice Baird, Chris Gagne, Chunchang Shao, Guanyu Hu, Soufiane Belharbi, Muhammad Haseeb Aslam:

Advancements in Affective and Behavior Analysis: The 8th ABAW Workshop and Competition. 5572-5583 - Hatef Otroshi-Shahreza, Anjith George, Sébastien Marcel:

Face Reconstruction from Face Embeddings using Adapter to a Face Foundation Model. 5584-5593 - Ujjal Kr Dutta, Guan-Ming Su:

DAF: Distillation, Augmentation and Filtering based Framework for Efficient Smartphone Human Activity Recognition. 5594-5602 - Ziruo Li, Chi Xu, Xiang Li, Shuqiong Wu, Yasushi Yagi:

Is Multi-Person Gait Recognition Feasible under Mutual Occlusion? A Human Model Regression-based Approach. 5603-5613 - Ankit Birla, Akshay Agarwal:

Advancing Facial Age Progression for Occluded Faces. 5614-5622 - Ahmed S. Abdelrahman, Mohamed A. Abdel-Aty, Quoc Dai Tran:

VRU-CIPI: Crossing Intention Prediction at Intersections for Improving Vulnerable Road Users Safety. 5623-5632 - SangEun Lee, Yubeen Lee, Eunil Park:

EmoVLM-KD: Fusing Distilled Expertise with Vision-Language Models for Visual Emotion Analysis. 5633-5642 - JunGyu Lee, Kunyoung Lee, Haesol Park, Ig-Jae Kim, Gi Pyo Nam:

V-NAW: Video-based Noise-aware Adaptive Weighting for Facial Expression Recognition. 5643-5650 - Yuheng Liang, Zheyu Wang, Feng Liu, Mingzhou Liu, Yu Yao:

Mamba-VA: A Mamba-based Approach for Continuous Emotion Recognition in Valence-Arousal Space. 5651-5656 - Helen Schneider, Svetlana Pavlitska, Helen Gremmelmaier, Marius Zöllner:

Datasets for Valence and Arousal Inference: A Survey. 5657-5664 - Josep Cabacas-Maso, Elena Ortega-Beltrán, Ismael Benito-Altamirano, Carles Ventura:

Enhancing Facial Expression Recognition with LSTM through Dual-Direction Attention Mixed Feature Networks and CLIP. 5665-5671 - João Alves, Pia Haubro Andersen, Rikke Gade:

Read My Ears! Horse Ear Movement Detection for Equine Affective State Assessment. 5672-5680 - Hajer Guerdelli, Claudio Ferrari, Stefano Berretti, Alberto Del Bimbo:

Multimodal Emotion Prediction in Interpersonal Videos Integrating Facial and Speech Cues. 5681-5690 - Jiho Choi, Sang Jun Lee:

MMDrive: Multi-modal Remote Physiological Signal Measurement Dataset for Driver Status Monitoring. 5691-5698 - Jun Yu, Yongqi Wang, Lei Wang, Yang Zheng, Shengfan Xu:

Interactive Multimodal Framework with Temporal Modeling for Emotion Recognition. 5699-5706 - Jun Yu, Yang Zheng, Lei Wang, Yongqi Wang, Shengfan Xu:

Cross-Modal Facial Expression Recognition with Global Channel-Spatial Attention: Modal Enhancement and Proportional Criterion Fusion. 5707-5714 - Tobias Hallmen, Robin-Nico Kampa, Fabian Deuser, Norbert Oswald, Elisabeth André:

Semantic Matters: Multimodal Features for Affective Analysis. 5715-5724 - Jun Yu, Yunxiang Zhang, Fengzhao Sun, Leilei Wang, Renjie Lu, Lingsi Zhu, Xilong Lu, Yang Zheng, Yongqi Wang:

Towards Robust Multimodal AU Detection: STN-Enhanced Visual Encoding and Audio-Visual Spatial-Temporal Alignment. 5725-5732 - Jun Yu, Lingsi Zhu, Yanjun Chi, Yunxiang Zhang, Yang Zhen, Yongqi Wang, Xilong Lu:

Dual-Stage Cross-Modal Network with Dynamic Feature Fusion for Emotional Mimicry Intensity Estimation. 5733-5740 - Gnana Praveen Rajasekhar, Jahangir Alam, Eric Charton:

United we stand, Divided we fall: Handling Weak Complementarity for Audio-Visual Emotion Recognition in Valence-Arousal Space. 5741-5751 - Narges Rashvand, Ghazal Alinezhad Noghre, Armin Danesh Pazho, Babak Rahimi Ardabili, Hamed Tabkhi:

Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose. 5752-5761 - Chao Yuan, Tianyi Zhang, Guanglin Niu:

Neighbor-Based Feature and Index Enhancement for Person Re-Identification. 5762-5769 - Xilong Lu, Jun Yu, Yunxiang Zhang, Lingsi Zhu, Yang Zheng, Yongqi Wang, Qiang Ling:

Robust Stage-Wise LVLM Adaptation: Multi-Phase Prompt Lora Fine-tuning for Compound Expression Recognition. 5770-5777 - Andrey V. Savchenko, Lyudmila V. Savchenko:

Leveraging Lightweight Facial Models and Textual Modality in Audio-visual Emotional Understanding in-the-Wild. 5778-5788 - Vrushank Ahire, Kunal Shah, Mudasir Nazir Khan, Nikhil Pakhale, Lownish Rai Sookha, Mudasir Ahmad Ganaie, Abhinav Dhall:

MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network. 5789-5799 - Mohammadmahdi Honarmand, Onur Cezmi Mutlu, Parnian Azizian, Saimourya Surabhi, Dennis P. Wall:

Selective Test-time Domain Adaptation Using Fisher Information for Robust Facial Expression Recognition In-the-wild. 5800-5810 - Aviral Chharia, Tianyu Ren, Tomotake Furuhata, Kenji Shimada:

Safe-Construct: Redefining Construction Safety Violation Recognition as 3D Multi-View Engagement Task. 5811-5820
11th IEEE International Workshop on Computer Vision in Sports
- Thomas Gossard, Andreas Ziegler, Andreas Zell:

TT3D: Table Tennis 3D Reconstruction. 5821-5831 - Hossein Feizollah Zadeh Khoiee, David R. Labbé, Thomas Romeas, Jocelyn Faubert, Sheldon Andrews:

Multi-person Physics-based Pose Estimation for Combat Sports. 5832-5841 - Daniel Kienzle, Robin Schön, Rainer Lienhart, Shin'ichi Satoh:

Towards Ball Spin and Trajectory Analysis in Table Tennis Broadcast Videos via Physically Grounded Synthetic-to-Real Transfer. 5842-5851 - Katja Ludwig, Yuliia Oksymets, Robin Schön, Daniel Kienzle, Rainer Lienhart:

Efficient 2D to Full 3D Human Pose Uplifting including Joint Rotations. 5852-5861 - Hoang Quoc Nguyen, Ankhzaya Jamsrandorj, Vanyi Chao, Yin May Oo, Muhammad Amrulloh Robbani, Kyung-Ryoul Mun, Jinwook Kim:

VNL-STES: A Benchmark Dataset and Model for Spatiotemporal Event Spotting in Volleyball Analytics. 5862-5871 - Katja Ludwig, Julian Lorenz, Daniel Kienzle, Tuan Bui, Rainer Lienhart:

Leveraging Anthropometric Measurements to Improve Human Mesh Estimation and Ensure Consistent Body Shapes. 5872-5881 - Victor Gaspar, Anthony Cioppa, Jan Held, Silvio Giancola, Marc Braham, Adrien Deliège, Bernard Ghanem, Marc Van Droogenbroeck:

An End-to-End Pipeline for Virtual Banner Replacement in Football Broadcasts. 5882-5893 - Hong-Qi Chen, Chao-Chi Liao, Yuan-Heng Sun, Cheng-Kuan Lin, Yu-Chee Tseng:

FieldMOT: A Field-Registered Multi-Object Tracking for Sports Videos. 5894-5904 - Fengshun Wang, Qiurui Wang, Dan Chen:

From Beats to Scores: A Multi-Modal Framework for Comprehensive Figure Skating Assessment. 5905-5914 - Christian Keilstrup Ingwersen, Rasmus Tirsgaard, Rasmus Nylander, Janus Nørtoft Jensen, Anders Bjorholm Dahl, Morten Rieger Hannemose:

Two Views Are Better than One: Monocular 3D Pose Estimation with Multiview Consistency. 5915-5925 - Tzu-Chun Chiu, Ming-Han Lee, Kun-Ru Wu, Yu-Shuen Wang, Yu-Chee Tseng:

Virtual Pose Coach: A Motion-Retargeting Approach for Pose Training. 5926-5934 - Yin May Oo, Ankhzaya Jamsrandorj, Vanyi Chao, Hoang Quoc Nguyen, Yewon Hwang, Kyung-Ryoul Mun, Jinwook Kim:

Jump-Aware: Player Position Rectification and Identification in Dynamic Sports Using Jump Event Spotting. 5935-5944 - Calvin Yeung, Tomohiro Suzuki, Ryota Tanaka, Zhuoer Yin, Keisuke Fujii:

AthletePose3D: A Benchmark Dataset for 3D Human Pose Estimation and Kinematic Validation in Athletic Movements. 5945-5956 - Amadou S. Sangare, Adrien Maglo, Baptiste Engel, Mohamed Chaouch:

Towards fine-grained spatial control for soccer game image generation. 5957-5966 - Yizhou Xu, Lars Bretzner, Tiesheng Wang, Atsuto Maki:

Skor-xG: SKeleton-ORiented Expected Goal Estimation in Soccer. 5967-5977 - Marc Gutiérrez-Pérez, Antonio Agudo:

SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding. 5978-5987 - Seunghyeon Jung, Seoyoung Hong, Jiwoo Jeong, Seungwon Jeong, Jaerim Choi, Hoki Kim, Woojin Lee:

CaddieSet: A Golf Swing Dataset with Human Joint Features and Ball Information. 5988-5996 - Liang Fan, Xiaoqian Liu, Malcolm Roberts:

Sport Field Calibration with NeRF-guided Camera Optimization from a Single Image. 5997-6006 - Lorenza Prospero, Abdullah Hamdi, João F. Henriques, Christian Rupprecht:

GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers. 6007-6017 - Ruisheng Han, Kanglei Zhou, Amir Atapour-Abarghouei, Xiaohui Liang, Hubert P. H. Shum:

FineCausal: A Causal-Based Framework for Interpretable Fine-Grained Action Quality Assessment. 6018-6027 - Vladimir Golovkin, Nikolay Nemtsev, Vasyl Shandyba, Oleg Udin, Nikita Kasatkin, Pavel Kononov, Anton Afanasiev, Sergey Ulasen, Andrei Boiarov:

From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction. 6028-6038 - Tomasz Stanczyk, Seongro Yoon, François Brémond:

No Train Yet Gain: Towards Generic Multi-Object Tracking in Sports and Beyond. 6039-6048 - Yamato Hokari, Ryosuke Hori, Hideo Saito:

Human Mesh Reconstruction of Sports Players with Multiple Dynamic Cameras. 6049-6059 - Liam Salass, Jerrin Bright, Amir Nazemi, Yuhao Chen, John S. Zelek, David A. Clausi:

Ice Hockey Puck Localization Using Contextual Cues. 6060-6069 - Floriane Magera, Thomas Hoyoux, Martin Castin, Olivier Barnich, Anthony Cioppa, Marc Van Droogenbroeck:

Can Geometry Save Central Views for Sports Field Registration? 6070-6079 - Mohamad Dalal, Artur Xarles, Anthony Cioppa, Silvio Giancola, Marc Van Droogenbroeck, Bernard Ghanem, Albert Clapés, Sergio Escalera, Thomas B. Moeslund:

Action Anticipation from SoccerNet Football Video Broadcasts. 6080-6091 - Bhat Dittakavi, Swarnim Maheshwari, Vineeth N. Balasubramanian:

Pose-to-Pose: A New Task and Benchmark for Human Pose Transition in Yoga. 6092-6101 - Lukasz Grad:

Single-Stage Uncertainty-Aware Jersey Number Recognition in Soccer. 6102-6110 - Tiancheng Jiang, Henry Wang, Md Sirajus Salekin, Parmida Atighehchian, Shinan Zhang:

Domain Adaptation of VLM for Soccer Video Understanding. 6111-6121 - Puntawat Ponglertnapakorn, Supasorn Suwajanakorn:

Where Is The Ball: 3D Ball Trajectory Estimation From 2D Monocular Tracking. 6122-6131 - Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés:

Action Valuation in Sports: A Survey. 6132-6142 - Dheeraj Khanna, Jerrin Bright, Yuhao Chen, John S. Zelek:

SportMamba: Adaptive Non-Linear Multi-Object Tracking with State Space Models for Team Sports. 6143-6153 - Qi Gan, Sao Mai Nguyen, Eric Fenaux, Stéphan Clémençon, Mounim A. El-Yacoubi:

Polar Coordinate-Based 2D Pose Prior with Neural Distance Field. 6154-6162 - Anna Maschek, David C. Schedl:

The Way Up: A Dataset for Hold Usage Detection in Sport Climbing. 6163-6171 - Muhammad Saif Ullah Khan, Stephan Krauß, Didier Stricker:

Towards Unconstrained 2D Pose Estimation of the Human Spine. 6172-6181
4th Monocular Depth Estimation Challenge
- Anton Obukhov, Matteo Poggi, Fabio Tosi, Ripudaman Singh Arora, Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden, Shuaihang Wang, Zhenxin Ma, Weijie Chen, Baobei Xu, Fengyu Sun, Di Xie, Jiang Zhu, Mykola Lavreniuk, Haining Guan, Qun Wu, Yupei Zeng, Chao Lu, Huanran Wang, GuangYuan Zhou, Haotian Zhang, Jianxiong Wang, Qiang Rao, Chunjie Wang, Xiao Liu, Zhiqiang Lou, Hualie Jiang, Yihao Chen, Rui Xu, Minglang Tan, Zihan Qin, Yifan Mao, Jiayang Liu, Jialei Xu, Yifan Yang, Wenbo Zhao, Junjun Jiang, Xianming Liu, Mingshuai Zhao, Anlong Ming, Wu Chen, Feng Xue, Mengying Yu, Shida Gao, Xiangfeng Wang, Gbenga Omotara, Ramy Farag, Jacket Demby's, Seyed Mohamad Ali Tousi, Guilherme N. DeSouza, Tuan-Anh Yang, Minh-Quang Nguyen, Thien-Phuc Tran, Albert Luginov, Muhammad Shahzad:

The Fourth Monocular Depth Estimation Challenge. 6182-6195
AI for Creative Visual Content Generation, Editing and Understanding
- Amin Fadaeinejad, Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got, Trevor Anderson, Amaury Depierre, Nikolaus F. Troje, Marcus A. Brubaker, Marc-André Carbonneau:

Geometry-Aware Texture Generation for 3D Head Modeling with Artist-driven Control. 6196-6206 - Aupendu Kar, Guan-Ming Su:

Temporal Consistent Semantic Video Color Transfer from Multiple References. 6207-6215 - Dongchao Wen, Zijian Chen, Weihong Deng, Yujiang Tian, Hongzhi Shi, Yingjie Zhang, Xingchen Cui, Jian Zhao, Lingyan Liang, Mei Wang:

Semantic-Aware Local Image Editing with a Single Mask Operation. 6216-6225 - Mathis Koroglu, Hugo Caselles-Dupré, Guillaume Jeanneret, Matthieu Cord:

OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models. 6226-6236 - Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan:

Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis. 6237-6246 - Nityanand Mathur, Shyam Marjit, Abhra Chaudhuri, Anjan Dutta:

CLIPDraw++: Text-to-Sketch Synthesis with Simple Primitives. 6247-6256 - Alessio Borgi, Luca Maiano, Irene Amerini:

Z-SASLM: Zero-Shot Style-Aligned SLI Blending Latent Manipulation. 6257-6266 - Ruihan Zhang, Borou Yu, Jiajian Min, Yetong Xin, Zheng Wei, Juncheng Nemo Shi, Mingzhen Huang, Xianghao Kong, Nix Liu Xin, Shanshan Jiang, Praagya Bahuguna, Mark Chan, Khushi Hora, Lijian Yang, Yongqi Liang, Runhe Bian, Yunlei Liu, Isabela Campillo Valencia, Patricia Morales Tredinick, Ilia Kozlov, Sijia Jiang, Peiwen Huang, Na Chen, Xuanxuan Liu, Anyi Rao:

Generative AI for Film Creation: A Survey of Recent Advances. 6267-6279 - Alan Dolhasz, Chen Ma, Dave Gausebeck, Kevin Chen, Gregor Miller, Lucas Hayne, Gunnar Hovden, Azwad Sabik, Olaf Brandt, Mira Slavcheva:

Defurnishing with X-Ray Vision: Joint Removal of Furniture from Panoramas and Mesh. 6280-6290 - Mo Zhou, Josh Myers-Dean, Danna Gurari:

PartStickers: Generating Parts of Objects for Rapid Prototyping. 6291-6301 - Anne-Sofie Maerten, Li-Wei Chen, Stefanie De Winter, Christophe Bossens, Johan Wagemans:

LAPIS: A novel dataset for personalized image aesthetic assessment. 6302-6311 - Anurag Dutta, Arnab Kumar Das, Ruchira Naskar, Rajat Subhra Chakraborty:

WaveDIF: Wavelet sub-band based Deepfake Identification in Frequency Domain. 6312-6321 - Desai Xie, Zhan Xu, Yicong Hong, Hao Tan, Difan Liu, Feng Liu, Arie E. Kaufman, Yang Zhou:

Progressive Autoregressive Video Diffusion Models. 6322-6332 - Masud An Nur Islam Fahim, Nazmus Saqib, Jani Boutellier:

STAM: Zero-Shot Style Transfer using Diffusion Model via Attention Modulation. 6333-6343 - Matthew Poska, Sharon X. Huang, Bin Hwang:

HopNet: Harmonizing Object Placement Network for Realistic Image Generation via Object Composition. 6344-6354
2nd Workshop on Efficient and On-Device Generation (EDGE)
- Jiuqiang Tang, Raman Sorokin, Ekaterina Ignasheva, Grant Jensen, Lin Chen, Juhyun Lee, Andrei Kulik, Matthias Grundmann:

Scaling On-Device GPU Inference for Large Generative Models. 6355-6364 - Elia Peruzzo, Adil Karjauv, Nicu Sebe, Amir Ghodrati, AmirHossein Habibian:

ADAPTOR: Adaptive Token Reduction for Video Diffusion Transformers. 6365-6371 - Weiyun Jiang, Devendra K. Jangid, Seok-Jun Lee, Hamid R. Sheikh:

Latent Patched Efficient Diffusion Model For High Resolution Image Synthesis. 6372-6378 - Mahsa Ardakani, Jinendra Malekar, Ramtin Zand:

LLMPi: Optimizing LLMs for High-Throughput on Raspberry Pi. 6379-6388 - Chaitanya Patel, Juan Carlos Niebles, Ehsan Adeli:

AdaVid: Adaptive Video-Language Pretraining. 6389-6398 - Josef Bengtson, David Nilsson, Fredrik Kahl:

Geometric Consistency Refinement for Single Image Novel View Synthesis via Test-Time Adaptation of Diffusion Models. 6399-6408
Domain Generalization: Evolution, Breakthroughs, and Future Horizons
- Xiaobing Yu, Jin Yang, Xiao Wu, Peijie Qiu, Xiaofeng Liu:

FM-LoRA: Factorized Low-Rank Meta-Prompting for Continual Learning. 6409-6418 - Rangel Daroya, Deepak Chandran, Subhransu Maji, Andrea Fanelli:

T-SAM: Transductive Learning for Segment Anything Model. 6419-6428 - Yusaku Takama, Ning Ding, Tatsuya Yokota, Toru Tamaki:

Separating Shared and Domain-Specific LoRAs for Multi-Domain Learning. 6429-6437 - Ashish Singh, Michael Jones, Kuan-Chuan Peng, Anoop Cherian, Moitreya Chatterjee, Erik G. Learned-Miller:

Improving Open-World Object Localization by Discovering Background. 6438-6447 - Jia Wei, Xiaoqi Zhao, Jonghye Woo, Jinsong Ouyang, Georges El Fakhri, Qingyu Chen, Xiaofeng Liu:

Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation. 6448-6458 - Reiji Saito, Kazuhiro Hotta:

Domain Generalization through Attenuation of Domain-Specific Information. 6459-6468 - Taero Kim, Seonggyun Lee, Joonseong Kang, Youngjun Choi, Wonsang Yun, Nicole Hee-Yeon Kim, Ziyu Chen, Lexing Xie, Kyungwoo Song:

IMC: A Benchmark for Invariant Learning under Multiple Causes. 6469-6478 - Renu Sharma, Debasmita Pal, Arun Ross:

Task-conditioned Ensemble of Expert Models for Continuous Learning. 6479-6488 - Kristi Topollai, Anna Choromanska:

Task-Level Contrastiveness for Cross-Domain Few-Shot Learning. 6489-6499 - Rajat Sahay, Andreas E. Savakis:

MoPEFT: A Mixture-of-PEFTs for the Segment Anything Model. 6500-6510 - Behraj Khan, Rizwan Qureshi, Nouman M. Durrani, Tahir Qasim Syed:

Confidence-calibrated covariate shift correction for few-shot classification in Vision-Language Models. 6511-6523 - Syed Bilal Ahsan, Muhammad Ikhalas, Muhammad Muzamil Khan, Sana Ullah, Muhammad Zaigham Zaheer:

ARDGen: Augmentation Regularization for Domain-Generalized Medical Report Generation. 6524-6533 - Aniruddh Sikdar, Arya Kishor, Ishika Kadam, Suresh Sundaram:

PiCaZo: Pixel-Aligned Contrastive Learning for Zero-Shot Domain Adaptation. 6534-6544 - Agil Aghasanli, Yi Li, Plamen Angelov:

Prototype-Based Continual Learning with Label-free Replay Buffer and Cluster Preservation Loss. 6545-6554 - Manjunath D, Shrikar Madhu, Aniruddh Sikdar, Suresh Sundaram:

VISTA-CLIP: Visual Incremental Self-Tuned Adaptation for Efficient Continual Panoptic Segmentation. 6555-6563 - Trisanth Srinivasan, Santosh V. Patapati:

PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications. 6564-6572
Catch UAVs that Want to Watch You: Detection and Tracking of Unmanned Aerial Vehicle (UAV) in the Wild and the 4th Anti-UAV Workshop & Challenge
- Yu-Hsi Chen:

Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID. 6573-6582 - Haolin Qin, Tianhao Li, Tingfa Xu, Jingxuan Xu, Yuqiang Fang, Jianan Li:

PPTracker: Tracking UAV Swarms with Prior Prompt. 6583-6590 - Ihsan Emre Üstün, Cevahir Çigla:

The Power of Augmentations in IR Object Detection. 6591-6600 - Wenzhen Wang, Jing Fu, Jiayi Song, Kaiyu Li, Hui Qiao, Jiang Liu, Hao Sun, Xiangyong Cao:

Dist-Tracker: A Small Object-aware Detector and Tracker for UAV Tracking. 6601-6609 - Shimou Ling, Shengkai Gan, Caoxin Wang, Lili Pan, Hongliang Li:

Enhancing Few-Shot Class-Incremental Learning via Frozen Feature Augmentation. 6610-6618 - Xiaolong Cui, Liu Wan, Lingqi Kong, Jimin Li, Chaojie Zhang, Ruohan Zhao, Panlong Wu, Shan He:

StrongSiamTracker: A Siamese Tracker with Dynamic Global Detection for Robust Anti-UAV Tracking. 6619-6629 - Chenxu Peng, Chenxu Wang, Minrui Zou, Danyang Li, Zhengpeng Yang, Yimian Dai, Ming-Ming Cheng, Xiang Li:

A Simple Detector with Frame Dynamics is a Strong Tracker. 6630-6640 - Jiahao Zhang, Yixin Wei, Jinli Zhang, Zongli Jiang, Peiwen Yu, Yufei Ma, Runan Jin:

DLST: Dual-Template Co-Evolution Learning for Robust Long-Term Drone Tracking in Dynamic Environments. 6641-6649 - Erik Tegler, Max Modig, Per Skarin, Kalle Åström, Magnus Oskarsson, Gabrielle Flood:

Detection and Localization of Drones and UAVs Using Sound and Vision. 6650-6658 - Yifei Dong, Fengyi Wu, Sanjian Zhang, Guangyu Chen, Yuzhi Hu, Masumi Yano, Jingdong Sun, Siyu Huang, Feng Liu, Qi Dai, Zhi-Qi Cheng:

Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions. 6659-6673

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














