中国科学技术大学 Li Li--Home--Challenge

Li Li

Supervisor of Doctorate Candidates

Supervisor of Master's Candidates

E-Mail:

Administrative Position:Professor

Business Address:West Campus of USTC

Alma Mater:University of Science and Technology of China

Discipline:Information and Communication Engineering

MOBILE Version

Click:Times

The Last Update Time: ..

Current position: Home >> Challenge

Custom columns

The 4^th practical end-to-end image/video compression challenge

1. Challenge introduction and description

End-to-end image/video compression has been a research focus for both academia and industry for nearly 10 years. Several technologies have been developed such as auto-encoder neural networks, probability estimation neural networks, condition-based end-to-end video coding framework and so on. Until recently, the performances of both end-to-end image and video compression schemes have surpassed that of the H.266/Versatile Video Coding (VVC) under certain test conditions. To promote its practical use, we think it is time to consider the complexities of the end-to-end image/video compression schemes, especially the decoding complexities. To be more specified, we constrain the kmac/pixel for more hardware-friendly solutions. Compared with our proposal of last year, we further make the constraint closer to practical solutions.

The participants are required to compress all images/videos defined in the Test Dataset. In the end-to-end image compression track, the actual bits per pixel (bpp) is not allowed to exceed a target bpp, which is set to the bpp of the test image coded by BPG using the quantization parameter 28. In the end-to-end video compression track, the actual bitrate is not allowed to exceed a target bitrate (kbps), which is set to the bitrate of the test sequence coded by VTM using the quantization parameter 27 under the random-access configuration.

In the following, detailed information of each track will be further provided.

2. End-to-end image compression track

2.1 Dataset

l Training and Validation Dataset: A collection of about 1600 high-resolution images will be provided as the training and validation dataset. Participants are free to split the provided images into training and validation datasets. Participants are also free to use some other dataset for training and validation.

l Test Dataset: 20 images with resolution 4K will be used for the evaluation. All the images will be in RGB color space and PNG file format. These images will be distributed to all the participants before a certain date. Participants are required to compress them within 72 hours.

2.2 Evaluation metrics

The performance Q is evaluated by the PSNR, which is calculated using the average PSNR of the R, G, and B components. The decoder complexity shall be constrained by 50kmac/pixel. The complexity will be measured by thop with CPU inference.

The confirguation used to run BPG is:

chroma_format=444

bit_depth=8

level=8

command="${bpgenc_cmd} -q ${q} -f ${chroma_format} -b ${bit_depth} -m ${level} -o ${output_file} ${img_path}"

2.3 Submission requirements

l The participants are requested to submit a decoder along with a docker environment and the corresponding script which can run the decoder.

l The participants are requested to submit the compressed bitstreams. The bitstreams shall be named like I01.bin.

l The participants are requested to submit the decoded images. The decoded images shall be named like I01dec.png

l The participants are requested to submit a document describing the solution as detailed as possible in no more than 5 pages using the VCIP format.

3. End-to-end video compression track

3.1 Dataset

l Training and Validation Dataset: It is recommended to use the UVG and CDVL dataset for training. Participates are free to split the provided videos into training and validation dataset. Participates are also free to use some other dataset for training and validation.

l Test Dataset: 10 video sequences in the resolution of 1080p will be used for evaluation. Each sequence contains 96 frames. All the sequences will be in YUV 4:2:0 color space. These video sequences will be distributed to all participates before a certain date. Participates are required to compress them within 72 hours.

3.2 Evaluation metrics

The decoded video sequences will be evaluated in YUV 4:2:0 color space. The weighted average PSNR=(6×PSNR_Y + PSNR_U + PSNR_V ) / 8 of the Y, U, and V components will be used to evaluate the distortion of the decoded video sequences. An anchor of VTM-17.0 coded with QP=27 under random access configuration defined in the VTM common test conditions (encoder_randomaccess_vtm.cfg) will be provided. The actual bitrate (kbps) of the bitstream of each video sequence is not allowed to exceed the target kbps of the test video coded by the anchor. The intra period in the proposed submission shall be no larger than that used by the anchor. The decoder complexity shall be constrained by 100kmac/pixel.

3.3 Submission requirements

l The participants are requested to submit a decoder along with a docker environment and the corresponding script which can run the decoder.

l The participants are requested to submit the compressed bitstreams. The bitstreams shall be named like V01.bin

l The participants are requested to submit the decoded video sequences. The decoded video sequences shall be named like V01dec.yuv

l The participants are requested to submit a document describing the solution as detailed as possible in no more than 5 pages using the VCIP format.

4. Deadlines

l Jun. 15, registration for the competition. The authors can send the team’s name, team members, and the institution to lil1@ustc.edu.cn or cmjia@pku.edu.cn for registration

l Jun. 15, release of the training and validation dataset

l Jul. 21, VCIP challenge paper submission

l Sept. 1, submission of the decoder and docker environment

l Sept. 1, release of the test Dataset

l Sept. 5, submission of the compressed bitstreams and decoded images/videos.

l Sept. 15, winners and leader boards notification.

l Sept. 15, VCIP challenge paper acceptance notification.

l Dec. 1-4, challenge session at the VCIP 2025 conference. The winners will receive winner certificates provided by the VCIP organization committee. Selected teams will be invited to present at the conference.

5. Organizers

l Li Li, University of Science and Technology of China

l Chuanmin Jia, Peking University

l For any inquiries, please email us at: lil1@ustc.edu.cn; cmjia@pku.edu.cn

Li Li received the B.S. and Ph.D. degrees in electronic engineering from University of Science and Technology of China (USTC), Hefei, Anhui, China, in 2011 and 2016, respectively. He was a visiting assistant professor in University of Missouri-Kansas City from 2016 to 2020. He joined the department of electronic engineering and information science of USTC as a research fellow in 2020 and became a professor in 2022. His research interests include image/video/point cloud coding and processing. He has authored or co-authored more than 80 papers in international journals and conferences. He has more than 20 granted patents. He has several technique proposals adopted by standardization groups. He received the received the DAMO YOUNG FELLOW Award 2024 and Multimedia Rising Star 2023. He and his students won multiple image and video compression challenges in ICIP and ISCAS. He serves as an associate editor for IEEE Transactions on Circuits and Systems for Video Technology and IEEE Transactions on Multimedia.

Chuanmin Jia received the B.E. degree in computer science from Beijing University of Posts and Telecommunications, Beijing, China, in 2015 and the Ph.D. degree in computer science from Peking University, Beijing, China, in 2020. He was a visiting student in New York University, USA, in 2018. He is currently an Assistant Professor with the Wangxuan Institute of Computer Technology, Peking University. His research interests include intelligent video compression and multimedia signal processing. He received the Best Paper Award of PCM 2017, Best Paper Award of IEEE MM 2018, Best Student Paper Award of IEEE MIPR 2019, Best Paper Award of IEEE WCSP 2024, top performance award of ISCAS 2023 Grand Challenge on Neural Network-based Video Coding, and winner of Challenge on Learned Image Compression (CLIC) 2024. He received the ACM China SIGMM Rising Star 2023. He has served as organizing committee of IEEE VCIP2023 and PCS2024, an elected member for MSA-TC and VSPC-TC of IEEE CAS Society.

6. Sponsorship

Tencent Media Lab focuses on the R&D of cutting-edge technologies in the field of multimedia and related areas. Industry-leading multimedia data compression engine, immersive media solutions and intelligent media toolkit have been serving massive global users through Tencent Cloud, Tencent Video, Tencent Games and other business platforms, which help the development of media communication, culture, tourism and other industries. On behalf of Tencent, Tencent Media Lab has been actively participating in the development of various international standards. Tencent Media Lab members have been serving in dozens of key roles such as co-chairs and standard editors. Hundreds of technical proposals have been adopted into international standards. Tencent Media Lab members have received many awards such as the Engineering Emmy Award, Lumiere Technology Award, and the ISO/IEC Excellence Award.