Supervisor of Doctorate Candidates
Supervisor of Master's Candidates
E-Mail:
Administrative Position:Professor
Business Address:West Campus of USTC
Alma Mater:University of Science and Technology of China
Discipline:Information and Communication Engineering
The Last Update Time: ..
The 3rd practical end-to-end image/video compression challenge
1. Challenge motivation and description
End-to-end image/video compression has been a research focus for both academia and industry for over seven years. A number of technologies have been developed such as auto-encoder neural networks, probability estimation neural networks, and conditional end-to-end video coding framework and so on. Until recently, the performances of both end-to-end image and video compression schemes have surpassed that of the H.266/Versatile Video Coding (VVC) under certain test conditions. To promote its practical use, we think it is time to consider the complexities of the end-to-end image/video compression schemes, especially the decoding complexities.
This challenge calls for novel end-to-end image/video compression algorithms which can result in a good R-D performance under certain complexity constraints. Last year, we will set a proper weight in the quality metric to balance the performance and decoding complexity for both the end-to-end image and video compression tracks. In addition to that, we add another track to constrain the kmac/pixel for more hardware-friendly solutions. Furthermore, another important feature for practical end-to-end image/video compression solutions is the cross-platform consistency. Therefore, for this year, it is needed for the submitted bitstream to be decoded and reconstructed successfully by the platform provided by the organizers.
The participates are required to compress all images/videos defined in the Test Dataset. In the end-to-end image compression track, the actual bits per pixel (bpp) is not allowed to exceed a target bpp, which is set to the bpp of the test image coded by BPG using the quantization parameter 28. In the end-to-end video compression track, the actual bitrate is not allowed to exceed a target bitrate (kbps), which is set to the bitrate of the test sequence coded by VTM using the quantization parameter 27 under the random access configuration.
In the following, the detailed information of each track will be further provided.
2. End-to-end image compression track
2.1 Dataset
Training and Validation Dataset: A collection of about 1600 high-resolution images will be provided as the training and validation dataset. Participates are free to split the provided images into training and validation dataset. Participates are also free to use some other dataset for training and validation.
Test Dataset: 20 images with resolution 4K will be used for the evaluation. All the images will be in RGB color space and PNG file format. These images will be distributed to all the participates before a certain date. Participates are required to compress them within 72 hours.
2.2 Evaluation metrics
2.2.1 Track 1
The performance Q will be evaluated by a weighted sum of the delta PSNR and the decoding complexity,
Q = w × ΔPSNR - dTime
where PSNR is calculated using the average PSNR of the R, G, and B components. ΔPSNR is calculated by subtracting the PSNR of BPG from that of the proposed method. dTime is measured by the seconds used for neural network model loading, entropy decoding, and image reconstruction with a GeForce RTX 4090 GPU provided by the organizers. Therefore, it is also required for the methods to be decoded successfully by the GeForce RTX 4090 GPU provided by the organizers. w is set to 1 to achieve a good balance between performance and decoding complexity.
2.2.2 Track 2
The performance Q is evaluated by the PSNR, which is calculated using the average PSNR of the R, G, and B components. The decoder complexity shall be constrained by 100kmac/pixel.
2.3 Submission requirements
The participates are requested to submit a decoder along with a docker environment and the corresponding script which can run the decoder.
The participates are requested to submit the compressed bitstreams. The bitstreams shall be named like I01.bin
The participates are requested to submit the decoded images. The decoded images shall be named like I01dec.png
3. End-to-end video compression track
3.1 Dataset
Training and Validation Dataset: It is recommended to use the UVG and CDVL dataset for training. Participates are free to split the provided videos into training and validation dataset. Participates are also free to use some other dataset for training and validation.
Test Dataset: 10 video sequences in the resolution of 1080p will be used for evaluation. Each sequence contains 96 frames. All the sequences will be in YUV 4:2:0 color space. These video sequences will be distributed to all participates before a certain date. Participates are required to compress them within 72 hours.
3.2 Evaluation metrics
The decoded video sequences will be evaluated in YUV 4:2:0 color space. The weighted average PSNR = ( 6 * PSNRY + PSNRU + PSNRV)/8 of the Y, U, and V components will be used to evaluate the distortion of the decoded video sequences. An anchor of VTM-17.0 coded with QP = 27 under random access configuration defined in the VTM common test conditions (encoder_randomaccess_vtm.cfg) will be provided. The actual bitrate (kbps) of the bitstream of each video sequence is not allowed to exceed the target kbps of the test video coded by the anchor. The intra period in the proposed submission shall be no larger than that used by the anchor.
3.2.1 Track 1
The performance Q will be evaluated by a weighted sum of the delta PSNR and the decoding complexity,
Q = w × ΔPSNR - dTime
where ΔPSNR is calculated by subtracting the PSNR of VTM from that of the proposed method. dTime is measured by the seconds used for both entropy decoding and video reconstruction with a GeForce RTX 4090 GPU provided by the organizers. dTime is measured by the average time per frame for each video. Therefore, it is also required for the methods to be decoded successfully by the GeForce RTX 4090 GPU provided by the organizers. w is set to 1 to provide a good balance between the complexity and the performance.
3.2.2 Track 2
The performance Q is evaluated by the weighted PSNR of the Y, U, and V components, PSNR = ( 6 * PSNRY + PSNRU + PSNRV)/8. The decoder complexity shall be constrained by 100kmac/pixel.
3.3 Submission requirements
The participates are requested to submit a decoder along with a docker environment and the corresponding script which can run the decoder.
The participates are requested to submit the compressed bitstreams. The bitstreams shall be named like V01.bin
The participates are requested to submit the decoded video sequences. The decoded video sequences shall be named like V01dec.yuv
4. Deadlines
Jul. 1: registration for the competition. The authors can send the team’s name, team members, and the institution to lil1@ustc.edu.cn or cmjia@pku.edu.cn for registration
Jul. 1: release of the training and validation dataset
Jul. 31: deadline of the challenge paper submission
Aug. 10: notification of the challenge paper acceptance
Aug. 15: submission of the camera-ready paper
Sept. 1: submission of the decoder and docker environment
Sept. 2: release of the test Dataset
Sept. 6: submission of the compressed bitstreams and decoded images/videos.
Sept. 15: winners and leader boards notification.
Oct. 2-4: challenge session at the MMSP 2024 conference. The winners will receive winner certificates provided by the MMSP organization committee. Selected teams will be invited to present at the conference.
5. Organizers
Li Li, University of Science and Technology of China
Chuanmin Jia, Peking University
For any inquiries, please email us at: lil1@ustc.edu.cn; cmjia@pku.edu.cn
6. Sponsorship
This challenge is sponsored by the Shanghai Shuangshen Information Technology Co., Ltd (ATTRSense) with a sponsorship of $500 for the winner of each track. ATTRSense is a company targeting “AI for Codec and Codec for AI”. Here is a brief introduction about ATTRSense:
Founded in June 2020, Shanghai Shuangshen Information Technology Co., Ltd is dedicated to revolutionizing traditional image and video codec technology with AI technology. They aim to provide compression products and solutions ranging from algorithms to chip levels for various industries such as security, power grid, internet, healthcare, and metaverse. Their solutions address the challenges of transmitting, storing, processing, and analyzing large volumes of unstructured data.
More than 80% of the company's personnel are dedicated to research and development. They have recruited talents from top universities both in China and abroad, such as Peking University, Zhejiang University, Shanghai Jiaotong University, University of Science and Technology of China, Fudan University, and University of Michigan. The company has also developed ANF, a self-developed image codec that is the world's first AI end-to-end codec for mobile terminals. The codec can be used for real-time coding and has excellent compression performance.
7. End-to-end image/video compression results for MMSP 2024
The winners of each team are announced as follows:
Image compression (track 1) winner: USTC-iVC
Image compression (track 2) winner: USTC-iVC
Video compression (track 1) winner: USTC-iVC
Video compression (track 2) winner: BVI-VC