中国科学技术大学 zhangjie--Home--Publications

张结

Associate professor

E-Mail:

Administrative Position:Associate Professor

Education Level:Postgraduate (Doctoral)

Business Address:Room 416 of Kejishiyan west Building, West campus, University of Science and Technology of China (USTC)

Contact Information:jzhang6@ustc.edu.cn

Degree:Dr

Alma Mater:Delft University of Technology (TU Delft)

Discipline:Information and Communication Engineering

MOBILE Version

Click:Times

The Last Update Time: ..

Current position: Home >> Publications

Custom columns

Journal publications（Google Scholar）：

Qingtian Xu, Jie Zhang*, Miao Sun, Huadong Liang, Xin Li, Zhenhua Ling, Analysis of Brain-Assisted Speech Enhancement Models Incorporating Auditory Attention Decoding, IEEE Trans. Audio, Speech, Lang. Process. (TASLP), 34:3073-3086, 2026. (pdf, code and USTC EEG-Speech dataset)
Haowei Chen, Jie Zhang*, A Collaborative Microphone Clustering Framework for Multi-Task Distributed Microphone Arrays, IEEE Trans. Audio, Speech, Lang. Process. (TASLP), 34:3174-3188, 2026. (pdf, code)
Yanwen Li, Jie Zhang*, Huawei Chen, Susanto Rahardja, OBFNet: A Neural Beamforming Network Using Orthonormal Basis Functions, IEEE Trans. Audio, Speech, Lang. Process. (TASLP), 34: 1342-1357, 2026. (pdf)
Zhengzhe Zhang, Jie Zhang*, Haoyin Yan, Hengshuang Liu, Junhua Liu, A Fully Complex-Valued Underwater Acoustic Signal Enhancement Model for Passive Sonar Systems, IEEE Open J. of Signal Process. (OJSP), 7: 101-115, DOI:10.1109/OJSP.2026.3656063, 2026. (pdf)
Shuang Zhang, Jie Zhang*, Yichi Wang, Haoyin Yan, DOA or Speaker Embedding: Which is Better for Multi-Microphone Target Speaker Extraction, IEEE Signal Processing Letters (SPL), 32: 3350-3354, DOI: 10.1109/LSP.2025.3600168, 2025. (pdf)
Qingtian Xu, Jie Zhang*, Zhenhua Ling, Input-Independent Subject-Adaptive Channel Selection for Brain-Assisted Speech Enhancement, IEEE J. Selected Topics in Signal Process. (JSTSP), 19(4): 658-670, DOI: 10.1109/JSTSP.2025.3558653, 2025. (pdf, code)
Jie Zhang, Haoyin Yan, Xiaofei Li, A Composite Predictive-Generative Approach to Monaural Universal Speech Enhancement, IEEE Trans. Audio, Speech, Lang. Process. (TASLP), DOI: 10.1109/TASLPRO.2025.3577387，33: 2312-2325, 2025. (pdf, code&demos)
Jie Zhang, Qingtian Xu, Zhenhua Ling, Haizhou Li, Sparsity-driven EEG channel selection for brain-assisted speech enhancement, Internal Technical Report, 2025. (arXiv preprint, code)
Chengqian Jiang, Haoyin Yan, Jie Zhang*, A Collaborative Neural Speech Enhancement Framework for Distributed Microphone Arrays, submitted to IEEE Trans. Audio, Speech, Lang. Process. (TASLP), 2025.
Miao Sun, Chengli Sun, Cairong Zou, Jie Zhang, Dan Xiang, Modeling of Multi-Electrode Epicardial Electrograms for Conductivity Estimation in Atrial Fibrillation, IEEE Access, 13: 90545-90557, 2025. (pdf)
Qiu-Shi Zhu, Long Zhou*, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, Lirong Dai, Daxin Jiang, Jinyu Li, Furu Wei, VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning, IEEE Trans. Multimedia (TMM), 26:1055-1064, 2024. (pdf, code)
Jie Zhang, Rui Tao, Jun Du*, Li-Rong Dai, SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction, IEEE/ACM Trans. Audio, Speech, Lang. Process., 31:3176-3189, 2023. (pdf)
Ye-Qian Du, Jie Zhang*, Xin Fang, Ming-Hui Wu, Zhou-Wang Yang, A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition, IEEE/ACM Trans. Audio, Speech, Lang. Process., 31:3908-3921, 2023.(pdf)
Bin Gu, Jie Zhang*, Wu Guo, A Dynamic Convolution Framework for Session-Independent Speaker Embedding Learning, IEEE/ACM Trans. Audio, Speech, Lang. Process., 31:3647-3658, 2023.(pdf)
Qiu-Shi Zhu, Jie Zhang*, Zi-Qiang Zhang, Li-Rong Dai, A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition, IEEE/ACM Trans. Audio, Speech, Lang. Process., 31: 1927-1939, 2023. (pdf)
Jie Zhang, Rui Tao, Jun Du*, Li-Rong Dai, Energy-Efficient Sparsity-Driven Speech Enhancement in Wireless Acoustic Sensor Networks, IEEE/ACM Trans. Audio, Speech, Lang. Process., 31: 215-228, 2023.(pdf)
Bin Gu, Wu Guo*, Jie Zhang, Memory Storable Network Based Feature Aggregation for Speaker Representation Learning, IEEE/ACM Trans. Audio, Speech, Lang. Process., 31: 643-655, 2023.(pdf)
Guanghui Zhang, Jie Zhang*, Ke Liu, Jing Guo, Jack Y. B. Lee, Haibo Hu, and Vaneet Aggarwal, "DUASVS: A Mobile Data Saving Strategy in Short-form Video Streaming", IEEE Trans. Services Computing, 16(2):1066-1078, 2023. (pdf)
Guanghui Zhang, Jie Zhang*, Yan Liu, Haibo Hu, Jack Lee, Vaneet Aggarwal, "Adaptive Video Streaming with Automatic Quality-of-Experience Optimization", IEEE Trans. Mobile Computing, 22(8):4456-4470, 2023. (pdf)
Jie Zhang, Guanghui Zhang*, Li-Rong Dai, "Frequency-invariant sensor selection for MVDR beamforming in wireless acoustic sensor networks", IEEE Trans. Wireless Communications, 21(12): 10648-10661, 2022. (pdf)
Jie Zhang, Guanghui Zhang*, "A parametric unconstrained beamformer based binaural noise reduction for assistive hearing", IEEE/ACM Trans. Audio, Speech, Lang. Process., 30:292-304, 2022. (pdf)
Jie Zhang, Changheng Li*, "Quantization-aware binaural MWF based noise reduction incorporating external wireless devices", IEEE/ACM Trans. Audio, Speech, Lang. Process., 29:3118-3131, 2021. (pdf)
Jian Tang, Jie Zhang*, Yan Song, Ian Mcloughlin, Li-Rong Dai, "Multi-granularity sequence alignment mapping for encoder-decoder based end-to-end ASR", IEEE/ACM Trans. Audio, Speech, Lang. Process., 29:2816-2828, 2021. (pdf, Python code)
Jie Zhang*, Jun Du, Li-Rong Dai, "Sensor selection for relative acoustic transfer function steered linearly-constrained beamformers", IEEE/ACM Trans. Audio, Speech, Lang. Process., 29:1220-1232, 2021. (pdf)
Jie Zhang*, Huawei Chen, Li-Rong Dai, Richard C. Hendriks, "A study on reference microphone selection for multi-microphone speech enhancement", IEEE/ACM Trans. Audio, Speech, Lang. Process., 29:1220-1232, 2021. (pdf, matlab code)
Jie Zhang*, "Power optimized and power constrained randomized gossip approaches for wireless sensor networks", IEEE Wireless Communications Letters, 10(2):241-245, 2020. (pdf)
Jie Zhang, Pingping Wu* "Joint sampling synchronization and source localization for wireless acoustic sensor networks", IEEE Communications Letters, 24(5):1020-1023, 2020. (pdf)
Jie Zhang*, Richard Heusdens, Richard C. Hendriks, "Relative acoustic transfer function estimation in wireless acoustic sensor networks", IEEE/ACM Trans. Audio, Speech, Lang. Process., 27(10):1507–1579, 2019. (pdf, matlab code)
Jie Zhang*, Andreas I. Koutrouvelis, Richard Heusdens, Richard C. Hendriks, "Distributed rate-constrained LCMV beamforming", IEEE Signal Process. Letters, 26(5):675–679, 2019. (pdf, matlab examples)
Jie Zhang*, Richard Heusdens, Richard C. Hendriks, "Rate-distributed spatial filtering based noise reduction in wireless acoustic sensor network", IEEE/ACM Trans. Audio, Speech, Lang. Process., 26(11):2015–2026, 2018. (pdf, matlab examples)
Jie Zhang*, Sundeep Prabhakar Chepuri, Richard Heusdens, Richard C. Hendriks, "Microphone subset selection for MVDR beamformer based noise reduction", IEEE/ACM Trans. Audio, Speech, Lang. Process., 26(3):550–563, 2018. (pdf, matlab examples)
Cheng Pang, Hong Liu*, Jie Zhang*, Xiaofei Li, "Binaural sound localization based on reverberation weighting and generalized parametric mapping", IEEE/ACM Trans. Audio, Speech, Lang. Process., 25(8):1618–1632, 2017. (pdf)
Jie Zhang, Hong Liu*, "Robust acoustic localization via time-delay compensation and interaural matching filter", IEEE Trans. Signal Process., 63(18):4771–4783, 2015. (pdf)
Hong Liu, Mengdi Yue, Jie Zhang, "Bi-direction interaural matching filter and decision weighting fusion for sound source localization in noisy environments", IEICE Trans. Information and Systems, 99(12):3192-3196, 2016.

Domestic journal publications：

张政哲, 刘恒双, 刘梦, 张结, 蒋承乾, 方昕, 基于环境先验的轻量级水声信号增强方法, 声学技术，2026.
凌震华，张结，从“听得到”到“听得清”——我们离听觉重建还有多远，光明日报，212(05语言文字)，2025年8月3日。
Yubang Zhang, Qiushi Zhu, Qingtian Xu, Jie Zhang*, Integrating time-frequency domain shallow and deep features for speech-EEG match-mismatch of auditory attention decoding, J. Shanghai Jiao Tong Univ. (Sci.), 2025. (pdf, weblink)
张结，呼德，张晓雷，凌震华，分布式麦克风阵列拾音理论与应用方法综述，数据采集与处理，39(5):1085-1113, 2024. (pdf, 2024年度高影响力论文，全年10/126、当期唯一入选)
Luzhen Xu, Haoyin Yan, Maokui He, Zixian Guo, Yeping Zhou, Peiqi Liu, Jie Zhang*, Lirong Dai, Multi-Frame Cross-Channel Attention and Speaker Diarization Based Speaker-Attributed Automatic Speech Recognition System for Multi-Channel Multi-Party Meeting Transcription, J. Shanghai Jiao Tong Univ. (Sci.), 2024. (pdf, weblink)
单小岩，张结，戴礼荣，钢琴音乐转写中音乐相关特性研究，小型微型计算机系统，45(10):2305-2311, 2024.（pdf）
Jie Zhang, Luzhen Xu, Lirong Dai, Low-complexity energy-aware sensor selection for noise reduction in distributed microphone networks, J. Uni. of Science and Technology of China (JUSTC), 53(4): 0402, 2023.(pdf)
Cheng Pang, Xiuling Wang, Jie Zhang, Hong Liu*, "Mandarin accent identification based on GMM with multi-feature fusion", J. Huazhong Uni. of Science and Technology: Science edition, (S1):381-384, 2015.

Conference publications:

Chengqian Jiang, Jie Zhang*, Haoyin Yan, CaSNet: Compress-and-Send Network Based Multi-Device Speech Enhancement Model for Distributed Microphone Arrays, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 20906-20910, Barcelona, Spain, May 4-9, 2026. (pdf, code)
Yubang Zhang, Jie Zhang*, Yusan Yasin, NeuroCapsNet: A Neuro-Inspired Capsule Network for Multi-Direction Auditory Spatial Attention Detection, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 22672-22676, Barcelona, Spain, May 4-9, 2026. (pdf, code, slides)
Cheng-Zhan Sui, Shuang Zhang, Jie Zhang*, The USTC-NERCSLIP System for the RASE Challenge, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 21865-21867, Barcelona, Spain, May 5-9, 2026. (Radar Acoustic Speech Enhancement, RASE challenge, the 3rd place, pdf)
Jianwei Cui, Shihao Chen, Yu Gu, Jie Zhang, Liping Chen, Na Li, Chengxing Li, Shan Yang, Lirong Dai, Sinba: Singing-to-Accompaniment Generation with Pitch Guidance via Mamba-Based Language Model, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Honolulu, USA, Dec. 5-8, 2025.
Yu Guan, Wu Guo, Jie Zhang, Zhijun Zhang, Fusing Multi-layer Features of the Pre-trained Model With Grouped Cross Attention for Spoofing Speech Detection, Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1-6, Singapore, 22-24 October, 2025. (pdf)
Zhijun Zhang, Wu Guo, Jie Zhang, Yu Guan, Fusing Blocked Deep Features of Pre-Trained Models for Short-Duration Speaker Verification, Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1-6, Singapore, 22-24 October, 2025. (pdf)
Lipeng Dai, Qing Wang*, Jie Zhang, Shengyu Peng, Yu Guan, Wu Guo, Leveraging Multi-Level Features of ATST with Conformer-Based Dual-Branch Network for Sound Event Detection, ISCA Interspeech, pp. 2595-2599, Rotterdam, The Netherlands, Aug. 17-21, 2025. (pdf)
Shengyu Peng, Wu Guo*, Jie Zhang, Yu Guan, Lipeng Dai, Zuoliang Li, Parameter-Efficient Fine-tuning with Instance-Aware Prompt and Parallel Adapters for Speaker Verification, ISCA Interspeech, pp. 1643-1647, Rotterdam, The Netherlands, Aug. 17-21, 2025. (pdf)
Jingyuan Wang, Jie Zhang*, Shihao Chen, Miao Sun, A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Hyderabad, India, 2025. (pdf, code)
Yichi Wang, Jie Zhang*, Chengqian Jiang, Weitai Zhang, Zhongyi Ye, Lirong Dai, Leveraging Boolean Directivity Embedding for Binaural Target Speaker Extraction, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Hyderabad, India, 2025. (pdf, code)
Jie Zhang*, Chengqian Jiang, Yichi Wang, Haoyin Yan, Miao Sun, Learning-Based Utility Estimation with Application to Speech Enhancement of a Moving Speaker, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Hyderabad, India, 2025. (pdf)
Haoyin Yan, Jie Zhang*, Cunhang Fan, Yeping Zhou, Peiqi Liu, LiSenNet: Lightweight Sub-band and Dual-Path Modeling for Real-Time Speech Enhancement, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Hyderabad, India, 2025. (pdf)
Keying Zuo#, Qingtian Xu#, Jie Zhang*, Zhenhua Ling, Geometry-Constrained EEG Channel Selection for Brain-Assisted Speech Enhancement, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Hyderabad, India, 2025. (pdf)
Shengyu Peng, Wu Guo, Jie Zhang, Zuoliang Li, Yu Guan, Bin Gu, Yang Ai*, A Study of Multi-Scale Feature Learning From Pre-Trained Models on Speaker Verification, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Hyderabad, India, 2025. (pdf)
Zuoliang Li, Yang Ai*, Jie Zhang, Shengyu Peng, Yu Guan, Bin Gu, Wu Guo, Aligning Noisy-Clean Speech Pairs at Feature and Embedding Levels for Learning Noise-Invariant Speaker Representations, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Hyderabad, India, 2025. (pdf)
Cunhang Fan, Youdian Gao, Zexu Pan, Jingjing Zhang, Hongyu Zhang, Jie Zhang, Zhao Lv, Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Hyderabad, India, 2025. (pdf)
Jianwei Cui, Yu Gu, Shihao Chen, Jie Zhang*, Liping Chen, Lirong Dai*, CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder, 39th AAAI Conf. on Artificial Intelligence (AAAI-25), Pennsylvania, USA, 25 February - 4 March 2025. (pdf, demos)
Qiushi Zhu, Jie Zhang*, Yu Gu, Yuchen Hu, Lirong Dai, Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-modal Speech Representation, 38th AAAI Conf. on Artificial Intelligence (AAAI-24), Vancouver, Canada, pp. 19768-19776, 20-27 February 2024. (pdf)
Zhuhai Li, Jie Zhang*, Wu Guo, Multi-layer Feature Augmentation Based Transferable Adversarial Examples Generation for Speaker Recognition, Int. Conf. on Intelligent Computing (ICIC), vol.14865, pp. 373-385, 2024. (pdf, weblink)
Yubang Zhang, Jie Zhang*, Zhenhua Ling, The NERCSLIP-USTC System for Track2 of The First Chinese Auditory Attention Decoding Challenge, The 14th Int. Symposium on Chinese Spoken Lang. Process. (ISCSLP), pp.319-323, Beijing, China, Nov. 2024. (ChineseAAD (Chinese Auditory Attention Decoding) challenge cross-subject track winner) (pdf, code)
Shihao Chen, Yu Gu, Jianwei Cui, Jie Zhang*, Rilin Chen, Lirong Dai, LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation, The 14th Int. Symposium on Chinese Spoken Lang. Process. (ISCSLP), pp.309-313, Beijing, China, Nov. 2024. (pdf)
Hengyu Li, Kangdi Mei, Zhaoci Liu, Yang Ai, Liping Chen, Jie Zhang*, Zhenhua Ling, Refining Self-Supervised Learnt Speech Representation using Brain Activations, ISCA Interspeech, pp.1480-1484, Kos Island, Greece, Sep. 2024. (pdf)
Zhuhai Li, Jie Zhang*, Wu Guo, Haochen Wu, Boosting the Transferability of Adversarial Examples with Gradient-Aligned Ensemble Attack for Speaker Recognition, ISCA Interspeech, pp.532-536, Kos Island, Greece, Sep. 2024. (pdf)
Haochen Wu, Wu Guo, Shengyu Peng, Zhuhai Li, Jie Zhang, Adapter Learning from Pre-trained Model for Robust Spoof Speech Detection, ISCA Interspeech, pp.2095-2099, Kos Island, Greece, Sep. 2024. (pdf)
Haochen Wu, Wu Guo, Zhentao Zhang, Wenting Zhao, Shengyu Peng, Jie Zhang, Spoofing Speech Detection by Modeling Local Spectro-Temporal and Long-term Dependency, ISCA Interspeech, pp.507-511, Kos Island, Greece, Sep. 2024. (pdf)
Shihao Chen, Yu Gu, Jie Zhang*, Na Li, Rilin Chen, Liping Chen, Lirong Dai, LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance, ISCA Interspeech, pp.2770-2774, Kos Island, Greece, Sep. 2024. (pdf)
Shengyu Peng, Wu Guo, Haochen Wu, Zuoliang Li, Jie Zhang*, Fine-tune Pre-Trained Models with Multi-Level Feature Fusion for Speaker Verification, ISCA Interspeech, pp.2110-2114, Kos Island, Greece, Sep. 2024. (pdf)
Zuoliang Li, Wu Guo, Bin Gu, Shengyu Peng, Jie Zhang*, Contrastive Learning and Inter-Speaker Distribution Alignment Based Unsupervised Domain Adaptation for Robust Speaker Verification, ISCA Interspeech, pp.3794-3798, Kos Island, Greece, Sep. 2024. (pdf)
Kang Zhong, Jie Zhang*, Wu Guo, Document-Level Machine Translation with Effective Batch-Level Context Representation, Int. Joint Conf. on Neural Networks (IJCNN), pp. 1-8, Yokohama, Japan, June 30- July 5, 2024. (pdf)
Qingtian Xu, Jie Zhang*, Zhenhua Ling, An End-to-End EEG Channel Selection Method with Residual Gumbel Softmax for Brain-Assisted Speech Enhancement, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Seoul, Korea, pp.10131-10135, 14-19 April, 2024. (pdf)
Yichi Wang, Jie Zhang*, Shihao Chen, Weitai Zhang, Zhongyi Ye, Xinyuan Zhou, Lirong Dai, A Study of Multichannel Spatiotemporal Features and Knowledge Distillation on Robust Target Speaker Extraction, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Seoul, Korea, pp.431-435, 14-19 April, 2024. (pdf)
Zhuhai Li, Wu Guo, Jie Zhang*, Generating High-Quality Adversarial Examples with Universal Perturbation-Based Adaptive Network and Improved Perceptual Loss, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Seoul, Korea, pp. 10376-10380, 14-19 April, 2024. (pdf)
Haochen Wu, Jie Zhang*, Zhentao Zhang, Wenting Zhao, Bin Gu, Wu Guo, Robust Spoof Speech Detection based on Multi-scale Feature Aggregation and Dynamic Convolution, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Seoul, Korea, pp.10156-10160, 14-19 April, 2024. (pdf)
Jianwei Cui, Yu Gu, Chao Weng, Jie Zhang*, Liping Chen, Lirong Dai, Sifisinger: A High-Fidelity End-to-End Singing Voice Synthesizer Based on Source-Filter Model, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Seoul, Korea, pp. 11126-11130, 14-19 April, 2024. (pdf)
Minghui Wu, Luzhen Xu, Jie Zhang*, Haitao Tang, Yanyan Yue, Ruizhi Liao, Jintao Zhao, Zhengzhe Zhang, Yichi Wang, Haoyin Yan, Hongliang Yu, Tongle Ma, Jiachen Liu, Chongliang Wu, Yongchao Li, Yanyong Zhang, Xin Fang, Yue Zhang, The USTC-NERCSLIP Systems for the ICMC-ASR Challenge, IEEE Int. Conf. Acoust., Speech, Signal Process. Workshops (ICASSPW), Seoul, Korea, pp.3-4, 14-19 April, 2024. (ICMC-ASR Challenge, the first place of both ASR and ASDR tracks) (pdf, leaderboard)
Shihao Chen, Liping Chen*, Jie Zhang, KongAik Lee, Zhenhua Ling, Lirong Dai, Adversarial Speech for Voice Privacy Protection from Personalized Speech Generation, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Seoul, Korea, pp.11411-11415, 14-19 April, 2024. (pdf)
Pang Deng, Shihao Chen, Weitai Zhang, Jie Zhang*, Lirong Dai, The USTC’s Dialect Speech Translation System for IWSLT 2023, International Conference on Spoken Language Translation (IWSLT), pp. 102-112, Toronto, Canada, July 13-14, 2023. (IWSLT2023语音翻译比赛方言和离线赛道冠军) (pdf)
Mohan Shi, Jie Zhang*, Zhihao Du, Fan Yu, Qian Chen, Shiliang Zhang, Lirong Dai, A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-Party Meetings, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA-ASC), pp. 1943-1948, Taipei, Taiwan, Oct. 2023. (pdf)
Xiaoying Zhao, Qiushi Zhu, Jie Zhang*, Yeping Zhou, Peiqi Liu, Speech Enhancement with Multi-Granularity Vector Quantization, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA-ASC), pp. 1937-1942, Taipei, Taiwan, Oct. 2023. (pdf)
Pan Deng, Jie Zhang*, Xinyuan Zhou, Zhongyi Ye, Weitai Zhang, Jianwei Cui, Lirong Dai, Learning Semantic Information from Machine Translation to Improve Speech-To-Text Translation, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA-ASC), pp. 954-959, Taipei, Taiwan, Oct. 2023. (pdf)
Haochen Wu, Zhuhai Li, Luzhen Xu, Zhentao Zhang, Wenting Zhao, Bin Gu, Yang Ai, Yexin Lu, Jie Zhang*, Zhenhua Ling, Wu Guo, The USTC-NERCSLIP System for the Track 1.2 of Audio Deepfake Detection (ADD 2023) Challenge, Proceedings of IJCAI 2023 Workshop on Deepfake Audio Detection and Analysis (DADA), pp. 119-124, August 19, 2023, Macao, China. (First place of Track 1.2 for fake audio detection) http://addchallenge.cn/add2023#RESULTS. (IJCAI-DADA2023伪造语音检测赛道冠军) (pdf)
Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du*, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Chen, Jie Zhang, Yuzhe Weng, Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023, in Proc. ACM Int. Conf. Multimedia (ACM MM), pp. 9531-9535, 2023. (Third place of Track 1) (MER2023多模态情感识别挑战赛第3名) (pdf)
Jie Zhang, Qingtian Xu, Qiushi Zhu, Zhenhua Ling, BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions, ISCA Interspeech, Dublin, Ireland, pp. 3117-3121, Aug. 2023. (pdf, code)
Jingyuan Wang, Jie Zhang*, Lirong Dai, Real-Time Causal Spectro-Temporal Voice Activity Detection Based on Convolutional Encoding and Residual Decoding, ISCA Interspeech, Dublin, Ireland, pp. 5062-5066, Aug. 2023. (pdf)
Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang*, Lirong Dai, Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction, ISCA Interspeech, Dublin, Ireland, pp. 5047-5051, Aug. 2023. (pdf)
Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang*, Lirong Dai, CASA-ASR: Context-Aware Speaker-Attributed ASR, ISCA Interspeech, Dublin, Ireland, pp. 411-415, Aug. 2023. (pdf)
Jie Zhang, Rui Tao, Li-Rong Dai, A Speech Distortion Weighted Single-Channel Wiener Filter Based STFT-Domain Noise Reduction, The 22nd IEEE Statistical Signal Processing Workshop (SSP), pp. 527-531, Hanoi, Vietnam, July, 2023. (pdf)
Haitao Xu#, Liangfa Wei#, Jie Zhang*, Jianming Yang, Yannan Wang, Tian Gao, Xin Fang, Lirong Dai, A Multi-Scale Lightweight Network for Audio-Visual Speech Enhancement, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 1-5, Rhodes Island, Greece, June 2023. (pdf)
Qiu-Shi Zhu, Long Zhou, Jie Zhang*, Shu-Jie Liu, Yu-Chen Hu, Li-Rong Dai, Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 1-5, Rhodes Island, Greece, June 2023. (pdf)
Haoyin Yan, Haitao Xu, Qing Wang, Jie Zhang*, The NERCSLIP-USTC system for the L3DAS23 Challenge Task2: 3D sound event localization and detection (SELD), IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 1-2, Rhodes Island, Greece, June 2023. (3D Speech Enhancement and 3D Sound Event Localization and Detection (L3DAS23)比赛声音事件定位任务第2名) (pdf, leaderboard)
Xiao-Ying Zhao, Qiu-Shi Zhu, Jie Zhang*, "Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA-ASC), Chiang Mai, Thailand, pp. 330-334, Dec. 2022. (pdf)
Guolong Zhong, Hongyu Song, Ruoyu Wang, Lei Sun, Diyuan Liu, Jia Pan, Xin Fang, Jun Du, Jie Zhang*, Li-Rong Dai, "External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge", ISCA Interspeech, Incheon, South Korea, pp. 4860-4864, Sept. 2022. (pdf)
Yeqian Du, Jie Zhang*, Qiu-Shi Zhu, Li-Rong Dai, Minghui Wu, Xin Fang, and Zhouwang Yang, "A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition", ISCA Interspeech, Incheon, South Korea, pp. 2613-2617, Sept. 2022. (pdf)
Hai-Tao Xu, Jie Zhang*, Li-Rong Dai, "Differential Time-frequency Log-mel Spectrogram Features for Vision Transformer Based Infant Cry Recognition", ISCA Interspeech, Incheon, South Korea, pp. 1963-1967, Sept. 2022. (pdf)
Zi-Qiang Zhang, Jie Zhang*, Jian-Shu Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai, "Learning contextually fused audio-visual representations for audio-visual speech recognition", IEEE Int. Conf. on Image Process. (ICIP), Bordeaux, France, pp. 1346-1350, Oct. 2022. (pdf)
Ao-Ran Gan, Jie Zhang*, Ming-Hui Wu, Xin Fang, Li-Rong Dai, "An experimental comparison between low-resource semi-supervised and high-resource supervised automatic speech recognition models", IEEE Int. Conf. on Multimedia & Expo (ICME), Taipei, Taiwan, pp. 1-6, July, 2022. (pdf)
Qiu-Shi Zhu, Jie Zhang*, Zi-Qiang Zhang, Minghui Wu, Xin Fang, and Li-Rong Dai, "A noise-robust self-supervised pre-training model based speech representation learning for automatic speech recognition", IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 3174-3178, Singapore, May 2022. (pdf)
Xing-yu Chen, Qiu-Shi Zhu, Jie Zhang*, and Li-Rong Dai, "Supervised and self-supervised pretraining based COVID-19 detection using acoustic breathing/cough/speech signals", IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 561-565, Singapore, May 2022. (The Second DiCOVA Competition Winner) (pdf)
Xing-yu Chen, Jie Zhang*, and Li-Rong Dai, "Reference microphone selection and low-rank approximation based multichannel Wiener filter with application to speech recognition", IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 4963-4967, Singapore, May 2022. (pdf, matlab code)
Jie Zhang*, "A parametric unconstrained binaural beamformer based noise reduction and spatial cue preservation for hearing-assistive devices", IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 791-795, Toronto, Canada, 2021. (pdf)
Qiu-Shi Zhu, Jie Zhang*, Minghui Wu, Xin Fang, and Li-Rong Dai, "An improved wav2vec 2.0 pre-training approach using enhanced local dependency modeling for speech recognition", ISCA Interspeech, pp. 4334-4338, Brno, Czechia, Sept. 2021. (pdf)
Muhammad Muzamil Aslam, Jie Zhang*, Bushra Qureshi, Zahoor Ahmed, "Beyond6G-Consensus Traffic Management in CRN, Applications, Architecture and key Challenges", IEEE 11th Int. Conf. on Electronics Information and Emergency Communication (ICEIEC), pp. 182-185, June, 2021. (pdf)
Liangfa Wei, Jie Zhang*, Junfeng Hou, and Li-Rong Dai, "Attentive fusion enhanced audio-visual encoding for transformer based robust speech recognition", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA-ASC), pp. 638-643, Auckland, New Zealand, Dec. 2020. (pdf)
Jie Zhang*, Richard Heusdens, and Richard C. Hendriks, "Sensor selection and rate distribution based beamforming for wireless acoustic sensor networks", EURASIP Europ. Signal Proces. Conf. (EUSIPCO), pp. 1-5, A Coruna, Spain, Sept. 2019. (pdf)
Jie Zhang*, Richard Heusdens, and Richard C. Hendriks, "Rate-distributed binaural LCMV beamforming for assistive hearing in wireless acoustic sensor networks", IEEE Workshop on Sensor Array and Multichannel Signal Process. (SAM), pp. 460-464, Sheffield, U.K., Jul. 2018. (Best Paper Award) (pdf)
Jie Zhang*, Richard C. Hendriks, and Richard Heusdens, "Structured total least squares based internal delay estimation for distributed microphone auto-localization, Int. Workshop Acoustic Signal Enhancement (IWAENC), pp. 1-5, Xi'an, China, Sept. 2016. (Best Paper Finalist) (pdf)
Jie Zhang*, Richard C. Hendriks, and Richard Heusdens, "Greedy gossip algorithm with synchronous communication for wireless sensor networks", In 6th Joint WIC/IEEE Symposium on Information Theory and Signal Processing in the Benelux, pp. 228-235, Belgium, April, 2016. (pdf)
Hong Liu, Mengdi Yue, Jie Zhang, "Probabilistic binaural multiple sources localization based on time-delay compensation estimator and clustering analysis", IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 4537-4544, Daejeon, South Korea, Oct. 2016. (pdf)
Ling Chen, Jie Zhang, Guodong Chen, Meng Zhang, and Hong Liu, "Binaural cues estimates based on interaural matching filter for sound source localization", IEEE Int. Conf. on Robotics and Biomimetics (ROBIO), pp. 863-868, Shenzhen, China, 2016. (pdf)
Jie Zhang, Hong Liu, "A dual-channel beamformer based on time-delay compensation estimator and shifted PCA for speech enhancement", IEEE 23rd Int. Conf. on Software, Telecommunications and Computer Networks (SoftCOM), pp. 180-184, Split, Crotia, 2015. (pdf)
Hong Liu, Cheng Pang, Jie Zhang, "Binaural sound source localization based on generalized parametric model and two-layer matching strategy in complex environments", IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 4496-4503, Seattle, WA, USA, May 2015. (pdf)
Cheng Pang, Jie Zhang and Hong Liu, "Direction of arrival estimation based on reverberation weighting and noise error estimator", ISCA Interspeech, pp. 3436-3440, Dresden, Germany, Sept. 2015. (pdf)
Hong Liu, Jie Zhang, "A binaural sound source localization model based on time-delay compensation and interaural coherence", IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 1424-1428, Florence, Italy, May, 2014. (pdf)
Hong Liu, Jie Zhang, Zhuo Fu, "A new hierarchical binaural sound source localization method based on interaural matching filter", IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 1598-1605, Hong Kong, China, June, 2014. (pdf)
Baolong Zhao, Zhuo Fu, Jie Zhang, Hong Liu, "Binaural Sound Source Localization Based on Time-delay Compensation and Spatial Grid Matching", IEEE Int. Conf. on Cloud Computing and Intelligence Systems (CCIS), pp. 287-291, Shenzhen, China, 2014.
Mengdi Yue, Ling Chen, Jie Zhang, Hong Liu, "Speaker Age Recognition Based on Isolated Words by Using SVM", IEEE Int. Conf. on Cloud Computing and Intelligence Systems (CCIS), pp. 287-291, Shenzhen, China, 2014.