报告题目: CHTKC: a robust and efficient k-mer counting algorithm based on a lock-free chaining hash table
报告时间:2020年11月10日(星期二)下午16:00
报告地点:校本部计算机楼313
报告摘要:Calculating the frequency of occurrence of each substring of length k in DNA sequences is a common task in many bioinformatics applications, including genome assembly, error correction, and sequence alignment. Although the problem is simple, efficient counting of datasets with high sequencing depth or large genome size is a challenge. We propose a robust and efficient method, CHTKC, to solve the k-mer counting problem with a lock-free hash table that uses linked lists to resolve collisions. We also design new mechanisms to optimize memory usage and handle situations where memory is not enough to accommodate all k-mers. CHTKC has been thoroughly tested on seven datasets under multiple memory usage scenarios and compared with Jellyfish2 and KMC3. Our work shows that using a hash-table-based method to effectively solve the k-mer counting problem remains a feasible solution.
报告人简介: 汪国华,东北林业大学信息与计算机工程学院院长,教授,博士生导师。2009年获得哈尔滨工业大学计算机应用技术专业博士学位。2009年起在哈尔滨工业大学计算机科学与技术学院历任副教授,教授,博士生导师。2019年调入东北林业大学信息与计算机工程学院任院长,林木遗传育种国家重点实验室PI。2013年入选教育部“新世纪优秀人才支持计划”,约翰霍普金斯大学博士后。中国计算机学会生物信息专委会常务委员,人工智能学会生物信息学与人工生命专委会委员。主要研究方向为生物信息学、人工智能。任BMC Genomics期刊编委,已经在Nature protocols,Nature Review Genetics,Nucleic Acids Research,Bioinformatics等期刊发表SCI检索国际期刊论文50余篇。作为负责人主持2项国家863,4项国家自然科学基金项目。2018年,2019年十三五国家重点研发计划“数字诊疗装备研发”重点专项会议评审专家。