In conjunction with the IEEE
International Parallel and Distributed Processing Symposium
Announcements:
- To register for the HiCOMB workshop, please visit the IPDPS
website and follow the registration link. All IPDPS'22 registered
attendees should automatically be able to access the workshop through
the conference' Virtual Platform.
- HiCOMB '22 Advance
Program
- Online HiCOMB Proceedings
Confirmed Keynote and Invited Speakers
Christina Boucher
[Keynote Speaker]
Associate Professor, Department of Computer and Information Science
University of Florida, Gainesville, FL, USA
Title: Building scalable indexes that can be
efficiently queried
Abstract: Recently, Gagie et al. proposed a
version of the FM-index, called the r-index, that can store thousands of
human genomes on a commodity computer. We later showed how to build the
r-index efficiently via a technique called prefix-free parsing (PFP) and
demonstrated its effectiveness for exact pattern matching. Exact pattern
matching can be leveraged to support approximate pattern matching but the
r-index itself cannot support efficiently popular and important queries
such as finding maximal exact matches (MEMs). To address this shortcoming,
Bannai et al. introduced the concept of thresholds, and showed that
storing them together with the r-index enables efficient MEM finding ---
but they did not say how to find those thresholds. We present another
novel algorithm that applies PFP to build the r-index and find the
thresholds simultaneously and in linear time and space with respect to the
size of the prefix-free parse. Our implementation can rapidly find MEMs
between reads and large sequence collections of highly repetitive
sequences. Compared to existing methods, ours used 2 to 11 times less
memory and was 2 to 32 times faster for index construction. Moreover, our
method was less than one thousandth the size of competing indexes for
large collections of human chromosomes.
Biography: Dr. Boucher is an Associate Professor in the
Department of Computer and Information Science and Engineering at the
University of Florida. She has over 85 publications in bioinformatics,
with over two dozen of them in succinct data structures and/or alignment.
Considering this, she was a keynote speaker for IABD 2019, FAB 2018,
RECOMB-SEQ 2016 and the ECCB 2016 Workshop on Pan-Genomics. She is a
recipient of an ESA 2016 Best Paper Award. She oversees the development
and maintenance of several software methods, including MEGARes and
AMRPlusPlus, METAMarc, Kohdista, Vari, VariMerge — and most recently,
Moni. In addition, she has built a team of collaborators in various
biomedical sciences including microbiology, veterinarian medicine,
epidemiology, public health, and clinical sciences. In addition, she
actively works on increasing the diversity in bioinformatics education.
Her efforts include being a member of the University of Florida’s Implicit
Bias committee, being a panelist for the NSF-funded ACM BCB 2015 Women in
Bioinformatics meeting, serving as a faculty advisor for an ACM-W chapter,
and being an active member of the Diversity Committee for over three
years. She also received a fellowship from The Institute for Learning and
Teaching (TILT) for her course redevelopment and served on the advisory
committee for an NSF Research Traineeships Program. She has been the PC
chair for several conferences, including SPIRE 2020, RECOMB-Seq 2019, and
ACM-BCB 2018. Most recently, she was nominated to serve on the NIH BDMA
Study Section as a Standing Member, and a Board Member of SIG BIO.
Yatish Turakhia [Invited
Speaker]
Assistant Professor, Department of Electrical and Computer Engineering
University of California San Diego, CA, USA
Title: Pandemic-scale phylogenetics
Abstract: Phylogenetics has been central to
the genomic surveillance, epidemiology and contact tracing efforts during
the COVD-19 pandemic. But the massive scale of genomic sequencing has
rendered the pre-pandemic tools quite inadequate for comprehensive
phylogenetic analyses. In this talk, I will discuss a high-performance
computing (HPC) phylogenetic package that we developed to address the
needs imposed by this pandemic. Orders of magnitude gains were achieved by
this package through several domain-specific optimization and
parallelization techniques. The package comprises four programs: UShER,
matOptimize, RIPPLES and matUtils. Using high-performance computing, UShER
and matOptimize maintain and refine daily a massive mutation-annotated
phylogenetic tree consisting of all (>9M currently) SARS-CoV-2
sequences available on online repositories. With UShER and RIPPLES,
individual labs – even with modest compute resources – incorporate
newly-sequenced SARS-CoV-2 genomes on this phylogeny and discover evidence
for recombination in real-time. With matUtils, they rapidly query and
visualize massive SARS-CoV-2 phylogenies. This has empowered scientists
worldwide to study the SARS-CoV-2 evolutionary and transmission dynamics
at an unprecedented scale, resolution and speed. This has laid the
groundwork for future genomic surveillance of MOST infectious pathogens.
Biography: Dr. Turakhia is an Assistant Professor in the
Department of Electrical and Computer Engineering at the University of
California San Diego since July 2021. He is also affiliated with the
Department of Computer Science and Engineering and the Bioinformatics and
Systems Biology graduate program at UCSD. His lab is also affiliated with
the Center of Machine-Integrated Computing and Security and Center of
Microbiome Innovation at UCSD. Previously, he was a postdoctoral scholar
at the Genomics Institute at UC Santa Cruz. Dr. Turakhia obtained his
Ph.D. in Electrical Engineering in 2019 from Stanford University and his
bachelor’s and master’s degrees in Electrical Engineering from the Indian
Institute of Technology (IIT), Bombay in 2014.
The size and complexity of genomic and biomedical big data continue to
grow at a furious pace, and the analysis of these complex, noisy, data
sets demands efficient algorithms and high performance computing
architectures. Hence, high-performance computing (HPC) has become
an integral part of research and development in bioinformatics,
computational biology, and medical and health informatics. The goal of
the HiCOMB workshop is to showcase novel HPC research and technologies
to solve data- and compute-intensive problems arising from all areas of
computational life sciences. The workshop will feature contributed
papers as well as invited talks from reputed researchers in the field.
For peer-reviewed papers, we invite authors to submit original and
previously unpublished work that are at the intersection of
the "pillars" of modern day computational life sciences and HPC.
More specifically, we encourage submissions from all areas of biology
that can benefit from HPC, and from all areas of HPC that need new
development to address the class of computational problems that
originate from biology.
Areas of interest within computational life sciences include (but not
limited to):
- Biological sequence analysis (genome assembly, long/short read
data structures, read mapping, clustering, variant analysis, error
correction, genome annotation)
- Computational structural biology (protein structure, RNA
structure)
- Functional genomics (transcriptomics, RNAseq/microarrays, single
cell analysis, proteomics, phospho-proteomics)
- Systems biology and networks (biological network analysis, gene
regulatory networks, metabolomics, molecular pathways)
- Tools for integrated multi-omics and biological databases (network
construction, modeling, link inference)
- Computational modeling and simulation of biological systems
(molecular dynamics, protein structure/docking, dynamic models)
- Phylogeny (phylogenetic tree reconstruction, molecular evolution)
- Microbes and microbiomes (taxonomical binning, metagenomics,
classification, clustering, annotation)
- Biomedical health analytics and biomedical imaging (electronic
health records, precision medicine, image analysis)
- Biomedical literature mining (text mining, ontology, natural
language processing)
- Computational epidemiology (infectious diseases, diffusion
mechanisms)
- Phenomics and precision agriculture (IoT technologies, feature
extraction)
- Visualization of large-scale biomedical data and patient
trajectories
Areas of interest within HPC include (but are not limited to):
- Parallel and distributed algorithms (scalable machine learning,
parallel graph/sequence analytics, combinatorial pattern matching,
optimization, parallel data structures, compression)
- Biological data management, metadata standards such as compliance
to FAIR principles, AI-ready data processing
- Data-intensive computing techniques
(communication-avoiding/synchronization-reducing techniques,
locality-preserving techniques, big data streaming techniques)
- Parallel architectures (multicore, manycore, CPU/GPU, FPGA,
system-on-chip, hardware accelerators, energy-aware architectures,
hardware/software co-design)
- Memory and storage technologies (processing-in-memory, NVRAM,
burst buffers, 3D RAM, parallel/distributed I/O)
- Parallel programming models (libraries, domain specific languages,
compiler/runtime systems)
- Scalable AI/ML frameworks for biological systems, modeling, and
analysis
- Scientific workflows (data management, data wrangling, automated
workflows, productivity)
- Scientific computing (numerical analysis, optimization)
- Empirical evaluations (performance modeling, case-studies)
Submission guidelines
To submit a paper, please upload a PDF file through the Linklings
HiCOMB 2022 submission link:
https://ssl.linklings.net/conferences/ipdps/?page=Submit&id=HiCOMBWorkshopFullSubmission&site=ipdps2022
IPDPS
workshops can have submission in three categories: regular papers (up
to 10 pages), short papers (up to 4 pages), and extended abstracts (1
page). Submitted manuscripts may not exceed ten (10) single-spaced
double-column pages using a 10-point size font on 8.5x11 inch pages
(IEEE conference style), including figures, tables, and references
(see IPDPS
Call for Papers for more details). All papers will be reviewed
by three or more referees. This year, the authors of the accepted
papers will be given a choice on whether to have the paper appear in
the IPDPSW Proceedings (which will be digitally indexed and archived
as part of the IEEE Xplore Digital Library). If the authors choose not
to make it part of the proceedings, then the paper will not
be considered archival. In either case, all accepted papers
will be posted online on the workshop website, and all accepted papers
(archived or not) will need to have an oral presentation at the
workshop by one of the authors of the paper.
Important Dates
Workshop
submission deadline
(for all categories):
|
January
22, 2022 February 5, 2022,
11:59pm AoE (extended deadline)
|
Author notification: |
February 28, 2022 |
Final camera-ready papers deadline: |
March 15, 2022 |
Workshop: |
May 30, 2022 |
Program Chair
- Mohammed Alser, ETH Zurich
- Rolf Backofen, University of
Freiburg
- Irem Boybat, IBM Zurich
- Sarah Bruningk, ETH Zurich
- Somali
Chaterji, Purdue University
- Ercument Cicek, Bilkent University
- Priyanka Ghosh, NIH NCBI
- Zam Iqbal, EMBL EBI
- Benjamin Langmead, Johns Hopkins
University
- Ryan Layer, University of Colorado Boulder
- Serghei Mangul, University of Southern California
- Zam Iqbal, EMBL EBI
- Ibrahim Numanagic, University Victoria
- Pierre Peterlongo, University of
Rennes
- Knut Reinert, Freie Universitat
Berlin
- Jaren Simpson, Ontario Institute
for Cancer Research
- Ewa
Szczurek, University of Warsaw
- Yatish Turakhia, UC San Diego
- Leonid Yavits, Bar-Ilan University Technion
- Federico Zambelli, University of Milan
General Chairs
Steering Committee Members
HiCOMB Archive