About me

Usman is a final year Ph.D. student in Computer Science at Florida International University Miami, FL, USA. He received his B.S. degree in Electrical Engineering from the University of Engineering and Technology, Lahore, Pakistan in 2014 and an M.S. degree in Computer Engineering from Western Michigan University in 2018. He was a Graduate Teaching Assistant at Western Michigan University from 2016 to 2018 and currently working as a Graduate Research Assistant at the Knight Foundation School of Computing and Information Sciences at Florida International University. His research interests include language models for protein sequences, deep learning, Artificially Intelligent algorithms for Big-Data Bioinformatics, high-performance computing, and parallel and distributed algorithm design.

Research

For a complete list of publications please click here

In the evolving field of computational biology, Dr. Usman Tariq’s groundbreaking research has marked significant milestones in the application of deep learning models for protein sequencing and cross-modal retrieval. The cornerstone of his research lies in developing a Self-Supervised Masked Language Model to learn protein representation. Trained on an unprecedented scale using 1 billion protein segments in a distributed cluster environment, this model harnessed the power of BERT and LLaMA models to enhance the performance of downstream models.

Expanding on this, Dr. Tariq developed SpeCollate, a deep learning network designed for cross-modal retrieval. In this initiative, he introduced SNAP-Loss, a quadruplet-based loss function, and implemented distributed multi-GPU training on supercomputer GPU nodes. Notably, SpeCollate significantly improved protein sequencing accuracy to 95%, marking a substantial leap from previous tools with less than 70% accuracy. This work also played a pivotal role in securing a US$1 million grant from the National Institute of Health for the project.

Taking a stride further, Dr. Tariq also embarked on the uncertainty analysis of Cross-Modal Embedding for retrieval purposes. He innovatively formulated three uncertainty metrics to quantify the model’s confidence in its embeddings. This crucial step facilitated risk-aware decision-making by enabling users to gauge the reliability of the model’s predictions and identify potential areas for improvement.

Finally, Dr. Tariq designed a novel Multitask Attention Network, DeepAtles, for predicting peptide properties. By learning mass spectra embeddings through an innovative deep learning attention-based architecture, he was able to reduce the search space size by 90% for cross-modal retrieval, streamlining the process significantly.

In addition to developing original research agendas, Usman is also working on projects requiring extensive internal and external collaboration and coordination. In December 2021, he authored multiple research papers where he designed graph-based algorithms for processing and analyzing comprehensive mass-spectrometry data to understand the structural similarities in Dissolved Organic Matter (DOM). The project was done in collaboration with researchers from Biology and Chemistry departments.

Moreover, he has enjoyed designing algorithms for numerous applications, including genomic data compression, fMRI/image classification for ASD detection, and machine learning solutions for bioinformatics pipelines.

Overall, Dr. Tariq’s research represents a leap forward in the field, combining innovative techniques, large-scale computational resources, and deep learning models to solve complex biological problems and enhance the reliability of protein sequencing and retrieval.

Work Experience

For a complete list of work experience click here

In Summer 2022, I had the opportunity to work at Meta (Facebook) as Machine Learning Intern where I designed deep learning architectures for ranking problems. My project was to rank comments under Ads such that the most useful and diverse set of comments are shown to the viewer so that they can learn more about the product. I was able to successfully design, evaluate, and implement various linear, and deep learning-based algorithms including (text embedding models) which showed up to 5.5% event-based revenue growth and an 11.96% Ad conversion rate increase. During this internship not only did I obtain the technical skills to deploy deep learning algorithms in real-world problems, but I also learned to work in a focused, extremely fast-paced environment and yet produce high-quality work. Moreover, by acting as the main point of contact for the project across multiple cross-functional teams, I also improved my collaboration and communication skills by presenting ideas to individuals with different backgrounds and little knowledge of the project.