Ph.D. Research Proposal Exam: Zahra Zare Jousheghani

Thursday, January 23, 2025
1:00 p.m.
IRB-4105
Maria Hoo
301 405 3681
mch@umd.edu

ANNOUNCEMENT: Ph.D. Research Proposal Exam

 

Name: Zahra Zare Jousheghani

Committee:

Prof. Robert Patro, Chair

Prof. Dinesh Manocha

Prof. Cunxi Yu

Date/time: Thursday, January 23, 2025 at 1 pm

Location: IRB-4105 

Title: Enhanced probabilistic modeling leads to improved accuracy in bulk & single-cell RNA-seq transcriptome quantificatio

Abstract:

Advancements in long-read sequencing technologies have transformed transcriptomics by enabling the sequencing of full-length transcripts, providing unprecedented insights into gene expression and isoform diversity. However, accurate transcript quantification in both bulk and single-cell long read RNA-seq remains a significant challenge due to technical limitations, sequencing errors, and biases. This proposal focuses on developing enhanced algorithmic and probabilistic modeling techniques to address these challenges and improve the accuracy of transcript quantification in both bulk and single-cell long-read RNA sequencing data. The work discussed herein is divided into two major chapters, each tackling unique aspects of long-read quantification and proposing novel solutions.

The first chapter addresses the challenges of transcript quantification in bulk RNA-seq datasets generated by long-read sequencing technologies, which provide a detailed view of transcript structures and isoform diversity by aggregating data from a population of cells. Despite their potential, current quantification methods are hindered by sequencing errors, mapping ambiguities, and limitations in probabilistic models, particularly for transcript assignment. To overcome these issues, we propose a novel probabilistic framework implemented in a software tool called \texttt{oarfish}, which integrates read alignment scores and coverage profiles to improve quantification accuracy, sensitivity to low-abundance isoforms, and robustness against sequencing errors. Evaluations on both simulated and experimental PacBio and ONT datasets demonstrate its effectiveness, while proposed enhancements—such as dynamic coverage updates, factorized likelihood models, and genome-to-transcriptome alignment would pave the way for even broader applications and improved computational efficiency.

The second chapter focuses on single-cell long-read RNA-seq, which enables the study of cellular heterogeneity and dynamic biological processes at single-cell resolution. While long-read scRNA-seq offers advantages such as isoform-level resolution and splicing variation analysis, it faces challenges from sparse data, technical noise, and errors in cell barcodes (CBs) and unique molecular identifiers (UMIs). Building on the bulk RNA-seq framework, we propose adapting the probabilistic model for single-cell datasets by incorporating advanced error correction for CBs and UMIs and refining UMI deduplication methods. While the current focus is on PacBio data, future work will extend compatibility to ONT datasets and integrate innovations from Chapter 1—including dynamic coverage updates, factorized likelihood models, and genome-to-transcriptome alignment into single-cell quantification.

 

Audience: Faculty 

remind we with google calendar

 

January 2025

SU MO TU WE TH FR SA
29 30 31 1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31 1
Submit an Event