POIBM - Poisson batch correction through sample matching
POIBM is a batch factor inference and correction method that is suited for heterogeneous RNA-seq or other count data datasets. It operates by simulataneously inferring the batch factors and a mapping between matching samples. This is advantageous for datasets, which comprise of samples of heterogeneous populations, in which unknown subpopulations match but e.g. the subpopulation fractions vary so the global population statistics cannot be matched.
Major features:
Simulatenous batch factor and sample matching inference reveals both the batch correction coefficients and putatively similar phenotypes in the data. The phenotypes need not to be prelabeled, but are learned in the process, as this is often difficult in patient derived samples.
Supports sample trimming for datasets that have only very little overlap
The model accounts for the discrete nature of RNA-seq data and models both expression and technical noise or the lack of thereof, operates on raw count data, and infers total RNA factors in the process
For the details about the method and validation on cancer cell line and patient data, please refer to our publication on the matter.