DIMMSC

DIMMSC is an R package for clustering droplet-based single cell transcriptomic data. It uses Dirichlet mixture prior to characterize variations across different clusters. An expectation-maximization algorithm is used for parameter inference. This package can provide clustering uncertainty.


Download

DIMMSC_0.2.0.tar.gz [Last update: 08/07/2017]

Archive

DIMMSC_0.1.1.tar.gz [04/01/2017]

DIMMSC_0.1.0-archive.tar.gz [02/01/2017]


Installation

Install third-party R package cellrangerRkit from 10X Genomics.

Install DIMMSC with R command

install.packages(pkgs = "DIMMSC.tar.gz", repos = NULL, type = "source")

Or terminal command

R CMD INSTALL DIMMSC.tar.gz


Usage

  • DIMMSC(data, K=2, method_cluster_intial="kmeans", method_alpha_intial="Ronning", maxiter=200, tol=1e-4, lik.tol=1e-2): DIMMSC clustering function

  • plot_tsne_clusters(data, cluster): visualize t-SNE and clusters


  • Arguments

  • data: a G*C matrix with G genes and C cells

  • K: number of clusters, default is 2

  • method_cluster_intial: method for intializing clusters, "kmeans" (default) or "random"

  • method_alpha_intial: method for initializing the alpha matrix for EM algorithm, "Ronning" (default, Ronning's method, 1989) or "Weir" (Weir and Hill's method, 2002)

  • maxiter: maximum number of iterations, default is 200

  • tol: a convergence tolerance for the difference of vector pie between iterations, default is 1e-4

  • lik.tol: a convergence tolerance for the difference of log-likelihoods between iterations, default is 1e-2

  • cluster: a vector of clustering member ship, e.g. mem of DIMMSC output


  • Values

  • pie: a vector of pie estimates

  • delta: a C*K matrix with probability that each cell belongs to each cluster

  • alpha: a K*G matrix of alpha estimates

  • mem: a vector of clustering member ship

  • loglik: the final log likelihood after iterations

  • AIC: Akaike information criterion (AIC)

  • BIC: Bayesian information criterion (BIC)


  • Example

    # Load the example data data_DIMMSC data(data_DIMMSC) # Run DIMMSC result <- DIMMSC(data=data_DIMMSC, K=2, method_cluster_intial="kmeans", method_alpha_intial="Ronning", maxiter=200, tol=1e-4, lik.tol=1e-2) # Plot t-SNE and clusters plot_tsne_clusters(data=data_DIMMSC, cluster=result$mem)


    Reference

    Zhe Sun, Ting Wang, Ke Deng, Xiao-Feng Wang, Robert Lafyatis, Ying Ding, Ming Hu, Wei Chen. DIMM-SC: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data. Bioinformatics 2017. LINK


    Contact

    Please feel free to contact us when you have questions.
    Wei Chen (wei.chen@chp.edu) or Ming Hu (hum@ccf.org)

    To homepage: Wei Chen

    Last update: Aug 2017