--- title: "Migrating from anndata to anndataR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Migrating from anndata to anndataR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ```{r check_on_cran, message=FALSE, warning=FALSE, echo=FALSE} on_cran <- !identical(Sys.getenv("NOT_CRAN"), "true") if (on_cran) { knitr::opts_chunk$set(eval = FALSE) knitr::asis_output(paste0( "**Note:** The outputs of this vignette are not rendered on CRAN due to package size limitations. ", "Please check the [Migration to anndataR](https://anndata.dynverse.org/articles/migration_to_anndataR.html) ", "vignette in the package documentation." )) } ``` ## Why migrate? The `anndata` R package (this package) is superseded by [anndataR](https://anndataR.scverse.org), available on Bioconductor. | Feature | `anndata` (CRAN) | `anndataR` (Bioconductor) | |---|---|---| | **Python dependency** | Required | Optional (only needed for `ReticulateAnnData`) | | **h5ad I/O** | Via Python | Native R via `rhdf5` (preferred), or via Python reticulate | | **In-memory backend** | Python-backed | Native R (`InMemoryAnnData`) | | **HDF5-backed backend** | Python-backed | Native R (`HDF5AnnData`) | | **Reticulate-backed backend** | Always | Optional (`ReticulateAnnData`) | | **Seurat interop** | Not supported | `as_Seurat()` / `as_AnnData()` | | **SingleCellExperiment interop** | Not supported | `as_SingleCellExperiment()` / `as_AnnData()` | | **Distribution** | CRAN | Bioconductor | The preferred backends (`InMemoryAnnData`, `HDF5AnnData`) require no Python. For users who already have Python `anndata` installed, the `ReticulateAnnData` backend offers a low-friction starting point; see [below](#using-reticulateanndata-as-a-stepping-stone). ## Installation ```{r install} if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("anndataR") # Required for native h5ad reading/writing: BiocManager::install("rhdf5") # Optional: SingleCellExperiment conversion BiocManager::install("SingleCellExperiment") # Optional: Seurat conversion install.packages("SeuratObject") ``` ## Side-by-side migration guide ### Creating an AnnData object The `AnnData()` constructor is identical in both packages. The only difference is that `anndataR` returns an `InMemoryAnnData` (a pure-R object) whereas `anndata` returned an `AnnDataR6` wrapping a Python object. ```{r create} library(anndataR) ad <- AnnData( X = matrix(1:6, nrow = 2), obs = data.frame(group = c("a", "b"), row.names = c("s1", "s2")), var = data.frame(type = c(1L, 2L, 3L), row.names = c("var1", "var2", "var3")), layers = list(spliced = matrix(4:9, nrow = 2)), uns = list(a = 1) ) ``` ### Reading and writing h5ad files The function signatures are identical. `anndataR` reads natively without Python. ```{r read-write} # Reading ad <- read_h5ad("path/to/file.h5ad") # InMemoryAnnData (default) ad <- read_h5ad("path/to/file.h5ad", as = "HDF5AnnData") # disk-backed, low memory sce <- read_h5ad("path/to/file.h5ad", as = "SingleCellExperiment") obj <- read_h5ad("path/to/file.h5ad", as = "Seurat") # Writing (works identically to anndata) write_h5ad(ad, "path/to/output.h5ad") ``` ### Slot access and subsetting Both packages use the same `$` notation and bracket subsetting syntax. ```{r slots} ad$X; ad$obs; ad$var; ad$obsm; ad$varm; ad$obsp; ad$varp; ad$layers; ad$uns # Subsetting returns an AnnDataView (native R) rather than a Python-backed view subset <- ad[1:5, ] subset <- ad[, c("var1", "var2")] subset <- ad[ad$obs$group == "a", ] concrete <- subset$as_InMemoryAnnData() # materialise to a concrete object ``` ### Interoperability (new in anndataR) ```{r interop} # AnnData <-> SingleCellExperiment sce <- ad$as_SingleCellExperiment() ad <- as_AnnData(sce) # AnnData <-> Seurat obj <- ad$as_Seurat() ad <- as_AnnData(obj) ``` ### Using ReticulateAnnData as a stepping stone `anndataR`'s `ReticulateAnnData` backend wraps a Python `anndata.AnnData` object via reticulate, implementing the same `AbstractAnnData` interface as the native backends. This is the closest equivalent to the old `anndata` R package: you can call Python tools (e.g. `scanpy`) on the object while also using all `anndataR` slot accessors from R. ```{r reticulate} library(reticulate) library(anndataR) # Make scanpy available in the same Python environment that reticulate uses py_require("scanpy") sc <- import("scanpy") # Read data with scanpy # Note: anndataR will automatically wrap the resulting Python object in a ReticulateAnnData url <- "https://cf.10xgenomics.com/samples/cell-exp/6.0.0/SC3_v3_NextGem_DI_CellPlex_CSP_DTC_Sorted_30K_Squamous_Cell_Carcinoma/SC3_v3_NextGem_DI_CellPlex_CSP_DTC_Sorted_30K_Squamous_Cell_Carcinoma_count_sample_feature_bc_matrix.h5" ad <- sc$read_10x_h5("dataset.h5", backup_url = url) # Use anndataR slot accessors from R dim(ad) head(ad$obs) rowMeans(ad$X[1:10, ]) # Pass back to scanpy for preprocessing # Note: anndataR automatically unwraps the Python object when calling scanpy functions. sc$pp$filter_cells(ad, min_genes = 200L) sc$pp$filter_genes(ad, min_cells = 3L) sc$pp$normalize_per_cell(ad) sc$pp$log1p(ad) # Write from R when done write_h5ad(ad, "path/to/output.h5ad") # Migrate to a different anndataR backend ad_mem <- ad$as_InMemoryAnnData() # ad_hdf5 <- ad$as_HDF5AnnData(path = "path/to/disk_backed.h5ad") # sce <- ad$as_SingleCellExperiment() # seu <- ad$as_Seurat() ``` ## Getting help with anndataR - Documentation: - Vignettes: `browseVignettes("anndataR")` - Issue tracker: