Expressions can be used to bring data in from other indexed datasets on SolveBio that correspond to the record that you're looking at.
- You have a dataset of genetic variants with sample IDs and you want to connect those sample IDs with your internal patient IDs or other patient-specific metadata present in another dataset (e.g. progression-free survival months, tissue type of cancer).
- You have a dataset of gene symbols and you want to know what diseases in NHGRI's Clinical Genomic Database have been associated with each gene.
- You have a dataset of gene IDs (in Ensembl, RefSeq, CCDS, etc format) and you want to get the canonical HUGO gene symbol for those gene IDs.
Step 1. Get your datasets set up.
Dataset A: Dataset you're editing / adding a column to (target). In this case, it's a list of Ensembl gene IDs. Note the field name of the field you're annotating (gene_id in this example).
Dataset B: Dataset you're getting the data from (source). Here, it's the public HUGO nomenclature dataset maintained on SolveBio, that lists all canonical gene symbols and many synonyms and ids. Note the dataset ID and the field you're using to match on (ensembl_gene_id) and the annotation fields you want to bring to your target dataset (gene_symbol).
Step 2. Insert a column with your custom expression
This will bring you to an empty blank recipe screen - copy the following expression into the empty expression box. This expression (broken down later in this walkthrough) will annotate all the gene ids in your dataset with the canonical HUGO gene symbol based on Ensembl gene ids.
You can edit this expression for a multitude of other annotation use cases.
Step 3. Run the expression!
Once your expression is set up, you'll get a preview of the new field
Press Insert, and your annotation will start. When it completes, you'll have your new annotated dataset, ready to use!