The SolveBio Platform is designed to be compatible with stringent security and compliance requirements. All dataset imports and modifications are linked to “commits” which track each change. Commits contain information on what changes were made, when it was made, and who made it. We recommend that all users establish their own internal best practices and workflows for importing data into the platform.
SolveBio supports data import from JSON, VCF (4+), CSV, TSV, XML, GFF3, GTF and other files. You can import data into any new or existing private SolveBio dataset. Before importing, make sure your private dataset has been created. You can use the web interface or API to create and manage datasets. Ensure you have the appropriate permissions relative to the data repository prior to submitting an import (you need to have a Contribute permission-level or higher for the data repository).
There are three steps to importing data:
Step 1: Upload the file to SolveBio
SolveBio Vaults accept all file types. Files may be up to 5 gigabytes in size, although we recommend splitting large files up into multiple parts. Files can be compressed using gzip.
Important: JSON files should contain one JSON record per line. For example:
{"record_number": "1", "field_1": "value 1", "field_2", "value 2"} {"record_number": "2", "field_1": "value 1", "field_2", "value 2"}
Step 2: Start the import
An import requires a file (from step 1) and a destination (a private dataset).
If the uploaded file is a VCF file it will be validated and parsed during this step and converted to a dataset commit. If it is a JSON, CSV or other file, it will be validated against the destination dataset to ensure field compatibility.
The import will validate the input file and generate a Commit. The commit will then begin indexing the data in blocks of records at a time.
0 Comments