DNA forensics
In this tutorial, you will learn how to use the Coretex platform for building a model capable of predicting the body site from which the sample was collected based on DNA analysis of the microbiome composition. This model can then be used for forensics purposes in crime labs.
We will use a curated dataset from Microbiome Atlas Project (MAP).
Data can be downloaded from this link: https://microbeatlas.org/index.html?action=download.
You have to download 2 required files:
samples.info - includes metadata about all samples
sample.[level].mapped.qz - includes OTU tables with mapped OTU counts for each sample
Different levels represent similarity cutoffs (90%, 94%, 96%, 97%, 98%, and 99%) with SSU rRNA reference composed of 1.5 million full-length sequences obtained using the MAPseq tool.
Import data to Coretex
The first step is to upload the MAP data you want to train your model on to Coretex.
You will have to prepare the structure of the previously downloaded files for upload. This is what you should have before uploading:
Select both files and compress them.
Since the archive is big (at least 2GB) the upload will take some time, so please be patient.
To learn more details about various ways Coretex lets you move data around (especially large datasets) please visit Dataset upload page. We will use the Coretex CLI to upload large files.
Let's start with the command:
We can see all the options that can be performed on the dataset. Our focus in this tutorial is creating datasets and uploading samples.
Arguments and flags that we need to send to upload the dataset successfully are displayed by running the previous command.
Let's create a dataset!
To run Microbiome analysis with previously uploaded data follow these steps:
Find Microbiome analysis Project and select it,
Go to the Workflows screen and open 'microbiome-forensics-workflow',
Fill the necessary fields and Run parameters as explained in the next picture,
Click on Run Task,
On the Runs screen find your Run and click on the right button to see more information about your Run.
On the Run details screen click on the Artifacts button to see predictions of validation after the run is marked as Done.
In the artifacts list, you can find a .csv file that contains model predictions.
That would be it for this tutorial! To find out more about Coretex check out our other examples of Demo Runs.
Parameters Information
dataset
null
dataset
validation
false
bool
trainedModel
null
int
datasetType
0
int
taxonomicLevel
1
int
sampleOrigin
human
list[str]
percentile
100
int
quantize
false
bool
sequencingTechnique
AMPLICON/SHOTGUN/WGS
list[str]
cache
true
bool
learningRate
0.3
float
epochs
100
int
earlyStopping
0
int
validationSplit
0.2
float
useGpu
false
bool
logLevel
4
int
Last updated