👨‍đŸ”ŦDNA forensics

In this tutorial, you will learn how to use the Coretex platform for building a model capable of predicting the body site from which the sample was collected based on DNA analysis of the microbiome composition. This model can then be used for forensics purposes in crime labs.

We will use a curated dataset from Microbiome Atlas Project (MAP).

Data can be downloaded from this link: https://microbeatlas.org/index.html?action=download.

You have to download 2 required files:

  • samples.info - includes metadata about all samples

  • sample.[level].mapped.qz - includes OTU tables with mapped OTU counts for each sample

Different levels represent similarity cutoffs (90%, 94%, 96%, 97%, 98%, and 99%) with SSU rRNA reference composed of 1.5 million full-length sequences obtained using the MAPseq tool.

Import data to Coretex

The first step is to upload the MAP data you want to train your model on to Coretex.

You will have to prepare the structure of the previously downloaded files for upload. This is what you should have before uploading:

Select both files and compress them.

Since the archive is big (at least 2GB) the upload will take some time, so please be patient.

To learn more details about various ways Coretex lets you move data around (especially large datasets) please visit Dataset upload page. We will use the Coretex CLI to upload large files.

Let's start with the command:

coretex dataset --help

We can see all the options that can be performed on the dataset. Our focus in this tutorial is creating datasets and uploading samples.

coretex dataset create --help

Arguments and flags that we need to send to upload the dataset successfully are displayed by running the previous command.

Let's create a dataset!

coretex dataset create Atlas 8770 11 "/Users/user/Downloads/Atlas Data.zip"

To run Microbiome analysis with previously uploaded data follow these steps:

  1. Find Microbiome analysis Project and select it,

  2. Go to the Workflows screen and open 'microbiome-forensics-workflow',

  3. Fill the necessary fields and Run parameters as explained in the next picture,

  4. Click on Run Task,

  5. On the Runs screen find your Run and click on the right button to see more information about your Run.

  6. On the Run details screen click on the Artifacts button to see predictions of validation after the run is marked as Done.

  7. In the artifacts list, you can find a .csv file that contains model predictions.

That would be it for this tutorial! To find out more about Coretex check out our other examples of Demo Runs.

Parameters Information

NameValueData typeRequired

dataset

null

dataset

validation

false

bool

trainedModel

null

int

datasetType

0

int

taxonomicLevel

1

int

sampleOrigin

human

list[str]

percentile

100

int

quantize

false

bool

sequencingTechnique

AMPLICON/SHOTGUN/WGS

list[str]

cache

true

bool

learningRate

0.3

float

epochs

100

int

earlyStopping

0

int

validationSplit

0.2

float

useGpu

false

bool

logLevel

4

int

Last updated