đŸ–Ĩī¸Project and Task

Project

Coretex Project is a scoped placeholder for Datasets and Tasks. Users need to define a Project before starting any work. Each Project has a name, brief description and an associated Type defining the scope of the work.

Coretex Task = code

A Coretex Task defines the data processing workflow a user is planning to run. It is simply code to read some data, process it, and store the results for further analysis. A Task has a name, description and can be created from a set of commonly used Task Templates conveniently provided by Coretex. If the user already has their own processing scripts, it is quite easy to use them instead of the Coretex-provided templates.

Task structure

To create a new Task you must first select a Project in which it will reside. Task can be created empty (it will contain no files, allowing you to build it from scratch), from local folder on your computer and from our pre-defined Coretex templates (curated state-of-the-art code and parameters).

If creating blank or from local folder bear in mind that, at its bare minimum a Task must contain:

  1. entry script

Runtime environment is a virtual environment of installed software dependencies used by the Task script. Coretex supports two major virtual environment managers: venv and Conda.

Coretex will dynamically determine the type of the environment manager your task uses based on the file structure.

If using Conda make sure to place your environment configuration files environment.yml and environment-osx.yml for Linux and MacOS respectively in your Task root.

If using venv make sure to place your environment configuration file requirements.txt in your Task root.

Entry script is the first and only file Coretex will run when executing a task. Coretex provides a dynamic entry point script. Enables preparing and storing several scripts for different applications in one place. Whether training or validation is in question, selecting the script selects the task run mode. By default, the entry script file is main.py and it must be located in the root of the Task.

Running a Task code creates a single Run. Run parameters are located in a yaml file task.yaml in the root of the Task. The example task.yaml given below has two parameters called dataset and outputDatasetName of types Coretex Dataset and string (text), respectively. Both of them are required and both of them have some default values which will be pre-populated when preparing to execute the task.

param_groups:
  - name: inputs
    params:
      - name: dataset
    	description: Denoised .fasta sequence dataset
        value: 2880
        data_type: dataset
        required: true
  - name: outputs
    params:
      - name: outputDatasetName
        description: Name of the dataset which will contain output of this experiment
        value: 'BioInformatics: Phylogenetic diversity analysis tree'
        data_type: str
        required: true

A Run can have any number of arbitrary parameters which a user can tune before executing it. Changing the available run parameters (their name, type, description and default value) can only be done by editing task.yaml, while parameter values can be changed before each Run execution. For more details, please check this tutorial.

Once you have prepared your Task and selected parameters, run can be executed and re-executed with the same configuration multiple times. Bear in mind that when you re-execute already executed run, you are in fact rerunning its snapshot at the time of the first run. This means that any changes to the Task files in the meantime will not affect the existing Run. In order to run with the new configuration, Execute new Run either by selecting the Task from the Tasks Module or by clicking Run Task from the Runs module and selecting the appropriate Task in the Run Task screen.

Parameter Optimization

In addition to being able to re-execute your runs with ease, Coretex provides additional feature called Parameter Optimization. Most of the Coretex data types can be tuned to include multiple values which allows you to execute Runs with all possible parameter combinations with just a single action.

In order to minimize the loss, we need to find the parameter values that match our predictions with real life use cases. Coretex makes this process more transparent by giving you an option to list out the values of each parameter prior to starting your Runs. Having multiple runs execute in the same batch increases your turnaround and makes it easier to find the perfect parameter combination.

To try out this feature in action follow the Hand Recognition tutorial. You will notice how Coretex pre-calculates and configures your Run, creating as many executions as needed.

Last updated