Endpoints
Coretex Endpoints are designed to help you easily deploy, track, and scale your production-ready Models.
To deploy a Model on Coretex you will need to have a Model already uploaded to the platform. If you don't have a Model yet don't worry, Coretex has a library of predefined Models (Llama3, Stable Diffusion, RAG, etc...) which are production-ready.
Deployment modes
When deploying your Model picking a Node on which your Model will run directly affects the priority and speed of execution. A Node can have one of two modes:
Dedicated
Shared
Dedicated
Shared
Dedicated Nodes
Dedicated Node can only be assigned to execute one Endpoint. Because of this it can keep that Endpoint warm for execution at all times. You want to pick a dedicated Node if you care about high availability and low latency for requests.
Shared Nodes
Shared Node can have multiple different Endpoints assigned for execution. It will handle the incoming requests through a FIFO (first-in-first-out) queue.
In most cases you will also need to warm up the Endpoint on a shared Node before it can execute the requests because there is a possibility that your Endpoint was moved to cold state after a period of inactivity (when there were no requests).
You want to pick shared Node if you don't need high availability or low latency for requests.
Deploying a Model from Coretex Library
You can start by opening the Coretex and going to Endpoints page, then pressing the "+ New Endpoint" button:
The next step is selecting the Model you want to deploy and the worker Node to which the Model will be deployed:
And that's it, your Model is deployed. You can skip to the section about sending requests to start using your deployed Model.
Deploying a custom Model
Deploying a custom Model is almost identical to deploying a Model from Coretex Library. All you need is your Model and code for running inference for that Model packaged together on Coretex.
The folder structure of your Model needs to be similar to this:
function
directory must contain:
function.py
file which contains the code for running the inference on your model,requirements.txt
which contain all of the packages which will be installed when creating the environment in which inference will be executed
You can also add other python submodules and files which you need to run inference for your Model, you are not limited to just those 2 files.
This is an example of a function.py
file structure:
Here are examples of how the code for serving the Models from Coretex Library looks like:
Sending requests to deployed Model
To send a request to a deployed Model you will need the URL provided to you on the Endpoint dashboard, as well as the token which was generated for that Endpoint. Here are some examples on how you can send a request:
Last updated