GUIDE NLP Autolabeling with Quality Assurance 🤖

Integrate WatsonX to Label Studio

WatsonX offers a suite of machine learning tools, including access to many LLMs, prompt refinement interfaces, and datastores via WatsonX.data. When you integrate WatsonX with Label Studio, you get access to these models and can automatically keep your annotated data up to date in your WatsonX.data tables.

To run the integration, you’ll need to pull this repo and host it locally or in the cloud. Then, you can link the model to your Label Studio project under the models section in the settings. To use the WatsonX.data integration, set up a webhook in settings under webhooks by using the following structure for the link: <link to your hosted container>/data/upload and set the triggers to ANNOTATION_CREATED and ANNOTATION_UPDATED. For more on webhooks, see our documentation

See the configuration notes at the bottom for details on how to set up your environment variables to get the system to work.

For a video demonstration, see Integrating Label Studio with IBM WatsonX.

Before you begin

Before you begin, you must install the Label Studio ML backend.

This tutorial uses the watsonx_llm example.

Setting up your label_config

For this project, we recommend you start with the labeling config as defined below, but you can always edit it or expand it to meet your needs! Crucially, there must be a <TextArea> tag for the model to insert its response into.

<View>
    <Style>
        .lsf-main-content.lsf-requesting .prompt::before { content: ' loading...'; color: #808080; }

        .text-container {
        background-color: white;
        border-radius: 10px;
        box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.1);
        padding: 20px;
        font-family: 'Courier New', monospace;
        line-height: 1.6;
        font-size: 16px;
        }
    </Style>
    <Header value="Context:"/>
    <View className="text-container">
        <Text name="context" value="$text"/>
    </View>
    <Header value="Prompt:"/>
    <View className="prompt">
        <TextArea name="prompt"
                  toName="context"
                  rows="4"
                  editable="true"
                  maxSubmissions="1"
                  showSubmitButton="false"
                  placeholder="Type your prompt here then Shift+Enter..."
        />
    </View>
    <Header value="Response:"/>
    <TextArea name="response"
              toName="context"
              rows="4"
              editable="true"
              maxSubmissions="1"
              showSubmitButton="false"
              smart="false"
              placeholder="Generated response will appear here..."
    />
    
    <Header value="Overall response quality:"/>
    <Rating name="rating" toName="context"/>
</View>

Setting up WatsonX.Data

To use your WatsonX.data integration, follow the steps below.

  1. First, get the host and port information of the engine that you’ll be using. To do this, navigate to the Infrastructure Manager on the left sidebar of your WatsonX.data page and select the Infrastructure Manager. Change to list view by clicking the symbol in the upper right hand corner. From there, click on the name of the engine you’ll be using. This will bring up a pop up window, where you can see the host and port information under “host”. The port is the part after the : at the end of the url.
  2. Next, make sure your catalog is set up. To create a new catalog, follow these instructions
  3. Once your catalog is set up, make sure that the correct schema is also set up. Navigate to your Data Manager and select create to create a new schema
  4. With all of this information, you’re ready to update the environment variables listed at the bottom of this page and get started with your WatsonX.data integration!
  1. Start Machine Learning backend on http://localhost:9090 with prebuilt image:
docker-compose up
  1. Validate that backend is running
$ curl http://localhost:9090/
{"status":"UP"}
  1. Create a project in Label Studio. Then from the Model page in the project settings, connect the model. The default URL is http://localhost:9090.

Building from source (advanced)

To build the ML backend from source, you have to clone the repository and build the Docker image:

docker-compose build

Running without Docker (advanced)

To run the ML backend without Docker, you have to clone the repository and install all dependencies using pip:

python -m venv ml-backend
source ml-backend/bin/activate
pip install -r requirements.txt

Then you can start the ML backend:

label-studio-ml start ./dir_with_your_model

Configuration

Parameters can be set in docker-compose.yml before running the container.

The following common parameters are available:

  • BASIC_AUTH_USER - Specify the basic auth user for the model server.
  • BASIC_AUTH_PASS - Specify the basic auth password for the model server.
  • LOG_LEVEL - Set the log level for the model server.
  • WORKERS - Specify the number of workers for the model server.
  • THREADS - Specify the number of threads for the model server.

The following parameters allow you to link the WatsonX models to Label Studio:

  • LABEL_STUDIO_URL - Specify the URL of your Label Studio instance. Note that this might need to be http://host.docker.internal:8080 if you are running Label Studio on another Docker container.
  • LABEL_STUDIO_API_KEY- Specify the API key for authenticating your Label Studio instance. You can find this by logging into Label Studio and and going to the Account & Settings page.
  • WATSONX_API_KEY- Specify the API key for authenticating into WatsonX. You can generate this by following the instructions at here
  • WATSONX_PROJECT_ID- Specify the ID of your WatsonX project from which you will run the model. Must have WML capabilities. You can find this in the General section of your project, which is accessible by clicking on the project from the homepage of WatsonX.
  • WATSONX_MODELTYPE- Specify the name of the WatsonX model you’d like to use. A full list can be found in IBM’s documentation
  • DEFAULT_PROMPT - If you want the model to automatically predict on new data samples, you’ll need to provide a default prompt or the location to a default prompt file.
  • USE_INTERNAL_PROMPT - If using a default prompt, set to 0. Otherwise, set to 1.

The following parameters allow you to use the webhook connection to transfer data from Label Studio to WatsonX.data:

  • WATSONX_ENG_USERNAME- MUST be ibmlhapikey for the integration to work.

To get the host and port information below, you can follow the steps under Pre-requisites.

  • WATSONX_ENG_HOST - the host information for your WatsonX.data Engine
  • WATSONX_ENG_PORT - the port information for your WatsonX.data Engine
  • WATSONX_CATALOG - the name of the catalog for the table you’ll insert your data into. Must be created in the WatsonX.data platform.
  • WATSONX_SCHEMA - the name of the schema for the table you’ll insert your data into. Must be created in the WatsonX.data platform.
  • WATSONX_TABLE - the name of the table you’ll insert your data into. Does not need to be already created.