GUIDE NLP Autolabeling with Quality Assurance 🤖

Interactive substring matching

The Machine Learning (ML) backend is designed to enhance the efficiency of auto-labeling in Named Entity Recognition (NER) tasks. It achieves this by selecting a keyword and automatically matching the same keyword in the provided text.

Before you begin

Before you begin, you must install the Label Studio ML backend.

This tutorial uses the interactive_substring_matching example.

This ML backend works with the default NER template from Label Studio. You can find this by selecting Label Studio’s pre-built NER template when configuring the labeling interface. It is available under Natural Language Processing > Named Entity Recognition.

Here is an example of a labeling configuration that can be used with this ML backend:

<View>
  <Labels name="label" toName="text">
    <Label value="ORG" background="orange" />
    <Label value="PER" background="lightgreen" />
    <Label value="LOC" background="lightblue" />
    <Label value="MISC" background="lightgray" />
  </Labels>
  <Text name="text" value="$text" />
</View>
  1. Start the Machine Learning backend on http://localhost:9090 with prebuilt image:
docker-compose up
  1. Validate that the backend is running
$ curl http://localhost:9090/
{"status":"UP"}
  1. Create a project in Label Studio. Then from the Model page in the project settings, connect the model. The default URL is http://localhost:9090.

Building from source (advanced)

To build the ML backend from source, you have to clone the repository and build the Docker image:

docker-compose build

Running without Docker (advanced)

To run the ML backend without Docker, you have to clone the repository and install all dependencies using pip:

python -m venv ml-backend
source ml-backend/bin/activate
pip install -r requirements.txt

Then you can start the ML backend:

label-studio-ml start ./interactive_substring_matching

Configuration

Parameters can be set in docker-compose.yml before running the container.

The following common parameters are available:

  • BASIC_AUTH_USER - Specify the basic auth user for the model server
  • BASIC_AUTH_PASS - Specify the basic auth password for the model server
  • LOG_LEVEL - Set the log level for the model server
  • WORKERS - Specify the number of workers for the model server
  • THREADS - Specify the number of threads for the model server

Customization

The ML backend can be customized by adding your own models and logic inside the ./interactive_substring_matching directory.