Reimagine AI development

No matter whether you are starting from scratch or are improving on existing use cases — our next-gen data platform enhances your AI dev, and turns data into applications within hours.
Screenshot of the onetask product dashboard.

Combining data labeling and data management

kern shines where other approaches fail. We enable data-driven use cases in numerous industries and domains. If needed, fully inhouse. On public and private cloud or on-premises.
Information Integration
The core of kern is Weak Supervision, a technique to automatically integrate noisy data heuristics. This enables 100x labeling speed.
Intelligent Slicing
As we enrich your records with valuable metadata, this data can be prioritized and sliced. This ensures both saving time and increasing quality.
Data Debugging
No manual labeling is 100% correct. We identify potential labeling errors by applying Confident Learning, to ensure highest possible quality.
Closing Relevance Gaps
kern is designed to integratae subject matter experts into the AI development cycle. Work in collaboration to solve actual pain points.
Seamless Integration
Automate data uploads and downloads and integrate kern into your development cycle with an intuitive API and Python SDK.
Data Privacy
The security of your data is our top priority. We offer kern both on public and private cloud as well as on-premises, ensuring high quality data security.
 from sklearn.tree import DecisionTreeClassifier


  class MyClassifier:
      def __init__(self):
          self.model = DecisionTreeClassifier()

      def fit(self, training_corpus):
          embeddings = training_corpus["embeddings"]["bert-base-uncased"]
          labels = training_corpus["labels"]["manual"]

          embeddings = embeddings[: (len(labels))]

          self.model.fit(embeddings, labels)

      def predict(self, prediction_corpus):
          embeddings = prediction_corpus["bert-base-uncased"]
          return self.model.predict_proba(embeddings)


  import knowledge


  def lkp_feature_keywords(record):
      for token in record["review"]:
          if token.text in knowledge.feature_terms:
              return "Feature"

Data Programming

With kern, you can label tons of data within hours or days. Potential heuristics to develop labels are:
  • Active Learning models, fully available in kern
  • User-defined labeling functions
  • 3rd Party applications
  • Legacy systems
  • Crowdlabeled records
Each of them can be integrated within minutes. Through intelligent analysis, this ensures high quality training data.

Diverse Data Formats

Our labeling solution is designed to work with any kind of JSON structure. This means that we can work with many different formats, such as CSV files, texts, images or even time series data.
If you are not sure whether your data format is valid for kern, just leave us a note.
Request demo

Frequently Asked Questions

What is Weak Supervision?

Put simple, Weak Supervision is an automated and intelligent integration of information sources. These sources don’t have to be perfect, i.e. can be rules of thumb. For instance, Python functions to describe textual patterns, Active Learning models or some external information source like a 3rd party application or Crowd Labeling. From these informations, cleansed weakly supervised labels can be derived.

What is Active Learning?

As the name suggests, in Active Learning, models are trained during the labeling process. This way, the learning model can continuously make predictions on the data, helping both in auto-labeling confident data and identifying critical records. The latter is used e.g. for query scheduling, making use of all available information to pick the next records to be labeled.

What is Confident Learning?

Real-world training data isn’t labeled 100% correct. Even datasets like MNIST, a well-known toy dataset to help new ML engineers enter the field, aren’t without errors. The field of Confident Learning aims to detect records which are either mislabeled or could be interpreted in multiple ways. With higher data quality, models can learn to make the right decisions in difficult cases.

What exactly is an information source?

Information sources are the ingredients for scaling your data labeling. You can think of them as heuristics associated with labeling, but they don't have to be 100% accurate, e.g. simple Python functions expressing some domain knowledge. When you add and run several of these sources, you create what is called a noisy label matrix, that is matched against the reference data that you manually labeled. This allows us to analyze correlations, conflicts, overlaps, the number of hits for a data set, and the accuracy of each information source.

How do I know whether my information source is good?

An information source can be “good” with respect to both coverage and precision. For coverage there basically is no limitation at all, for precision we recommend some value above 70%, depending on how many information sources you have. In general, the more information sources you have, the more overlaps and conflicts will be given, the better the information integration can work.

If you already automatically label data, why should I train a model at all?

Technically, you could use our program for inference. However, best results are achieved if a Supervised Learning model is trained on the generated labels, as these models improve generalization. It’s just a best practice.

Is your software limited to classifications?

No, you can do single- and multilabel multiclass-classifications as well as named entity recognition. We’re currently aiming to implement further labeling tasks in the area of NLP, such as entity linkage. If you have any custom labeling task you need, let us know.

Which data formats are supported?

We’ve structured our data formats around JSON, so you can upload most file types natively. This includes spreadsheets, text files, CSV data, generic JSON and many more.

I don’t know whether my data would work - who can I contact?

No worries, we’re always happy to help. Just send a message to the chat in your bottom right corner, and someone from our team will gladly help.

How fast will I get my results?

Information sources typically run for few seconds to minutes, depending on the payload of your data. As we run functions in containerized environments and enrich text data using SpaCy, it might take more time than running them on your local machine. The computation of Weak Supervision also takes few seconds to minutes.

I have less than 1,000 records - do I need this?

Our system is well designed for scalability, but you can definitely also face the benefits with low amounts of data. We provide an intuitive multi-task labeling interface, extensive data management capabilities, well-written documentation and world class-support.

I don’t want to label my data myself - can I outsource this with your tool?

We’ll gladly help you with the data labeling. To do so, please contact our support team using the chat in the bottom right corner of your browser.

How can I reach support?

The easiest way is to use the chat in the bottom right corner of your browser. Someone from our team will contact you within minutes. Alternatively, you can just send a message to h.wenck@kern.ai. Henrik, one of our co-founders, will be in contact with you as soon as possible.

Are you offering consulting or workshops?

Yes, we offer consulting and workshops depending on the size of your project. In this, we offer custom labeling solutions and workshops with best practices on labeling, to ensure high data quality right from the beginning of your project.

kern is highly secure, as we follow industry-leading best practices to keep all of your data secure.

How is my data encrypted?

All of your data is encrypted at transfer using HTTPS in order to protect requests from eavesdrop and man-in-the-middle attacks. Additionally, your data is encrypted at rest using AES-256, securing your data from unauthorized access.

How often are backups created?

We use a managed database for production, which automatically creates backups in form of snapshots from the data every day.

Where are the data centers located?

Our application solely runs on three AWS availability zones (data centers) located in Frankfurt, Germany. AWS data centers maintain state-of-the-art physical security, including 24x7x365 surveillance, environmental protection, and extensive secure access policies.

On which OS is the application running?

kern servers run in recent Linux OS releases with Long Term Support policies and are regularly updated. Our engineering team monitors uptime and is able to quickly act if errors occur.

How do you ensure operational security?

Only a small number of authorized employees can access user data. Accessing users’ accounts by kern employees is only allowed in exceptional cases, always with your prior permission and for the purpose of resolving a specific issue only.

We use specialized tools for storing and sharing passwords and other sensitive data and require our employees to use Multi-Factor authentication for all tools where possible.

Can we use Multi-Factor Authorization?

We provide your users to enable MFA for login to reduce friction and increase security. Additionally, we use a security stack that detects whether your password has been leaked in a recent data breach, and validates that used passwords are secure.

Is the application available on private cloud or on-premises?

Our free version is available on public cloud only. For private cloud or an on-premises solution, please contact sales.

I have some further questions about your security - who can I contact?

For all further questions, please contact h.wenck@kern.ai.

Become a data pioneer now

Algorithms aren’t the bottlenecks. It’s data. We shorten AI development from months to days by programmatically scaling your data labeling tasks.