We offer refinery both on public and private cloud as well as on-prem.
What is weak supervision?
A technique/methodology to integrate different kinds of noisy and imperfect heuristics like labeling functions. It can be used not only to automate data labeling, but generally as an approach to improve your existing label quality.
What is active learning?
As the name suggests, in Active Learning, models are trained during the labeling process. This way, the learning model can continuously make predictions on the data, helping both in auto-labeling confident data and identifying critical records. The latter is used e.g. for query scheduling, making use of all available information to pick the next records to be labeled.
What is confident learning?
Real-world training data isn’t labeled 100% correct. Even datasets like MNIST, a well-known toy dataset to help new ML engineers enter the field, aren’t without errors. The field of confident learning aims to detect records which are either mislabeled or could be interpreted in multiple ways. With higher data quality, models can learn to make the right decisions in difficult cases.
What exactly is a heuristic?
Heuristics are the ingredients for scaling your data labeling. They don't have to be 100% accurate, heuristics can be e.g. simple Python functions expressing some domain knowledge. When you add and run several of these heuristics, you create what is called a noisy label matrix, that is matched against the reference data that you manually labeled. This allows us to analyze correlations, conflicts, overlaps, the number of hits for a data set, and the accuracy of each heuristic.
How do I know whether my heuristic is good?
A heuristic can be “good” with respect to both coverage and precision. For coverage there basically is no limitation at all, for precision we generally recommend some value above 70%, depending on how many heuristics you have. The more heuristics you have, the more overlaps and conflicts will be given, the better weak supervision can work.
If you already automatically label data, why should I train a model at all?
Technically, you could use our program for inference. However, best results are achieved if a supervised learning model is trained on the generated labels, as these models improve generalization. It’s just a best practice.
Which data formats are supported?
We’ve structured our data formats around JSON, so you can upload most file types natively. This includes spreadsheets, text files, CSV data, generic JSON and many more.
I don’t know whether my data would work - who can I contact?
No worries, we’re always happy to help. Just send a message to the chat in your bottom right corner, and someone from our team will gladly help.
How fast will I get my results?
Heuristics typically run for few seconds to minutes, depending on the payload of your data. As we run functions in containerized environments and enrich text data using spaCy, it might take more time than running them on your local machine. The computation of weak supervision also takes few seconds to minutes.
I have less than 1,000 records - do I need this?
Our system is well designed for scalability, but you can definitely also face the benefits with low amounts of data. We provide an intuitive multi-task labeling interface, extensive data management capabilities, well-written documentation and world class-support.
I don’t want to label my data myself - can I outsource this with your tool?
We’ll gladly help you with the data labeling. Check out our pricing options, and reach out to us.
How can I reach support?
The easiest way is to use the chat in the bottom right corner of your browser. Someone from our team will contact you within minutes. Alternatively, you can just send a message to firstname.lastname@example.org. Henrik, one of our co-founders, will be in contact with you as soon as possible.
Are you offering consulting or workshops?
Yes, we offer consulting and workshops depending on the size of your project. In this, we offer custom labeling solutions and workshops with best practices on labeling, to ensure high data quality right from the beginning of your project.
Kern AI is highly secure, as we follow industry-leading best practices to keep all of your data secure.
How is my data encrypted?
All of your data is encrypted at transfer using HTTPS in order to protect requests from eavesdrop and man-in-the-middle attacks. Additionally, your data is encrypted at rest using AES-256, securing your data from unauthorized access.
How often are backups created?
We use a managed database for production, which automatically creates backups in form of snapshots from the data every day.
Where are the data centers located?
Our application solely runs on three AWS availability zones (data centers) located in Frankfurt, Germany. AWS data centers maintain state-of-the-art physical security, including 24x7x365 surveillance, environmental protection, and extensive secure access policies.
On which OS is the application running?
Kern AI servers run in recent Linux OS releases with Long Term Support policies and are regularly updated. Our engineering team monitors uptime and is able to quickly act if errors occur.
How do you ensure operational security?
Only a small number of authorized employees can access user data. Accessing users’ accounts by kern employees is only allowed in exceptional cases, always with your prior permission and for the purpose of resolving a specific issue only.
We use specialized tools for storing and sharing passwords and other sensitive data and require our employees to use Multi-Factor authentication for all tools where possible.
Can we use Multi-Factor Authorization?
We provide your users to enable MFA for login to reduce friction and increase security. Additionally, we use a security stack that detects whether your password has been leaked in a recent data breach, and validates that used passwords are secure.
Is the application available on private cloud or on-premises?
Our free version is available on public cloud only. For private cloud or an on-premises solution, please contact sales.
I have some further questions about your security - who can I contact?
For all further questions, please contact email@example.com.