Building AI Content Moderation with Human-in-the-Loop

If you want to ensure accuracy and scalable trust in your forum, learn how building AI content moderation with the human-in-the-loop process can achieve that.

You no longer have the option of choosing whether you want to implement content moderation or not. As your platform scales up, the number of content generated by its users also grows faster than your team can handle manually. This is where you look towards AI for help, but keep in mind that automation is not enough.

To implement a modern moderation system, you have to combine artificial intelligence with human judgement and that will give you the most accurate results. Your human intuitions will bring in fairness and awareness towards the context of the situation. It is now the standard for the industry to use an approach known as human-in-the-loop or HITL.

This hybrid system will allow you to manage your platform when it is overcrowded with complex content. So, this guide will show you how to implement a system that is both reliable and scalable and teach you the best practices that most leading platforms use today.

Start with Smart Detection Systems

These days everyone is chronically online. As a result you will see a high visit rate in your platform, some of which may be unwanted. The is where you will benefit from having an AI moderation system. For you to have a sustainable moderation system with human-in-the-loop, you have to have an accurate detection system.

You need the help of AI tools to handle a few tasks for you. An AI detector can assist you in identifying patterns of posts made by people and classify each content type separately. It can also help you to prioritize moderation queues efficiently. But remember, through detection you are not aiming for perfection.

You want your AI detection system to catch the most harmful content circling on your forum. Even if it makes a few mistakes, it is better that your tool detects the more risky content. However, in doing so your detection tool should not wrongly flag safe content. This will bother your users and can reduce traffic on your forum.

Where your human input comes in is re-reading or re-watching the content that has been flagged. You can also set the moderation rule that is to be applied to your forum and make the final decision.

Design a Tiered Moderation Workflow

When you are running a platform you want strategic approaches to strenghten customer retention. A tiered moderation workflow can work as a 3-level filter system with human-in-the-loop, that does not drive traffic away from your website.

Tier one can be automated filtering where your AI tool works on their own without your aid. This is particularly for the obviously bad content coming into your forum. Something you can code in to be filtered automatically are spam links, inappropriate images or words that you have identified as harmful. These are easy for your AI tools to identify and remove instantly.

Tier two is when AI flags something as harmful but they are not completely sure of it. These are content that we might identify as not bad but not fully safe. You can recognize them as a comment that might offend some groups or joke someone made that could be misunderstood. This is where your AI tool flags it, which means it will be temporarily hidden before you can review it and make the decision to either remove or make it visible again.

Tier three is where you have to let a human make the complex decisions and you cannot trust AI to handle it. When a person flags someone else’s content you need to understand the intent it was posted with. It may also include content that was not clearly mentioned in the policy. So, you have to read go through the write-up or video and make the decision completely by yourself.

Build Feedback Loops That Improve AI

AI has not built itself. Everything the AI tools know is everything we humans have fed into its system. Everytime you let your visitors correct the AI on a wrong decision it has made, the system learns and becomes more efficient for you and you can effectively implement the human-in-the-loop sytem.

For example, let’s say your AI tool flags a painting from hundreds of years ago as harmful and your visitors collectively say that no it is not, your system will learn to exclude these type of content from getting flagged in the future.

Another example from the opposite spectrum can be that your human audience notices a new harmful trend of slangs that the AI tool has not picked up on yet. Their feedback can teach your system to flag those words in the future.

As your human audiences teaches, your AI tool learns. Slowly but surely your system will learn to make fewer mistakes and give you faster decisions that will match with your policies much better.

Define Clear Moderation Policies

Once you have your website up and running and you decide to implement an AI moderation system with human-in-the-loop, one thing to remember is that the tool will be as effective as the policies you put out. You should set out clear rules, meaning they should be specific, measurable and consistent.

What does it actually mean to have specific rules? It means you should clearly define to your AI tool what is acceptable as content and what should be immediately discarded. Your AI system should be aware what type of posts comments are threats of violence and which ones are considered a common joke. The more specific you are with your policies the more efficient your AI assistant will be.

For a rule to be measurable, your AI system must be able to check it easily. For example, if someone is using an explicit image, or language they should not be using in a public forum, your AI tool can then easily flag them. AI cannot read through texts and find out intent and make its decision. So as long as it is measurable, you will get efficient results from your tool.

Consistency comes when you train your AI model to act the same way towards the same type of content. As long as they make the same decision everytime, your audience will not feel a sense of bias from your website.

Use Confidence Scores to Route Decisions

A confidence score system will make it easier for you to review decisions made by your AI tool. This makes it easier for you to implement the human-in-the-loop system to decide and prioritize which ones to review first and which ones can be declined. But what exactly is a confidence score system?

To start off with, you can set a scoring system to your will. AI will share its confidence level that a content is harmful based on your system. Let’s say your given conditions are as follows, when your AI tool says it is 90% sure it means that the content is very likely harmful, 50% sure means unsure and 20% sure means it is likely safe.

Once you get a score of 90% or above you can direct your AI tool to automatically delete the content. You can give it another check later for clarity. However, if the confidence level falls to below 50%, this is where you tell the AI tool to flag it for review and you have to manually check and decide whether it was actually harmful or just a mistake.

With this system you will clearly understand when the AI tool is confused and needs your assistance. Based on the score, you will also be able to decide which content needs monitoring first and which ones can wait.

Train Human Moderators Effectively

If you are running an ecommerce business or a social media website, you will need a team to run it by you. Everyday, thousands of posts come in, some of which might not be appropriate for your forum. It would be impossible for you to review them all by yourself.

This is why it is safe for you have a team of moderators to share the burden with you and implement a human-in-the-loop system. However, they are not just reviewers and will need training and support from you. Set up clear guidelines for your moderators. Teach them what is allowed and what you will not accept. If you do not set proper guidelines, their decisions will become inconsistent.

Training your team of moderators once will not be enough for you to expect consistent results. Sometimes you might change the rules yourself and sometimes new type of content surfaces which your team might have no experience on. Regular training will also allow them to not forget which content is acceptable and which ones are not.

You must remember that your team of moderators will view content on a daily basis that might be disturbing or traumatizing for them. In such instances, they will need your support. You can arrange mental councelling for the ones who need help. You can also rotate their shifts so that one person is not taking too much of the burden. At the end of the day, your team will be as good to you as you are to them.

Final Thoughts

Having a human-in-the-loop system means implementing an AI moderation system where you actively participate in the decision making. The best way to introduce AI models is to first use it as a detection tool while you are in the back seat making all the decisions.

You can also design a tiered model system. This way in the first tier AI can take care of the easy steps for you, while in the second and third tiers you step in with your expertise to make the final call. You can also bring in your audience into the system by letting them correct your AI tool when it makes a wrong decision.

However, for your AI system to efficiently work, you have set the to rules precisely and in a way that is easy for machine learning algorithm to pick up on. You can use confidence scores to measure how confident your AI system is about a content being harmful. That way you will be able to prioritize your decision as well.

Lastly, if you have a team of moderators to help you with the reviews, make sure you define your policies clearly to them and regularly train them. You should also support them to have the most efficient AI moderation system with human-in-the-loop.