Word Obfuscator

Started by ., Dec 26, 2015, 03:10 AM

Previous topic - Next topic

.

This is part of a new snippet I've started to work on tonight for better bad words detection. Because it seems people really want the server to do all the work. And I'm not pleased with the current approaches. Anyway, I had to think a bit out of the box to come up with a few solutions that don't kill the CPU on massive amounts of filtered words. I'm currently testing the ideas and these snippets are stripped out of that system in the hope that they might be useful to some.

The first thing I had to do is to create a list of all possible bad words and all their possible combinations in l33t. I wasn't that stupid to do it manually so I had to make a small function to automate this job for me. I mean, that's what programming is for, right?

What you have here is a l33t translator that creates every possible l33t combination of a certain word:
local g_l33tDb = array(256);

g_l33tDb['a'] = ["4", "@", "/-\\"];
g_l33tDb['b'] = ["8", "|3"];
g_l33tDb['c'] = ["("];
g_l33tDb['d'] = ["|)"];
g_l33tDb['e'] = ["3"];
g_l33tDb['f'] = ["|=", "pH"];
g_l33tDb['g'] = ["9", "6"];
g_l33tDb['h'] = ["|-|", "#"];
g_l33tDb['i'] = ["1", "|", "!"];
g_l33tDb['j'] = [";"];
g_l33tDb['k'] = ["|{", "|<"];
g_l33tDb['l'] = ["|_", "[]_", "|"];
g_l33tDb['m'] = ["|\\/|", ")v("];
g_l33tDb['n'] = ["|\\|", "/\\/"];
g_l33tDb['o'] = ["0", "()"];
g_l33tDb['p'] = ["|>"];
g_l33tDb['q'] = ["0,"];
g_l33tDb['r'] = ["|2"];
g_l33tDb['s'] = ["5", "$"];
g_l33tDb['t'] = ["+", "7"];
g_l33tDb['u'] = ["|_|", "\\_/"];
g_l33tDb['v'] = ["\\/"];
g_l33tDb['w'] = ["\\/\\/", "\\X/ ", "\\^/"];
g_l33tDb['x'] = ["><"];
g_l33tDb['y'] = ["'/"];
g_l33tDb['z'] = ["2"];

function WordObfuscator(word)
{
    if (typeof(word) != "string")
    {
        throw "Expected word to be string but got" + typeof(word);
    }
    else if (word.len() <= 0)
    {
        return [word];
    }
    else
    {
        word = word.tolower();
    }

    local words = [word], tmpw = [];
    local i = 0, j = 0, k = 0, s = 0, c = 0, r = "";

    for (; i < word.len(); ++i)
    {
        c = word[i];

        if (g_l33tDb[c] == null) continue;

        tmpw.clear();
        for (j = 0, k = g_l33tDb[c].len(); j < k; ++j)
        {
            foreach (w in words)
            {
                s = w.find(format("%c", c));
                r = w.slice(0, s);
                r += g_l33tDb[c][j];
                r += w.slice(s+1);
                tmpw.push(r);
            }
        }
        words.extend(tmpw);
    }

    return words;
}

And taking the following example of a bad word:
local wa = WordObfuscator("anus");

foreach (w in wa) print(w);

Would produce the following output:
[SCRIPT]  anus
[SCRIPT]  4nus
[SCRIPT]  @nus
[SCRIPT]  /-\nus
[SCRIPT]  a|\|us
[SCRIPT]  4|\|us
[SCRIPT]  @|\|us
[SCRIPT]  /-\|\|us
[SCRIPT]  a/\/us
[SCRIPT]  4/\/us
[SCRIPT]  @/\/us
[SCRIPT]  /-\/\/us
[SCRIPT]  an|_|s
[SCRIPT]  4n|_|s
[SCRIPT]  @n|_|s
[SCRIPT]  /-\n|_|s
[SCRIPT]  a|\||_|s
[SCRIPT]  4|\||_|s
[SCRIPT]  @|\||_|s
[SCRIPT]  /-\|\||_|s
[SCRIPT]  a/\/|_|s
[SCRIPT]  4/\/|_|s
[SCRIPT]  @/\/|_|s
[SCRIPT]  /-\/\/|_|s
[SCRIPT]  an\_/s
[SCRIPT]  4n\_/s
[SCRIPT]  @n\_/s
[SCRIPT]  /-\n\_/s
[SCRIPT]  a|\|\_/s
[SCRIPT]  4|\|\_/s
[SCRIPT]  @|\|\_/s
[SCRIPT]  /-\|\|\_/s
[SCRIPT]  a/\/\_/s
[SCRIPT]  4/\/\_/s
[SCRIPT]  @/\/\_/s
[SCRIPT]  /-\/\/\_/s
[SCRIPT]  anu5
[SCRIPT]  4nu5
[SCRIPT]  @nu5
[SCRIPT]  /-\nu5
[SCRIPT]  a|\|u5
[SCRIPT]  4|\|u5
[SCRIPT]  @|\|u5
[SCRIPT]  /-\|\|u5
[SCRIPT]  a/\/u5
[SCRIPT]  4/\/u5
[SCRIPT]  @/\/u5
[SCRIPT]  /-\/\/u5
[SCRIPT]  an|_|5
[SCRIPT]  4n|_|5
[SCRIPT]  @n|_|5
[SCRIPT]  /-\n|_|5
[SCRIPT]  a|\||_|5
[SCRIPT]  4|\||_|5
[SCRIPT]  @|\||_|5
[SCRIPT]  /-\|\||_|5
[SCRIPT]  a/\/|_|5
[SCRIPT]  4/\/|_|5
[SCRIPT]  @/\/|_|5
[SCRIPT]  /-\/\/|_|5
[SCRIPT]  an\_/5
[SCRIPT]  4n\_/5
[SCRIPT]  @n\_/5
[SCRIPT]  /-\n\_/5
[SCRIPT]  a|\|\_/5
[SCRIPT]  4|\|\_/5
[SCRIPT]  @|\|\_/5
[SCRIPT]  /-\|\|\_/5
[SCRIPT]  a/\/\_/5
[SCRIPT]  4/\/\_/5
[SCRIPT]  @/\/\_/5
[SCRIPT]  /-\/\/\_/5
[SCRIPT]  anu$
[SCRIPT]  4nu$
[SCRIPT]  @nu$
[SCRIPT]  /-\nu$
[SCRIPT]  a|\|u$
[SCRIPT]  4|\|u$
[SCRIPT]  @|\|u$
[SCRIPT]  /-\|\|u$
[SCRIPT]  a/\/u$
[SCRIPT]  4/\/u$
[SCRIPT]  @/\/u$
[SCRIPT]  /-\/\/u$
[SCRIPT]  an|_|$
[SCRIPT]  4n|_|$
[SCRIPT]  @n|_|$
[SCRIPT]  /-\n|_|$
[SCRIPT]  a|\||_|$
[SCRIPT]  4|\||_|$
[SCRIPT]  @|\||_|$
[SCRIPT]  /-\|\||_|$
[SCRIPT]  a/\/|_|$
[SCRIPT]  4/\/|_|$
[SCRIPT]  @/\/|_|$
[SCRIPT]  /-\/\/|_|$
[SCRIPT]  an\_/$
[SCRIPT]  4n\_/$
[SCRIPT]  @n\_/$
[SCRIPT]  /-\n\_/$
[SCRIPT]  a|\|\_/$
[SCRIPT]  4|\|\_/$
[SCRIPT]  @|\|\_/$
[SCRIPT]  /-\|\|\_/$
[SCRIPT]  a/\/\_/$
[SCRIPT]  4/\/\_/$
[SCRIPT]  @/\/\_/$
[SCRIPT]  /-\/\/\_/$

It shouldn't take you too much to realize that each of the outputted word means exactly what was inputted. Except it was combined with different symbols to make it harder for censor systems to catch them. You can clearly see that it becomes really hard to come up with a way of writing a bad word that this function hasn't already generated. If you try harder than that then the word won't be understood anyway.

I apologize for the bad words. This was just as an example.
.

DizzasTeR


Xmair

Seriously, you're a genius.

Credits to Boystang!

VU Full Member | VCDC 6 Coordinator & Scripter | EG A/D Contributor | Developer of VCCNR | Developer of KTB | Ex-Scripter of EAD

KAKAN

oh no

EK.IceFlake

#4
slc: hey look my new system you cant type anus no matter how obfuscated
nerd: a-n.u-s

DizzasTeR

Quote from: NE.CrystalBlue on Dec 26, 2015, 08:45 AMslc: hey look my new system you cant type anus no matter how obfuscated
nerd: a-n.u-s

So you're saying no matter how much one tries, you're going to be a b1tch and try to swear anyway? :D

KAKAN

Quote from: NE.CrystalBlue on Dec 26, 2015, 08:45 AMslc: hey look my new system you cant type anus no matter how obfuscated
nerd: a-n.u-s
Do you know why he posted this? Actually you don't. That's why you posted it, right?
If not, then tell me why did he post it?
oh no

.

@NE.CrystalBlue nobody is keeping you from creating all those versions yourself. Take it one by one and begin:
a.nus
a..nus
a...nus
a.n-us
a..n--us
a...n--us
a.n-u=s
a..n--u==s
a...n--u==s
... and many more ascii characters to substitute...

And while you can understand it since you know what it's supposed to mean. When I look at this "a-n.u-s" I'd be asking if it's a clan name or something. Because it looks like an acronym from all the separation.
.

EK.IceFlake

Quote from: Doom_Kill3R on Dec 26, 2015, 09:17 AM
Quote from: NE.CrystalBlue on Dec 26, 2015, 08:45 AMslc: hey look my new system you cant type anus no matter how obfuscated
nerd: a-n.u-s

So you're saying no matter how much one tries, you're going to be a b1tch and try to swear anyway? :D
I'm obviously referring to the nerd and not me