A Browser Extension to hide the troll comments on YouTube – An Inside Look

Intro

My contribution to the “Hide Fedora” project has mostly been providing the web backend for the extension itself. Before I get into details as to how the web backend works I’d like to provide some insight into the purpose of the extension. And from the original author, why he made the extension:

I created the extension to learn how to make browser extensions and what better way than making something that I could actually use – Extension Creator

And his answer to the future plans with the extension is a simple “World Domination” answer.

The Whole Process

How the Hide Fedora process works.

Background

If you browse a lot of YouTube videos that you find on Reddit’s /r/videos subreddit you will notice that the comments of these YouTube videos are rather sabotaged by users identifying from /r/redditarmie (not linking this one). They post “troll” based comments posing as fedora wearing people, making fun of Reddit, and various types of people, including stealing people’s profile pictures and using them as their own. YouTube isn’t doing anything about it, and luckily I found the Hide Fedora project and noticed that the method of adding new users was rather old school.

  1. Find someone with a “Fedora” comment
  2. Go to their Google+ profile
  3. Get the Google+ profile ID from the URL
  4. Add this to a JSON file in a Github repo
  5. Submit a pull request

How we handle the “reviewing”

The extension itself was loading the JSON file from github which is not exactly the best method of loading the data, however it was still a good base. I reached out and the author agreed to use my Cloudflare backed site to provide the backend and serve the JSON file the extension uses. The process to add new ID’s transformed from the above to now a faster and more efficient setup that separates the users from the administrator actions. For the users they do the following:

  1. If a new “fedora member” pops up in the comment feed that isn’t already blocked the user simply needs to hit the “Report and Ban” button. Once they hit this the comment is removed, added to a local blacklist, and then submitted for review to our global blacklist. Here is an image of the button:

    Report and Ban Button

    Report and Ban Button

So for the user, if they see something that they believe is a fedora comment they simply click this button, and they have nothing else to worry about. From the admin standpoint here is what happens:

  1. We view a page backed by a secure login that contains the reports. The reports are sorted by the highest amount of reports first, and in the case of ties then goes by the submission date, showing oldest first.
  2. We have the following information to judge a report. Profile ID, Comment, Date, IP and # of Reports. This looks like the following:

image

        </li> 

        <li>
          If you look at the ID column you will notice this is also a link, and contains a <strong>(^)</strong> symbol at the end of it. The symbol links us to the YouTube video that the comment was reported on, and the ID itself links to the Google+ profile. This allows us to investigate reports better.
        </li>
        <li>
          We found it was too time consuming going through reports one by one, and having to click each id, check out the profile and go back and make the decision. So now what I added is the ability to hover over the ID. When you do this a box pops up populating the box with the username and profile picture of the user in question. The ajax query uses <a href="https://developers.google.com/+/api/" title="">Google+&#8217;s API</a> to get the users name and profile picture using an ajax call. Here is what the popup looks like: <p>
            <div id="attachment_331" style="width: 287px" class="wp-caption alignnone">
              <a href="/static/old-wordpress-uploads/2014/12/2014-12-19-01_08_12-Review-_-Hide-Fedora.jpg?ssl=1" data-rel="lightbox-image-2" data-rl_title="" data-rl_caption="" title=""><img class="size-full wp-image-331" src="/static/old-wordpress-uploads/2014/12/2014-12-19-01_08_12-Review-_-Hide-Fedora.jpg?resize=277%2C335&#038;ssl=1" alt="Review Popup Example" width="277" height="335" srcset="/static/old-wordpress-uploads/2014/12/2014-12-19-01_08_12-Review-_-Hide-Fedora.jpg?w=277&ssl=1 277w, /static/old-wordpress-uploads/2014/12/2014-12-19-01_08_12-Review-_-Hide-Fedora.jpg?resize=248%2C300&ssl=1 248w" sizes="(max-width: 277px) 100vw, 277px" data-recalc-dims="1" /></a>

              <p class="wp-caption-text">
                Review Popup Example
              </p>
            </div></li> 

            <li>
              Based on this data we simply approve or reject the user.
            </li></ol> 

            <h2>
              How we handle abuse
            </h2>

            <p>
              As we are targeting a rather abusive and troll oriented community we obviously have had some backlash. We have had users try to submit bulk false reports via scripting, however we now have a per ip throttle in place making those attempts less helpful. Also due to using Cloudflare most proxies will trigger a user to enter a captcha so scripting using proxies does not really work either. Users that abuse are quickly banned.
            </p>

            <p>
              The fedora members also like to use our extension and report perfectly legitimate comments/users with it. These kinds of abuse mean that there are a lot of reports with only one user reporting them meaning we get to them last. The important real fedora users are quickly removed, and the false reports are dealt with later. It is quick to spot false reports as we simply hover over the ID, make sure the user is clean, and move on. Very fast and efficient.
            </p>

            <h2>
              Other types of problems we come across
            </h2>

            <p>
              Now the problem with all that we have setup is that it is rather hard to tell a user what exactly is &#8220;right&#8221; and what is &#8220;wrong&#8221; to report. So we end up with a lot of actual spam comments on YouTube being reported. Not only that, but racial comments also tend to be reported. Even more general though, simply comments people don&#8217;t like. And a lot of times these get <strong>a lot</strong> of reports too! Here are some examples:
            </p>

            <div id="attachment_332" style="width: 829px" class="wp-caption alignnone">
              <a href="/static/old-wordpress-uploads/2014/12/2014-12-19-01_18_24-Review-_-Hide-Fedora.jpg?ssl=1" data-rel="lightbox-image-3" data-rl_title="" data-rl_caption="" title=""><img class="size-full wp-image-332" src="/static/old-wordpress-uploads/2014/12/2014-12-19-01_18_24-Review-_-Hide-Fedora.jpg?resize=672%2C43&#038;ssl=1" alt="Sample 1" width="672" height="43" srcset="/static/old-wordpress-uploads/2014/12/2014-12-19-01_18_24-Review-_-Hide-Fedora.jpg?w=819&ssl=1 819w, /static/old-wordpress-uploads/2014/12/2014-12-19-01_18_24-Review-_-Hide-Fedora.jpg?resize=300%2C19&ssl=1 300w" sizes="(max-width: 672px) 100vw, 672px" data-recalc-dims="1" /></a>

              <p class="wp-caption-text">
                There are 25 reports on this. The username and profile picture of the user is totally normal. 25 people reported this just because they didn&#8217;t like the comment, not because it met our standards for blocking.
              </p>
            </div>

            <div id="attachment_333" style="width: 836px" class="wp-caption alignnone">
              <a href="/static/old-wordpress-uploads/2014/12/2014-12-19-01_21_02-Review-_-Hide-Fedora.jpg?ssl=1" data-rel="lightbox-image-4" data-rl_title="" data-rl_caption="" title=""><img class="size-full wp-image-333" src="/static/old-wordpress-uploads/2014/12/2014-12-19-01_21_02-Review-_-Hide-Fedora.jpg?resize=672%2C52&#038;ssl=1" alt="Sample 2" width="672" height="52" srcset="/static/old-wordpress-uploads/2014/12/2014-12-19-01_21_02-Review-_-Hide-Fedora.jpg?w=826&ssl=1 826w, /static/old-wordpress-uploads/2014/12/2014-12-19-01_21_02-Review-_-Hide-Fedora.jpg?resize=300%2C23&ssl=1 300w" sizes="(max-width: 672px) 100vw, 672px" data-recalc-dims="1" /></a>

              <p class="wp-caption-text">
                Just one word, but 3 people didn&#8217;t like it. They use the report button like it is a dislike button.
              </p>
            </div>

            <div id="attachment_334" style="width: 833px" class="wp-caption alignnone">
              <a href="/static/old-wordpress-uploads/2014/12/2014-12-19-01_22_04-Review-_-Hide-Fedora.jpg?ssl=1" data-rel="lightbox-image-5" data-rl_title="" data-rl_caption="" title=""><img class="size-full wp-image-334" src="/static/old-wordpress-uploads/2014/12/2014-12-19-01_22_04-Review-_-Hide-Fedora.jpg?resize=672%2C45&#038;ssl=1" alt="Sample 3" width="672" height="45" srcset="/static/old-wordpress-uploads/2014/12/2014-12-19-01_22_04-Review-_-Hide-Fedora.jpg?w=823&ssl=1 823w, /static/old-wordpress-uploads/2014/12/2014-12-19-01_22_04-Review-_-Hide-Fedora.jpg?resize=300%2C20&ssl=1 300w" sizes="(max-width: 672px) 100vw, 672px" data-recalc-dims="1" /></a>

              <p class="wp-caption-text">
                This is clear spam yet people use our button instead of YouTube&#8217;s button to report the spam. We don&#8217;t care about spam, if we blacklisted all the spammers we&#8217;d have a huge list!
              </p>
            </div>

            <p>
              And here are what some real reports look like:
            </p>

            <p>
              <a href="/static/old-wordpress-uploads/2014/12/2014-12-19-01_20_34-Review-_-Hide-Fedora.jpg?ssl=1" data-rel="lightbox-image-6" data-rl_title="" data-rl_caption="" title=""><img class="alignnone size-full wp-image-335" src="/static/old-wordpress-uploads/2014/12/2014-12-19-01_20_34-Review-_-Hide-Fedora.jpg?resize=672%2C104&#038;ssl=1" alt="Sample 4" width="672" height="104" srcset="/static/old-wordpress-uploads/2014/12/2014-12-19-01_20_34-Review-_-Hide-Fedora.jpg?w=824&ssl=1 824w, /static/old-wordpress-uploads/2014/12/2014-12-19-01_20_34-Review-_-Hide-Fedora.jpg?resize=300%2C46&ssl=1 300w" sizes="(max-width: 672px) 100vw, 672px" data-recalc-dims="1" /></a>
            </p>

            <p>
              <a href="/static/old-wordpress-uploads/2014/12/2014-12-19-01_25_20-Review-_-Hide-Fedora.jpg?ssl=1" data-rel="lightbox-image-7" data-rl_title="" data-rl_caption="" title=""><img class="alignnone size-full wp-image-336" src="/static/old-wordpress-uploads/2014/12/2014-12-19-01_25_20-Review-_-Hide-Fedora.jpg?resize=672%2C126&#038;ssl=1" alt="Sample 6" width="672" height="126" srcset="/static/old-wordpress-uploads/2014/12/2014-12-19-01_25_20-Review-_-Hide-Fedora.jpg?w=825&ssl=1 825w, /static/old-wordpress-uploads/2014/12/2014-12-19-01_25_20-Review-_-Hide-Fedora.jpg?resize=300%2C56&ssl=1 300w" sizes="(max-width: 672px) 100vw, 672px" data-recalc-dims="1" /></a>
            </p>

            <p>
              &nbsp;
            </p>

            <h1>
              Technical Details/Stats!
            </h1>

            <p>
              Now for the fun part!
            </p>

            <h2>
              Approval rate
            </h2>

            <p>
              At the time of writing this we have just hit the 1007 blacklisted users, however we have had a total of 8156 reports. That means the approval rate on reports is <strong>12.34%</strong>. Yes, we only approve 12% of the reports we get!
            </p>

            <h2>
              How the user get the updated list
            </h2>

            <p>
              The list is loaded every time a YouTube video is loaded and is loaded from a JSON endpoint on my server. The file has a cache which is updated every hour. The cache is also updated whenever a new ID is approved. You may ask why the cache updates every hour? This is just to ensure we are serving a fresh copy in case a user is removed via the database or via an appeal. The list is stored locally with the user as well in case the server is down.
            </p>

            <p>
              As soon as we approve an ID the user gets this new blacklist when they load a YouTube video. Fairly instant
            </p>

            <h2>
              How much traffic do we get?
            </h2>

            <p>
              According to the <a href="https://chrome.google.com/webstore/detail/hide-fedora/acjgabfifnnmmlckmnijdbijgbfpedde" title="">Chrome</a>, <a href="https://addons.mozilla.org/en-US/firefox/addon/hide-fedora/" title="">Firefox</a>, and <a href="https://addons.opera.com/en/extensions/details/hide-fedora/" title="">Opera </a>pages we have 16 859 active users. Using <a href="http://www.awstats.org/" title="">AWStats </a>to analyze my servers log files to look at all of our requests we actually get a lot of usage!
            </p>

            <h3>
              Traffic Summaries for month of December (20 days)
            </h3>

            <h4>
              Summary
            </h4>

            <div id="attachment_337" style="width: 409px" class="wp-caption alignnone">
              <a href="/static/old-wordpress-uploads/2014/12/2014-12-19-01_40_32-Statistics-for-jhvisser.com-2014-12-main.jpg?ssl=1" data-rel="lightbox-image-8" data-rl_title="" data-rl_caption="" title=""><img class="size-full wp-image-337" src="/static/old-wordpress-uploads/2014/12/2014-12-19-01_40_32-Statistics-for-jhvisser.com-2014-12-main.jpg?resize=399%2C59&#038;ssl=1" alt="We have served 8 253 194 requests, most of which are to the JSON page." width="399" height="59" srcset="/static/old-wordpress-uploads/2014/12/2014-12-19-01_40_32-Statistics-for-jhvisser.com-2014-12-main.jpg?w=399&ssl=1 399w, /static/old-wordpress-uploads/2014/12/2014-12-19-01_40_32-Statistics-for-jhvisser.com-2014-12-main.jpg?resize=300%2C44&ssl=1 300w" sizes="(max-width: 399px) 100vw, 399px" data-recalc-dims="1" /></a>

              <p class="wp-caption-text">
                We have served 8 253 194 requests, most of which are to the JSON page.
              </p>
            </div>

            <h4>
              Growth
            </h4>

            <p>
              As we began to rise in popularity you can see the amount of requests rise
            </p>

            <div id="attachment_338" style="width: 752px" class="wp-caption alignnone">
              <a href="/static/old-wordpress-uploads/2014/12/2014-12-19-01_42_25-Statistics-for-jhvisser.com-2014-12-main.jpg?ssl=1" data-rel="lightbox-image-9" data-rl_title="" data-rl_caption="" title=""><img class="size-full wp-image-338" src="/static/old-wordpress-uploads/2014/12/2014-12-19-01_42_25-Statistics-for-jhvisser.com-2014-12-main.jpg?resize=672%2C675&#038;ssl=1" alt="We serve an average of 5GB per day!" width="672" height="675" srcset="/static/old-wordpress-uploads/2014/12/2014-12-19-01_42_25-Statistics-for-jhvisser.com-2014-12-main.jpg?w=742&ssl=1 742w, /static/old-wordpress-uploads/2014/12/2014-12-19-01_42_25-Statistics-for-jhvisser.com-2014-12-main.jpg?resize=150%2C150&ssl=1 150w, /static/old-wordpress-uploads/2014/12/2014-12-19-01_42_25-Statistics-for-jhvisser.com-2014-12-main.jpg?resize=300%2C300&ssl=1 300w, /static/old-wordpress-uploads/2014/12/2014-12-19-01_42_25-Statistics-for-jhvisser.com-2014-12-main.jpg?resize=160%2C160&ssl=1 160w" sizes="(max-width: 672px) 100vw, 672px" data-recalc-dims="1" /></a>

              <p class="wp-caption-text">
                We serve an average of 5GB per day!
              </p>
            </div>

            <h4>
              Hourly
            </h4>

            <p>
              <a href="/static/old-wordpress-uploads/2014/12/2014-12-19-01_43_17-Statistics-for-jhvisser.com-2014-12-main.jpg?ssl=1" data-rel="lightbox-image-10" data-rl_title="" data-rl_caption="" title=""><img class="alignnone size-full wp-image-339" src="/static/old-wordpress-uploads/2014/12/2014-12-19-01_43_17-Statistics-for-jhvisser.com-2014-12-main.jpg?resize=662%2C411&#038;ssl=1" alt="2014-12-19 01_43_17-Statistics for jhvisser.com (2014-12) - main" width="662" height="411" srcset="/static/old-wordpress-uploads/2014/12/2014-12-19-01_43_17-Statistics-for-jhvisser.com-2014-12-main.jpg?w=662&ssl=1 662w, /static/old-wordpress-uploads/2014/12/2014-12-19-01_43_17-Statistics-for-jhvisser.com-2014-12-main.jpg?resize=300%2C186&ssl=1 300w" sizes="(max-width: 662px) 100vw, 662px" data-recalc-dims="1" /></a>
            </p>

            <h4>
              Page-URL
            </h4>

            <p>
              This is where you can see that the JSON is being loaded the most:
            </p>

            <p>
              <a href="/static/old-wordpress-uploads/2014/12/2014-12-19-01_44_32-Statistics-for-jhvisser.com-2014-12-main.jpg?ssl=1" data-rel="lightbox-image-11" data-rl_title="" data-rl_caption="" title=""><img class="alignnone size-full wp-image-340" src="/static/old-wordpress-uploads/2014/12/2014-12-19-01_44_32-Statistics-for-jhvisser.com-2014-12-main.jpg?resize=672%2C107&#038;ssl=1" alt="2014-12-19 01_44_32-Statistics for jhvisser.com (2014-12) - main" width="672" height="107" srcset="/static/old-wordpress-uploads/2014/12/2014-12-19-01_44_32-Statistics-for-jhvisser.com-2014-12-main.jpg?w=1163&ssl=1 1163w, /static/old-wordpress-uploads/2014/12/2014-12-19-01_44_32-Statistics-for-jhvisser.com-2014-12-main.jpg?resize=300%2C48&ssl=1 300w, /static/old-wordpress-uploads/2014/12/2014-12-19-01_44_32-Statistics-for-jhvisser.com-2014-12-main.jpg?resize=1024%2C164&ssl=1 1024w" sizes="(max-width: 672px) 100vw, 672px" data-recalc-dims="1" /></a>
            </p>

            <p>
              You can see that the average size of the JSON page is 13.22KB. A fairly small file. The <em>submit.php </em>is the endpoint to submit reports. The <em>ajax.php</em> is used for us to approve or reject reports (authenticated page). And the <em>getProfile.php</em> is what is used to grab the profile info for the reported ID&#8217;s.
            </p>

            <p>
              &nbsp;
            </p>

            <h1>
              New stuff
            </h1>

            <p>
              Just some recent changes that have been added that required some code
            </p>

            <h2>
              Store profile name and profile picture (url)
            </h2>

            <p>
              Recently I added the ability to now fetch users profile name and image on submission so we can avoid even needing the hover element as featured above anymore. It works great but I also needed to get this data for over 12000 rows, so I ran a cronjob every so often catching up with the backlog. We want to ensure our records are all up to date, even if the report is completed. I ran it on a cronjob to avoid sending over 12 000 api requests to Google within a short time. Here is a little example of it in action:
            </p>

            <p>
              <a href="http://imgur.com/Cvsrzs2"><img title="source: imgur.com" src="https://i2.wp.com/i.imgur.com/Cvsrzs2.gif?w=672" alt="" data-recalc-dims="1" /></a>
            </p>

            <h1>
              End
            </h1>

            <p>
              I hope you enjoyed reading all of this! If you did read this far, you are free to leave a comment below if you are interested in joining our review team. We get an increasing amount of reports each day and are always looking for volunteers to come on even just once a day to review a few reports. If you have any questions about the post you can also feel free to leave your comments below as well!
            </p>
Enhancing the University of Guelph WebAdvisor Site NetZero Hackathon – Community Spark (2nd Place!)