RE: Topic Blockers ⁂ starbreaker.org

In Topic Blockers, Manuel Moreale writes:

Anyway, one thing I’d love to have — and I know that’s something basically impossible to build — is a topic blocker. I’d happily pay money for a tool that allows me to completely hide certain topics from the web.

I can’t help but wonder if this would really be as hard to implement as Manuel thinks. For one thing, I would love to have my browser and feed reader not load pages or articles whose URLs or text contains one or more of a set of keywords that I have specified. For example, I don’t want to ever read anything posted on Substack or read about some bullshit some random asshole posted on Twitter or Reddit. Or, on a less serious note, maybe I’m sick of hearing about a particular celebrity and want to filter out any mention of them.

While I don’t like Mastodon because it repeats most of Twitter’s mistakes, here’s credit where it’s due: they provide solid keyword filtering so that you can maintain some control over your feed. If a half-assed knockoff of Twitter can do it, why can’t browsers and feed readers?

There must be a way. After all, I can filter by keyword with grep on a GNU/Linux or BSD system, for Crom’s sake.

This is a MEWNIX system. My cat knows this.

For example, if I wanted to search a single text file on a real (as in POSIX-compatible) computer for all lines that did not contain the word ‘fuck’, I could do the following:

grep -v fuck ~/foo.txt

Suppose I wanted to look for lines in a file where I don’t any of George Carlin’s “Seven Dirty Words”: shit, piss, fuck, cunt, cocksucker, motherfucker, and tits. Well, grep lets you specify multiple expressions with the -e flag, like so:

grep -v -e "shit" -e "piss" -e "fuck" -e "cunt" -e "cocksucker" -e "tits" ~/foo.txt

Careful readers may have noticed that I did not include ‘motherfucker’ in my multi-expression command above. While I could have done so, I didn’t bother to be explicit because ‘motherfucker’ contains ‘fuck’. grep -v should exclude all lines containing works that contain ‘fuck’, therefore ‘motherfucker’ should also be excluded.

Just watch out for the Scunthorpe problem. This sort of relatively naïve filtering is where you’re most likely to run into it.

The real fun starts when we start wanting to search for multiple files that don’t contain a set of keywords. If we only want to search for files in the current directory, we need only change the filename to something more generic, like *.txt. Otherwise, if we want to drill down into subdirectories, we need to use grep’s recursion switch, -r.

grep -rv -e "shit" -e "piss" -e "fuck" -e "cunt" -e "cocksucker" -e "tits" .

I suggest piping output into less if you want to read through it, but you’ll probably nope out fast because raw grep output for multiple files lists the filename with a colon afterward, then prints out the line that contains (or doesn’t) contain the specified keywords. If you want a list of files, you’ll want some additional filtering. Fortunately, since we’re on a POSIX system we can pipe the output through other commands for filtering:

grep -rv -e "shit" -e "piss" -e "fuck" -e "cunt" -e "cocksucker" -e "tits" . \
    | cut -d ':' -f1 \
    | sort -u

Now we’ve got something more manageable, a list of files that don’t contain any words that provoke pearl-clutching at the FCC, the MPA film ratings board, the ESRB, or whoever decides which albums should get a “parental advisory: explicit fucking lyrics” sticker.

Really Simple Filtering?

Admittedly, most people aren’t UNIX hackers bashing out commands into a terminal (emulator). And there are UNIX hackers far more elite than I; there isn’t that much gray in the beard I’d have if my wife didn’t like her criminals smooth. So, let’s talk about RSS feeds.

On GNU/Linux, I just use elfeed in Emacs as my feed reader. If I want to filter articles by keyword, I can write custom filter functions in Emacs Lisp. I don’t recommend this, of course, to people who are not already acolytes of the Church of Emacs.

Let’s consider something a bit more practical for people whose hobbies aren’t as perverse as mine. When I’m on macOS and iOS, I prefer NetNewsWire. It’s free and it works. It makes good methadone for bathroom doomscrolling, too.

However, NetNewsWire doesn’t support keyword filtering. Technically, they could do it since the app uses SQLite to store data, but the developers decided that the trade-offs to performance and battery usage aren’t worth it. Personally, I’d like to decide that for myself, but in this case I’d have to fork NetNewsWire and learn Swift so that I could implement it myself.

There is another way, though. While you can import individual feeds directly into NetNewsWire, the app also works with online RSS platforms like Feedbin, Feedly, BazQux, Inoreader, NewsBlur, The Old Reader, and FreshRSS. For example, in FreshRSS (an instance is provided by 32bit.cafe) you can automatically mark articles as read using search filters. Inoreader provides similar functionality.

The Web itself incites to thoughts of violence...

Now we come to web browsers themselves, which are guilty of a multitude of sins:

They don’t make it easy to specify your own stylesheet, if they still allow that at all.
They don’t let you define your own JavaScript functions for websites.
They might let you block third-party cookies (with some scaremongering), but third-party JavaScript? Fugeddaboutit.
They don’t help people find RSS feeds even if they’re referenced in <head>.
If they render RSS/Atom feeds at all, it’s as raw XML without even the crappy default stylesheets that web pages get if they don’t specify a stylesheet.

Admittedly, Mozilla Firefox isn’t quite as egregious about this sort of user-hostile behavior as other browsers; they’ll let programmers implement most of this functionality in extensions. The one extension I recommend most is uBlock Origin. It’s more than an ad blocker; it’s a fairly powerful engine for muting elements by keyword.

If I’m on Reddit, for example, I don’t want to see any links from Fox News. (We don’t read Wynand — or Murdoch.) All I’d have to do is create a custom rule like old.reddit.com##.link:has-text(foxnews).

I can go even further if I want: I can block display of any page on any site that so much as mentions Substack. All it takes is one custom rule: *##html:has-text(substack). Even websites that are hosted by Substack but have custom domains, like “www.thebignewsletter.com” are subject to this rule. Because it applies to every website and targets the <html> element, it’s the nuclear option.

Of course, such blanket rules are powerful because they depend on regular expressions. You’ve got to be careful with those.

If, for example, you just wanted to filter some of the shit out of Google’s results, you might instead want uBlacklist. I use it to avoid seeing crap from Pinterest, Medium, Substack, ScreenRant, and GameRant in search results.

A word on limitations: You can’t grep a video. Nor do keyword searches work well on audio. You can only hope that the text surrounding multimedia contains something you can use for filtering. Nothing I’ve described above works inside smartphone apps. If you’re using the Bluesky apps for iOS or Android instead of the website, you’re at Bluesky’s mercy.

Nothing is ever easy...

If you were thinking of objecting that nothing I’ve discussed in this post is useful to people who aren’t “power users”, please don’t. I already know. I also know that helping people become more knowledgeable and better able to use tech for their own ends doesn’t scale. Even if I could teach, people must still be willing to learn.

This is the real problem. It isn’t as impossible as Manuel Moreale might think to filter out some of the shit clogging the Web without censorship, but it might as well be for most people. Most people have more pressing concerns than learning to dick around with browser extensions and regular expressions to prevent browsers from showing them material they’d rather not see. Just like most people aren’t going to learn how to use web feeds instead of social media apps in the first place.

And I’m not the guy who can solve this problem. It’s not amenable to technological solutions. Even if it were, doing so might set the stage for further censorship.

Suppose, for a moment, that web browsers supported keyword muting. What’s to stop governments from demanding that certain words be muted by the browser? I’m sure the CCP would love it if Chrome and Firefox muted words like ‘democracy’, “Tianenmen Square”, and “Is Xi Jinping a Pooh bear or a pedobear? Only Christopher Robin knows for sure, and he’s incommunicado.” Hell, what’s to stop Florida from demanding that browsers mute keywords related to LGBT issues? What’s to stop religious organizations from demanding mutes on keywords related to sexuality, reproductive health, bodily autonomy, or freethinking?

It’s one of those situations where one can get so excited by the possibilities that one ignores ethics and consequences. It’s fortunate that our “leaders” aren’t power users or techies. The problem is that their money is as good as anybody else’s, and principles are all too often a privilege when you’re broke, in debt, or just living paycheck to paycheck.