Return to Blog

Building a command line tool to help with Google searches

By John Lekberg on August 15, 2020.


A lot of my Google searches are specific and require lots of typing, e.g.

site:cppreference.com OR site:cplusplus.com realloc

This searches for "realloc" on cppreference.com and cplusplus.com. To reduce the amount of typing, I built a command line tool that allows me to use macros that expand into larger phrases. E.g. I can write

{cpp} realloc

instead of writing

site:cppreference.com OR site:cplusplus.com realloc

In this week's post, I'll show you how I built this tool. You will learn:

Script source code

search-google

#!/usr/bin/env python3

import pathlib
import re
import sys
import urllib.parse
from contextlib import suppress

macro_re = re.compile(
    r"(?m)^(?P<name>\w+):(?P<replace>.*)$"
)
macro_path = pathlib.Path.home() / "search-google-macro.txt"

# This limit prevents recursive macros from causing the
# program to enter an infinite loop.
MAX_MACRO_SUBSTITIONS = 32


def MAIN():
    import argparse

    parser = argparse.ArgumentParser(
        prefix_chars=":",
        description="""
        Search Google for a list of terms. Macro
        substitutions look like '{macro}', e.g. '{gov}
        differential diagnosis'. If no terms are given,
        print the path to the macro file.
        """,
    )
    parser.add_argument(
        "term", nargs="*", help="Search for this term."
    )
    args = parser.parse_args()

    if len(args.term) == 0:
        print(macro_path)
    else:
        macro = load_macros(macro_path)

        query = " ".join(args.term)
        with suppress(KeyError):
            for _ in range(MAX_MACRO_SUBSTITIONS):
                query = query.format_map(macro)

        url = "https://www.google.com/search?q="
        url += urllib.parse.quote_plus(query)

        print(url)


def load_macros(path):
    """Load macros from a file.

    If the file doesn't exist, then return an empty
    dictionary.

    path -- pathlib.Path. The macro file.
    """
    if not path.is_file():
        return {}
    else:
        return {
            match["name"].strip(): match["replace"].strip()
            for match in macro_re.finditer(path.read_text())
        }


if __name__ == "__main__":
    MAIN()
$ search-google :h
usage: search-google [:h] [term [term ...]]

Search Google for a list of terms. Macro substitutions look
like '{macro}', e.g. '{gov} differential diagnosis'. If no
terms are given, print the path to the macro file.

positional arguments:
  term        Search for this term.

optional arguments:
  :h, ::help  show this help message and exit

Using the script to search Google

Here's what my macro file looks like:

~/search-google-macro.txt

c: {cpp}
cpp: (site:cppreference.com OR site:cplusplus.com)
edu: site:edu
goe: ({gov} OR {edu})
gov: site:gov
hn: site:news.ycombinator.com
js: site:developer.mozilla.org
movie: (site:rottentomatoes.com OR site:imdb.com)
py: site:python.org

Here are some example searches:

I could copy-and-paste these URLs into my browser. But, since I use macOS, I create a shell function that uses the open command:

$ gg() { open "$(./search-google "$@")" ; }
$ gg {movie} alien 1979
(Opens URL in my browser.)

How the script works

I place the macro file in my home directory using pathlib.Path.home.

I read from the macro file using pathlib.Path.read_text and re.finditer. I use a regular expression to extract the macros from the macro file, and I use str.strip to remove excess whitespace from the macros.

I use str.format_map to perform the macro substitutions.

Instead of performing substitutions until I hit a fixed point, I perform a finite number of substitutions. (This is the constant MAX_MACRO_SUBSTITIONS.) I do this because it is an easy way to prevent unbounded recursion, and, in practice, my macros are not complex enough to cause this strategy to fail.

A macro substitution will fail if I reference a macro that doesn't exist, e.g. {xyxhkj}. I handle this by using the context manager contextlib.suppress. This allows me to catch the relevant exception and stop performing substitutions, without terminating the whole program.

I build the URL by combining a constant prefix, https://www.google.com/search?q=, with the query string. I use urllib.parse.quote_plus to properly escape the query string. E.g.

import urllib.parse
query = "site:news.ycombinator.com haskell"
print(query)
site:news.ycombinator.com haskell
print(urllib.parse.quote_plus(query))
site%3Anews.ycombinator.com+haskell

In conclusion...

In this week's post, you learned how to build a command line tool that helps you search Google. You learned how to use the pathlib and re modules to parse the macros. And you used the urllib.parse module to properly construct the URL.

My challenge to you:

Read about the different Google search operators. ("Refine web searches" via google.com.)

Create a few macros that would be useful to you. Some ideas:

If you enjoyed this week's post, share it with your friends and stay tuned for next week's post. See you then!


(If you spot any errors or typos on this post, contact me via my contact page.)