Building a command line tool to help with Google searches
By John Lekberg on August 15, 2020.
A lot of my Google searches are specific and require lots of typing, e.g.
site:cppreference.com OR site:cplusplus.com realloc
This searches for "realloc" on cppreference.com and cplusplus.com. To reduce the amount of typing, I built a command line tool that allows me to use macros that expand into larger phrases. E.g. I can write
{cpp} realloc
instead of writing
site:cppreference.com OR site:cplusplus.com realloc
In this week's post, I'll show you how I built this tool. You will learn:
- How to use the pathlib and re modules to parse macros from a text file.
- How to construct a Google search URL using urllib.parse.
Script source code
search-google
#!/usr/bin/env python3
import pathlib
import re
import sys
import urllib.parse
from contextlib import suppress
macro_re = re.compile(
r"(?m)^(?P<name>\w+):(?P<replace>.*)$"
)
macro_path = pathlib.Path.home() / "search-google-macro.txt"
# This limit prevents recursive macros from causing the
# program to enter an infinite loop.
MAX_MACRO_SUBSTITIONS = 32
def MAIN():
import argparse
parser = argparse.ArgumentParser(
prefix_chars=":",
description="""
Search Google for a list of terms. Macro
substitutions look like '{macro}', e.g. '{gov}
differential diagnosis'. If no terms are given,
print the path to the macro file.
""",
)
parser.add_argument(
"term", nargs="*", help="Search for this term."
)
args = parser.parse_args()
if len(args.term) == 0:
print(macro_path)
else:
macro = load_macros(macro_path)
query = " ".join(args.term)
with suppress(KeyError):
for _ in range(MAX_MACRO_SUBSTITIONS):
query = query.format_map(macro)
url = "https://www.google.com/search?q="
url += urllib.parse.quote_plus(query)
print(url)
def load_macros(path):
"""Load macros from a file.
If the file doesn't exist, then return an empty
dictionary.
path -- pathlib.Path. The macro file.
"""
if not path.is_file():
return {}
else:
return {
match["name"].strip(): match["replace"].strip()
for match in macro_re.finditer(path.read_text())
}
if __name__ == "__main__":
MAIN()
$ search-google :h
usage: search-google [:h] [term [term ...]]
Search Google for a list of terms. Macro substitutions look
like '{macro}', e.g. '{gov} differential diagnosis'. If no
terms are given, print the path to the macro file.
positional arguments:
term Search for this term.
optional arguments:
:h, ::help show this help message and exit
Using the script to search Google
Here's what my macro file looks like:
~/search-google-macro.txt
c: {cpp}
cpp: (site:cppreference.com OR site:cplusplus.com)
edu: site:edu
goe: ({gov} OR {edu})
gov: site:gov
hn: site:news.ycombinator.com
js: site:developer.mozilla.org
movie: (site:rottentomatoes.com OR site:imdb.com)
py: site:python.org
Here are some example searches:
-
Search for the C++ function realloc:
$ search-google {cpp} realloc
https://www.google.com/search?q=%28site%3Acppreference. com+OR+site%3Acplusplus.com%29+realloc
-
Search for information on orthostatic hypotension:
$ search-google {goe} orthostatic hypotension
https://www.google.com/search?q=%28site%3Agov+OR+site%3A edu%29+orthostatic+hypotension
-
Search for discussion about Verilog or VHDL on Hacker News:
$ search-google {hn} verilog OR vhdl
https://www.google.com/search?q=site%3Anews.ycombinator. com+verilog+OR+vhdl
-
Search for reviews for the movie Alien (1979):
$ search-google {movie} alien 1979
https://www.google.com/search?q=%28site%3Arottentomatoes. com+OR+site%3Aimdb.com%29+alien+1979
I could copy-and-paste these URLs into my browser. But, since I use macOS, I create a shell function that uses the open command:
$ gg() { open "$(./search-google "$@")" ; } $ gg {movie} alien 1979
(Opens URL in my browser.)
How the script works
I place the macro file in my home directory using pathlib.Path.home.
I read from the macro file using pathlib.Path.read_text and re.finditer. I use a regular expression to extract the macros from the macro file, and I use str.strip to remove excess whitespace from the macros.
I use str.format_map to perform the macro substitutions.
Instead of performing substitutions until I hit a fixed point, I perform a
finite number of substitutions.
(This is the constant MAX_MACRO_SUBSTITIONS
.)
I do this because it is an easy way to prevent unbounded recursion, and, in
practice, my macros are not complex enough to cause this strategy to fail.
A macro substitution will fail if I reference a macro that doesn't exist, e.g.
{xyxhkj}
.
I handle this by using the context manager contextlib.suppress.
This allows me to catch the relevant exception and stop performing
substitutions, without terminating the whole program.
I build the URL by combining a constant prefix,
https://www.google.com/search?q=
, with the query string.
I use urllib.parse.quote_plus to properly escape the query string. E.g.
import urllib.parse query = "site:news.ycombinator.com haskell" print(query)
site:news.ycombinator.com haskell
print(urllib.parse.quote_plus(query))
site%3Anews.ycombinator.com+haskell
In conclusion...
In this week's post, you learned how to build a command line tool that helps you search Google. You learned how to use the pathlib and re modules to parse the macros. And you used the urllib.parse module to properly construct the URL.
My challenge to you:
Read about the different Google search operators. ("Refine web searches" via google.com.)
Create a few macros that would be useful to you. Some ideas:
- Create a macro that restricts searches to ruby-lang.org.
- Create a macro that searches for upcoming 5K races in a specified city.
- Create a macro that searches a person's name, but excludes results from LinkedIn and Twitter.
If you enjoyed this week's post, share it with your friends and stay tuned for next week's post. See you then!
(If you spot any errors or typos on this post, contact me via my contact page.)