howdoi was gpt


When I first used howdoi cli util like 10 years ago I thought it was magic. You ask it a question and it spits out an answer and example of the implementation. Still kinda do even though I know it's just searching for my query and returning the results in plaintext. But actually I wasn't sure so I had a look through the source, which I'll highlight here.

how it works

But first here's how it's used:

Part of the reason I like it is that it doesn't open up the shiny, gorg, button filled, enticingly tabbed browser so you can keep it moving.

Very cool right. If you pass the -a flag it will display the full text of the answer vs. just what it has determined (see below for how it decides that) with the link to where it fetched the answer:

By default it returns 1 response but passing -n Number will give you that many results. This was passing -n 3 to the same question:

If you're thinking, Tyler, the results leave a lot of aesthetic desired, like any modern util you can colorize with -c:

You can also make your own lil stash of stuff, viewing entries that you've saved etc. Prettyyyyyy sick amirite.

internal highlights

Btw, a cameo for minimalist terminal tools, behold the file explorer ranger. To poke through the code rather than back-button directories on github I just cloned the repo, cd into it and typed ranger which presents you with:

Howdoi is written in python. Can be installed with brew. Ditto ranger btw.

It starts by checking for env variables:

if os.getenv('HOWDOI_DISABLE_SSL'):  # Set http instead of https
    SCHEME = 'http://'
    VERIFY_SSL_CERTIFICATE = False
else:
    SCHEME = 'https://'
    VERIFY_SSL_CERTIFICATE = True

One of the env vars is what url(s) are to be used for queries where the default is stackoverflow:

SUPPORTED_SEARCH_ENGINES = ('google', 'bing', 'duckduckgo')

URL = os.getenv('HOWDOI_URL') or 'stackoverflow.com'

# later

SEARCH_URLS = {
    'bing': SCHEME + 'www.bing.com/search?q=site:{0}%20{1}&hl=en',
    'google': SCHEME + 'www.google.com/search?q=site:{0}%20{1}&hl=en',
    'duckduckgo': SCHEME + 'duckduckgo.com/html?q=site:{0}%20{1}&t=hj&ia=web'
}

Howdoi will occasionally be blocked and here's how it determines that:

BLOCK_INDICATORS = (
    'form id="captcha-form"',
    'This page appears when Google automatically detects requests coming from your computer '
    'network which appear to be in violation of the <a href="//www.google.com/policies/terms/">Terms of Service'
)

def _is_blocked(page):
    for indicator in BLOCK_INDICATORS:
        if page.find(indicator) != -1:
            return True

    return False

This is used in one of the meat & tatas fns which also uses the SEARCH_URLS enum (? not sure what that's called in python heh) above:

def _get_links(query):
    search_engine = os.getenv('HOWDOI_SEARCH_ENGINE', 'google')
    search_url = _get_search_url(search_engine).format(URL, url_quote(query))

    try:
        result = _get_result(search_url) # hey I'm _get_result
    except requests.HTTPError:
        logging.info('Received HTTPError')
        result = None
    if not result or _is_blocked(result):
        raise BlockError('Temporary block by search engine')

the result here is supplied by the fn where all the network stuff happens:

def _get_result(url):
    try:
        resp = howdoi_session.get(url, headers={'User-Agent': _random_choice(USER_AGENTS)},
                                  proxies=get_proxies(),
                                  verify=VERIFY_SSL_CERTIFICATE,
                                  cookies={'CONSENT': 'YES+US.en+20170717-00-0'})
        # the howdoi_session is just requests.session() defined earlier and terminated afterwards
        resp.raise_for_status()
        return resp.text

Back to get links, a modified version of get_links that checks for a cache is used in the tatas to the meat & tatas, we have def _get_answers(args): which I'll break down:

initial_pos = args['pos'] - 1
final_pos = initial_pos + args['num_answers']
question_links = question_links[initial_pos:final_pos]
search_engine = os.getenv('HOWDOI_SEARCH_ENGINE', 'google')

First it checks what the value of the -n flag is to determine how many responses to display.

with Pool() as pool:
    answers = pool.starmap(
        _get_answer_worker,
        [(args, link) for link in question_links]
    )

Here it uses pool.starmap with a worker to process answers. starmap is an alternative to builtin map which ofc allows you to apply a fn to an iterable.

A problem with this function is that it converts the provided iterable of items into a list and submits all items as tasks to the process pool then blocks until all tasks are complete.

The worker calls the… corn on the cob of fns, get_answer:

html = pq(page)

first_answer = html('.answercell').eq(0) or html('.answer').eq(0)

instructions = first_answer.find('pre') or first_answer.find('code')
args['tags'] = [t.text for t in html('.post-tag')]

This uses pq which is jQuery for python to find the class of a div that wraps answers, then finds relevant html <pre> and <code> elements.

If someone has passed the -a flag to get the full text, it will return the whole kit and kaboodle, otherwise just the pre/code element content:

elif args['all']:
    texts = []
    for html_tag in first_answer.items(f'{answer_body_cls} > *'):
        current_text = get_text(html_tag)
        if current_text:
            if html_tag[0].tag in ['pre', 'code']:
                texts.append(_format_output(args, current_text))
            else:
                texts.append(current_text)
    text = '\n'.join(texts)
else:
    text = _format_output(args, get_text(instructions.eq(0)))

Back where we used the cutely named starmap, the results of get_answer, or rather the worker, are going to be used in howdoi herself.

answers = [a for a in answers if a.get('answer')]
for i, answer in enumerate(answers, 1):
    answer['position'] = i

return answers or False

Finally, in the command line runner, it writes these to stdout after an os check.

howdoi_result = howdoi(args) # invocation of `howdoi`

if os.name == 'nt': # windows
    print(howdoi_result)
else:
    utf8_result = howdoi_result.encode('utf-8', 'ignore')
    sys.stdout.buffer.write(utf8_result)

Pretty cool! Definitely not magic. Although it was to me and would be to a small Victorian child.

lol btw

It has come to my attention while looking at this that there is a flag -x that explains how it got the answer ha ha ha. It increments a logging level var whose output I stripped when summarizing above, but it looks like this:

I also discovered that there is a VSCode extension because of course. You write a comment // howdoi your query, and then select howdoi in the command palette—it'll paste the text answer in your editor.

What a meta screenshot amirite ladies.

:~)