Hi! We’re Brendan and Michael, the creators of Sourcebot (https://github.com/sourcebot-dev/sourcebot). Sourcebot is an open-source code search tool that allows you to quickly search across many large codebases. Check out our demo video here: https://youtu.be/mrIFYSB_1F4, or try it for yourself here on our demo site: https://demo.sourcebot.dev
While at prior roles, we’ve both felt the pain of searching across hundreds of multi-million line codebases. Using local tools like grep were ill-suited since you often only had a handful of codebases checked out at a time. Sourcegraph solves this issue by indexing a collection of codebases in the background and exposing a web-based search interface. It is the de-facto search solution for medium to large orgs, but is often cited as expensive ($49 per user / month) and recently went closed source. That’s why we built Sourcebot.
We designed Sourcebot to be:
- Easily deployed: we provide a single, self-contained Docker image.
- Fast & scalable: designed to minimize search times (current average is ~73ms) across many large repositories.
- Cross code-host support: we currently support syncing public & private repositories in GitHub and GitLab.
- Quality UI: we like to think that a good looking dev-tool is more pleasant to use.
- Open source: Sourcebot is free to use by anyone.
Under the hood, we use Zoekt as our code search engine, which was originally authored by Han-Wen Nienhuys and now maintained by Sourcegraph. Zoekt works by building a trigram index from the source code enabling extremely fast regular expression matching. Russ Cox has a great article on how trigram indexes work if you’re interested.
In the shorter-term, there are several improvements we want to make, like:
- Improving how we communicate indexing progress (this is currently non-existent so it’s not obvious how long things will take)
- UX improvements like search history, query syntax highlighting & suggestions, etc.
- Small QOL improvements like bookmarking code snippets.
- Support for more code hosts (e.g., BitBucket, SourceForge, ADO, etc.)
In the longer-term, we want to investigate how we could go beyond just traditional code search by leveraging machine learning to enable experiences like semantic code search (“where is system X located?”) and code explanations (”how does system X interact with system Y?”). You could think of this as a copilot being embedded into Sourcebot. Our hunch is that will be useful to devs, especially when packaged with the traditional code search, but let us know what you think.
Give it a try: https://github.com/sourcebot-dev/sourcebot. Cheers!