fbpx

New open source robots.txt projects  |  Google Search Central Blog



Monday, September 21, 2020

Last year we released the
robots.txt parser and matcher that we use in
our production systems to the open source world. Since then, we’ve seen people build new tools
with it,
contribute to the
open source library (effectively improving our production systems- thanks!), and release new
language versions like golang and
rust, which make it easier for
developers to build new tools.

With the intern season ending here at Google, we wanted to highlight two new releases related to
robots.txt that were made possible by two interns working on the Search Open Sourcing team,
Andreea Dutulescu and
Ian Dolzhanskii.

Robots.txt Specification Test

First, we are releasing a
testing framework for robots.txt
parser developers, created by Andreea. The project provides a testing tool that can validate
whether a robots.txt parser follows the Robots Exclusion Protocol, or to what extent. Currently
there is no official and thorough way to assess the correctness of a parser, so Andreea built a
tool that can be used to create robots.txt parsers that are following the protocol.

Java robots.txt parser and matcher

Second, we are releasing an official
Java port of the C++ robots.txt parser,
created by Ian. Java is the
3rd most popular programming language
on GitHub and it’s extensively used at Google as well, so no wonder it’s been the most requested
language port. The parser is a 1-to-1 translation of the C++ parser in terms of functions and
behavior, and it’s been thoroughly tested for parity against a large corpora of robots.txt
rules. Teams are already planning to use the Java robots.txt parser in Google production
systems, and we hope that you’ll find it useful, too.

As usual, we welcome your contributions to these projects. If you built something with the
C++ robots.txt parser or with these new
releases, let us know so we can potentially help you spread the word! If you found a bug, help
us fix it by opening an issue on GitHub or directly contributing with a pull request. If you
have questions or comments about these projects, catch us on
Twitter!

It was our genuine pleasure to host Andreea and Ian, and we’re sad that their internship is
ending. Their contributions help make the Internet a better place and we hope that we can
welcome them back to Google in the future.



[ad_2]

Source link

Digital Strategy Consultants (DSC) © 2019 - 2024 All Rights Reserved|About Us|Privacy Policy

Refund Policy|Terms & Condition|Blog|Sitemap