On 27th May, the leak of 2,500+ internal Google documents caused ripples across the search community. The leak contained extensive internal documentation appearing to come from Google’s Content Warehouse API. The documentation revealed detailed information about 14,000+ ranking signals used or potentially used to rank search results.
This marks another public setback for Google. Last year, the tech giant was forced to disclose some of the inner workings of its search product during a landmark antitrust trial brought by the US Justice Department.
Unlike the unprecedented Yandex code leak in January 2023, this latest leak is believed to have been accidental, with the documentation spreading to public indices where it was discovered, assessed, and shared by members of the search community. On 29th May, Google confirmed the data’s authenticity.
What is in the leak?
The leak revealed various factors that could potentially influence how the algorithm ranks pages.
It seemed to confirm the existence of navBoost, a ranking factor that provides signals to the algorithm based on user clicks. This works in conjunction with a wealth of data Google collects from Chrome users. Long denied to be a ranking factor, the documentation indicates that Google collects and assesses clicks and post-click behaviour in its ranking algorithms.
The leak also appeared to confirm that Google uses a metric called siteAuthority, suggesting that having an authoritative website positively affects site ranking. This is something the company had previously denied existed. It also revealed Google uses a feature called smallPersonalSite, though it is unclear whether it uses this to promote or demote any such sites.
The leak also shows the use of golden documents, which appears to be a flag used for adding “additional weight to human-labeled” documents in contrast to “automatically labelled annotations”. This could mean that a Google employee could manually flag a specific URL to boost it in the results page.
The documentation indicates the existence of whitelists for certain topics, such as isElectionAuthority and isCovidLocalAuthority, which would explain the high level of curation around specific queries.
What was Google’s response?
Google issued the following statement in response to the leak: “We would caution against making inaccurate assumptions about search based on out-of-context, outdated, or incomplete information. We’ve shared extensive information about how search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation.”
What the documentation does not reveal is which of the 14,000+ ranking signals leaked are in production, or the weighting that ranking signals and factors have in its algorithms. Google declined to comment on a signal-by-signal basis.
What does this mean for Google Search?
In truth, nothing really changes, but Google will have to work hard to earn the search community’s trust again. It did not deny the accuracy or validity of the leaked data, just that it lacked context.
Google also said that to improve its services, its ranking systems change over time, and that it will communicate any information it can to the community.
One thing is clear though: everything Google releases in the foreseeable future about Google Search will be heavily scrutinised through the lens of this leak.
Privacy Policy.
Revoke consent.
© Digitalis Media Ltd. Privacy Policy.
Digitalis
We firmly believe that the internet should be available and accessible to anyone, and are committed to providing a website that is accessible to the widest possible audience, regardless of circumstance and ability.
To fulfill this, we aim to adhere as strictly as possible to the World Wide Web Consortium’s (W3C) Web Content Accessibility Guidelines 2.1 (WCAG 2.1) at the AA level. These guidelines explain how to make web content accessible to people with a wide array of disabilities. Complying with those guidelines helps us ensure that the website is accessible to all people: blind people, people with motor impairments, visual impairment, cognitive disabilities, and more.
This website utilizes various technologies that are meant to make it as accessible as possible at all times. We utilize an accessibility interface that allows persons with specific disabilities to adjust the website’s UI (user interface) and design it to their personal needs.
Additionally, the website utilizes an AI-based application that runs in the background and optimizes its accessibility level constantly. This application remediates the website’s HTML, adapts Its functionality and behavior for screen-readers used by the blind users, and for keyboard functions used by individuals with motor impairments.
If you’ve found a malfunction or have ideas for improvement, we’ll be happy to hear from you. You can reach out to the website’s operators by using the following email webrequests@digitalis.com
Our website implements the ARIA attributes (Accessible Rich Internet Applications) technique, alongside various different behavioral changes, to ensure blind users visiting with screen-readers are able to read, comprehend, and enjoy the website’s functions. As soon as a user with a screen-reader enters your site, they immediately receive a prompt to enter the Screen-Reader Profile so they can browse and operate your site effectively. Here’s how our website covers some of the most important screen-reader requirements, alongside console screenshots of code examples:
Screen-reader optimization: we run a background process that learns the website’s components from top to bottom, to ensure ongoing compliance even when updating the website. In this process, we provide screen-readers with meaningful data using the ARIA set of attributes. For example, we provide accurate form labels; descriptions for actionable icons (social media icons, search icons, cart icons, etc.); validation guidance for form inputs; element roles such as buttons, menus, modal dialogues (popups), and others. Additionally, the background process scans all of the website’s images and provides an accurate and meaningful image-object-recognition-based description as an ALT (alternate text) tag for images that are not described. It will also extract texts that are embedded within the image, using an OCR (optical character recognition) technology. To turn on screen-reader adjustments at any time, users need only to press the Alt+1 keyboard combination. Screen-reader users also get automatic announcements to turn the Screen-reader mode on as soon as they enter the website.
These adjustments are compatible with all popular screen readers, including JAWS and NVDA.
Keyboard navigation optimization: The background process also adjusts the website’s HTML, and adds various behaviors using JavaScript code to make the website operable by the keyboard. This includes the ability to navigate the website using the Tab and Shift+Tab keys, operate dropdowns with the arrow keys, close them with Esc, trigger buttons and links using the Enter key, navigate between radio and checkbox elements using the arrow keys, and fill them in with the Spacebar or Enter key.Additionally, keyboard users will find quick-navigation and content-skip menus, available at any time by clicking Alt+1, or as the first elements of the site while navigating with the keyboard. The background process also handles triggered popups by moving the keyboard focus towards them as soon as they appear, and not allow the focus drift outside of it.
Users can also use shortcuts such as “M” (menus), “H” (headings), “F” (forms), “B” (buttons), and “G” (graphics) to jump to specific elements.
We aim to support the widest array of browsers and assistive technologies as possible, so our users can choose the best fitting tools for them, with as few limitations as possible. Therefore, we have worked very hard to be able to support all major systems that comprise over 95% of the user market share including Google Chrome, Mozilla Firefox, Apple Safari, Opera and Microsoft Edge, JAWS and NVDA (screen readers), both for Windows and for MAC users.
Despite our very best efforts to allow anybody to adjust the website to their needs, there may still be pages or sections that are not fully accessible, are in the process of becoming accessible, or are lacking an adequate technological solution to make them accessible. Still, we are continually improving our accessibility, adding, updating and improving its options and features, and developing and adopting new technologies. All this is meant to reach the optimal level of accessibility, following technological advancements. For any assistance, please reach out to webrequests@digitalis.com