Google Dorking: Manual and Automated Methods for finding Hidden Information

by | Aug 2, 2022 | Articles, Write up

Join our Patreon Channel and Gain access to 70+ Exclusive Walkthrough Videos.

Patreon

Reading Time: 8 Minutes

Introduction

Google is used by everyday Internet users to answer all their questions, find which laptop or smartphone is the best, and furthermore explore their curiosity. After all, it’s just a Web index, right? No, if you are using Google just like any other Internet user, you are just scratching the surface of Google’s enormous search engine capabilities.

For most Internet users, Google is used as a synonym for the Internet, Google is the first site they use, and it’s where they find everything they need. The reality is that Google is just one of many search engines that are indexing the Internet.

It’s estimated that Google processes 5.6 BILLION searches per day while holding 92% of the search engine market share worldwide.

By saying all that, you can easily use a search engine to find hidden information on public websites, vulnerabilities exposed on public servers, and much more publicly available stuff in the wild with just one single search query using a technique called Dorking.

So, let’s explore deeper the Dorking world. All you need is an Internet connection, and a browser.

What is Dorking?

Dorking is a way of using search engines to their full capabilities to find publicly available information that is not necessarily visible at first. A dork is just a search that uses one or more of these advanced techniques called operators to reveal that information.

Many Internet users think that Dorking only works on Google but that’s a mistake. Dorking can be employed across various search engines, not just on Google. Any search engine like Bing, Yahoo, and DuckDuckGo can accept a search term or a string of search terms in order to return matching results. However, even if two search engines support the same operators, they often return different search results due to the difference in the indexing of each search engine.

Search engines can also accept more advanced operators used to refine those search terms. An operator, is a keyword or phrase that has a specific meaning for the search engine. Operators like ‘intext’, ‘intitle’, ‘filetype’, ‘cache’, and ‘site’ are some of the most known.

Each operator should be followed by a colon which is followed by the relevant search term. e.g. inurl: login

These operators allow a search query to be more specific which can locate administrator panels, login pages that are not supposed to be publicly accessible, and many more ‘juicy’ information.

Since Google is the most widely used search engine, and the Google Search index includes hundreds of billions of web pages with over 100,000,000 gigabytes in size, we will be focusing mainly on Google Dorking aka Google Hacking using Google’s search engine.

 

SS1

 

See Also: So you want to be a hacker?
Offensive Security and Ethical Hacking Course

Google Dorking/Hacking

 

History

Google Dorking has been documented since the early 2000s. Johnny Long, aka j0hnnyhax, was a pioneer of Google Dorking, and the first man to post his own definition back in December 2002 on his site ihackstuff.com where he described it as: ‘An inept or Foolish person as revealed by Google’.

SS2

 

Google Dorking

As we already know, Google crawls all websites and indexes everything available on that website including sensitive information.

An ordinary type of search query focuses on a semantic way of asking questions, by either asking directly through writing or by searching with keywords. Google Dorking is based on reverse engineering the way that search engines scan and index/crawls the Internet. Google Dorking uses search functions beyond their semantic role, expanding the capacity of the search engine in the hands of people searching for a way of exploring content and accessing various services, pages, etc.

By using the expanded capabilities of Google’s search engine it can lead to the discovery of information that can be used for fraud, finding information on yourself or your company, as well as information that assists in the investigation of governments or corporations. Dorking can also expose vulnerabilities and expose endpoints where the access is supposed to be restricted but often forgotten.

It can be used by security researchers or hackers to find critical information about a website, individual or software, etc., but it can also be used by normal people (students, content writers, companies) to save time and get better information from their query by filtering the search results.

 

How to use Google Hacking manually with special operators

As mentioned above, operators are keywords that have a particular meaning for the search engine. Each operator is followed by a colon and then the relevant terms of the query. These operators allow search engines to target more specific information, such as certain strings of text body of a webpage or files hosted on a given URL. It’s good to point out that not all advanced search techniques rely on operators. When you surround your search terms with quotation marks, you are telling the search engine to return the results of that exact phrase.

Take a look below, searching with quotation marks on the same phrase produced 99.7% fewer results than the query without the quotation marks.

SS3

SS4

 

Some of the most used operators:

There are many existing dork operators and they vary across search engines. The most used operators can be found in the list below:

  • cache:<keyword> → It is used to find the cached version of a page. Google generates a cached version of the website for accessing the web page even if the site isn’t available. It opens the most recent cached version of a web page – providing that the page is indexed.

e.g.

cache:https://twitter.com

SS5

  • inurl:<keyword> → It finds pages with which include the keyword in the URL.

e.g.

Inurl:login

It fetches generic results which include the keyword “login” in the URL.

SS6

 

  • allinurl:<keyword> → Similar to ‘inurl’, but it fetches results containing all the specified keywords in the URL.

 

allinrul: login portal

SS7

 

 

  • site:<keyword> → It is used to limit the search results for a given domain only.

e.g.

site:blackhatethicalhacking.com

 

SS8

 

  • intitle:<keyword> → It is used to find pages with the keyword in the title. It return results of pages that contain the keyword in their HTML title tag.

e.g.

intitle:login

 

SS9

 

  • allintitle :<keyword> → Similar to ‘intitle’, but It only returns results containing all the specified keywords in the title tag.

e.g.

intitle: login portal

 

SS10

 

  • intext:<keyword> → It is used to find pages containing the keyword somewhere in the content of the page, including meta-information.

e.g.

intext: login portal

 

SS11

 

  • allintext:<keyword> → Similar to ‘intext’, but only return results containing all the specified keywords somewhere in the page.
  • filetype: <keyword> → Restrict results to those of a certain filetype like docx, pdf, xlsx, etc.

e.g.

twitter filetype:pdf

It fetches pdf results that contain the keyword ‘twitter’.

SS12

 

 

Advanced Dorking – Ways to use and combine operators

The magic comes when you combine different operators at once. For more advanced Google Dorking you should combine the existing operators with different keywords to maximize their effectiveness. A note to be made here is that multiple keywords should be surrounded by quotation marks e.g. intitle: ‘multiple keywords’. Using an all caps ‘OR’ between the search keywords prompts the search engine to return with one keyword or the other e.g. passwords filetype:xls OR passwords filetype: csv.

 

Find files under a domain name

Dork:

<keyword> site:<website.com> filetype: pdf,xlsx,docx

Find files which contain the keyword ‘password’ under a domain name.

e.g.

password site:<website.com> filetype: pdf

 

Find all indexed pages for a specific domain

Dork:

site:<website.com>

SS13

 

Find subdomains for a specific domain

Dork:

site:<*.website.com> -www

Notice the wildcard (*) operator used to find all the subdomains belonging to the specified domain, and the combination with the exclusion (-) operator to exclude all the ‘www’ results.

 

Finding non HTTPS web pages

Dork:

site:<website.com> -inurl:https

Again, use the exclusion (-) operator with another operator (inurl) to exclude all the HTTPS results from the specified URL.

SS14

Sometimes pages will be indexed as non-HTTPS but when you click through it actually redirects to the HTTPS version, always click on them to double-check.

 

Find social profiles or searching for a keyword from multiple websites at once

Dork:

<keyword> (site:facebook.com | site:twitter.com | site:linkedin.com)

The pipe (|) operator is the same as OR. It is used here to include keywords that are a match on either of the websites provided. You can tweak the dork to add or replace different social media sites.

Also, this dork can be used to find keywords for any website, not only social media profiles.

SS15

 

Find open webcams

You can also find exposed webcams using the specific string that it’s already exposed to Google (if they’re accidentally left exposed on the Web, without restrictions such as passwords, etc.)

Dork:

intitle:"webcamXP" inurl:8080

This is a type of webcam that if left exposed, you will probably be able to watch the recording live.

SS16

SS17

 

Find plain text passwords on Pastebin

Find exposed passwords in pastebin.com

Dork by Anirudh Kumar Kushwaha:

site:pastebin.com "@gmail.com password"

SS18

 

You can also tweak it a bit to find admin passwords.

Dork by Saumyajeet Das:

site:pastebin.com "admin password"

SS19

 

Millions of combinations could be used to find everything a search engine has already indexed. We’ve just scratched the surface of what is possible. Feel free to explore more advanced dorks on the Google Hacking Database (GHDB).

 

Find vulnerability reports from multiple tools

This dork allows us to find reports of vulnerability tools like Nmap, Nessus, Acunetix, etc.

Dork by SACHIN KATTIMANI:

intitle:"report" ("qualys" | "acunetix" | "nessus" | "netsparker" | "nmap") filetype:html

You can also tweak it to provide pdf files instead of HTML, but the HTML version gives more reports of top companies than the pdf version.

SS20

 

 

Google Hacking Database (GHDB)

SS21

 

The Google Hacking Database (GHDB) is a website that is an index of search queries (dorks) that security researchers found to reveal sensitive data exposed by vulnerable servers and web applications. It currently holds approximately 7500 Google dorks and it is updated almost daily with new dorks. It was launched by Johnny Long in the early 2000s to serve penetration testers.

It is composed of simple and advanced dorks and is also sorted into different categories such as Files Containing Juicy Info, Pages Containing Login Portals, Vulnerable Files, etc.

It is easy to use as you can filter the results to specific categories, and authors or even use the search function to find exactly what you are looking for.

 

Using Google Dorking on yourself

Google dorking can also be used to protect your data. A couple of years ago, I was practicing with Google Dorks and I managed to find my personal details such as a home address, gmail address, phone number, etc., along with hundreds of other people’s data by just using a simple dork: intext:<my_telephone_number>.

It was located inside a .txt file for a supplement company, that .txt file contained every order that the website received and processed along with all the details from the people that made the orders. We managed to contact them and informed them that the file shouldn’t be publicly accessible and the matter was fixed in time. You cannot know what information about you is publicly available on the Web until you search for it.

Some other google dorks that you can use to find information about yourself or your website are:

  • <your_name> filetype:pdf
  • <your_name> intext:<phone_number> |<email> |<address>
  • site:<your_website> filetype:”doc | xls | txt | pdf”
  • ip:<your_servers_IP> filetype:”doc | xls | txt | pdf”

 

While they may help threat actors find vulnerable websites and webservers, they might also help website admins to protect their own websites if they have an idea of what’s actually publicly available.

 

Is Google Dorking Illegal?

While it may seem intimidating, there is nothing illegal about Google dorking, given you are only using it to refine your search results and not trying to download or access vulnerable websites or servers. It’s just a passive search with advanced operators, you are essentially using a search engine to its maximum capabilities.

It’s also worth noting that Google tracks your searches, so if you access sensitive data or search for malicious content, it will probably flag you and block you from further searching. (It can be easily bypassed by using a VPN but if you are not, you risk your home’s IP or company’s IP  getting blocked.)

 

 

Automating Google Dorking with Pagodo – Demo

Using Google Dorking manually can become difficult if you want to do more than one query, Pagodo automates this process for us. It’s a tool developed in Python which consists of various Dorking lists. With just one single command, you can query hundreds of Dorking queries on the google search engine, and also save the results for later use. Its repo can also be updated with the latest dorks found in the GHDB.

Dorking lists categories included:

1: “Footholds”

2: “File Containing Usernames”

3: “Sensitives Directories”

4: “Web Server Detection”

5: “Vulnerable Files”

6: “Vulnerable Servers”

7: “Error Messages”

8: “File Containing Juicy Info”

9: “File Containing Passwords”

10: “Sensitive Online Shopping Info”

11: “Network or Vulnerability Data”

12: “Pages Containing Login Portals”

13: “Various Online devices”

14: “Advisories and Vulnerabilities”

 

Installation

Since Pagodo is a Python tool, you will need Python to be installed on your machine.

Select the directory of your choice and use git clone to clone it into the new directory ‘pagodo’

git clone https://github.com/opsdisk/pagodo.git

cd pagodo

ls

SS22

And then install the requirements from the requirements.txt.

pip install -r requirements.txt

 

Running the tool

After all the requirements are installed, it’s important to run the ghdb_scraper.py before running the tool.

By running the ghdb_scraper.py, it checks the online repositories for Google dorks and then downloads the freshest dorks from the GHDB and updates them in the ‘dorks’ directory.

python3 ghdb_scraper.py -i

SS23

Run the help (-h) parameter to check the help section.

python3 pagodo.py -h

SS24

The help section is actually pretty straightforward, so let’s run the tool with a simple command.

python3 pagodo.py -d blackhatethicalhacking.com -g dorks/sensitive_directories.dorks

There are some arguments that we need to take into account to avoid being blocked by Google or selecting to return for example the first 20 results instead of 100 which is the default. (First 20 results found in Google for the particular dork).

The -l argument is used for the number of results returned, we will use the first 20 for this example.

The -e argument is used to indicate the minimum delay in seconds between the searches for avoiding being blocked by Google, we will set it at 37 seconds because it’s recommended by the people who wrote Pagodo.

The -j argument, aka the ‘Jitter factor’, is basically used to add some randomness to the lookup times. If you set it for example at 1.5 it will increase or decrease the 37 seconds by multiplying it 1.5 times for every search performed.

The -s argument will save the results found. If you do not specify the name, it will generate a datetimestamped one. You can save it as a text file using the -s argument, e.g. -s /path/results.txt
The -o argument will save the output to a JSON file. If you do not specify the name again, it will generate a datetimestamped one. You can save it as a JSON file using the -o argument, e.g. -o /path/results.json

python3 pagodo.py -d blackhatethicalhacking.com -g dorks/sensitive_directories.dorks -s -e 37 -l 20 -j 1.5

SS25

 

As you can see it would take a long time until the Google Dorking search comes to end. You will have to wait until it finishes and then check the results for sensitive directories that could be found.

If Google is still blocking you after entering the arguments for delaying the time between searches, you could use proxychains4 to avoid being blocked.

 

Install proxychains4

apt install proxychains4 -y

Then throw the proxychains4 in front of the pagodo.py script and each search lookup will go through different proxy (different IP).

proxychains4 python3 pagodo.py -d blackhatethicalhacking.com -g dorks/sensitive_directories.dorks -s -e 37 -l 20 -j 1.5

 

Google Hacking – A powerful hacking tool, always available

Google is always available, you only need a browser and an Internet connection to use its power. Combining your knowledge of Dorking operators with your imagination could lead to extraordinary results, both on a personal and professional level.

Einstein said “Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world.”, that perfectly applies to Google Dorking. Your knowledge could be limited and not the most advanced on Dorking operators, but you can still manage to imagine and put into practice something valuable from your point of view and something that is not ever thought of.

Google hacking is a powerful tool that can help a security researcher to find better techniques/ways to make a website more secure, thus improving his work, and a student that can leverage the power of Dorking to find better resources for his studies or study more efficiently.

Google Dorking is practically an immense advantage for everyone that uses the Internet regularly.

 

We hope that this write up has taught you something new. If you enjoyed it, the best way that you can support us is to share it! If you’d like to hear more about us, you can find us on LinkedInTwitterYouTube.

 

Are you a security researcher? Or a company that writes articles about Cyber Security, Offensive Security (related to Information Security in general) that match with our specific audience and is worth sharing? If you want to express your idea in an article contact us here for a quote: [email protected]

Merch

Recent Articles

Offensive Security & Ethical Hacking Course

Begin the learning curve of hacking now!


Information Security Solutions

Find out how Pentesting Services can help you.


Join our Community

Share This