How is Google finding pages which don't have any links to them?

We've got an interesting question from Danny and Bucharest, who wants to know how can Googlebot crawl and index pages that don't have any links to them on my website?

I find each day two or three pages in the index that don't have any links links to them on my site.

The pages are generated by the search field of my website.

Okay, so you almost threw me for a loop, Danny.

I read the entire sentence, and I was ready to give one answer, and then the last sentence changed my answer completely.

So let me answer it both ways, starting with the beginning part of your question.

How can Google index stuff even when there aren't links pointing to my particular page?

Well, people can always submit a URL or something like that, but a lot of people don't realize how many links there are just sort of floating around on the web.

So it could be that you don't realize that someone is linking to a page on your site, even though it is.

So we can follow a page from a very obscure, esoteric specific page, follow that link and find a deep page on your own site.

And just because we only return a subsample of all the links we know about when you do link on a particular URL, we might know about a link, but you might not know about a link.

So that's how I started to answer your question.

And then you said the pages are generated by the search field of my website, and that completely changes the nature of the question.

So in April of 2008, Jayant Madovan and Alan Halevi did a blog post where they talked about crawling through HTML forms.

They later on got it published as a paper.

And so the basic idea is in some cases, whenever we see a search form, Google can try to sort of fill out that form as long as the form is simple enough.

So suppose, for example, you have your website, your main root page, and you can't get to any other part of your site except for a drop down page.

Googlebot can enumerate the values in that drop down.

Maybe it's the 50 States in the United States, and we can try to submit.

Okay, well, what if we set the state to Kentucky, or what if we set the state to California?

And then if that opens up new pages for us to discover and crawl, that can let us crawl through a search form.

Now, in general, we don't crawl through a ton of search forms because they can be very complex.

Sometimes they want credit card numbers, and Googlebot is very broke.

It doesn't have a credit card number, but in some situations where there might be only one or two input elements, we do have the ability to try to find out whether we can search through that form to find new content.

Now, if that's something that you're not interested in maybe you don't want any of those pages.

Crawled you can always use robots text to do a disallow on the search or search form or whatever the area that you're going to go to.

Whenever you submit the search form is, we try to be very polite. You can read more about it.

If you search for Googlebot Crawl through HTML forms or something like that, you can read the sort of forms that we will and won't crawl through, but it's all part of the process where we try to discover as much of the web as possible.

Crawl it as comprehensively as we can so that we can return it to you in under half a second, you.