Submitting your URL to Google

Visited 432 times | Submited on 2007-04-18 13:14:55

Google is primarily a fully-automatic search engine with no human-intervention involved in the search process. It utilizes robots known as 'spiders' to crawl the web on a regular basis for new updates and new websites to be included in the Google Index. This robot software follows hyperlinks from site to site. Google does not require that you should submit your URL to its database for inclusion in the index, as it is anyway done automatically by the 'spiders'. However, manual submission of URL can be done by going to the Google website and clicking the related link. One important thing here is that Google does not accept payment of any sort for site submission or improving page rank of your website. Also, submitting your site through the Google website does not guarantee listing in the index.

Cloaking: Sometimes, a webmaster might program the server in such a way that it returns different content to Google than it returns to regular users, which is often done to misrepresent search engine rankings. This process is referred to as cloaking as it conceals the actual website and returns distorted webpages to search engines crawling the site. This can mislead users about what they'll find when they click on a search result. Google highly disapproves of any such practice and might place a ban on the website which is found guilty of cloaking.

Google Services: The following are some of the popular and innovative services offered by Google and there are always improvisations in it.

Google Answers: Google Answers is an interesting cross between 'online marketplace' and probably a 'virtual classroom'. Those who wish to participate must register with Google Answers. Here, the researchers who have considerable expertise in online researching provide answers to the queries posted by other users for a fee. When a user posts a question, he or she also needs to mention the price the user is willing to pay in case the question is answered. When the question is answered by any user, then the payment is made accordingly to the user answering the question. Moreover, the questions and the discussion that ensues will be publicly viewable and other registered users can also share their opinions and insights.

There is a non-refundable listing fee of $0.50 per question plus an additional 'price' you set for your question that reflects how much you're willing to pay for an answer. Three-quarters of your question price goes directly to the Researcher who answers your question; the remaining 25 percent goes to Google to support the service.

Google Groups: Google Groups is an online discussion forum and it contains the entire archive of Usenet discussion groups dating back to 1981. These discussions cover the full range of human dissertation and present a fascinating look at evolving viewpoints, debate and advice on every subject from politics to technology. Users can access all of this information all in a database that contains more than 800 million posts by using the search feature of Google.

Google's Image Search: Google offers a wide collection of images from around the web; its comprehensive database consists of more than 425 million images. All a user has to do is to enter a query in the image search box, then click on the "Search" button. On the results page, by clicking the thumbnail a larger version of the image can be seen, as well as the web page on which the image is located. By default, Google's Image Search uses its mature content filter on the initial search by any user. The filter removes many adult images but it cannot guarantee that all such content will be filtered out. It is not possible to ensure with 100% accuracy that all mature content will be removed from image search results using filters.

Google analyzes the text on the page near the image, the image caption and dozens of other factors which enables it to determine the image content. Google also utilizes several sophisticated algorithms which make it possible to remove duplicates and it in turn ensures that the highest quality images are presented first in the results. Google's Image search supports all the complex search strategies like Boolean operators, etc.

Google's Catalog Search: Google offers a unique service in the form of its Catalog Search. Google's Catalog Search has made it easy to find information published in mail-order catalogs that were not previously available online. It includes the full content of hundreds of mail-order catalogs selling everything from industrial adhesives to clothing and home furnishings. Google's Catalog Search can help you if you are looking to buy for either yourself or for your business.

The printed copies of catalogs are scanned and the text portion is converted into a format which makes it easy for users to search for the catalog. The same sophisticated algorithm employed by the Google Web Search is then employed to search for catalogs. This makes sure that most recent and relevant catalogs are displayed. Google is not associated with any catalog vendors and is not liable for any misuse of this service on part of the users.

Froogle: The word 'froogle' is a combination of the word 'frugal' which means 'penny-wise' or 'economical' and of course 'Google'. Currently in its beta version, or testing format, Froogle is a recent concept put forth by Google. Google's spidering software crawls the web looking for information about products for sale online. It does so by focusing entirely on product search and applying the power of Google's search technology to locate stores that sell items you want and consequently pointing you to that specific store.

Just like the Google Web Search, Froogle also ranks store sites based only on their relevance to the search terms entered by the users. Google does not accept payment for placement within their actual search results. Froogle also includes product information submitted electronically by merchants. Its search results are automatically generated by Google's ranking software.

AltaVista has an index that is built by sending out a crawler (a robot program) that captures text and brings it back.

The main crawler is called "Scooter." Scooter sends out thousands of threads simultaneously. 24 hours a day, 7 days a week, Scooter and its cousins access thousands of pages at a time, like thousands of blind users grabbing text, pulling it back, throwing it into the indexing machines so the next day that text can be in the index. And at the same time, they pull off, from all those pages, every hyperlink that they find, to put in a list of where to go to next.

In a typical day Scooter and its cousins visit over 10 million pages. If there are a lot of hyperlinks from other pages to yours, that increases your chances of being found. But if this is your own personal site, or if this is a brand new Web page, that's not too likely.

AltaVista has in incredibly large database of Web sites, such that searches often return hundreds of thousands of Web site matches. AltaVista's spider goes down about three pages into your site. This is important to remember if you have different topical pages that won't be found within three clicks of the main page. You will have to index them separately.

You cannot tell Alta Vista how to index your site, it is all done via their spider, but you can go to their site and give the spider a nudge by submitting specific pages. That way, AltaVista's spider knows to visit that page and index it. Once you have done that, it's all up to your META tags and your page's content! AltaVista's spider may revisit your site each month after its initial visit.

AltaVista ranking algorithms reward keywords in the <TITLE> tag. If a keyword is not in a title tag, it will likely not appear anywhere near the top of the search results! AltaVista also rewards keywords near one another, and keywords near the beginning of a page

Add a Page: Adding a page through AltaVista's Add URL form doesn't guarantee that the page would be listed. It usually takes around 4 to 6 weeks to show up. You don't have to have any special authority to "add a page." This is not a directory, like Yahoo!, where the information provider has to submit information and has to prove they are who they say they are. You do not have to do this with AltaVista. It will go and check and bring back whatever text it finds at that address.

If you give it a URL for a page that doesn't exist, it will come back with Error 404, which means there is no such page. If that page was in the index, it will remove that page from the index the next day.

This is very important from several perspectives. Say you have changed the directory structure at your Web site. First, you should go to AltaVista and Add a Page for all the old addresses to remove the old information from the index. Then you should Add a Page for all the new addresses. Also, if you made an embarrassing typo or posted a document that you shouldn't have, and removed that page from the Web, you can Add URL for that page at AltaVista to make sure the information is not perpetuated in the index.

What AltaVista doesn't Index: AltaVista doesn't index everything. In fact, features that Web designers may add to sites at great expense may block crawlers, meaning that those pages will never be indexed and never be found through search engines. As a result, those sites may end up spending far more on promotion than they would have had to otherwise.

Here are some pages AltaVista doesn't index. This only highlights the importance of using plain text for your web pages.

First, sites that require any kind of registration or password lock out AltaVista. Keep in mind that a web crawler cannot fill out a form of any kind. If you need to fill out a form to get to the next page, the crawler halts right there. If you would like to gather information about your users/members but would also like your pages to be indexed, make the registration optional.

Similarly, the AltaVista crawler cannot get content from a database, because it cannot fill out a form. If the content of your database is largely text, you might consider creating plain text static HTML pages with that same content, so it can be indexed and found.

Dynamic pages also block AltaVista spiders. While it's great to give visitors to your site unique experiences, tailored to their needs, the techniques you use to do that could stop most search engines including AltaVista from indexing your content and hence could greatly reduce your potential traffic. Dynamically generated pages are created on the fly from a variety of elements held in databases. When the AltaVista crawler arrives at such a page, it captures the content but halts immediately, and will not follow the links, because it sees ahead of it an infinite number of pages ahead -- a black hole that would bring it to a crash.

Active Server Pages (.asp) with question marks in their URLs (indicating that the page is a script for the construction of a page, rather than just static content) fall into this category.

If you have information inside frames, that will probably prove to be a hindrance, but is not an absolute barrier. AltaVista indexes the outside of the frame as a distinct page. It will also index each pane of the frame window as a separate page. That means that if the content matching a query is in a pane, when visitors clicking on those links will see the pane and only the pane -- not the full page as it was designed. So if you want visitors from search engines to experience your pages the way they were intended to be seen, you should have non-frames as well as frames versions of those pages; and submit the non-frames versions with Add URL.

AltaVista also can't index text that is embedded in graphics. Search engines simply cannot "see" the text unless the Webmaster put ALT text behind the picture, describing it and listing those important words. But pictures, as pictures, can be indexed for Image search at AltaVista.

Text that appears in multi-media files (audio and video) cannot be indexed. But those same files can be indexed at AltaVista for Multimedia search.

Information that is generated by Java applets or in XML coding cannot be indexed. Acrobat files cannot be indexed either. But technology exists that will enable AltaVista to convert those files to indexable form.

Exceptionally large pages also present a problem at AltaVista. As a pragmatic compromise, intended to help optimize the performance of AltaVista, they fully index the first 64 Kbytes of text on any single page. They will harvest the hyperlinks from the whole document for following up later, but they will only index the first 64 Kbytes. So if you want to post an entire book, it's best to break it up into chapters, and then all the text can be indexed.

Comments, such as <!--change this every Friday-->, aren't indexed at all. Those are intended as private communications, not viewable by Web site visitors, except by using View/Page Source.

Also, consider technical factors. If a site has a slow connection, it might time-out for the crawler. Very complex pages, too, may time out before the crawler can harvest the text. If you have a hierarchy of directories at your site, put the most important information high, not deep. AltaVista will presume that the higher you placed the information, the more important it is. And crawlers may not venture deeper than three or four or five directory levels.

Above all remember the obvious - full-text search engines such as AltaVista index text. You may well be tempted to use fancy and expensive design techniques that either block search engine crawlers or leave your pages with very little plain text that can be indexed.

By Ken Mathie



Add your comment

Name:(required)
E-mail address:(optional)
Comment:(required)
Repeat the number for validation: (required)

Browse by Tags:


Related Articles:

Text Link Ads

Statistics

Total 296 articles submitted
Latest submission at January 28, 2008 15:13

Feedback

Use this email below to send us your suggestions and feedback. We value your opinion.
info (at) theitarticles.com