|
|
11 Creative Ways You Can Use Autoresponders
1. Pick 4 or more articles you've written that have a common theme and put them in an autoresponder series. Announce it on your site as an e-mail course on the go. 2. If you have a page for related links, create a related links file and put...
Implementation of ASP(application services provision)
IMPLEMENTATION OF ASP (APPLICATION SERVICES PROVISION) Beyond the old computer application share application become a new way of communication in information technology. ASP (Application Service Provision) has a concept of since it first come in...
Make Documenting Your Software Easier Than Ever Before
Here at M6.Net we’ve just released our new control panel software, allowing for easier and better control over our clients’ website accounts. It was my task to write the documentation for this new software. After working through it in a...
Quick, Helpful Tips for Direct Mail
Make Your Offer Irresistible - All good copywriters will tell you, the secret to great copy is to make it impossible to say "No"! Find the Good in Everything - Rather than try to hide negative sides to your product, try making them known with a...
The Importance Of Email Backup
Viruses, software failures, power failures, human errors, hard drive failures are only a few examples of what could destroy the data on a hard drive, including all documents, pictures, emails and other files!
Most home computer users don't need an...
|
|
|
|
|
|
|
|
How To Control Search Engine Robots
Wouldn't it be nice to be able to leave some code in your web site to tell the search engine spider crawlers to make your site number one? Unfortunately a robots.txt file or robots meta tag won't do that, but they can help the crawlers to index your site better and block out the unwanted ones.
First a little definition explaining:
Search Engine Spiders or Crawlers - A web crawler (also known as web spider) is a program which browses the World Wide Web in a methodical, automated manner. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine, that will index the downloaded pages to provide fast searches.
A web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit. As it visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, recursively browsing the Web according to a set of policies.
Robots.txt - The robots exclusion standard or robots.txt protocol is a convention to prevent well-behaved web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website.
The robots.txt protocol is purely advisory, and relies on the cooperation of the web robot, so that marking an area of your site out of bounds with robots.txt does not guarantee privacy. Many web site administrators have been caught out trying to use the robots file to make private parts of a website invisible to the rest of the world. However the file is necessarily publicly available and is easily checked by anyone with a web browser.
The robots.txt patterns are matched by simple substring comparisons, so care should be taken to make sure that patterns matching directories have the final '/' character appended: otherwise all files with names starting with that substring will match, rather than just those in the directory intended.
Meta Tag - Meta tags are used to provide structured data about data.
In the early 2000s, search engines veered away from reliance on Meta tags, as many web sites used inappropriate keywords, or were keyword stuffing to obtain any and all traffic possible.
Some search engines, however, still take Meta tags into some consideration when delivering results. In recent years, search engines have become smarter, penalizing websites that are cheating (by repeating the same keyword several times to get a boost in the search ranking). Instead of going up rankings, these websites will go down in rankings or, on some search engines, will be kicked off of the search engine completely.
Index a site - The act of crawling your site and gathering information.
How can the robots.txt file and meta tag help you?
In the robots.txt you can tell the harmful 'web crawlers' to leave your web site alone, and give helpful hints to the ones you want to crawl your site. Here is an example on how to disallow a web crawler to search your site:
# this identifies the wayback machine User-agent: ia_archiver Disallow: /
ia_archiver is the crawler name for the wayback machine that you may have heard of, and the / after disallow tells ai_archiver not to index any of your
site. The # allows you to write comments to yourself so you can keep track of what you typed.
Type the above three lines into notepad from your computer and save it to the root directory of your web site as robots.txt. Web crawlers look for this document first at a web site before doing anything else. This helps the crawler to do its job, and helps the web site owner tell the spider what to do. Say for instance you have some data that you don't want the crawlers to see. (Like duplicate content for other browser referrer pages)
You can deter crawlers from indexing the 'duplicate' directory by typing this into your robots.txt file. Or if you would like to have the robots.txt file created for you, visit www.rietta.com/robogen. To validate your robots.txt file to make sure it works properly you can visit www.searchengineworld.com/cgi-bin/robotcheck.cgi
User-agent: * Disallow: /duplicate/
The * after user-agent says that this action applies to all crawlers and /duplicate/ after disallow tells all crawlers to ignore this directory and not search it. For each user-agent and disallow line there must be a blank space between them in order for it to function correctly. So this is how you would create the above two commands into a robots.txt file:
# this identifies the wayback machine User-agent: ia_archiver Disallow: /
User-agent: * Disallow: /duplicate/
One thing to note that is very important: Anyone can access the robots.txt file of a site. So if you have information that you don't want anyone to see don't include it into the robots.txt file. If the directory that you don't want anyone to see is not linked to from your web site the crawlers won't index it anyway.
An alternative to blocking indexing of your site is to put a meta tag into the page. It looks like this:
You put this into the tag of your web page. This line tells the robot crawlers not to index (search) the page and not to follow any of the hyperlinks on the page. So as an example tells the robots crawlers to not index the page, but follow the hyperlinks on this page.
Did you know that Google has its own tag?
It looks like this: This tells the Google robot crawler not to index the page, not to follow any of the links, and not to keep from storing cached versions of your web site. You will want this done if you update the content on your site frequently. This prevents the web user from seeing outdated content that isn't refreshed because of storage in the cache. You can use the tag to specifically talk to Google's robots to avoid complications or if you are optimizing your site for Google's search engine. This concludes this month's article.
Until the next article have a great day!
Copyright © Michael Rock Web development contractor (Web Design and Hosting) Internet Presence http://www.TheInternetPresence.com
About the Author
The owner of this registered company has over twenty years experience with DOS, windows business applications, numerous programming languages, artistic development, and web design. Other areas of interest include web marketing, web promoting, and business marketing and development. After the persuasion of those praising his work, he decided to go into business himself and highly suggests everyone else to do the same.
|
|
|
|
|
Tucows Downloads - Download Freeware and Shareware Software |
Download freeware, shareware, and demos. Maintains over 45000 software titles that are tested, rated, reviewed and ready to download. |
www.tucows.com |
  |
Free Software Downloads and Software Reviews - Download.com |
Download shareware, freeware and Demo software for PC, Mac, Linux, and Handhelds categorized into categories, plus software reviews. |
www.download.com |
  |
Computer software - Wikipedia, the free encyclopedia |
This includes application software such as a word processor, which enables a ... Application software is often purchased separately from computer hardware. ... |
en.wikipedia.org |
  |
Shareware.com - Search for shareware programs and free software ... |
Search for shareware programs from more than a dozen downloadable software directories. |
www.shareware.com |
  |
Jumbo: Free & Shareware MP3 files, Games, Screen Savers & Computer ... |
Source of free and shareware computer programs and utilities for PC and Mac. Evaluate software and read product reviews. Download games and screen savers. |
www.jumbo.com |
  |
Computer Software in the Yahoo! Directory |
Browse categories featuring sites devoted to computer software, including shareware and freeware download sites, operating systems, desktop customization, ... |
dir.yahoo.com |
  |
IEEE Software |
IEEE Computer Society's magazine covering all aspects of software, including software engineering. |
www.computer.org |
  |
Free Downloads on ZDNet | Shareware, Trialware, Evaluation Software |
ZDNet's Software Directory is the Web's largest library of software downloads. Covering software for Windows, Mac, and Mobile systems, ZDNet's Software ... |
downloads.zdnet.com |
  |
FSF - The Free Software Foundation |
Free software is a matter of liberty not price. Think of "free" as in "free speech". |
www.fsf.org |
  |
Apple - Software |
Software products for your digital life. ... The perfect addition for professional review. QuickTime Broadcaster. Encoding software for live events. ... |
www.apple.com |
  |
Open Directory - Computers: Software |
In Partnership with AOL Search. about dmoz | report abuse/spam | help. the entire directory, only in Computers/Software. Top: Computers: Software (38471) ... |
dmoz.org |
  |
freshmeat.net: Welcome to freshmeat.net |
About: The Web browser is probably the most frequently used software today, ... Web professionals can use the software for functional testing and regression ... |
freshmeat.net |
  |
Software - GNU Project - Free Software Foundation (FSF) |
Listing of the GNU software packages. |
www.gnu.org |
  |
Sun Software |
Get enterprise-class software--Solaris 10 OS, the Java Enterprise System, ... Sun Java StorageTek Software reduces cost and complexity with a single, ... |
www.sun.com |
  |
Internet Real Estate.com -- owns and operates a portfolio of the ... |
SOFTWARE.COM · SWEEPSTAKES.COM · PHONE.COM PODCAST.COM ... Software.com | Sweepstakes.com | Phone.com | Podcast.com | Shop.com | Safety.com ... |
www.internetrealestate.com |
  |
Joel on Software |
A weblog by Joel Spolsky, a programmer working in New York City, about software and software companies. |
www.joelonsoftware.com |
  |
Amazon.com Software: Computer & video games, business, accounting ... |
Online shopping for computer & video games, business & office productivity software, software from Microsoft, Apple, Adobe & more; accounting, antivirus, ... |
www.amazon.com |
  |
IBM Software - Home Page |
IBM home page for all of its software products, including Lotus and Tivoli, with keyword search, category browse and AZ product names. |
www.ibm.com |
  |
Opera web browser: Homepage |
Copyright © 2006 Opera Software ASA. All rights reserved. Skip navigation. Opera Software ... Copyright Opera Software ASA . All rights reserved. ... |
www.opera.com |
  |
Google Directory - Computers > Software |
Search only in Software Search the Web ... Software Categorized by Letter: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ... |
www.google.com |
  |
|