Questions about robots.txt and sitemaps

Register an Account
Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 01-01-2008, 06:01 PM
Casual Pligger
Pligg Version: 9.8.2
Pligg Template: Vera
 
Join Date: Dec 2007
Posts: 59
Hi all!

After following some advices on the Stop Indexing Duplicate Content thread I thought I had everything ready for Google, but I was wrong
I think it's a simple question. Hope I can get good answers

1. I submitted my sitemaps through google webmaster tools. There was 57 urls, but google just indexed 7. Why? Screenshot:

HERE google says that they can't guarantee to index all the sites, but only 7 from 57.... I think something is wrong...
What do you think? Is it normal?
Will they index the other pages later or maybe the don't index it for reasons as duplicate content or whatever?
You can check my robots.txt file HERE, but I think everything is ok with that file.
2. As you can see my robots.txt ask google to don't index pages as /search, but they are doing it:
site:estosesale.com/news/search - Google Search
I think maybe they did it because I created the robots.txt file after they crawled my site, but will these results expected to dissapear on the future?

Thanks in advance!

Juan
Reply With Quote
  #2 (permalink)  
Old 01-01-2008, 07:02 PM
Pligg Donor
Pligg Version: 9.9.5
 
Join Date: Sep 2007
Posts: 192
Juan...

Yes, you are correct. Google crawled your site BEFORE you admitted the robots.txt file into your domain. Google is crawling sites so darn fast these days -- it's almost a joke for anyone to say "How do I get site my site listed in Google?" (Ha!) The only thing you can do now is to wait until Google removes those entries -- which they will, but how soon is entirely up to them.

In google webmaster tools, click on your domain name, then click on diagnostics, then click on Content analysis. From there, view how many pages "might" be causing trouble for Google. For example, take notice to the title tag issues and meta issues Google reports. My guess is that you likely have MANY. To fix these issues, you will have to apply a fix I wrote located here.

Yes I know, NOT ANOTHER FIX I HAVE TO IMPLEMENT?!! Welcome to the world of SEO.

After applying the meta/title fix, then you should be alright. Indexing takes time, so even if you submit a sitemap, Google will only take what it wants to at the time -- so don't worry. Eventually, after all fixes have been implemented, your site will index correctly. Just be patient though, as you still have much to fix before everything is sorted out in Google.
Reply With Quote
  #3 (permalink)  
Old 01-01-2008, 08:24 PM
Casual Pligger
Pligg Version: 9.8.2
Pligg Template: Vera
 
Join Date: Dec 2007
Posts: 59
Thanks blaze, I REALLY apreciate your effort (think is not the first time I thank u, lol)
I already read that thread and I had in mind to do it but it seemed difficult so I haven't enough brave to do it yet.

Google webmaster tools > diagnostics > Content analysis says "We didn't detect any content issues with your site". I guess they need more time, because I don't think everything is ok

Anyway, if my girlfriend let me spend another day on the computer, I'll put my hands on the meta/title fix

Thanks!
Reply With Quote
  #4 (permalink)  
Old 01-09-2008, 06:31 PM
New Pligger
Pligg Version: 0.9.8
Pligg Template: Yget
 
Join Date: Dec 2007
Posts: 23
i got only this
URLs restricted by robots.txt (67) .. i guess patience is key
Reply With Quote
  #5 (permalink)  
Old 01-09-2008, 07:10 PM
Pligg Donor
Pligg Version: 9.9.5
 
Join Date: Sep 2007
Posts: 192
Quote:
Originally Posted by spahiu View Post
i got only this
URLs restricted by robots.txt (67) .. i guess patience is key
The key is to NOT have anything bad reported. The robots.txt file being reported is not a bad thing, it's just telling you that it's working and Google cannot spider those pages, which is what you wanted.

If everything else is clean and nothing shows up (ie: pages) for duplicate content, then you're in good hands. However, if you start seeing some of your pages appear in the duplicate title tags and duplicate meta tags, then you have problems and you need to fix those issues.
Reply With Quote
  #6 (permalink)  
Old 01-10-2008, 08:03 AM
New Pligger
Pligg Version: 0.9.8
Pligg Template: Yget
 
Join Date: Dec 2007
Posts: 23
Quote:
Originally Posted by blaze View Post
The key is to NOT have anything bad reported. The robots.txt file being reported is not a bad thing, it's just telling you that it's working and Google cannot spider those pages, which is what you wanted.

If everything else is clean and nothing shows up (ie: pages) for duplicate content, then you're in good hands. However, if you start seeing some of your pages appear in the duplicate title tags and duplicate meta tags, then you have problems and you need to fix those issues.
thx, yeah i figured that out ..just pointing out what i have on that page.
Reply With Quote
Reply

Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Xml sitemaps new version mihai Free Modules 64 11-08-2011 04:00 PM
SEO hacks and XML Sitemaps (and RSS feeds) rethomas07 Questions and Comments 8 08-17-2009 05:10 AM
XML Sitemaps graphicsguru Questions and Comments 13 02-02-2008 11:55 AM
A paid solution to sitemaps Divisive Cotton Questions and Comments 5 12-10-2007 02:32 AM
Google Sitemaps LeoNel Questions and Comments 2 10-25-2006 09:03 PM


Pligg Modules and Pligg Templates from Pligg Pro Find support on the Pligg CMS Forum - 24 hours a day! Make a donation to support Pligg CMS development