1234

Yahoos at Yahoo (Read 1781 times)

    RunningAHEAD.com was unresponsively for a good part of this morning.  I call it unresponsive rather than unavailable because the server was so overloaded that it can't keep up with new requests.  If you're lucky, you may get a page every so often.  Normally, this doesn't happen because the server has plenty of capacity.

    Saturday night is the least busy time of the week for the web server.  The server schedules its weekly maintenance tasks during this down time to reduce user impact.  This works great most of the time.

    Along with legitimate user traffic, RA also gets requests from search engines such as Google, Yahoo and MSN.  These crawlers download new pages and index them so that they can point you to the right places when you search for information.  Since the search engines do not know when information is updated, they download pages from each website periodically to see if anything has changed.

    Every day, Yahoo requests the most pages by far.  In fact, Yahoo downloads more pages from RA than Google, Bing (MSN) and all the other web crawls combined.  I don't know why it has to crawl so frequently, especially when it is responsible for only 4.75% of all search engine traffic to RA.  Google, on the other hand, is responsible for 89.8% of the referrals.  Server logs shows the Yahoo crawler is requesting pages from RA every minute of every hour every day.

    Normally, this is no problem because the RA server is capable of handling the load.  However, when the server went into maintenance mode last night, Yahoo doubled and even tripled its crawl rate.  In maintenance mode, the server could not handle this spike in traffic and the requests started to backup, resulting in RA being unresponsive.

    Perhaps this is why Google is so dominant in the search engine market.  It doesn't download as frequently and as much as Yahoo, yet new forum posts on RA will appear on Google within a few hours, sometimes within a few minutes.   The Google crawler reduces its crawl rate if the server is takes longer to return the requested page.  Yahoo crawler does the exact opposite by increasing its crawl rate, thus making the problem worse.

    Other websites have reported similar problems for years.  Given that Yahoo generates so little traffic while consuming so much resources, I am considering banning Yahoo from crawling RunningAHEAD.


    Needs more cowbell!

      Perhaps this is why Google is so dominant in the search engine market.  It doesn't download as frequently and as much as Yahoo, yet new forum posts on RA will appear on Google within a few hours, sometimes within a few minutes.   The Google crawler reduces its crawl rate if the server is takes longer to return the requested page.  Yahoo crawler does the exact opposite by increasing its crawl rate, thus making the problem worse.

      Other websites have reported similar problems for years.  Given that Yahoo generates so little traffic while consuming so much resources, I am considering banning Yahoo from crawling RunningAHEAD.

       

      I think I would, if it were my choice to make.  I haven't used Yahoo for anything in years.  But I've seen threads/posts appear in Google searches within hours of creation, as you pointed out (noticed this just recently when looking at info. re: ankle sprain recovery).  Very cool.

      Kirsten - aka "Auntie Kirsten"

      '14 Goals:

      • 2 olympic distance duathlons -- 6 days apart -- PR at least 1

      • 130#s (and stay there, gotdammit!)


         Given that Yahoo generates so little traffic while consuming so much resources, I am considering banning Yahoo from crawling RunningAHEAD.

         I agree with Zoomy. I'd go ahead with banning Yahoo from crawling RA.

        Use your momentum...keep going.  You know you can make it.


        A Saucy Wench

          who uses yahoo for searches anyway? Ban the yahoos!

           

           

          But I really do appreciate the explanation behind the scenes! 

          I have become Death, the destroyer of electronic gadgets

           

          "When I got too tired to run anymore I just pretended I wasnt tired and kept running anyway" - dd, age 7

            Eric, if you are looking for feedback I would recommend(although do not know if it is possible) limiting the # of requests/hour or minute that the server responds to Yahoo with.  I would not totally dismiss 4.75% of search engine traffic.  While a revenue stream is not your primary intent today, it is still traffic which may have an impact on future growth or site value as overall traffic is one metric of a sites value(for sale or advertising potential).


            Anyway, that is my 2 cents.


            Dennis


            A Dance with Monkeys

              What is Yahoo?

              Test: once upon a time there were fools who ran the harpeth hills flying monkey marathon.  And then they drank beer.

              Now, look at the time stamp...


              A Dance with Monkeys

                Not on the search engines yet...


                A Dance with Monkeys


                A Saucy Wench

                  Not on the search engines yet...

                   Now I have the "can you hear me now" commercial in my head. 

                  I have become Death, the destroyer of electronic gadgets

                   

                  "When I got too tired to run anymore I just pretended I wasnt tired and kept running anyway" - dd, age 7


                  A Dance with Monkeys

                    Not on the search engines yet...

                     

                    Are you looking at the right place?  Click

                      Dennis,
                      I could try throttling Slurp, but from what I read, Yahoo sometimes do not honor the request.


                      A Dance with Monkeys

                        Are you looking at the right place?  Click

                         

                        Your text string has been around longer than mine.  Mine is still not there.


                        A Dance with Monkeys

                          I just went to see Despicable Me.  Good show, had fun.  The popcorn was not bad.  However, upon my return, google matches the string "once upon a time there were fools who ran the harpeth hills flying monkey marathon.  And then they drank beer" while Yahoo does not.
                            On a related note, Google is scary.  Here's the Google result when I searched for "Yahoos at Yahoo":

                            RunningAHEAD - Topic: Yahoos at Yahoo

                            7 posts - 5 authors - Last post: 4 hours ago
                            who uses yahoo for searches anyway? Ban the yahoos! But I really do appreciate the explanation behind the scenes! ...
                            www.runningahead.com/.../f475caced4ce4266ae492ddeefb026b7 - 4 hours ago

                            I created this forum from scratch (you may criticize my insanity some other time).  There is no other forum like it, yet it is evident that Google figured out that it's a forum, and its format.


                            1234