Dee-Nee Forums

General => News => Topic started by: Gantry on 10/11/11, 02:21:49 PM

Title: Baidu
Post by: Gantry on 10/11/11, 02:21:49 PM
These guys LOVE the forums:

http://en.wikipedia.org/wiki/Baidu (http://en.wikipedia.org/wiki/Baidu)

14 mofos right now spidering away, seems like they are always on. 

Tried searching for "harry spilman" "rbi baseball" and "gantry rbi baseball" and no hits for dee-nee on the first page :'(

try yourself - http://www.baidu.com/ (http://www.baidu.com/)
Title: Re: Baidu
Post by: Shooty on 10/11/11, 06:11:52 PM
More like Bye-du.   ;D
Title: Re: Baidu
Post by: Gerlost on 10/11/11, 08:22:54 PM
More like Bi Dudes!  ^-^
Title: Re: Baidu
Post by: nightwulf on 10/14/11, 04:34:28 PM
QuoteAccording to the China Digital Times, Baidu has a long history of being the most proactive and restrictive online censor in the search arena. Documents leaked in April 2009 from an employee in Baidu's internal monitoring and censorship department show a long list of blocked websites and censored topics on Baidu search.[39] In May 2011, pro-democracy activists sued Baidu for violating the U.S. constitution by the censorship it conducts, in accord with the demand of the Chinese government.[40]

Maybe that explains the lack of hits even though their spiders are here 24 hours a day. Clearly our debauchery is a threat to communism.
Title: Re: Baidu
Post by: Gantry on 10/14/11, 05:27:21 PM
FREE DEE NEE, OCCUPY BEIJING !!!!
Title: Re: Baidu
Post by: nomaaa on 10/16/11, 02:30:57 AM
for some reason, i don't think smoking pot on the lawn of city hall in beijing would go down as smoothly
Title: Re: Baidu
Post by: fightonusc on 10/18/11, 03:11:26 AM
(http://cdn.uproxx.com/assets/images/155/155_8200bd30e1295fe9cc6d80b5f200742c.jpg)

imo
Title: Re: Baidu
Post by: nomaaa on 10/18/11, 06:23:21 PM
that's a good opinion to have imo
Title: Re: Baidu
Post by: Gantry on 10/24/11, 12:17:19 PM
20 Baidu spiders all up in my shit right now
Title: Re: Baidu
Post by: fknmclane on 10/24/11, 05:34:15 PM
Quote from: MikeDEEK!
/shits pants
//beats off with own shit
Title: Re: Baidu
Post by: Mike D. on 10/24/11, 05:51:10 PM
I really hate spiders.
Title: Re: Baidu
Post by: Shooty on 10/24/11, 07:10:46 PM
Users active in past 5 minutes:
Shooty, BDawk, Yahoo!, Google, Baidu (19)


At least Yahoo! and Google have joined the forums.  Good to have those guys on board.
Title: Re: Baidu
Post by: Gantry on 10/31/11, 11:48:02 AM
Going to have to figure out a way to block them, they request 3 times as many pages as all normal users combined.  Forums running like shit today, web & db services periodically using up a ton of CPU.

Their IP ranges are consistent, may just block them via apache or iptables...
Title: Re: Baidu
Post by: GDavis on 10/31/11, 11:51:29 AM
I'd block them via apache

Tommy Seebach Band - apache (http://www.youtube.com/watch?v=yo4glASbEh4#)
Title: Re: Baidu
Post by: Gantry on 10/31/11, 11:59:32 AM
Haven't posted that on Sperling's wall in awhile, guess it's time
Title: Re: Baidu
Post by: Reds on 10/31/11, 12:03:37 PM
Might I suggest you try fucking Baidu?
Title: Re: Baidu
Post by: Gantry on 10/31/11, 12:04:02 PM
fuck this, I'm rebooting.  Probably won't fix but there's a new kernel to isntall anyway. 

[ultimate]Are the forums slow for anyone else this morning?[/ultimate]
Title: Re: Baidu
Post by: Reds on 10/31/11, 12:07:53 PM
Maybe berating it?
Title: Re: Baidu
Post by: Gantry on 10/31/11, 12:10:06 PM
Speedy after the reboot so far...
Title: Re: Baidu
Post by: Reds on 10/31/11, 12:13:22 PM
Berating usually does work.
Title: Re: Baidu
Post by: Gantry on 10/31/11, 12:14:13 PM
Then why do I always fuck it first?
Title: Re: Baidu
Post by: Reds on 10/31/11, 01:25:54 PM
What kind of question is that?
Title: Re: Baidu
Post by: Shooty on 10/31/11, 04:40:05 PM
Quote from: Reds on 10/31/11, 12:13:22 PM
Berating usually does work.

So does bereting:

(http://i55.tinypic.com/1z6h7kn.jpg)
Title: Re: Baidu
Post by: fknmclane on 10/31/11, 05:13:34 PM
Forums were indeed slow as shit this morning.
Title: Re: Baidu
Post by: Gantry on 10/31/11, 05:24:23 PM
Yeah the reboot did wonders.  don't know what the issue was, but I falsely accused Baidu.  I apologize to all of China...
Title: Re: Baidu
Post by: GDavis on 10/31/11, 05:31:38 PM
Have you checked the kernel?
Title: Re: Baidu
Post by: Gantry on 12/28/11, 06:26:03 PM
Site's been up 2 minutes and 25 separate connections from Baidu - my uninformed decision is that along with an apache memory leak are causing the issues.
Title: Re: Baidu
Post by: TβG on 12/28/11, 06:55:36 PM
so can you ban whoever that is?
Title: Re: Baidu
Post by: Gantry on 12/29/11, 06:02:36 PM
Not easily, but I'm sure there's a way to tweak the firewall to not allow any access from China. 
Title: Re: Baidu
Post by: nightwulf on 12/29/11, 07:30:42 PM

[nightwulf@deenee httpd]# grep "180\.76\.5" access_log | grep "robots"
180.76.5.88 - - [25/Dec/2011:14:54:36 -0600] "GET /robots.txt HTTP/1.1" 404 291
180.76.5.50 - - [26/Dec/2011:21:24:28 -0600] "GET /robots.txt HTTP/1.1" 404 291
180.76.5.101 - - [27/Dec/2011:11:41:30 -0600] "GET /robots.txt HTTP/1.1" 404 291
180.76.5.169 - - [29/Dec/2011:17:30:01 -0600] "GET /robots.txt HTTP/1.1" 404 291


Names changed to protect the innocent. It appears Baidu is at least reading robots.txt. I'll try using that to get them off our backs, though it may take a day or two to see any change. If that doesn't work, I guess we're left with IP banning. Could maybe come up with something based on user-agent, but I don't like the overhead that would create.
Title: Re: Baidu
Post by: nightwulf on 12/29/11, 07:38:38 PM
Robots.txt now politely asking all Baidu spider user-agents (as found here (http://www.baidu.com/search/spider_english.html)) to fuck off. We'll see if they actually follow it.
Title: Re: Baidu
Post by: nightwulf on 12/29/11, 10:20:27 PM
So far Google and MSN have parsed robots.txt; haven't seen Baidu read it (though there are still 19+ of their spiders spidering). Found a couple old httpd processes going absolutely batshit on CPU usage; restarted httpd and they all came back. 95+% usage at times. Just saw one at 99.5%. Total CPU usage is constantly high, all in httpd processes. Maybe we need to look at the web server ...
Title: Re: Baidu
Post by: Reds on 12/29/11, 10:26:24 PM
Nightwulf, have you tried fucking it?
Title: Re: Baidu
Post by: Felonious Gunk on 12/29/11, 10:37:09 PM
Quote from: Reds on 12/29/11, 10:26:24 PM
Nightwulf, have you tried fucking it?
Also, is it possible you acted in error by berating it before fucking it?
Title: Re: Baidu
Post by: Gantry on 12/30/11, 09:21:11 AM
Quote from: nightwulf on 12/29/11, 10:20:27 PM
So far Google and MSN have parsed robots.txt; haven't seen Baidu read it (though there are still 19+ of their spiders spidering). Found a couple old httpd processes going absolutely batshit on CPU usage; restarted httpd and they all came back. 95+% usage at times. Just saw one at 99.5%. Total CPU usage is constantly high, all in httpd processes. Maybe we need to look at the web server ...

Something is definitely up with apache, last night I noticed total memory usage per top with only 75MB of RAM free.  Restarted httpd and it went to 2.7GB free.  Restarting mysql only freed up 50MB or so.  I think there's a memory leak or something of the sort with apache. 
Title: Re: Baidu
Post by: TβG on 12/30/11, 10:34:28 AM
does the servers have alzheimer's?   
Title: Re: Baidu
Post by: nightwulf on 12/30/11, 01:39:45 PM
Quote from: Reds on 12/29/11, 10:26:24 PM
Nightwulf, have you tried fucking it?

I wouldn't fuck it with Mike D's dick.
Title: Re: Baidu
Post by: nightwulf on 12/30/11, 01:42:18 PM
Quote from: Gantry on 12/30/11, 09:21:11 AM
Something is definitely up with apache, last night I noticed total memory usage per top with only 75MB of RAM free.  Restarted httpd and it went to 2.7GB free.  Restarting mysql only freed up 50MB or so.  I think there's a memory leak or something of the sort with apache.

Well, I'll look around the internets but I'm more of a "./configure; make; make install" guy, so CentOS is really your ball. In the meantime I'll just restart httpd a couple times a day or so.
Title: Re: Baidu
Post by: nightwulf on 12/30/11, 09:29:36 PM
Baidu spiders appear to be laying off. httpd CPU usage is still high, but significantly lower than it was yesterday. Damn Chinese ...
Title: Re: Baidu
Post by: Flood on 04/26/12, 01:33:13 PM
bump...any news on CPU usage?
Title: Re: Baidu
Post by: nightwulf on 04/26/12, 05:54:41 PM
Why yes. Nowhere near as bad as it was, still spiking occasionally.

Given that and the shutdowns we've had recently, maybe we should hold a Dee-Nee bake sale for some new hardware. Socket AM3 stuff would be fine for our needs and it's dirt cheap ... DDR3 RAM is stupid cheap ...
Title: Re: Baidu
Post by: Flood on 04/27/12, 08:12:51 AM
Somebody set up the donation thingy and I'm sure you'll get plenty of cash/cache.
Title: Re: Baidu
Post by: Gantry on 04/30/12, 08:31:40 PM
Really need to put this thing on a virtual machine, upgrade CentOS (I can't upgrade Mediawiki anymore thanks to php, the fuck?) and put in better hardware.  Oh yeah, and get rid of the main site - still running that on another server running RedHat 7 software from over 10 years ago.

In reality though, the site is plenty fine performance wise - biggest performance issues are that the bandwidth simply blows at my office.  Since I have a static IP and a 50/5 connection at home, I really should move it there.  It's a 3 megabit double-T1 at the office and it's completely saturated...

I should do lots of things, none of which will probably get done.  Hope that helps!
Title: Re: Baidu
Post by: Flood on 05/01/12, 09:53:30 AM
internet slumlord imo
Title: Re: Baidu
Post by: nightwulf on 05/04/12, 06:16:24 PM
Quote from: Gantry on 04/30/12, 08:31:40 PM
Really need to put this thing on a virtual machine, upgrade CentOS (I can't upgrade Mediawiki anymore thanks to php, the fuck?) and put in better hardware.  Oh yeah, and get rid of the main site - still running that on another server running RedHat 7 software from over 10 years ago.

In reality though, the site is plenty fine performance wise - biggest performance issues are that the bandwidth simply blows at my office.  Since I have a static IP and a 50/5 connection at home, I really should move it there.  It's a 3 megabit double-T1 at the office and it's completely saturated...

I should do lots of things, none of which will probably get done.  Hope that helps!

I agree that we're not really "bad." The power supply issues recently gave me the idea to look at the server hardware (what I can glean from a terminal window anyway). A low-power triple core AM3, motherboard, and 16 gigs of RAM wouldn't even run $200, and that'd be enough hardware for Dee-Nee for decades. The RAM would be killer if you're wanting to run things on different VMs, and I'm sure offloading httpd and mysql to different cores would make a world of difference. Plus I've been looking at other add-ons (replacing the shitty chat is a big one) and those ajax-based ones always warn "if you're on a shared machine, you run the risk of a nasty e-mail from your sysadmin regarding processor time."

But I don't mean to sound like I'm bitching. I'm fine either way. Just throwing things out there.
Title: Re: Baidu
Post by: fknmclane on 05/04/12, 06:58:39 PM
Fucking nightwulf, always bitching.
Title: Re: Baidu
Post by: fightonusc on 05/04/12, 07:27:46 PM
What a fucking Bitchy McBitcherson.
Title: Re: Baidu
Post by: rdub on 05/04/12, 08:15:57 PM
I don't know if this is going to be enough for nightwulf

(http://2.bp.blogspot.com/-5cVa12RVnEc/T5814l0JwiI/AAAAAAAAACI/gPl0rLHD1p0/s1600/ChedCheeseWheel.gif)