SQUID - zamrzani

Dobry den,

zacal se mi objevovat problem s funkcnosti SQUIDa kde se nemohu dopatrat po pricine. Proste a jednoduse po nejake dobe behu prestane odpovidat na request pozadavky. Prochazel jsem logy zda nekde nenajdu nejaky zaznam proc se tak chova, ale nikde nic. Jedine co se objevi priblizne kolem casu kdy prestane fungovat je tento zapis v cache.log:

2010/01/06 13:55:06| WARNING: HTTP header contains NULL characters {Accept: /^M
Content-Type: application/x-www-form-urlencoded}

Nicmene to si nemyslim ze bude ten problem. Nasledne staci tedy SQUID restartovat a dalsi dva, tri dny bezi bez problemu. Osobne mam podezreni na nadestatek pameti, ale tak nejak nevim jak to overit. Prosim poradite mi nekdo zda lze nekde vycist kde by mohl byt problem, ci zda zkusit nejak podrobnejsi debug zaznam provozu?

Dekuju Martin S.

Zapnout debug by nebylo od veci -d a -X (viz man squid). Pokud se tim nechces vic zaobirat, dej si do cronu kazdy den restart squidu… jinak tez

11.20 [47]Debugging Squid

If you believe you have found a non-fatal bug (such as incorrect HTTP
processing) please send us a section of your cache.log with debugging to
demonstrate the problem. The cache.log file can become very large, so
alternatively, you may want to copy it to an FTP or HTTP server where we
can download it.

It is very simple to enable full debugging on a running squid process.
Simply use the -k debug command line option:

     % ./squid -k debug

Zkusil jsme dve veci. Doslo k rozsireni pameti na serveru a SQUID spusten v debug modu. Takze uvidime co bude dal. Diky za radu

Mas posledni verzi Squid? Vim ze pred par mesici byl bug ve Squidu a projevovalo se to hlavne na Red Hat. Snad bylo i oficialni ohlaseni protoze jsme to dostali u nas v praci.
Meli jsme tu uplne stejny problem. Squid prestal fungovat po par dnech a zadna stopa v logu. Nakonec jsme updatovali squid a vsechno je OK.

FL

Mam verzi squid-3.0.STABLE20-2.fc11.i586, coz je jak se divam posledni. Uz mi to zacina dost zneprijemnovat zivot tak mozna zkusim odnistalovat balicek a zkusim zdrojovou verzi primo od squida. Bohuzel ani pravidelny restart behen noci nepomaha.
M.

Pokud muzes pust na proces zamrzleho, ci neodpovidajiciho squidu strace, treba ti ukaze v jake smycce se mota, nebo na co ceka.

Mimochodem v repu jsou dva squidy
squid-3.0.STABLE13-1.fc11.i386.rpm
squid-3.0.STABLE20-2.fc11.i586.rpm

Zkus ten druhy. Jak ses na tom se SELinuxem?

OK, zkusim ten druhy SQUID a uvidime. Jina SELinux mam disable.
Jeste jedna vec je to divny, ale cca 10 minut pred timto zvlastnim stavem se v cache.log objevi tento zaznam:

2010/01/18 15:00:47| WARNING: HTTP header contains NULL characters {Accept: /^M
Content-Type: application/x-www-form-urlencoded}

psal jsme to uz na zacatku a podle toho co jsme nasel by to nemelo na to mit vliv, ale zarazi me ta pravidelnost. No nic jdu zkouset dal …

Diky M.

Tak se to chova stejne i s verzi squid-3.0.STABLE13-1.fc11.i386.rpm. Strace sjme sice pustil, ale nejak se v te zmeti zaznamu nevyznam, takze mi to asi nepomuze. Jdu zkusit instalovat zdrojovou verzi. To uz je asi posledni poku :frowning:
M.

Vystup si muzes presmerovat do souboru. Neni potreba rozumet kazdemu radku, jde o to ziskat obecny prehled co asi squid dela kdyz je “zasekly”. Pokud strace vypisuje spoustu informaci, znamena to, ze squid neco dela, tzn. neni skutecne zamrzly, otazka je co dela a to chce ten vypis trochu prostudovat.

Bohuzel problem stale trva. Probehl upgrade na novou verzi SQUIDa, ale nepomohlo to. Mam vystup z strace v dobe kdy je squid “zamrzly”, ale nic moc jsme tam nevycetl. Pripadne log neakm vystavim pokud by mel nekdo chut a naladu mi s timto problemem pomoci. Prikladam cast vypisu:

09:27:39.115527 gettimeofday({1264667259, 115552}, NULL) = 0
09:27:39.115591 gettimeofday({1264667259, 115614}, NULL) = 0
09:27:39.115655 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.125528 gettimeofday({1264667259, 125553}, NULL) = 0
09:27:39.125591 gettimeofday({1264667259, 125615}, NULL) = 0
09:27:39.125656 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.135527 gettimeofday({1264667259, 135553}, NULL) = 0
09:27:39.135591 gettimeofday({1264667259, 135614}, NULL) = 0
09:27:39.135655 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.145528 gettimeofday({1264667259, 145553}, NULL) = 0
09:27:39.145591 gettimeofday({1264667259, 145615}, NULL) = 0
09:27:39.145656 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.155527 gettimeofday({1264667259, 155552}, NULL) = 0
09:27:39.155590 gettimeofday({1264667259, 155613}, NULL) = 0
09:27:39.155654 epoll_wait(6, {{EPOLLIN, {u32=13, u64=13805992705920073741}}}, 16384, 10) = 1
09:27:39.158331 gettimeofday({1264667259, 158362}, NULL) = 0
09:27:39.158412 accept(13, {sa_family=AF_INET, sin_port=htons(2317), sin_addr=inet_addr(“192.168.1.133”)}, [16]) = 15
09:27:39.158538 getsockname(15, {sa_family=AF_INET, sin_port=htons(3128), sin_addr=inet_addr(“192.168.1.1”)}, [16]) = 0
09:27:39.158681 fcntl64(15, F_GETFL) = 0x2 (flags O_RDWR)
09:27:39.158746 fcntl64(15, F_SETFD, FD_CLOEXEC|0x2) = 0
09:27:39.158818 fcntl64(15, F_GETFL) = 0x2 (flags O_RDWR)
09:27:39.158878 fcntl64(15, F_SETFL, O_RDWR|O_NONBLOCK) = 0
09:27:39.158949 epoll_ctl(6, EPOLL_CTL_ADD, 15, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=15, u64=20831755321737231}}) = 0
09:27:39.159033 accept(13, 0x13b74e0, [16]) = -1 EAGAIN (Resource temporarily unavailable)
09:27:39.159119 gettimeofday({1264667259, 159145}, NULL) = 0
09:27:39.159191 epoll_wait(6, {{EPOLLIN, {u32=15, u64=20831755321737231}}}, 16384, 10) = 1
09:27:39.159259 gettimeofday({1264667259, 159281}, NULL) = 0
09:27:39.159323 read(15, “GET http://www.idnescz/ HTTP/1.1”…, 4095) = 404
09:27:39.159401 gettimeofday({1264667259, 159425}, NULL) = 0
09:27:39.159637 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 16
09:27:39.159734 fcntl64(16, F_GETFL) = 0x2 (flags O_RDWR)
09:27:39.159794 fcntl64(16, F_SETFD, FD_CLOEXEC|0x2) = 0
09:27:39.159854 bind(16, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr(“0.0.0.0”)}, 16) = 0
09:27:39.159926 fcntl64(16, F_GETFL) = 0x2 (flags O_RDWR)
09:27:39.159981 fcntl64(16, F_SETFL, O_RDWR|O_NONBLOCK) = 0
09:27:39.160044 setsockopt(16, SOL_TCP, TCP_NODELAY, [1], 4) = 0
09:27:39.160142 sendto(7, “\312j\1\0\0\1\0\0\0\0\0\0\3www\7idnescz\0\0\1\0\1”, 29, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(“194.228.2.1”)}, 16) = 29
09:27:39.160310 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.169528 gettimeofday({1264667259, 169555}, NULL) = 0
09:27:39.169596 gettimeofday({1264667259, 169621}, NULL) = 0
09:27:39.169663 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.179528 gettimeofday({1264667259, 179553}, NULL) = 0
09:27:39.179591 gettimeofday({1264667259, 179615}, NULL) = 0
09:27:39.179656 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.189545 gettimeofday({1264667259, 189574}, NULL) = 0
09:27:39.189614 gettimeofday({1264667259, 189638}, NULL) = 0
09:27:39.189680 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.199528 gettimeofday({1264667259, 199553}, NULL) = 0
09:27:39.199591 gettimeofday({1264667259, 199615}, NULL) = 0
09:27:39.199657 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.209536 gettimeofday({1264667259, 209565}, NULL) = 0
09:27:39.209604 gettimeofday({1264667259, 209628}, NULL) = 0
09:27:39.209670 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.219528 gettimeofday({1264667259, 219553}, NULL) = 0
09:27:39.219591 gettimeofday({1264667259, 219614}, NULL) = 0
09:27:39.219675 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.229524 gettimeofday({1264667259, 229549}, NULL) = 0
09:27:39.229588 gettimeofday({1264667259, 229611}, NULL) = 0
09:27:39.229652 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.239527 gettimeofday({1264667259, 239551}, NULL) = 0
09:27:39.239590 gettimeofday({1264667259, 239613}, NULL) = 0
09:27:39.239654 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.249527 gettimeofday({1264667259, 249551}, NULL) = 0
09:27:39.249589 gettimeofday({1264667259, 249612}, NULL) = 0
09:27:39.249653 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.260175 gettimeofday({1264667259, 260210}, NULL) = 0
09:27:39.260258 gettimeofday({1264667259, 260283}, NULL) = 0
09:27:39.260334 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.269532 gettimeofday({1264667259, 269558}, NULL) = 0
09:27:39.269595 gettimeofday({1264667259, 269619}, NULL) = 0
09:27:39.269661 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.279527 gettimeofday({1264667259, 279553}, NULL) = 0
09:27:39.279590 gettimeofday({1264667259, 279614}, NULL) = 0
09:27:39.279655 epoll_wait(6, {}, 16384, 10) = 0
09:27:39.289526 gettimeofday({1264667259, 289551}, NULL) = 0
09:27:39.289589 gettimeofday({1264667259, 289612}, NULL) = 0

sendto(7, “\312j\1\0\0\1\0\0\0\0\0\0\3www\7idnescz\0\0\1\0\1”, 29, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(“194.228.2.1”;)}, 16) = 2

Zeby cekal na odpovad od DNS? Ta invalidni adresa tam je schvalne?

Omlouvam se, ato je moje chyba. Pochopitelne to neni validni dotaz. v URL byl preklep tak to ani nemohlo fungovat. Toto je jiz spravny vypis.

09:27:07.065527 gettimeofday({1264667227, 65554}, NULL) = 0
09:27:07.065593 gettimeofday({1264667227, 65617}, NULL) = 0
09:27:07.065659 epoll_wait(6, {}, 16384, 10) = 0
09:27:07.075541 gettimeofday({1264667227, 75572}, NULL) = 0
09:27:07.075611 gettimeofday({1264667227, 75636}, NULL) = 0
09:27:07.075679 epoll_wait(6, {{EPOLLIN, {u32=13, u64=13805992705920073741}}}, 16384, 10) = 1
09:27:07.077870 gettimeofday({1264667227, 77902}, NULL) = 0
09:27:07.077951 accept(13, {sa_family=AF_INET, sin_port=htons(2290), sin_addr=inet_addr(“192.168.1.104”)}, [16]) = 16
09:27:07.078047 getsockname(16, {sa_family=AF_INET, sin_port=htons(3128), sin_addr=inet_addr(“192.168.1.1”)}, [16]) = 0
09:27:07.078124 fcntl64(16, F_GETFL) = 0x2 (flags O_RDWR)
09:27:07.078183 fcntl64(16, F_SETFD, FD_CLOEXEC|0x2) = 0
09:27:07.078244 fcntl64(16, F_GETFL) = 0x2 (flags O_RDWR)
09:27:07.078301 fcntl64(16, F_SETFL, O_RDWR|O_NONBLOCK) = 0
09:27:07.078372 epoll_ctl(6, EPOLL_CTL_ADD, 16, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=16, u64=20831755321737232}}) = 0
09:27:07.078572 accept(13, 0x13b74e0, [16]) = -1 EAGAIN (Resource temporarily unavailable)
09:27:07.078756 gettimeofday({1264667227, 78796}, NULL) = 0
09:27:07.078880 epoll_wait(6, {{EPOLLIN, {u32=16, u64=20831755321737232}}}, 16384, 10) = 1
09:27:07.078966 gettimeofday({1264667227, 78992}, NULL) = 0
09:27:07.079042 read(16, “GET http://www.seznam.cz/ HTTP/1”…, 4095) = 2436
09:27:07.079130 gettimeofday({1264667227, 79164}, NULL) = 0
09:27:07.079397 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 18
09:27:07.079508 fcntl64(18, F_GETFL) = 0x2 (flags O_RDWR)
09:27:07.079572 fcntl64(18, F_SETFD, FD_CLOEXEC|0x2) = 0
09:27:07.079633 bind(18, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr(“0.0.0.0”)}, 16) = 0
09:27:07.079707 fcntl64(18, F_GETFL) = 0x2 (flags O_RDWR)
09:27:07.079763 fcntl64(18, F_SETFL, O_RDWR|O_NONBLOCK) = 0
09:27:07.079826 setsockopt(18, SOL_TCP, TCP_NODELAY, [1], 4) = 0
09:27:07.079914 read(18, 0xbf97b18d, 65535) = -1 ENOTCONN (Transport endpoint is not connected)
09:27:07.079986 close(18) = 0
09:27:07.080521 epoll_ctl(6, EPOLL_CTL_MOD, 16, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=16, u64=24569716899053584}}) = 0
09:27:07.080644 epoll_wait(6, {{EPOLLOUT, {u32=16, u64=24569716899053584}}}, 16384, 10) = 1
09:27:07.080720 gettimeofday({1264667227, 80746}, NULL) = 0
09:27:07.080789 write(16, “HTTP/1.0 503 Service Unavailable”…, 4243) = 4243
09:27:07.080928 gettimeofday({1264667227, 80958}, NULL) = 0
09:27:07.081026 epoll_wait(6, {{EPOLLOUT, {u32=16, u64=24569716899053584}}}, 16384, 10) = 1
09:27:07.081103 gettimeofday({1264667227, 81128}, NULL) = 0
09:27:07.081169 write(16, " 28 Jan 2010 08:27:07 GMT by mai"…, 84) = 84
09:27:07.081261 gettimeofday({1264667227, 81287}, NULL) = 0
09:27:07.081432 write(8, “1264667227.081 2 192.168.1.”…, 113) = 113
09:27:07.081669 write(10, “1264667227.081 RELEASE -1 FFFFFF”…, 150) = 150
09:27:07.081806 read(16, 0xbf97bb7d, 65535) = -1 EAGAIN (Resource temporarily unavailable)
09:27:07.081885 epoll_ctl(6, EPOLL_CTL_MOD, 16, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=16, u64=20795170790309904}}) = 0
09:27:07.081956 epoll_ctl(6, EPOLL_CTL_DEL, 16, {0, {u32=16, u64=20795170790309904}}) = 0
09:27:07.082020 close(16) = 0
09:27:07.082130 epoll_wait(6, {}, 16384, 10) = 0
09:27:07.091543 gettimeofday({1264667227, 91574}, NULL) = 0
09:27:07.091616 gettimeofday({1264667227, 91641}, NULL) = 0
09:27:07.091685 epoll_wait(6, {}, 16384, 10) = 0
09:27:07.101544 gettimeofday({1264667227, 101574}, NULL) = 0
09:27:07.101615 gettimeofday({1264667227, 101639}, NULL) = 0

Tzn. ze na ten posledni dotaz pred zacyklenim ti odpovi squit 503 service not available, duvodem k tomu je
read(18, 0xbf97b18d, 65535) = -1 ENOTCONN (Transport endpoint is not connected)

Podle me je cas obratit se na autory squidu - tedy je otazka proc k teto situaci dochazi, viz napr.


ale takto to vypada ze je to neosetrena situace…