|
山航的WAS服务宕掉了,还好给他们做了双机热备,否则影响web访问就郁闷了。这可能是第3次了吧,上次这台web服务器也出问题了。上次应该是CGI 脚本的问题,这次WebSphere又出问题了,对于好几兆的日志文件,我也只能含泪一行行看,对于IBM的WAS不怎么熟悉,看起来相当费劲!~~~
看来抽空要好好学一下WebSphere了!!
系统Redhat AS4 U4,WAS6.1 数据库Oracle 10g,采用RoseHA做双机热备。
看了一下午was的日志和http server日志,感觉不想是was出的问题。
http server日志(error_log)出现。
[Tue Jul 01 09:22:34 2008] [error] [client 192.168.2.1] request failed: error reading the headers
[Tue Jul 01 09:22:39 2008] [error] [client 192.168.2.1] request failed: error reading the headers
[Tue Jul 01 09:22:41 2008] [error] [client 192.168.2.1] request failed: error reading the headers
[Tue Jul 01 09:22:41 2008] [error] [client 192.168.2.1] request failed: error reading the headers
[Tue Jul 01 09:24:29 2008] [error] [client 192.168.2.1] request failed: error reading the headers
[Tue Jul 01 09:24:29 2008] [error] [client 192.168.2.1] request failed: error reading the headers
piped log program '(null)' failed unexpectedly
piped log program '(null)' failed unexpectedly
[Tue Jul 01 09:53:08 2008] [warn] child process 26523 still did not exit, sending a SIGTERM
[Tue Jul 01 09:53:08 2008] [warn] child process 2847 still did not exit, sending a SIGTERM
[Tue Jul 01 09:53:08 2008] [warn] child process 28784 still did not exit, sending a SIGTERM
[Tue Jul 01 09:53:08 2008] [warn] child process 24247 still did not exit, sending a SIGTERM
.......................略...........................
[Tue Jul 01 09:53:14 2008] [error] child process 1753 still did not exit, sending a SIGKILL
[Tue Jul 01 09:53:14 2008] [error] child process 2536 still did not exit, sending a SIGKILL
[Tue Jul 01 09:53:15 2008] [notice] caught SIGTERM, shutting down
[Tue Jul 01 09:57:20 2008] [notice] IBM_HTTP_Server/6.1 Apache/2.0.47 configured -- resuming normal operations
[Tue Jul 01 09:57:20 2008] [notice] Core file limit is 0; core dumps will be not be written for server crashes
日志中出现piped log program '(null)' failed unexpectedly。在此期间的日志没有被记录,估计在09:24~09:53之间http server宕机了。
感觉是http server子进程无法正常终止,所以资源耗尽就 down 机了。
[Tue Jul 01 09:53:15 2008] [notice] caught SIGTERM, shutting down
[Tue Jul 01 09:57:20 2008] [notice] IBM_HTTP_Server/6.1 Apache/2.0.47 configured -- resuming normal operations
之间有3分钟停顿时间,感觉是管理员发现服务宕机了,采取重启的手段,或者是RoseHA发现主服务器宕机了,自动切换到另一个备用服务器,然后在09:57的时候又被切换到主服务器。中间4分钟的日志应该在备份服务器上。
查看WAS的SystemOut.log日志,在09:53~09:57之间的日志。
[08-7-1 9:53:43:455 CST] 00009fdf SRTServletRes W WARNING: Cannot set status. Response already committed.
[08-7-1 9:53:43:455 CST] 00009fdf SRTServletRes W WARNING: Cannot set header. Response already committed.
[08-7-1 9:53:50:995 CST] 0001bfc0 SystemOut O select FLIGHT_NO,departure_airport,arrival_airport,TO_CHAR(STD,'HH24:MI'),NVL(TO_CHAR(ATD,'HH24:MI'),'-'),TO_CHAR(STA,'HH24:MI'),NVL(TO_CHAR(ATA,'HH24:MI'),'-'),DECODE(AC_TYPE,'B73G','B737','B737','B733','CRJ','CRJ2','CR7','CRJ7',AC_TYPE) from T2001 where FLIGHT_DATE =to_date('null','yyyy-mm-dd') and FLIGHT_NO='null' AND (FLIGHT_TYPE IN('J','N','U')) order by STD asc
[08-7-1 9:54:10:663 CST] 000252f8 TCPChannel I TCPC0002I: TCP 通道 TCP_3 已经停止在主机 * (IPv6) 端口 9043 上侦听。
[08-7-1 9:54:10:664 CST] 000252f8 TCPChannel I TCPC0002I: TCP 通道 TCP_2 已经停止在主机 * (IPv6) 端口 9080 上侦听。
[08-7-1 9:54:10:665 CST] 000252f8 TCPChannel I TCPC0002I: TCP 通道 TCP_4 已经停止在主机 * (IPv6) 端口 9443 上侦听。
[08-7-1 9:54:10:666 CST] 000252f8 TCPChannel I TCPC0002I: TCP 通道 TCP_1 已经停止在主机 * (IPv6) 端口 9060 上侦听。
[08-7-1 9:54:10:670 CST] 000252f8 ApplicationMg A WSVR0217I: 正在停止应用程序:SchedulerCalendars
[08-7-1 9:54:10:676 CST] 000252f8 EJBContainerI I WSVR0041I: 正在停止 EJB jar:Calendars.jar
[08-7-1 9:54:10:679 CST] 000252f8 EJBContainerI I WSVR0059I: EJB JAR 已停止:Calendars.jar
[08-7-1 9:54:10:684 CST] 000252f8 ApplicationMg A WSVR0220I: 应用程序已停止:SchedulerCalendars
[08-7-1 9:54:10:685 CST] 000252f8 ApplicationMg A WSVR0217I: 正在停止应用程序:filetransfer
[08-7-1 9:54:10:693 CST] 000252f8 ServletWrappe I SRVE0253I: [filetransfer] [/FileTransfer] [transfer]:毁坏成功。
[08-7-1 9:54:10:723 CST] 000252f8 ApplicationMg A WSVR0220I: 应用程序已停止:filetransfer
[08-7-1 9:54:10:724 CST] 000252f8 ApplicationMg A WSVR0217I: 正在停止应用程序:ManagementEJB
[08-7-1 9:54:10:727 CST] 000252f8 EJBContainerI I WSVR0041I: 正在停止 EJB jar:mejb.jar
[08-7-1 9:54:10:729 CST] 000252f8 EJBContainerI I WSVR0059I: EJB JAR 已停止:mejb.jar
[08-7-1 9:54:10:733 CST] 000252f8 ApplicationMg A WSVR0220I: 应用程序已停止:ManagementEJB
[08-7-1 9:54:10:734 CST] 000252f8 ApplicationMg A WSVR0217I: 正在停止应用程序:shair_query_war
[08-7-1 9:54:10:744 CST] 000252f8 ServletWrappe I SRVE0253I: [shair_query_war] [/query] [/query_hbsk.jsp]:毁坏成功。
[08-7-1 9:54:10:745 CST] 000252f8 ServletWrappe I SRVE0253I: [shair_query_war] [/query] [/query_result_hbsk.jsp]:毁坏成功。
[08-7-1 9:54:10:745 CST] 000252f8 ServletWrappe I SRVE0253I: [shair_query_war] [/query] [/query_hbdt.jsp]:毁坏成功。
[08-7-1 9:54:10:747 CST] 000252f8 ServletWrappe I SRVE0253I: [shair_query_war] [/query] [/query_result_flightno.jsp]:毁坏成功。
[08-7-1 9:54:10:748 CST] 000252f8 ServletWrappe I SRVE0253I: [shair_query_war] [/query] [/query_result_hbsk_caow.jsp]:毁坏成功。
[08-7-1 9:54:10:749 CST] 000252f8 ServletWrappe I SRVE0253I: [shair_query_war] [/query] [/query_result_hbdt.jsp]:毁坏成功。
[08-7-1 9:54:10:749 CST] 000252f8 ServletWrappe I SRVE0253I: [shair_query_war] [/query] [/error.jsp]:毁坏成功。
[08-7-1 9:54:10:750 CST] 000252f8 ServletWrappe I SRVE0253I: [shair_query_war] [/query] [/query_hbdt_index.jsp]:毁坏成功。
[08-7-1 9:54:10:751 CST] 000252f8 ServletWrappe I SRVE0253I: [shair_query_war] [/query] [/query_hbsk_caow.jsp]:毁坏成功。
[08-7-1 9:54:10:758 CST] 000252f8 ApplicationMg A WSVR0220I: 应用程序已停止:shair_query_war
[08-7-1 9:54:10:759 CST] 000252f8 ApplicationMg A WSVR0217I: 正在停止应用程序:shair_root_war
应该说明此时服务器已经重启完毕了。
WAS的SystemOut.log中循环出现。
[08-7-1 9:28:46:942 CST] 0001bfc0 SRTServletReq E SRVE0133E: 解析参数时发生错误。java.net.SocketTimeoutException: Async operation timed out
[08-7-1 9:28:46:969 CST] 00009fdf SRTServletReq E SRVE0133E: 解析参数时发生错误。java.net.SocketTimeoutException: Async operation timed out
[08-7-1 9:28:46:978 CST] 00009fdf ServletWrappe E SRVE0068E: 未捕获到 servlet /html/portal/render_portlet.jsp 的其中一个服务方法中抛出的异常。抛出的异常:java.lang.NullPointerException
[08-7-1 9:28:46:980 CST] 00009fdf ServletWrappe E SRVE0068E: 未捕获到 servlet MainServlet 的其中一个服务方法中抛出的异常。抛出的异常:javax.servlet.ServletException
[08-7-1 9:28:46:983 CST] 00009fdf WebApp E [Servlet Error]-[MainServlet]: java.lang.NullPointerException
[08-7-1 9:33:34:553 CST] 0001bfc0 SRTServletReq E SRVE0133E: 解析参数时发生错误。java.net.SocketTimeoutException: Async operation timed out
[08-7-1 9:33:34:562 CST] 0001bfc0 ServletWrappe E SRVE0068E: 未捕获到 servlet /html/portal/render_portlet.jsp 的其中一个服务方法中抛出的异常。抛出的异常:java.lang.NullPointerException
时间也是http server宕机的时候,由于当时不在现场,山航的网管担心宕机时间过长,因此立马就重启了服务器。所以除了日志没有太多详细的报告。不知道当时进程处于什么状态。不知道ps会不会出现很多僵死的httpd进程?!如果当时没有急于重启可以netstat -al|grep SYN,看看是不是有大量的SYN连接,如果是则极有可能是中了DDOS或DOS攻击。但是山航是有防火墙的,应该会拦截这种攻击!或许httpd.conf里面MaxClients和MaxRequestsPerChild值设的太小,以至于出现child process XXXX still did not exit, sending a SIGTERM。
由于技术有限,这些所有的所有都只是猜测!和同事商量求助一下IBM的工程师。
哎!~~~看来自己需要学的东西还有很多~~~~~
|