Ticket #99 (closed defect: fixed)

Opened 8 weeks ago

Last modified 2 days ago

Tigase server uses too much CPU

Reported by: phearnot Owned by: kobit
Priority: major Milestone:
Component: tigase.server Version: 4.0.x
Keywords: Cc:
Estimated Number of Hours: 0 Add Hours to Ticket: 0
Billable?: yes Total Hours: 0

Description

Current Tigase server svn version (rev 1212) consumes too much CPU. After less than 10 hours of uptime top showed as much as 99.7% CPU usage by java process.

Log files are clean (no errors have been reported). This issue reoccurs after a restart. The server runs with at most two user sessions active.

Java version: 1.6.0_06-b02

Attachments

Change History

  Changed 8 weeks ago by kobit

  • status changed from new to assigned

Thank you for your report.

This is very strange because the server already runs for weeks on a few installations and it doesn't happen anywhere else. What operating system do you use? What database do you use?

Could you please attach your configuration file?

Could you please change the logging level to debug for the server component?

If this is possible I would be happy to access your machine remotely to better diagnose the problem.

  Changed 8 weeks ago by kobit

More questions: do you use any extra components in your installation? like MUC or Pubsub, if you can send me you config file I would find it out.

  Changed 8 weeks ago by kobit

Just got an idea.... Such a high CPU consumption may actually happen in case of misconfiguration. An XMPP packet is send within the Tigase server which is supposed to be processed by one of the component but the component may refuse to process the packet and sends it back.

This happens when the hostname and vhosts are set incorrectly. Therefore seeing your configuration might help indeed....

  Changed 8 weeks ago by phearnot

I'm using Ubuntu 8.04. The database is PostgreSQL 8.2.7 with JDBC3 8.3-603 driver.

I have recently built pubsub component from svn. The jar file is in the classpath, but upon startup Tigase claims that no plugin implementation found for pubsub. I didn't alter pubsub-related portion of the config file.

Indeed, the problem may be related to incorrect configuration, since sometimes I got 'Packet dropped' message in the log file.

Here are xml and conf conf files, and console and log files.

Unfortunately, I cannot provide you with access to the machine I'm running Tigase on, sorry about that.

  Changed 8 weeks ago by phearnot

This could possibly help:

phearnot@ser:~$ ps -ef | grep tigase
phearnot 10741     1  0 12:16 pts/0    00:00:11 /usr/lib/jvm/java-6-sun/bin/java 
-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -Djdbc.drivers=org.postgresql.Driver -Djdbc.username=tigase 
-Djdbc.password=tigase -server -Xms100M -Xmx100M 
-cp ./jars/tigase-server.jar:./libs/postgres.jar:./libs/tigase-muc.jar:
./libs/tigase-pubsub.jar:./libs/tigase-utils.jar:./libs/tigase-xmltools.jar tigase.server.XMPPServer 
-c etc/tigase-default.xml 
--user-db pgsql --user-db-uri jdbc:postgresql://localhost/renee_im_tigase --auth-db pgsql
postgres 10778  5209  0 12:16 ?        00:00:00 postgres: tigase renee_im_tigase 127.0.0.1(49525) idle                                                                                                                            
postgres 10779  5209  0 12:16 ?        00:00:00 postgres: tigase renee_im_tigase 127.0.0.1(49526) idle                                                                                                                            
phearnot 11640 10654  0 17:02 pts/0    00:00:00 grep tigase

  Changed 8 weeks ago by kobit

Hi,

Actually the last entry helps the most. My impression is you use an old XML configuration with the new version of the server. That's the problem. I hope that's the problem because it is quite easy to solve then.

If you don't have any custom entries in the XML config file my suggestion to quickly solve the problem is to remove the XML config file and restart the server. A new configuration will be generated using settings you provided in the command line arguments and all defaults appropriate for version 4.0.0.

Ideally you should really stop using long command line arguments and start using init.properties file. This is much safer way to set initial settings and gives you also much more flexibility.

Please have a look at the online documentation for configuring Tigase 4.0.0.

In you case you should change command line arguments to: -c etc/tigase-default.xml --property-file etc/init-pgsql.properties

and your init.properties file would contains at least:

--user-db=pgsql
--user-db-uri jdbc:postgresql://localhost/renee_im_tigase
# Line below is optional, by default --user-db option is used anyway
--auth-db pgsql

If you have any other own modifications in the XML config file you can aslo include them in the init.properties file. Refer to the documentation for details how to do it.

If you have any problems please let me know.... I will close the ticket within next couple days if i am not hearing from you.

  Changed 8 weeks ago by kobit

Regarding pubsub plugin warning....

This is a bit confusing. Pubsub used to be implemented as plugin in Tigase. (Plugin - small thing). But it grew to a bigger thing and got separated to a component (component - big thing) in a separate subproject. I have just forgot to remove old pubsub plugin from the list of plugins loaded by default.

To learn how to setup tigase to load pubsub and muc component please have a look at this guide: http://www.tigase.org/en/content/configuring-tigase-server-load-component

Or alternatively use the Tigase installer which contains configuration functionality.... ups, it is broken right now... we are releasing a new version today.

  Changed 8 weeks ago by anonymous

Thank you very much for your help!

I've updated my configuration. I'll watch the process and report the stats.

Btw, Tigase was running for more than a day since the last incident, and load average was around 0.02.

  Changed 8 weeks ago by kobit

yes, this is what I would expect. It is actually hard to get more that 10% of CPU load in normal circumstances....

Let me know how it goes.

I am releasing a new version today or tomorrow with a few minor fixes in MUC and other parts and I will also put a note to regenerate configuration when upgrading from an old version.

  Changed 7 weeks ago by anonymous

Unfortunately, the problem persists after an update.

phearnot@ser:~$ top
top - 11:10:49 up 24 days, 23:37,  1 user,  load average: 1.05, 1.18, 1.17
Tasks:  97 total,   1 running,  94 sleeping,   0 stopped,   2 zombie
Cpu(s): 11.1%us,  0.7%sy,  0.0%ni, 86.7%id,  1.3%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   3367924k total,  2469424k used,   898500k free,    61764k buffers
Swap:  1405648k total,     1500k used,  1404148k free,  2088284k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3115 phearnot  20   0  257m  44m 6040 S 99.3  1.3 383:38.20 java
    1 root      20   0  1808  828  608 S  0.0  0.0   0:01.71 init
    2 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/0
phearnot@ser:~$ ps -ef | grep tigase
phearnot  3115     1 63 01:10 ?        06:25:04 /usr/lib/jvm/java-6-sun/bin/java -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -Djdbc.drivers=org.postgresql.Driver -server -Xms100M -Xmx100M -cp ./jars/tigase-server.jar:./libs/postgres.jar:./libs/tigase-muc.jar:./libs/tigase-pubsub.jar:./libs/tigase-utils.jar:./libs/tigase-xmltools.jar tigase.server.XMPPServer -c etc/tigase-default.xml --property-file etc/init-pgsql.properties
postgres  3165  5209  0 01:10 ?        00:00:00 postgres: tigase renee_im_tigase 127.0.0.1(50483) idle                                   

  Changed 7 weeks ago by phearnot

Sorry, I forgot to log in.

  Changed 7 weeks ago by kobit

Alright, I will use your config files and try to replicate it to find out what is going on. Will keep you posted.

  Changed 7 weeks ago by phearnot

One more suspicious message from logs, apart from missing implementation of pubsub:

2008-11-17 10:28:50  ConnectionManager$Watchdog$1.check()  INFO:   s2s: Max inactive time exceeded, stopping: 192.168.1.44_5269_72.14.242.20_64564
2008-11-17 11:58:50  ConnectionManager$Watchdog$1.check()  INFO:   s2s: Max inactive time exceeded, stopping: 192.168.1.44_5269_72.14.242.20_1951

It appears that CPU starts growing around the time these messages get dumped to the logs.

follow-up: ↓ 16   Changed 7 weeks ago by kobit

Those log entries are perfectly normal, almost... Normally s2s (server-to-server) connection is opened when there is something to send to the remote server. The connections stays opened as long as it is active and after certain time of inactivity it is closed. These log entries say the s2s connection was inactive and therefore was closed.

What is not normal however (but it still may be correct) is that your Tigase server is available at private IP address. If your installation is behind NAT then this is still perfectly correct, if not it means your server tries to establish s2s connections which can't be validated and it can not send out any messages.

  Changed 7 weeks ago by kobit

Hi, looked in your log file again. There are some strange things in there but the log is very brief in displaying any information. What client do you use? I can see things like this:

2008-11-14 19:25:59  PacketFilter.process()              WARNING:  No 'to' address, can't deliver packet: 
<iq type="set" from="phearnot@renee.ru/Telepathy" id="328236330418">
  <pubsub xmlns="http://jabber.org/protocol/pubsub">
    <publish node="http://laptop.org/xmpp/activities">
      <item> <activities xmlns="http://laptop.org/xmpp/activities"/></item>
    </publish> 
  </pubsub>
</iq>

Looks like 'to' address is missing. This is however handled correctly by the server and this doesn't cause 100% CPU usage. There might be however other this kind of packets causing incorrect behavior. I would be very keen to get to the bottom of this. Could you please set the debug for the server: --debug=server in init.properties, remove the XML config file, restart the server and send me logs again when this happens? I am hoping to see what kind of packet causes this problem. Hopefully your logs would be full of debug messages with packet caused high CPU usage.

in reply to: ↑ 14   Changed 7 weeks ago by phearnot

Replying to kobit:

If your installation is behind NAT then this is still perfectly correct, if not it means your server tries to establish s2s connections which can't be validated and it can not send out any messages.

My server is indeed behind the NAT.

I'm using Pidgin 2.5.2 (mostly; both on Windows and Ubuntu), Psi 0.11 (again, on both OS). Client name you've mentioned belongs to Empathy 2.24.1.

Thanks for your hints. I'll fix the config file and wait 'till it happens next time.

By the way, maybe there's some way to profile Tigase remotely?

  Changed 3 weeks ago by kobit

  • status changed from assigned to closed
  • resolution set to fixed

I think I found a possible cause to this and fixed it. In some cases when the network connection is broken but this is not detected properly by the TCP/IP layer the watchdog thread gets stuck on the connection check which falls in infinite loop trying to send a space character. It seems to be fixed now.

Add/Change #99 (Tigase server uses too much CPU)

Author



Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.