We're finding the UniversalForwarder (UL) freezes up after Java garbage collection out of memory errors.

Our current Java command line to start UL is as follows:

java -jar $SECURONIX_HOME/agent/UniversalForwarder.jar -server -Xms16g -Xmx20g -XX:+UseG1GC -XX:MaxPermSize=512m -XX:+AggressiveOpts -XX:+UseCompressedOops -Xmn2g -Dsolr.solr.home=/securonix/securonix_home/solr -Dlog4j.configuration=file:/securonix/securonix_home/agent/conf/log4j.properties

Increasing Xmx to 30g appears to have increased frequency from once a week to every 2-3 days.

Is there a known solution for this issue?

Thanks in advance, Jason

asked 20 Apr '15, 15:32

JasonBlue's gravatar image

JasonBlue
17129
accept rate: 0%

First errors seen are org.quartz.SchedulerException's java.lang.OutOfMemoryError: GC overhead limit exceeded. These repeat every so often, and less than hour later UL freezes after displaying: these repeat every so often, and some minutes later UL freezes after displaying org.quartz.SchedulerException's java.lang.OutOfMemoryError: Java heap space

(20 Apr '15, 15:45) JasonBlue

Jason,

Could you please provide the following information?

  1. Is there an increase in the data sources volume in recent weeks?
  2. Are there any new correlation rules that have been added to these data sources?
  3. Can you share the server specs ( memory)? we could see if memory can be increased on the UF. Also, when you increase the Xmx value to 30G, please also increasing the Xms value to 28G so this way, we allocate a higher memory during start up and we do not rely on dynamic memory allocation.
link

answered 21 Apr '15, 02:35

Aditya's gravatar image

Aditya
10018
accept rate: 6%

  1. No, the average of the number of events per day for past four months has been steady; ranging on monthly basis from 183 to 193 million events, with April thus far at 188.

  2. No.

  3. 48GB memory, soon to be 64GB. Dedicated server for UF and Syslog-NG. I have increased the memory to the UF to 32GB max and min on 4/20 at noon, and the problem appeared again on 4/21 at 21:30. Specifying -XX:InitiatingHeapOccupancyPercent=70 did not prevent issue.

Planning to triage with Saurabh today (ticket 824775). Jason

link

answered 22 Apr '15, 11:38

JasonBlue's gravatar image

JasonBlue
17129
accept rate: 0%

Another possible reason why this would happen is presence of a large IP correlation reference data set in the 'activityuseripmapping' table. If there are large entries, then everytime we run IP correlation on a data source, the application would try to keep all these entries in memory during correlation.

Our suggestion would be to purge the older entries from that table [ data older than the expiry time set in the IP correlation configuration] for increased performance and faster correlation.

In version 4.6, this job (to purge older entries from activityuseripmapping table) can be scheduled to run from the Housekeeping section.

link

answered 22 Apr '15, 16:03

Aditya's gravatar image

Aditya
10018
accept rate: 6%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×3
×2
×1
×1

Asked: 20 Apr '15, 15:32

Seen: 2,629 times

Last updated: 22 Apr '15, 16:03