Queues - dialparties.agi & high CPU load

A friend of my has an inbound call center which now has 30 agents. Set him up with an older HP DL360, which has been working well for a couple of years. The only issue has been with the call distribution- there's an IVR that times out to a Queue.
Despite the "Ring All" setting, the phones will ring at different times, and without any predictability. What caught my eye is the CPU starts working hard, and is pretty quickly maxed out with multiple dialparties.agi consuming the vast majority. With the high CPU load, the sound breaks down...
Anytime a Queue is invoked, the CPU is taxed heavily. Phones will ring sporadically, or not at all. Any Queue configuration I've tested has the problem. Ring Groups do not have the problem, so for now, our "solution" is to use groups, although we lose a lot of functionality.
I've done everything I could read about to fix this. There are no loops. All the phones are registered and set as static agents. I've cloned the setup to a much higher performance machine, and find the same problem. With fewer extensions, the problem diminishes.
Wondering if anyone that uses queues has luck with all phones ringing simultaneously (in a "ringall" config) and a more normal load on the CPU...
This is on a Asterisk 1.4.21.2 PIAF 1.3 FreePBX 2.5
Any ideas?
Thanks,
Peter
Fortel



ringall is going to be very
ringall is going to be very taxing on your system as dialparties.agi will be called for everyone. There are some workarounds at different levels that you can use.
For starters, you probably want to get a hold of the 2.6 version of queues and use a setting which includes ringinuse as that will avoid attempting to call agents (members) that the queue knows are already taking a queue call. This does have some drawbacks as you will see in the tooltip when setting that setting.
This will still try to ring every agent that is not on a call form the queue, even if you have Skip Busy Agent set (dialparties will be called but not ring them). If you have this situation (a lot of agents on non-queue calls) they you can have a look at this patch:
https://issues.asterisk.org/view.php?id=15168
if you go that route you will need some additional patches to the queue code which I have but have not been checked into svn yet because the Asterisk patch above is still being discussed. You may alternatively want to have a look at #3562 and #3496 that can help if you are in standard FreePBX "extension" mode:
Alternatively, you may want to setup queue penalties to keep so many from ringing unless the lower penalties are all busy/unavailable. Just be aware that if you do this, and any single queue agent is available but does not answer their phone (it rings), then the higher penalties will not be called.
Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here
Queues, dialparties.agi, CPU load
Thanks for the quick response, Philippe!
I will study the information- thanks for posting the links.
Not sure where to get Queues 2.6 though...
Peter
Fortel
We're having a very similar
We're having a very similar issue.
We're using Asterisk version 1.4.25.1 with FreePBX version 2.5.1.5. We have less than 10 queues, each with static agents. Three of the queues have 12 static members each, with all members using PolyCom SIP phones, set in ringall strategy.
Everything works great except that several times a day Asterisk crashes and restarts itself. Afterward I notice there are dialparties.agi processes stuck out there (this morning I noticed several 10 hours old) that consume cpu up to 99%.
The asterisk system ended with exit status 139 on signal 11.
I'm sure they're probably related. Any idea if one is causing the other? Would the above fixes address my issues or is there something more fundamental going on?
Thanks
I haven't heard that one
I haven't heard that one before, something is hanging those processes - how much memory do you have on your system?
Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here
We have 4GB of ram on that
We have 4GB of ram on that box.
BTW, I submitted a paid
BTW, I submitted a paid ticket with log files if that would help... ZPV-610725
Thanks
I hate to start arbitrarily
I hate to start arbitrarily swapping out software... but I'll need a bulletproof vest to get past the call center director if I can't come up with something. :)
Calls are being dropped and then callers call back and curse out our operators. It's not pretty. Any idea/guess?
Thanks
escape2mtns, if you
escape2mtns,
if you submitted a paid ticket then that team will have a look and work with you as needed. The fact that dialparties is hanging is troublesome, there may be something with your php.ini but best to leave that discussion with the engineer who is going to help you.
Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here
I just noticed the following
I just noticed the following lines in /var/log/messages at 9:45:49 this morning...
Jul 7 09:45:49 freepbx php: /var/lib/asterisk/agi-bin/recordingcheck[114]: Unde
fined offset: 1
Jul 7 10:01:49 freepbx last message repeated 2 times
Could this be related?
no - it's probably an
no - it's probably an undefined variable in recordingcheck that may need to be fixed, but it would not be related to what you are seeing.
Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here
Is it possible that we're
Is it possible that we're using a version that's too new? Should we downgrade?
Looking through the log
Looking through the log files I noticed the following on multiple occasions:
app_queue.c: The device state of this queue member, Local/200@from-internal/n, is still 'Not in Use' when it probably should not be! Please check UPGRADE.txt for correct configuration settings.
[Jul 7 10:39:29] VERBOSE[27346] logger.c: dialparties.agi: Methodology of ring is 'none'
[Jul 7 10:39:29] VERBOSE[27346] logger.c: -- dialparties.agi: Added extension 200 to extension map
[Jul 7 10:39:29] VERBOSE[27346] logger.c: -- dialparties.agi: Extension 200 cf is disabled
[Jul 7 10:39:29] VERBOSE[27346] logger.c: -- dialparties.agi: Extension 200 do not disturb is disabled
[Jul 7 10:39:29] VERBOSE[27346] logger.c: dialparties.agi: Extension 200 has ExtensionState: 0
[Jul 7 10:39:29] VERBOSE[27346] logger.c: -- dialparties.agi: Checking CW and CFB status for extension 200
[Jul 7 10:39:29] VERBOSE[27346] logger.c: -- dialparties.agi: dbset CALLTRACE/200 to 6514427031
[Jul 7 10:39:29] VERBOSE[27346] logger.c: -- dialparties.agi: Filtered ARG3: 200
[Jul 7 10:39:29] VERBOSE[27353] logger.c: == Manager 'admin' logged off from 127.0.0.1
[Jul 7 10:39:29] VERBOSE[27346] logger.c: -- AGI Script dialparties.agi completed, returning 0
[Jul 7 10:39:29] DEBUG[27346] app_macro.c: Executed application: AGI
[Jul 7 10:39:29] VERBOSE[27346] logger.c: -- Executing [s@macro-dial:7] Dial("Local/200@from-internal-1eb8,2", "SIP/200||trM(auto-blkvm)") in new stack
[Jul 7 10:39:29] VERBOSE[27346] logger.c: -- Called 200
[Jul 7 10:39:29] VERBOSE[27343] logger.c: -- Local/200@from-internal-1eb8,1 is ringing
[Jul 7 10:39:30] VERBOSE[27346] logger.c: -- SIP/200-082f0f98 is ringing
[Jul 7 10:39:31] VERBOSE[27346] logger.c: -- SIP/200-082f0f98 answered Local/200@from-internal-1eb8,2
[Jul 7 10:39:31] VERBOSE[27346] logger.c: -- Executing [s@macro-auto-blkvm:1] Set("SIP/200-082f0f98", "__MACRO_RESULT=") in new stack
[Jul 7 10:39:31] DEBUG[27346] app_macro.c: Executed application: Set
[Jul 7 10:39:31] VERBOSE[27346] logger.c: -- Executing [s@macro-auto-blkvm:2] DBdel("SIP/200-082f0f98", "BLKVM/191/SIP/SIP1-b6d083a8") in new stack
[Jul 7 10:39:31] VERBOSE[27346] logger.c: -- DBdel: family=BLKVM, key=191/SIP/SIP1-b6d083a8
[Jul 7 10:39:31] DEBUG[27346] app_macro.c: Executed application: dbDel
[Jul 7 10:39:31] DEBUG[27346] app_dial.c: Macro exited with status 0
[Jul 7 10:39:31] DEBUG[27343] app_queue.c: Dunno what to do with control type -1
[Jul 7 10:39:31] VERBOSE[27343] logger.c: -- Local/200@from-internal-1eb8,1 answered SIP/SIP1-b6d083a8
[Jul 7 10:39:31] VERBOSE[27343] logger.c: -- Stopped music on hold on SIP/SIP1-b6d083a8
[Jul 7 10:39:31] WARNING[27343] app_queue.c: The device state of this queue member, Local/200@from-internal/n, is still 'Not in Use' when it probably should not be! Please check UPGRADE.txt for correct configuration settings.
Could this be related? I looked in the UPGRADE.txt file and I don't see anything.
In the UPGRADE.txt file I did see this:
* The exit behavior of the AGI applications has changed. Previously, when
a connection to an AGI server failed, the application would cause the channel
to immediately stop dialplan execution and hangup. Now, the only time that
the AGI applications will cause the channel to stop dialplan execution is
when the channel itself requests hangup. The AGI applications now set an
AGISTATUS variable which will allow you to find out whether running the AGI
was successful or not.
Previously, there was no way to handle the case where Asterisk was unable to
locally execute an AGI script for some reason. In this case, dialplan
execution will continue as it did before, but the AGISTATUS variable will be
set to "FAILURE".
A locally executed AGI script can now exit with a non-zero exit code and this
failure will be detected by Asterisk. If an AGI script exits with a non-zero
exit code, the AGISTATUS variable will be set to "FAILURE" as opposed to
"SUCCESS".
Could the new handling for AGI scripts be causing the dialparties to hang? I did see many log entries where the scripts exited with non-zero.
Thanks
Call center with 8 to 10 dialparties.agi active
I am no guru but this is what is happening to me here. I do a reboot and find CPU load on a dual core 3.04GHz system up to 30% in the morning and at some point in time during the morning the queues recieve calls but no longer ring the agents. After a reload it starts working again but CPU load goes up to 187% (dualcore) with various instances of dialparties.agi active and consuming everything there is. At night I can once again reset the box and everything returns back to normal untill the next day. I have the following setup:
Ast 1.4.25.1
FPBX: 2.5.1.5 100 Ext, 5 Queues, 2 SIP trunks
Server HP DL380, Dual 3.04GHz, 2GB mem, SCSI Raid 5
Traffic: 1000 calls a day incoming, 15000 calls a day outgoing
Any suggestions are very very welcome.
It's probably a coincidence
It's probably a coincidence but we're all 3 running on HP DL360/DL380's.
We've dropped all our queues down to 5 or fewer static agents to see how it goes today.
I just spoke with one of the
I just spoke with one of the Digium Select distributors that installs many large (100+ seat) call centers. He said that if you're doing a lot of calls or static queue members then you want to use the SIP device as a queue member instead of the Local channel. He explained that would avoid use of dialparties.agi which really doesn't come into play much in these environments.
dialparties.agi
This is good information. I'd be interested to see how you implement the workaround, and what the result is. With the inbound call center we work with, we've tried more powerful hardware, different brand hardware, etc. Clearly, that's not the right path.
Thanks,
Peter
Fortel
He told me to edit the
He told me to edit the queues_additional.conf and change the member= lines from:
member=Local/102@from-internal/n,0
to
member=SIP/102,0
(using extension 102 as an example)
We tried this on a test queue and confirmed that it works but unfortunately it breaks assumptions in a Queue Monitoring app we developed. We've got to fix this monitoring app for the new style of members before we can fully test.
I guess we'll need to put a sed search/replace script in /etc/amportal.conf as a POST_RELOAD script to set these members after FreePBX queue changes.
He said that they absolutely have these issues when using the default local channel members and running them through dialparties.agi and that they absolutely don't have an issue when they set members as the direct sip device.
Howto to implement member to SIP and not Local
Sounds extremely logical to me. The problem I have is that we have changes to the Queues on a regular basis from FPBX. Where could we "fix" this in the scripts that make the modification in the queues_additional.conf file?
Is FPBX develpoment considering this change for call center solutions like ours?
There are a LOT of
There are a LOT of implications if you just go and change those to SIP devices that the Digium Select distributor likely has no idea about.
He is correct wrt to a heavy call center environment is not at all what FreePBX is tuned for. However, the numbers you are mentioning here are not heavy.
There are also new abilities in queues in 1.6 that have found there way into 1.4.26 (I think) that can pass on device state information to the queue. (The issue being, there are short comings in what they implemented. I have a patch in the asterisk bug tracker to do it 'right' using hints, that works against 1.4. I have tested it but not stress tested it. I also have patches against the queues module in 2.6 to take advantage of these changes. Problem is, none of it is checked into svn because unless that patch finds its way into 1.4, or at least a version of it into 1.6, there is no point in getting the changes into FreePBX. There are a couple bugs in our tracker with information about all these patches also.)
Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here
Thanks for the insight.
Thanks for the insight.
I'd love to take advantage of all the new stuff when they get it working but I need to get to a point where we're not disconnecting callers tomorrow. I don't have a bullet-proof vest and I'm not sure how to get by the call center director and live to tell about it. I've got a family to think about. :)
The FreePBX engineer I spoke with today said that he thought this sounded like a reasonable solution. We discussed the fact that I won't have asterisk-based DND, Call Forward, etc. but we're not using those anyway. This is a call center environment where the agents put the phone devices on DND when they're away and we don't use call forwarding.
What are the other implications? All I can go on is recommendations from the experts on how to fix this and right now I'm hearing that "in the field" this is how it's fixed.
I'm open to anything...
Thanks
Speschko - We created a
Speschko -
We created a file /usr/local/bin/freepbx_fix_queue_members.sh that contains:
(don't forget to make the file executable)
Then we changed/added a line to /etc/amportal.conf that reads:
POST_RELOAD=/usr/local/bin/freepbx_fix_queue_members.sh
Now, every time we change the static queue members in FreePBX, before Asterisk is reloaded, the members are switched to the SIP devices.
Test with SIP
I have just tested and it works for me.
What I did was change "Local" to "SIP" and deleted out the "from-internal" in the "/var/www/html/admin/modules/queues/page.queues.php" and then saved changes to the queues from within FPBX. This successfully changed Local to SIP in the queue_additional.conf like escape2mtns describes above.
This is working well at the moment I will observe tomorrow under high traffic conditions how this works out.
Same result different methods
jajaja, Mine is obviously prone to breaking when the QUEUE module is updated by FPBX but it was easyer for me to edit the php script file seeing that Iḿ not such a Linux "boffin".
Are there any drawbacks you see using SIP instead of Local, such as problems trasnferring etc or should everything else work the same?
UPDATE
Philippe,
thank you for your input because I was sure that there were implications....but at the moment I need a solution for the dialparties.agi script hanging every day and going to 180% load on my server.
What main implications does this change to SIP have? ...because since I have done it my server has gone down to 20% CPU usage and I have perfect voice quality on all calls which is my first objective obviously seeing that this is a 100 agent call center.
What is you suggestion for someone like me that is not very savvy on programming etc? I have implemented 1.4.26RC5 seeing that you had mentiond something having been implemented in 1.4.25 or 26 but still today I was having the same issues.
How can I get hold of the other solution you mention with the "hints"?
Thanks for your input.
BTW, I'm absolutely not
BTW, I'm absolutely not trying to cross purposes with Philippe. I really appreciate him and his efforts.
But I would like to mention ... because I couldn't find this info anywhere and I looked... that the FreePBX engineer I spoke with today said that they have seen bugs with the 1.4.23, 1.4.24, and 1.4.25 versions of Asterisk and that they recommend all installations stick with the 1.4.22 level unless something else requires that you upgrade.
From a call center perspective, it seems like the SIP queue member is the way to go. I've now run across other forum posts (here I think) that talk about how using the Local channel for queue members causes Asterisk to see the Agent as in-use even if the call is transferred away and that the Agent isn't 'available' again until that call disconnects from whoever received the call transfer.
So far, so good
At the end of the first full business day using SIP queue members, we've now gone over 24 hrs without a crash/restart or hung dialparties.agi (for the first time).
System uptime: 1 day, 2 hours, 37 minutes, 59 seconds
CPU: 0%
Everything else worked as expected. So far so good.
Same here But I am worried about the implications
Philippe,
We have had a day with over 15K calls and no crashes or hanging dialparties.agi. I need to know what we can do to use SIP instead of Local to avoid these problems in the future or is this something that has to be corrected in Asterisk??
What exact things are we loosing by doing the SIP instead of Local call type?
3rd day of operation with no restarts or high cpu load
I am happy as a pig in sh.t. The call center has been stable for 3 days now using SIP instead of Local. I wonder when the Asterisk team finds out what they broke in the way the AGI are handled. I`m no expert but I`d say there is something gravely wrong somewhere there.
Here is the change I made in:
/var/www/html/admin/modules/queues/page.queues.php
Changed from: (just serach for "Local")
$members[$key] = "Local/".$members[$key]."@from-internal/n,".$penalty_val;
to:
$members[$key] = "SIP/".$members[$key].",".$penalty_val;
This is obviously a temporary fix until the queues module is updated by FPBX again and at some point in time I hope there is a solution for the dialparties hanging issue to be able to revert back to the original way of handling these calls as intended by FPBX.
Working well...
We put Speschko's script in the production server yesterday, and it appears to fix our issue. Relative to the previous condition with dialparties.agi and the associated hit, the CPU is now loafing.
This is a great relief!
We'll look forward to an official fix, and will make a donation now to help.
Thanks,
Peter
Fortel
Donate link?
I must be tired.
Is there a donate mechanism somewhere? A link perhaps?
Thanks,
Peter
Fortel
warlock67, By sending
warlock67,
By sending directly to SIP, first of it requires you to be running in extension mode thought that is usually not an issue with most people as that is the default and most don't know there is even another mode that you can run in.
The biggest issue you run into will be if those agents subsequently transfer calls to someone else which potentially requires going to that person's voicemail. Instead of dropping into voicemail, the call will be hung up. There are some other anomalies related to that that may occur, but it's late on a Friday so it's not coming to me right now.
The other issue which is probably not a problem for you, is that all numbers may not always be SIP. There could be other devices, they could be other dialplan, they could be outside numbers. Again, none of that may be important in your situation.
There is an ability to use Asterisk Agents, agent.conf type, that is un-supported, keeping in mind that I think that is being deprecated in 1.6. If you put Annn as your static agent number, it will treat it like an agent which can get a similar result that you are doing.
Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here
Quick way to kill call waiting?
After a few days of operation, I can say the above fix as relayed by escape2mtns has worked for us.
But there are some issues, and currently I'm wondering how to kill call waiting for all the devices in the queues. So, anyone with some ideas?
Thanks,
Peter
Fortel
ringinuse=no maybe
ringinuse=no maybe
Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here
Call waiting...
You're very fast! I was just studying that as a possibility! Obviously, I have a lot to learn...
Thanks,
Peter
Fortel
you may want to download the
you may want to download the 2.6 version of queues, it has options that include that in it.
Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here
Call Waiting - Queues
Okay, setting ringinuse=no, by itself, didn't work.
But as documented elsewhere, an edit to queues_post_custom.conf does appear to do the trick. We have several queues, so we put the changes in this format:
[queuenumber](+)
[nextqueuenumber](+)
ringinuse=no
Now I guess I'll find out the repercussions...
Peter
Fortel
that's why I suggested
that's why I suggested looking at the 2.6 version of the module, it has an option for ringinuse, and several other things.
Though, come to think about it, there may be some dependencies that make it require 2.6 core changes. But then again, you could just load all of 2.6. I plan on wrapping the tarball up and making it a bit more formal within the next couple weeks, but it is pretty solid overall. And there is some nice enhancement for queues.
Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here
Queues 2.6
I did consider the 2.6 suggestion, but prefer to wait for the tarball. This is a production system I'm now experimenting on, and the end users are pretty jumpy.
But I am looking forward to trying 2.6.
Thanks for all the help!
Peter
Fortel
Possible Fix
Could someone with this problem try the fix I've put at
https://issues.asterisk.org/view.php?id=14639#bugnotes
If it's that, then it's a bug with phpagi, not with Asterisk or FreePBX.
I was having symptoms pretty much as here, and have just spent today diagnosing it, and the fix I've put at the above link solves it for me (as well as other 'unexplained delays' issues), so it may help others, and it would be good to see if it works for other people as well.
Actually after my last post,
Actually after my last post, I think the bug IS in FreePBX, and I've reported that here:
http://www.freepbx.org/trac/ticket/3792#preview
along with the fix that is needed in dialparties.agi
a simple one line fix that definitely works is to change
$astman = new AGI_AsteriskManager( );
to
$astman = new AGI_AsteriskManager( );
$astman->pagi = & $AGI;
You may be able to change the line to
$astman = $AGI->new_AsteriskManager();
instead. That does seem to work for me, but it does change some other things so I'm not as confident about it
That's interesting. What
That's interesting. What version of Asterisk are you running because there may be an even better fix, which is no connection to the Manager. (Assuming this is a fix to the issues in this thread).
In core module version 2.6, I had made the following change to dialparties.agi:
I have back ported the EXTENSION_STATE() function to 1.4 as it was quite trivial, so if you do the same, you could use this on 1.4, thus the other checks. The crux is, dialparties.agi does not use a manager connection any longer if either of the conditions are met, which is an even better solution. Assuming of course, that this is really related to the problems reported here.
Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here
I'm using Asterisk
I'm using Asterisk 1.6.0.9
The (current?) FreePBX 2.5.1 download doesn't have code as you describe, and the 'is_ext_avail' function uses the manager connection, so that's what I've been working with.
Using the original dialparties.agi, an incoming call to a single extension would have a second or two delay before the internal extension started ringing, even though the outside line would give the ring tone immediately. With my suggested fix, the inside line starts ringing almost immediately. Queues are similar, with them ringing immediately, and no excessive CPU usage, delays or failure.
I don't know if you changed the code to what you described to try and fix this delay problem (because it would), but using the AGI->new_AsteriskManager() function instead of 'new AGI_AsteriskManager()' does fix it, and seems a simpler change, and it will work with Asterisk 1.4 as well :)
I'm not sure why the PHP 'error_log' function is so slow, but if that is different speeds on different computers (eg because of different PHP configurations), that would explain why some people have problems, and other people can't reproduce it.
I'm also not sure why phpagi uses error_log if the AGI_AsteriskManager isn't linked to an AGI instance, so maybe that should be reported as a bug in phpagi as well.
pscs, your observation will
pscs,
your observation will be looked at, it is interesting and I was not familiar with creating a manager session like you describe, so don't know what, if any, implications it has.
The change I described about was put into 2.6 simply because we are usually very conservative in introducing anything into such 'core' components of the system, such as dialparties, on released versions. The change was put because we have been wanting to remove the need to hit the manager from dialparties since 'the beginning of time' but until the new function, which is 1.6 only, it was the only way to do that.
It would be really nice if Asterisk would simply backport that officially to 1.4. But they almost never add new features to a new release, which I understand and agree with, though one could almost argue it is a 'bug' not to have and extension state function at the dialplan level when the have a device state and have had extension state available through the AGI since 1.2 (and maybe even prior to 1.2?).
Lastly, whether a bug in phpagi or FreePBX, last I checked phpagi is not maintained and we have been maintaining it within our project (it has deviated quite a bit since the one on sourcefourge). We have already fixed quite a few bugs. So ... the buck stops here either way :)
Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here
Let's examine this, I'm
Let's examine this, I'm going to 'think outloud' here to see if we can get to the bottom of this. Here are the only differences, to the best that I can tell (and I have only had 1 coffee so ...).
Firs, this is influenced by my current phpagi.conf file which includes:
The config setting of the astman handle as is today are:
ASM config: Array ( [festival] => Array ( [text2wave] => /usr/src/festival/bin/text2wave [tempdir] => /var/lib/asterisk/sounds/tmp/ ) [asmanager] => Array ( [server] => localhost [port] => 5038 [username] => admin [secret] => amp111 ) )Then with your change of:
ASM config: Array ( [festival] => Array ( [text2wave] => /usr/src/festival/bin/text2wave [tempdir] => /var/lib/asterisk/sounds/tmp/ ) [asmanager] => Array ( [server] => localhost [port] => 5038 [username] => admin [secret] => amp111 [festival] => Array ( [text2wave] => /usr/src/festival/bin/text2wave [tempdir] => /var/lib/asterisk/sounds/tmp/ ) [asmanager] => Array ( [server] => localhost [port] => 5038 [username] => admin [secret] => amp111 ) [phpagi] => Array ( [error_handler] => 1 [debug] => [admin] => [tempdir] => /var/spool/asterisk//tmp/ ) ) )Looking through the code, I'm not seeing the above config differences having an effect. The only difference between the way it is now and your change is the config that is passed into the astman handle resulting from:
Those change that you pointed out mean that with the new call, the asm->pagi is set telling the astman handle that it has access to AGI, so to speak (e.g. running in an AGI session) So ... this looks like it may be the crux of the difference, because the other configs above don't look like they make a difference for AMI, but his function:
Which means it writes to error_log if you have not set pagi (as is the case today) but to conlog() if you have. The conlog function is:
Since in my case, config['phpagi']['debug'] is false, nothing gets written. And if a lot was otherwise getting written, that could very well bog things down (though it's strange that writing out log messages would have such a dramatic effect.
Now be aware that the PHPAGI Config module, if being used, has settings for debug and error which can further influence the behavior. (If you are seeing the debug message as verbose messages, then you probably want to turn that level off as well to reduce the noise).
So ... I think you may be right in that we want to change the code, though will probably only do it initially in 2.6 since we always approach such fundamental changes cautiously, but seems like it should be:
=================================================================== --- dialparties.agi (revision 7882) +++ dialparties.agi (working copy) @@ -62,7 +62,7 @@ // If we are 1.6 then we have the EXTENSION_STATE() function and don't need to use the manager // if (!$has_extension_state) { - $astman = new AGI_AsteriskManager( ); + $astman = $AGI->new_AsteriskManager(); if (!$astman->connect("127.0.0.1", $ampmgruser , $ampmgrpass)) { exit (1); }Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here
I agree that it's very odd
I agree that it's very odd that the logging slows things down so much.
On the 'EventState' AGI call, about 100+ events are returned, each of those results in a 'process_event' (or something) call - those don't actually do anything since no event processor is registered - however each one does log a line saying 'no event processor registered' (or words to that effect).
For some reason, on the PCs I've tried it on here, error_log takes about 0.1 seconds per call, so those 100 events take about 10 seconds to log. Simply commenting out the error_log line in phpagi made a dramatic difference to performance.
I really don't know why error_log takes so long here - and I'm not in the mood to start debugging PHP as well :) The simpler solution for me was to stop phpagi from calling it unnecessarily. I guess error_log probably hasn't been optimised since it's only meant to be used for errors, not for general diagnostic logging in the way that phpagi seems to use it in some circumstances.
(I can put full debug logging on in Asterisk and it makes virtually no difference to performance, so I don't think the slowness of error_log is to do with my PC's hardware.)
Question for testing
Question to Phillippe and PSCS,
what changes should I make for testing? Is it sufficient only changing the dialparties.agi or do I need to take out the logging feature as well??
At the mome nt I am still using the SIP instead of Local but as Phillippe mentions there are some drawbacks to this such as the transfer of calls issue so I would like to get back to a working solution with "Local" but without the long wait times eg in transfers etc.
Appreciate your response here.
Update Call center using Local
I have just made that one single change in dialparties.agi and changes back the queueś to use Local instead of SIP, and have been running for an hour without a hitch. I will post tomorrow or in the next couple of days to see if we are error free and without dialparties.agi hanging. Thanks Phillippe and everyone else involved up to now. This is fantastic to see results this way. I will keep contributing with support tickets and time.
warlock67, the change has
warlock67,
the change has already been checked into 2.6 #3792. If you test on 2.5 and have some solid results it will help to drive us to back port it to 2.5. My general approach is to introduce it cautiously, even though it looks pretty straight forward per the long post above. Further testing will help convince us that it should go into 2.5 (and maybe 2.4 also).
I do still plan on getting a tarball of 2.6 beta out within the next week I hope as well, but that will be beta though pretty solid.
Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here
Knock on wood....
Up to now this seems to have taken care of the problem, because I have been up now for more than 3 hours and no hanging dialparties.agi and I have had some peaks of 40 simultaneaous incoming and outgoing calls on this box where it would usually have crippled the box before.
Since the change to dialparties.agi we have processed some 275 incoming calls and over 500 minutes on 4 inbound queues with arround 35 static agents (using penalty within the queues) and no hangs, so this really looks like it did the trick...
warlock67, if you could make
warlock67,
if you could make sure to ping the ticket in the next day or two once you are pretty certain this is solid, it will trigger me to look at merging that fix into 2.5, it is sounding pretty convincing to me.
Philippe Lindheimer - FreePBX Project Leader
FreePBX Training Opportunities - Click Here
Get Official Paid Support - Click Here