Ticket #5646 (closed Bugs: fixed)

Opened 1 year ago

Last modified 10 months ago

DYNAMICHINTS stops generate_hints.php exec

Reported by: MPG Assigned to:
Priority: regression Milestone: 2.11
Component: FreePBX Framework Version: 2.10-branch
Keywords: DYNAMICHINTS generated_hints Cc:
Confirmation: Pending Distro:
Backend Engine: Asterisk 1.8 Distro Ver:
Backend Ver: 10.1.3 SVN Revision (if applicable):

Description

I'm using device and users config, with dynamic hints disabled, when dhints are enabled , the scripts generate_hints.php and generate_queue_hints.php hangups for ever ( I can see this in ps xa|grep php )

My config is asterisk-10.1.3 + freepbx 2.10.0.0 with all modules update.

1st : can any php config are blocking this ?

2nd : Can I run dynamic login extensions with DYNAMICHINTS enables ?

3 : I try debug, but I am lost something in generate_hints.php code.

Regards,

Marcio Gomes

Change History

(follow-up: ↓ 13 ) 03/02/12 10:45:21 changed by p_lindheimer

  • owner changed.
  • priority changed from blocker to regression.
  • confirmation changed from Unreviewed to Need Feedback.
  • component changed from Asterisk Configuration to FreePBX Framework.

I have not tested this on Asterisk 10.

Can you try running the scripts directly from the command line and see if there is any further feedback or errors? The scripts will do no harm when run, they should simply generate the hints as output.

03/04/12 20:33:03 changed by bodywave

Thank goodness somebody else finally reported this problem. It has been driving me crazy since the 2nd week of January 2012. I am having this problem on two servers. Both are running Asterisk 1.8.9.2 but one is FreePBX 2.9.0.9 and the other is FreePBX 2.10.0.0. Here is what I am doing to work around the problem:

1. Every time I click the Apply bar, I quickly go and find the process of "/usr/bin/php -q /var/lib/asterisk/bin/generate_hints.php *80 nocampon dnd" and I kill this process to unhang FreePBX. With 2.9, I have plenty of time to do this, and even though it doesn't generate any hints, at least it still clears the flag so that a reload isn't pending anymore. With 2.10, I need to be fast, otherwise the new GUI will timeout and leave the reload flag set.

2. Next, I must login as root and run asterisk -rx "core reload" because this will fix the hints. It always works because I can run "core show hints" before and afterwards, and the number of hints is much higher afterwards. I tried "dialplan reload" but it wasn't good enough; only "core reload" fixes the hints.

3. One of my affected servers is not using Queues, so the hints are not considered critical, just convenient for BLF. However, the other server is using Queues. On that server, it is critical that I run "core reload" after every reboot. Otherwise, all queue agents have a state of "(Invalid)" and callers never get thru. As soon as I run "core reload" the agents' state is fixed and they immediately start to receive queue calls. This indicates that the hints fail to be generated when Asterisk initially starts after boot, but there is never any noticeable delay during boot (i.e. it does not seem to "hang" except when triggered by reloads in FreePBX).

Some additional information:

I can successfully run the command line while logged in as root in a bash shell. It runs quickly and displays all the expected hint lines. Then it ends gracefully (without error).

I have double-checked all ownership and permissions on /var/lib/php/session, amportal.conf, everything in /etc/asterisk, /tmp, and /var/tmp.

When I first started troubleshooting this problem, I got sidetracked by a slightly related problem. DYNAMICHINTS=True causes the following line to be inserted in extensions_additional.conf:

#exec /var/lib/asterisk/bin/generate_hints.php *80 nocampon dnd

but this is not entirely safe. I believe it should be:

#exec /var/lib/asterisk/bin/generate_hints.php "*80" nocampon dnd

or

#exec /var/lib/asterisk/bin/generate_hints.php \*80 nocampon dnd

If the * is not escaped, and there happens to be any filename in /tmp which matches the mask *80, then the exec command will get expanded by /bin/sh to:

/usr/bin/php -q /var/lib/asterisk/bin/generate_hints.php resetmodem.8980 nocampon dnd

In my case, I had a file /tmp/resetmodem.8980 which was created by my HylaFax? "wedged" script. Manually editing extensions_additional.conf was futile, of course, since it gets regenerated, but I was able to get past this problem for the moment by cleaning out /tmp. Unfortunately, generate_hints.php continued to hang.

While it is hung, I have examined all the files in use by the process, and nothing looks unusual. I can also see that it is successfully opening a manager connection (tcp connection to 127.0.0.1:5038). The process consumes 1-2% CPU for a few seconds, then it drops to 0% and stays there until I kill it.

Sorry for the long post but hopefully this info will help to track down the problem.

03/05/12 12:24:53 changed by p_lindheimer

The *80 should not be an issue as it is not running in a bash shell, it is being launched directly from Asterisk and thus there is no substitution going on. There may be an issue when running it from a shell independently during testing for the reasons you mention.

If it is hanging when reloads are happening but runs fine independently then before anything else proceed it with "sudo -u asterisk" when testing it so that it runs as the same user as will be the case from running it from asterisk. Otherwise the script is going to have to be instrumented to see where it is hanging. There are plenty of systems where it runs without incident so we need to determine what differs on your systems to see why this is going on.

03/05/12 15:54:14 changed by reidcanavan

I would like to echo that we are experiencing this problem ( generate_hints.php hangs system ).

We upgraded from 2.9 to 2.10 and flipped over to device and user mode. After adjusting the devices / users we then activated the dynamic hinting. After doing this it was found that the generate_hints.php hangs.

Running the lastest version of AsteriskNow / Asterisk ( 1.8.9.3 ).

(follow-up: ↓ 6 ) 03/05/12 16:00:45 changed by reidcanavan

I should note that the hang occurs on both generate_queue_hints.php and generate_hints.php.

Killing the tasks and doing a core reload does not circumvent the hangs being seen.

Running those php files from root does not make any difference either.

(in reply to: ↑ 5 ; follow-ups: ↓ 7 ↓ 9 ) 03/05/12 16:12:37 changed by p_lindheimer

Replying to reidcanavan:

Running those php files from root does not make any difference either.

Did you try running them as user asterisk also as requested above with sudo -u and does that run fine or does that hang?

(in reply to: ↑ 6 ) 03/05/12 16:31:28 changed by reidcanavan

Replying to p_lindheimer:

Using sudo -u asterisk or root it appears they run as there is a response with a return back to the command line.

It appears to get stuck running the *45 for queue_hints and *80, *84 for hints.

I tried removing the modules campon and intercom, etc with the same results yielded.

There is only one queue on the system with one static agent.

Running from sudo under either user will also hang on those scripts using the parameters from ps xa. Running the php scripts without parameters returns valid results.

03/05/12 16:37:42 changed by p_lindheimer

  • confirmation changed from Need Feedback to Pending.

This looks like an Asterisk bug that has been introduced somewhere after 1.8.5.0 where it can run successfully. If memory servers me right, this happened once before a few years ago.

My suspected issue is as follows:

#exec calls our script which creates a connection into AMI to obtain information from Asterisk. This of course happens at the time that the new dialplan is being loaded. If they made a change in Asterisk that would lock out the manager connection because the dialplan is being reloaded, it can create a form of deadlock. While loading the dialplan, they need to allow AMI to run and this appears to have been broken.

That's a theory, this will need more detailed analysis to determine exactly what point is hanging in the scripts so we can track down and file the bug with Asterisk if it has not already been detected.

(in reply to: ↑ 6 ) 03/05/12 16:44:54 changed by reidcanavan

I have found that by running the scripts while there is a call by asterisk, the scripts will hang. However, if I ensure that asterisk has not called these scripts and run them manually that they return the results as expected.

I would presume that your theory is correct based on these results.

In the interim I will turn off dynamic hinting.

03/05/12 17:09:41 changed by p_lindheimer

if you have adhoc devices you need dynamic hints and should revert back to an earlier asterisk. If you do not use adhoc devices then you do not need dynamic hints. We are currently in the process of trying to determine which Asterisk release introduced this issue so we can track down the bug and have it properly reported. I have determined that the manager connection gets opened but ends up hanging when doing a "database get" of the AMPUSER AstDB object which would make sense.

03/05/12 19:17:44 changed by p_lindheimer

confirmed to work on 1.8.8.0 and confirmed to fail on 1.8.9.1 so something between then has introduced this issue.

03/05/12 19:47:13 changed by p_lindheimer

(in reply to: ↑ 1 ) 03/05/12 23:11:06 changed by MPG

Replying to p_lindheimer:

I have not tested this on Asterisk 10. Can you try running the scripts directly from the command line and see if there is any further feedback or errors? The scripts will do no harm when run, they should simply generate the hints as output.

All off then run without problems.. in example :

asterisk@zap1:/etc$ /var/lib/asterisk/bin/generate_hints.php nointercom *84 exten => 1002,hint,DAHDI/82 exten => *841002,hint,ccss:DAHDI/82 exten => 1031,hint,DAHDI/80 exten => *841031,hint,ccss:DAHDI/80 exten => 1034,hint,DAHDI/64 exten => *841034,hint,ccss:DAHDI/64 exten => 1041,hint,DAHDI/84 exten => *841041,hint,ccss:DAHDI/84 exten => 1051,hint,DAHDI/72 exten => *841051,hint,ccss:DAHDI/72 exten => 1061,hint,DAHDI/75 exten => *841061,hint,ccss:DAHDI/75 exten => 1062,hint,DAHDI/73 exten => *841062,hint,ccss:DAHDI/73 exten => 1012,hint,DAHDI/66 exten => *841012,hint,ccss:DAHDI/66 exten => 1013,hint,DAHDI/67 exten => *841013,hint,ccss:DAHDI/67 exten => 1014,hint,DAHDI/68 exten => *841014,hint,ccss:DAHDI/68 exten => 1015,hint,DAHDI/69 exten => *841015,hint,ccss:DAHDI/69 exten => 1032,hint,DAHDI/89 exten => *841032,hint,ccss:DAHDI/89 exten => 1011,hint,DAHDI/65 exten => *841011,hint,ccss:DAHDI/65 exten => 1233,hint,SIP/1233 exten => *841233,hint,ccss:SIP/1233 exten => 1033,hint,DAHDI/63 exten => *841033,hint,ccss:DAHDI/63

03/06/12 06:46:35 changed by MPG

I attached the traces in the asterisk issue

https://issues.asterisk.org/jira/browse/ASTERISK-19487

03/07/12 23:05:53 changed by p_lindheimer

(In [13780]) add Reload action and call instead of CLI module reload to trigger reload, re #5646 fixes the issue but other versions of FreePBX still broke as it's a real Asterisk bug

03/07/12 23:22:38 changed by p_lindheimer

(In [13789]) Merged revisions 13780-13781 via svnmerge from http://www.freepbx.org/v2/svn/freepbx/branches/2.10

........

r13780 | p_lindheimer | 2012-03-07 20:05:52 -0800 (Wed, 07 Mar 2012) | 1 line

add Reload action and call instead of CLI module reload to trigger reload, re #5646 fixes the issue but other versions of FreePBX still broke as it's a real Asterisk bug

........

r13781 | p_lindheimer | 2012-03-07 20:14:46 -0800 (Wed, 07 Mar 2012) | 1 line

fixes #5492 callback issue re #5565 reverts r13406

........

03/09/12 17:37:39 changed by bodywave

I applied changeset 13780 to my FreePBX 2.9.0.9 system which was having this problem, and I can confirm that this workaround solution is effective.

03/09/12 18:03:41 changed by p_lindheimer

(In [13792]) Merged revisions 13780 via svnmerge from http://www.freepbx.org/v2/svn/freepbx/branches/2.10

........

r13780 | p_lindheimer | 2012-03-07 20:05:52 -0800 (Wed, 07 Mar 2012) | 1 line

add Reload action and call instead of CLI module reload to trigger reload, re #5646 fixes the issue but other versions of FreePBX still broke as it's a real Asterisk bug

........

(follow-up: ↓ 20 ) 03/13/12 05:52:40 changed by MPG

I found a new problem, after amportal stops ( or asterisk stop ), when you call amportal start ( or a single asterisk daemon ), core show hints are empty, after a reload all works fine.

(in reply to: ↑ 19 ; follow-ups: ↓ 21 ↓ 22 ) 03/13/12 11:35:10 changed by p_lindheimer

Replying to MPG:

I found a new problem, after amportal stops ( or asterisk stop ), when you call amportal start ( or a single asterisk daemon ), core show hints are empty, after a reload all works fine.

Your issue appears to be a different bug and is also an issue with Asterisk. I have confirmed it on Asterisk 10.2.0, please indicate what version of Asterisk you are running as this will probably be another Asterisk ticket though needs to be confirmed if the issue is the same issue or not with them. The issue is that a connection to the AMI fails when called from the #exec at initial boot. It would be extremely useful if this could be tested on 1.8.8.0 or earlier 1,8, 1.6 or 1.4 versions to see how long it has been an issue, and in particular if 1.8.8.0 or earlier work which would point to the same Asterisk bug being a most likely cause.

(in reply to: ↑ 20 ) 03/13/12 13:56:59 changed by p_lindheimer

Replying to p_lindheimer:

Replying to MPG:

I found a new problem, after amportal stops ( or asterisk stop ), when you call amportal start ( or a single asterisk daemon ), core show hints are empty, after a reload all works fine.

Your issue appears to be a different bug and is also an issue with Asterisk. I have confirmed it on Asterisk 10.2.0, please indicate what version of Asterisk you are running as this will probably be another Asterisk ticket though needs to be confirmed if the issue is the same issue or not with them. The issue is that a connection to the AMI fails when called from the #exec at initial boot. It would be extremely useful if this could be tested on 1.8.8.0 or earlier 1,8, 1.6 or 1.4 versions to see how long it has been an issue, and in particular if 1.8.8.0 or earlier work which would point to the same Asterisk bug being a most likely cause.

I have confirmed that this is still an issue on 1.8.8.0 so does not appear to be related with this Asterisk bug. This needs to be checked on 1.4 and/or 1.6.X to see if the problem existed back then or not and then a new ticket created once more details are there though it's inevitably something that will have to be defined fixed in Asterisk. The amportal (freepbx_engine) script could be modified to send a reload after startup but that is a workaround and we want to get to a resolution first.

(in reply to: ↑ 20 ) 03/13/12 14:20:52 changed by p_lindheimer

Replying to p_lindheimer:

Replying to MPG:

I found a new problem, after amportal stops ( or asterisk stop ), when you call amportal start ( or a single asterisk daemon ), core show hints are empty, after a reload all works fine.

Your issue appears to be a different bug and is also an issue with Asterisk. I have confirmed it on Asterisk 10.2.0, please indicate what version of Asterisk you are running as this will probably be another Asterisk ticket though needs to be confirmed if the issue is the same issue or not with them. The issue is that a connection to the AMI fails when called from the #exec at initial boot. It would be extremely useful if this could be tested on 1.8.8.0 or earlier 1,8, 1.6 or 1.4 versions to see how long it has been an issue, and in particular if 1.8.8.0 or earlier work which would point to the same Asterisk bug being a most likely cause.

#5677 opened against this issue, I have confirmed it is a problem on both 1.4.21-ish, and 1.4.32. Seems like an Asterisk but they will have to determine that.

03/15/12 18:08:13 changed by reidcanavan

Dynamic hints was working but has stopped since upgrading to the latest FreePBX modules today.

03/27/12 10:25:49 changed by p_lindheimer

(In [13920]) fixes #5725 1.4 and below don't have a manager Reload action, re #5646 which was introduced in framework 2.9.0.11 only affects Asterisk prior to 1.4

06/15/12 16:19:16 changed by p_lindheimer

  • status changed from new to closed.
  • resolution set to fixed.

(In [14148]) fixes #5646 reload Asterisk after initial start if using Dynamic Hints to get over Asterisk bug where #exec fails on first startup

07/11/12 17:22:22 changed by p_lindheimer

(In [14253]) Merged revisions 13971-13972,13979,14026,14031,14048-14049,14052,14054,14060,14082,14119,14128,14133-14135,14148-14149,14151-14153,14159,14205,14218 via svnmerge from http://www.freepbx.org/v2/svn/freepbx/branches/2.10

........

r13971 | p_lindheimer | 2012-04-14 10:56:49 -0700 (Sat, 14 Apr 2012) | 1 line

don't make manager creds readonly now that they are auto updated when changed

........

r13972 | p_lindheimer | 2012-04-14 11:43:05 -0700 (Sat, 14 Apr 2012) | 1 line

fixes #5716 and fixes #5763 set reasonable timeouts with wget, port 5060 is still an issue and the server needs to be checked as something has changed there

........

r13979 | p_lindheimer | 2012-04-17 17:12:25 -0700 (Tue, 17 Apr 2012) | 1 line

this is supplied by core now

........

r14026 | p_lindheimer | 2012-04-27 13:18:08 -0700 (Fri, 27 Apr 2012) | 1 line

fixes #5742 conflict with html ID on Advanced Settings page breaks abiliity to change this one setting, will go live with next framework module update

........

r14031 | mbrevda | 2012-04-30 07:58:12 -0700 (Mon, 30 Apr 2012) | 1 line

fixes #5743 by namsepacing ari css stuff

........

r14048 | p_lindheimer | 2012-05-03 07:19:25 -0700 (Thu, 03 May 2012) | 1 line

fixes #5533 harmless warning message by not checking the filesize of the log file the first time if it does not exists yet

........

r14049 | p_lindheimer | 2012-05-03 08:54:56 -0700 (Thu, 03 May 2012) | 1 line

check if dbug file exists before calculatin size to remove otherwise harmless warning message re #5533

........

r14052 | p_lindheimer | 2012-05-04 08:26:35 -0700 (Fri, 04 May 2012) | 1 line

revert r12549 merged in error as well as remove undefined index all re #5692 which may fix issues but needs testing

........

r14054 | p_lindheimer | 2012-05-04 09:12:50 -0700 (Fri, 04 May 2012) | 1 line

don't duplicate this first setting re #5692

........

r14060 | p_lindheimer | 2012-05-07 15:03:50 -0700 (Mon, 07 May 2012) | 1 line

revert accidental checkin of part of r13972 re #5812

........

r14082 | p_lindheimer | 2012-05-18 15:16:54 -0700 (Fri, 18 May 2012) | 1 line

incorrect parsing of REQUEST variable results in streamlined ajax path's not being honored

........

r14119 | mbrevda | 2012-05-30 00:10:38 -0700 (Wed, 30 May 2012) | 1 line

function to recursivly remove a directory

........

r14128 | p_lindheimer | 2012-06-11 08:21:11 -0700 (Mon, 11 Jun 2012) | 1 line

fixes #5874 variable name typo

........

r14133 | p_lindheimer | 2012-06-14 11:22:37 -0700 (Thu, 14 Jun 2012) | 1 line

remove some advanced settings no longer valid

........

r14134 | p_lindheimer | 2012-06-14 12:37:43 -0700 (Thu, 14 Jun 2012) | 1 line

makes spnosor image and associated link configurable as is freepbx image

........

r14135 | p_lindheimer | 2012-06-14 12:39:02 -0700 (Thu, 14 Jun 2012) | 1 line

closes #5833 USEDEVSTATE defaults to true

........

r14148 | p_lindheimer | 2012-06-15 13:19:15 -0700 (Fri, 15 Jun 2012) | 1 line

fixes #5646 reload Asterisk after initial start if using Dynamic Hints to get over Asterisk bug where #exec fails on first startup

........

r14149 | mbrevda | 2012-06-18 08:48:21 -0700 (Mon, 18 Jun 2012) | 1 line

re #5879, #5839 - add bootstrap hook; write writetimeout to mamanger.conf; cosmetics

........

r14151 | mbrevda | 2012-06-19 05:36:10 -0700 (Tue, 19 Jun 2012) | 1 line

syntaxt error

........

r14152 | p_lindheimer | 2012-06-19 08:23:43 -0700 (Tue, 19 Jun 2012) | 1 line

add 2.10.1 upgrade dir, migration for writetimeout re #5839, and remove some advanced settings that are no longer used

........

r14153 | GameGamer?43 | 2012-06-19 15:10:37 -0700 (Tue, 19 Jun 2012) | 1 line

closes #5885 - check recordings in cdr prior to enabling or disabling the delete button

........

r14159 | p_lindheimer | 2012-06-27 16:51:15 -0700 (Wed, 27 Jun 2012) | 1 line

don't take out the description or it doesn't show up in module admin when not online

........

r14205 | p_lindheimer | 2012-06-30 13:11:53 -0700 (Sat, 30 Jun 2012) | 1 line

Creating release 2.10.1

........

r14218 | p_lindheimer | 2012-06-30 13:30:57 -0700 (Sat, 30 Jun 2012) | 1 line

remove some of the security auditing code that got checked in error and not used or ready

........