Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] About a combination of Pacemaker1.1.14 and crmsh2.1.5. #129

Closed
HideoYamauchi opened this issue Apr 5, 2016 · 16 comments
Closed
Assignees
Labels

Comments

@HideoYamauchi
Copy link
Contributor

Hi All,

I put Pacemaker1.1.14 and crmsh2.1.5 together on RHEL7.2 and confirmed a history function.

I start a node and constitute a cluster.

[root@rh72-01 ~]# crm_mon -1 -Af
Last updated: Tue Apr  5 10:06:05 2016          Last change: Tue Apr  5 10:06:01 2016 by root via cibadmin on rh72-01
Stack: corosync
Current DC: rh72-01 (version 1.1.14-70404b0) - partition WITHOUT quorum
1 node and 1 resource configured

Online: [ rh72-01 ]

 prmDummy       (ocf::heartbeat:Dummy): Started rh72-01

Node Attributes:
* Node rh72-01:

Migration Summary:
* Node rh72-01:

I cause trouble in a resource.

[root@rh72-01 ~]# rm -rf /var/run/resource-agents/Dummy-prmDummy.state 
[root@rh72-01 ~]# crm_mon -1 -Af
Last updated: Tue Apr  5 10:06:38 2016          Last change: Tue Apr  5 10:06:01 2016 by root via cibadmin on rh72-01
Stack: corosync
Current DC: rh72-01 (version 1.1.14-70404b0) - partition WITHOUT quorum
1 node and 1 resource configured

Online: [ rh72-01 ]

 prmDummy       (ocf::heartbeat:Dummy): Started rh72-01

Node Attributes:
* Node rh72-01:

Migration Summary:
* Node rh72-01:
   prmDummy: migration-threshold=1000000 fail-count=1 last-failure='Tue Apr  5 10:06:33 2016'

Failed Actions:
* prmDummy_monitor_10000 on rh72-01 'not running' (7): call=7, status=complete, exitreason='No process state file found',
    last-rc-change='Tue Apr  5 10:06:33 2016', queued=0ms, exec=0ms

I was going to use the history function of crmsh, but it does not seem to move well.

[root@rh72-01 rhel7]# crm --version
2.1.5 (Build unknown)
[root@rh72-01 rhel7]# crm history
crm(live)history# latest
INFO: fetching new logs, please wait ...
[1] 10:07:16 [FAILURE] rh72-01 Exited with error code 120
ERROR: no transitions found in the source
crm(live)history# info
Source: live
Created on: Tue Apr  5 09:54:32 JST 2016
By: report -Z -Q -f Tue Apr  5 08:54:30 2016 /var/cache/crm/history/live
Period: 2016-04-05 09:07:16 - 2016-04-05 09:54:30
Nodes: rh72-01
Groups: 
Resources: prmDummy
Transitions: 
crm(live)history# peinputs v
Date       Start    End       Filename      Client     User       Origin
====       =====    ===       ========      ======     ====       ======
crm(live)history# resource prmDummy
ERROR: Dummy(prmDummy)[17229]:: unknown string format

When I use the history function, is there any insufficient setting?
Can the history function use Pacemaker1.1.14 and crmsh2.1.5 at the time of a combination?

I put Pacemaker1.1.12 and crmsh2.1.4 together, and I confirmed the same operation, but it seemed to move well.

[root@rh67-01 ~]# crm --version
2.1.4 (Build unknown)
[root@rh67-01 ~]# crm history
crm(live)history# latest
WARNING: pssh not installed, slow live updates ahead
INFO: retrieving information from cluster nodes, please wait ...
WARNING: end of transition rh67-01:pe-input-3 not found in logs (transition not complete yet?)
Transition rh67-01:pe-input-4 (11:34:46 - 11:34:46):
        total 2 actions: 2 Complete
Apr  5 11:34:46 rh67-01 Dummy(prmDummy)[13184]: ERROR: No process state file found
Apr  5 11:34:46 rh67-01 crmd[13126]:   notice: te_rsc_command: Initiating action 2: stop prmDummy_stop_0 on rh67-01 (local)
Apr  5 11:34:46 rh67-01 crmd[13126]:   notice: te_rsc_command: Initiating action 4: start prmDummy_start_0 on rh67-01 (local)
crm(live)history# info
Source: live
Created on: Tue Apr  5 11:34:55 JST 2016
By: report -Z -Q -f Tue Apr  5 10:34:51 2016 /var/cache/crm/history/live
Period: 2016-04-05 10:34:51 - 2016-04-05 11:34:52
Nodes: rh67-01
Groups: 
Resources: prmDummy
Transitions: 0 1 2 3 4
crm(live)history# peinputs v
Date       Start    End       Filename      Client     User       Origin
====       =====    ===       ========      ======     ====       ======
2016-04-05 11:34:26 11:34:26  pe-input-0    no-client  no-user    no-origin
2016-04-05 11:34:26 11:34:26  pe-input-1    no-client  no-user    no-origin
2016-04-05 11:34:46 11:34:46  pe-input-2    no-client  no-user    no-origin
2016-04-05 11:34:46 11:34:46  pe-input-3    no-client  no-user    no-origin
2016-04-05 11:34:46 11:34:46  pe-input-4    no-client  no-user    no-origin
crm(live)history# resource prmDummy
Apr  5 11:34:26 rh67-01 crmd[13126]:   notice: te_rsc_command: Initiating action 5: start prmDummy_start_0 on rh67-01 (local)
Apr  5 11:34:46 rh67-01 Dummy(prmDummy)[13184]: ERROR: No process state file found
Apr  5 11:34:46 rh67-01 crmd[13126]:   notice: te_rsc_command: Initiating action 2: stop prmDummy_stop_0 on rh67-01 (local)
Apr  5 11:34:46 rh67-01 crmd[13126]:   notice: te_rsc_command: Initiating action 4: start prmDummy_start_0 on rh67-01 (local)

Best Regards,
Hideo Yamauchi.

@krig
Copy link
Contributor

krig commented Apr 5, 2016

Hello,

Thank you for this report! The problem is a log format I have not previously encountered. I will create a fix for this.

@HideoYamauchi
Copy link
Contributor Author

Hi Kristoffer,

Thank you for comment.

All right!
I wait for your correction.

Best Regards,
Hideo Yamauchi.

@krig krig added the bug label Apr 5, 2016
@krig krig self-assigned this Apr 5, 2016
@krig
Copy link
Contributor

krig commented Apr 6, 2016

Hello Hideo-san,

I apologize, but it will take a little time for me to publish an updated version of crmsh 2.1.5 with a fix for this problem, as I am currently travelling.

If it is any help, the master branch currently does not suffer this issue, I seem to have repaired the problem since releasing 2.1.5. I will take care of this as soon as I am able.

Thank you,
Kristoffer

@HideoYamauchi
Copy link
Contributor Author

Hi Kristoffer,

Thank you for comment.

Okay!
Because I have the work of other investigations, I decide to wait a little more.

Best Regards,
Hideo Yamauchi.

@krig
Copy link
Contributor

krig commented Apr 27, 2016

Hello, and apologies again for taking so long to respond. I have now released 2.1.6 which should fix this issue (and others).

https://github.com/ClusterLabs/crmsh/releases/tag/2.1.6

@krig krig closed this as completed Apr 27, 2016
@HideoYamauchi
Copy link
Contributor Author

Hi Kristoffer,

Great!
I confirm your correction.

Many thanks!
Hideo Yamauchi.

@HideoYamauchi
Copy link
Contributor Author

HideoYamauchi commented Apr 28, 2016

Hi Kristoffer,

There seems to be still a problem with the matching of the event somehow or other.

Because contents of the log of Pacemaker are changed somehow or other, a change is necessary for the contents of EVENT_PATTERNS.

EVENT_PATTERNS="
membership crmd.*(NEW|LOST)|pcmk.*(lost|memb|LOST|MEMB):
quorum crmd.*Updating.quorum.status|crmd.*quorum.(lost|ac?quir)
pause Process.pause.detected
resources lrmd.*(start|stop)
stonith crmd.*Exec|stonith-ng.*log_oper.*reboot|stonithd.*(requests|(Succeeded|Failed).to.STONITH|result=)
start_stop Configuration.validated..Starting.heartbeat|Corosync.Cluster.Engine|Executive.Service.RELEASE|Requesting.shutdown|Shutdown.complete
"

As a result of having performed matching of EVENT_PATTERNS in Pacemaker1.1.14
(I composed a cluster of two nodes and carried it out after having carried out stonith.)

[root@rh72-01 ~]# grep -E "crmd.*(NEW|LOST)|pcmk.*(lost|memb|LOST|MEMB):" /var/log/messages
[root@rh72-01 ~]# grep -E "crmd.*Updating.quorum.status|crmd.*quorum.(lost|ac?quir)" /var/log/messages
Apr 28 11:10:42 rh72-01 <local1.notice> crmd[1252]:  notice: Membership 632: quorum acquired (2)
Apr 28 11:11:14 rh72-01 <local1.notice> crmd[1252]:  notice: Membership 644: quorum lost (1)
[root@rh72-01 ~]# grep -E "Process.pause.detected" /var/log/messages
[root@rh72-01 ~]# grep -E "lrmd.*(start|stop)" /var/log/messages
Apr 28 11:10:54 rh72-01 <local1.info> lrmd[1249]:    info: executing - rsc:prmDummy action:start call_id:14
Apr 28 11:10:54 rh72-01 <local1.info> lrmd[1249]:    info: finished - rsc:prmDummy action:start call_id:14 pid:1326 exit-code:0 exec-time:12ms queue-time:0ms
Apr 28 11:10:54 rh72-01 <local1.info> lrmd[1249]:    info: executing - rsc:prmStonith2-1 action:start call_id:15
Apr 28 11:10:56 rh72-01 <local1.info> lrmd[1249]:    info: finished - rsc:prmStonith2-1 action:start call_id:15  exit-code:0 exec-time:1108ms queue-time:0ms
[root@rh72-01 ~]# grep -E "crmd.*Exec|stonith-ng.*log_oper.*reboot|stonithd.*(requests|(Succeeded|Failed).to.STONITH|result=)" /var/log/messages
Apr 28 11:11:09 rh72-01 <local1.notice> crmd[1252]:  notice: Executing reboot fencing operation (9) on rh72-02 (timeout=60000)
[root@rh72-01 ~]# grep -E "Configuration.validated..Starting.heartbeat|Corosync.Cluster.Engine|Executive.Service.RELEASE|Requesting.shutdown|Shutdown.complete" /var/log/messages
Apr 28 11:10:06 rh72-01 <daemon.info> systemd:Starting Corosync Cluster Engine...
Apr 28 11:10:06 rh72-01 <local1.notice> corosync[1239]: [MAIN  ] Corosync Cluster Engine ('2.3.5'): started and ready to provide service.
Apr 28 11:10:07 rh72-01 <daemon.info> corosync:Starting Corosync Cluster Engine (corosync): [  OK  ]
Apr 28 11:10:07 rh72-01 <daemon.info> systemd:Started Corosync Cluster Engine.

I think that the correction of EVENT_PATTERNS is necessary.

What do you think?

Best Regards,
Hideo Yamauchi.

@krig
Copy link
Contributor

krig commented Apr 28, 2016

Hi,

Yes, that is probably true. We already have different patterns for 1.1.8
and earlier in crmsh itself.

Can you create a report that shows the problem and email me? I will update
the event patterns.

On Thu, Apr 28, 2016, 03:02 Hideo Yamauchi notifications@github.com wrote:

Hi Kristoffer,

There seems to be still a problem with the matching of the event somehow
or other.

Because contents of the log of Pacemaker are changed somehow or other, a
change is necessary for the contents of EVENT_PATTERNS.

EVENT_PATTERNS="
membership crmd.(NEW|LOST)|pcmk.(lost|memb|LOST|MEMB):
quorum crmd._Updating.quorum.status|crmd.quorum.(lost|ac?quir)
pause Process.pause.detected
resources lrmd.
(start|stop)
stonith crmd._Exec|stonith-ng._log_oper._reboot|stonithd.*(requests|(Succeeded|Failed).to.STONITH|result=)
start_stop Configuration.validated..Starting.heartbeat|Corosync.Cluster.Engine|Executive.Service.RELEASE|Requesting.shutdown|Shutdown.complete
"

Different EVENT_PATTERNS seems to be necessary every version to handle
Pacemaker of the old version.

What do you think?

Best Regards,
Hideo Yamauchi.


You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub
#129 (comment)

@krig
Copy link
Contributor

krig commented Apr 28, 2016

For the next major release, I am considering removing events.txt from the report, since we now have crm history events which is a more sophisticated analysis.

@HideoYamauchi
Copy link
Contributor Author

Hi Kristoffer,

Please give me time a little.
Japan becomes the vacation for approximately one week from tomorrow.
Because I go for work on Monday, I send a report to you.

Best Regards,
Hideo Yamauchi.

@krig
Copy link
Contributor

krig commented Apr 28, 2016

No problem, enjoy your vacation :)

@HideoYamauchi
Copy link
Contributor Author

Hi Kristoffer,

I will send hb_report which I acquired in Pacemaker1.1.14 tomorrow evening.
Is the address good in krig@koru.se?

Best Regards,
Hideo Yamauchi.

@krig
Copy link
Contributor

krig commented May 1, 2016

Hi,

Yes, that is OK :)

Cheers,
Kristoffer

On Sun, May 1, 2016, 06:29 Hideo Yamauchi notifications@github.com wrote:

Hi Kristoffer,

I will send hb_report which I acquired in Pacemaker1.1.14 tomorrow evening.
Is the address good in krig@koru.se?

Best Regards,
Hideo Yamauchi.


You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub
#129 (comment)

@HideoYamauchi
Copy link
Contributor Author

Hi Kristoffer,

I sent an email in the evening.
Did the email arrive?

Best Regards,
Hideo Yamauchi.

@krig
Copy link
Contributor

krig commented May 2, 2016

Hello Hideo-san,

I checked my spam folder and found the message. I suppose it was flagged
due to the attachment, but it has arrived now.

I will get back to you when I have looked at the log data.

Thank you!
Kristoffer

On Mon, May 2, 2016 at 4:01 PM, Hideo Yamauchi notifications@github.com
wrote:

Hi Kristoffer,

I sent an email in the evening.
Did the email arrive?

Best Regards,
Hideo Yamauchi.


You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub
#129 (comment)

@HideoYamauchi
Copy link
Contributor Author

Hi Kristoffer,

Okay!
Sorry....The address of ybb.ne.jp may not be so good.

Many Thanks!
Hideo Yamauchi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants