Uwe Bartels | 25 Apr 13:37 2011
Picon

failback setup problem

Hi,

I'm using pg-pool-II 3.0.3 with streaming replication.
I coded the failback scenario/script for the slave server and the script itself works fine.

I now configured the failback script in pgpool.conf and during testing an error message comes up:
2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not connect master node.

[root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898 postgres postgres 0
adt-db01 5432 1 0.500000
[root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898 postgres postgres 1
adt-db02 5432 3 0.500000

pcp commands and pgpooladmin report that the master is up and running and I'm able to connect to the master directly and through pgpool.
So what's wrong? So far everything else works fine.

Best Regards,
Uwe

_______________________________________________
Pgpool-general mailing list
Pgpool-general@...
http://pgfoundry.org/mailman/listinfo/pgpool-general
Tatsuo Ishii | 26 Apr 01:06 2011
Picon

Re: failback setup problem

> I'm using pg-pool-II 3.0.3 with streaming replication.
> I coded the failback scenario/script for the slave server and the script
> itself works fine.
> 
> I now configured the failback script in pgpool.conf and during testing an
> error message comes up:
> 2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
> 2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not connect
> master node.
> 
> [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898 postgres
> postgres 0
> adt-db01 5432 1 0.500000
> [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898 postgres
> postgres 1
> adt-db02 5432 3 0.500000
> 
> pcp commands and pgpooladmin report that the master is up and running and
> I'm able to connect to the master directly and through pgpool.
> So what's wrong? So far everything else works fine.

Assuming you have set recovery_user and recovery_passwd correctly, I'm
not sure what's going on. IMO, the error message is very rare. It's so
rare and there's a bug in the error path, which had not been found for
long time. Can please try attached patch? The patch add a little bit
usefull info to the error message above.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
Attachment (recovery.patch): text/x-patch, 664 bytes
_______________________________________________
Pgpool-general mailing list
Pgpool-general@...
http://pgfoundry.org/mailman/listinfo/pgpool-general
Uwe Bartels | 26 Apr 07:24 2011
Picon

Re: failback setup problem

Hi Tatsuo,

thanks, that message already helped. I tried to recover the postgres server with the failback_command.

I didn't realize these recovery_* parameters yet.
So I use the recovery_* parameters for recovering the failed postgres server. And the failback_command to attach the postgres server into pgpool right?

Best Regards,
Uwe


On 26 April 2011 01:06, Tatsuo Ishii <ishii-r5vX20e9KLfqq2nvvmkE/A@public.gmane.org> wrote:
> I'm using pg-pool-II 3.0.3 with streaming replication.
> I coded the failback scenario/script for the slave server and the script
> itself works fine.
>
> I now configured the failback script in pgpool.conf and during testing an
> error message comes up:
> 2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
> 2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not connect
> master node.
>
> [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898 postgres
> postgres 0
> adt-db01 5432 1 0.500000
> [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898 postgres
> postgres 1
> adt-db02 5432 3 0.500000
>
> pcp commands and pgpooladmin report that the master is up and running and
> I'm able to connect to the master directly and through pgpool.
> So what's wrong? So far everything else works fine.

Assuming you have set recovery_user and recovery_passwd correctly, I'm
not sure what's going on. IMO, the error message is very rare. It's so
rare and there's a bug in the error path, which had not been found for
long time. Can please try attached patch? The patch add a little bit
usefull info to the error message above.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

_______________________________________________
Pgpool-general mailing list
Pgpool-general@...
http://pgfoundry.org/mailman/listinfo/pgpool-general
Tatsuo Ishii | 26 Apr 07:34 2011
Picon

Re: failback setup problem

> thanks, that message already helped. I tried to recover the postgres server
> with the failback_command.

You are welcome.

> I didn't realize these recovery_* parameters yet.
> So I use the recovery_* parameters for recovering the failed postgres
> server.

pgpool-II connects to backend to issue some SQLs including CHECKPOINT.
recovery_* parameters define the user and password for the connection.
Usually they are for PostgreSQL super user (postgres).

> And the failback_command to attach the postgres server into pgpool
> right?

If you want to do something special, for example mailing to DBA, then
you might want to specify it.  Otherwise you can leave it empty.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Best Regards,
> Uwe
> 
> 
> On 26 April 2011 01:06, Tatsuo Ishii <ishii@...> wrote:
> 
>> > I'm using pg-pool-II 3.0.3 with streaming replication.
>> > I coded the failback scenario/script for the slave server and the script
>> > itself works fine.
>> >
>> > I now configured the failback script in pgpool.conf and during testing an
>> > error message comes up:
>> > 2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
>> > 2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not connect
>> > master node.
>> >
>> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> postgres
>> > postgres 0
>> > adt-db01 5432 1 0.500000
>> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> postgres
>> > postgres 1
>> > adt-db02 5432 3 0.500000
>> >
>> > pcp commands and pgpooladmin report that the master is up and running and
>> > I'm able to connect to the master directly and through pgpool.
>> > So what's wrong? So far everything else works fine.
>>
>> Assuming you have set recovery_user and recovery_passwd correctly, I'm
>> not sure what's going on. IMO, the error message is very rare. It's so
>> rare and there's a bug in the error path, which had not been found for
>> long time. Can please try attached patch? The patch add a little bit
>> usefull info to the error message above.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
Uwe Bartels | 28 Apr 08:40 2011
Picon

Re: failback setup problem

Hi Tatsuo,

ok. now its working fine. thanks for your help.

after getting through that initial setup for the first time, I'd like to give you some feedback about the (at least for me) missing information in the documentation.

- please document the control flow during recovery, e.g.
connect to master server with recovery_user/recovery_password (connection check)
run recovery_1st_stage_command
run checkpoint
run recovery_2nd_stage_command
...
run failback_command

- please document that recovery_1st_stage_command and recovery_2nd_stage_command system calls by the current postgres masterserver in the PGDATA directory are. And that the failback_command a shell script command or system call from the pgpool server is. I needed to search for it in the source code.

I have a different approach of recovering the postgres server. I'm recovering from an existing backup. I do that because it is faster and I don't put additional i/o load on the just activated server. I guess (or hope) most people will have an existing backup.
So my question is - if the aproach of recovering the failed server via a sql command is optimal? What if both servers failed? then I'm not able to use pcp-tools or pgpoolAdmin for recovering?

I'm asking because I worked for several years as an it-production-responsible and I learned a little how administrators think/work. They are happy if they have a (or better ONE) defined recovery procedure.
Where am I getting? I'm asking you if it would makes sense to recode or reduce the recovery procedure code to one system call e.g. failback_command.
Most people have their backup and restore functionality coded and ready for training and/or desaster. If they could simply use this very same functionality within pgpooladmin that would be great.

It might be that i have overseen something (as before) and this is already possible. If so please tell me how.

Best Regards,
Uwe


On 26 April 2011 07:34, Tatsuo Ishii <ishii-r5vX20e9KLfqq2nvvmkE/A@public.gmane.org> wrote:
> thanks, that message already helped. I tried to recover the postgres server
> with the failback_command.

You are welcome.

> I didn't realize these recovery_* parameters yet.
> So I use the recovery_* parameters for recovering the failed postgres
> server.

pgpool-II connects to backend to issue some SQLs including CHECKPOINT.
recovery_* parameters define the user and password for the connection.
Usually they are for PostgreSQL super user (postgres).

> And the failback_command to attach the postgres server into pgpool
> right?

If you want to do something special, for example mailing to DBA, then
you might want to specify it.  Otherwise you can leave it empty.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Best Regards,
> Uwe
>
>
> On 26 April 2011 01:06, Tatsuo Ishii <ishii-r5vX20e9KLfqq2nvvmkE/A@public.gmane.org> wrote:
>
>> > I'm using pg-pool-II 3.0.3 with streaming replication.
>> > I coded the failback scenario/script for the slave server and the script
>> > itself works fine.
>> >
>> > I now configured the failback script in pgpool.conf and during testing an
>> > error message comes up:
>> > 2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
>> > 2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not connect
>> > master node.
>> >
>> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> postgres
>> > postgres 0
>> > adt-db01 5432 1 0.500000
>> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> postgres
>> > postgres 1
>> > adt-db02 5432 3 0.500000
>> >
>> > pcp commands and pgpooladmin report that the master is up and running and
>> > I'm able to connect to the master directly and through pgpool.
>> > So what's wrong? So far everything else works fine.
>>
>> Assuming you have set recovery_user and recovery_passwd correctly, I'm
>> not sure what's going on. IMO, the error message is very rare. It's so
>> rare and there's a bug in the error path, which had not been found for
>> long time. Can please try attached patch? The patch add a little bit
>> usefull info to the error message above.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>

_______________________________________________
Pgpool-general mailing list
Pgpool-general@...
http://pgfoundry.org/mailman/listinfo/pgpool-general
Tatsuo Ishii | 28 Apr 08:58 2011
Picon

Re: failback setup problem

Thank you for the feedback!
I'm looking forward to seeing your suggestions.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Hi Tatsuo,
> 
> ok. now its working fine. thanks for your help.
> 
> after getting through that initial setup for the first time, I'd like to
> give you some feedback about the (at least for me) missing information in
> the documentation.
> 
> - please document the control flow during recovery, e.g.
> connect to master server with recovery_user/recovery_password (connection
> check)
> run recovery_1st_stage_command
> run checkpoint
> run recovery_2nd_stage_command
> ...
> run failback_command
> 
> - please document that recovery_1st_stage_command and
> recovery_2nd_stage_command system calls by the current postgres masterserver
> in the PGDATA directory are. And that the failback_command a shell script
> command or system call from the pgpool server is. I needed to search for it
> in the source code.
> 
> I have a different approach of recovering the postgres server. I'm
> recovering from an existing backup. I do that because it is faster and I
> don't put additional i/o load on the just activated server. I guess (or
> hope) most people will have an existing backup.
> So my question is - if the aproach of recovering the failed server via a sql
> command is optimal? What if both servers failed? then I'm not able to use
> pcp-tools or pgpoolAdmin for recovering?
> 
> I'm asking because I worked for several years as an
> it-production-responsible and I learned a little how administrators
> think/work. They are happy if they have a (or better ONE) defined recovery
> procedure.
> Where am I getting? I'm asking you if it would makes sense to recode or
> reduce the recovery procedure code to one system call e.g. failback_command.
> Most people have their backup and restore functionality coded and ready for
> training and/or desaster. If they could simply use this very same
> functionality within pgpooladmin that would be great.
> 
> It might be that i have overseen something (as before) and this is already
> possible. If so please tell me how.
> 
> Best Regards,
> Uwe
> 
> 
> On 26 April 2011 07:34, Tatsuo Ishii <ishii@...> wrote:
> 
>> > thanks, that message already helped. I tried to recover the postgres
>> server
>> > with the failback_command.
>>
>> You are welcome.
>>
>> > I didn't realize these recovery_* parameters yet.
>> > So I use the recovery_* parameters for recovering the failed postgres
>> > server.
>>
>> pgpool-II connects to backend to issue some SQLs including CHECKPOINT.
>> recovery_* parameters define the user and password for the connection.
>> Usually they are for PostgreSQL super user (postgres).
>>
>> > And the failback_command to attach the postgres server into pgpool
>> > right?
>>
>> If you want to do something special, for example mailing to DBA, then
>> you might want to specify it.  Otherwise you can leave it empty.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> > Best Regards,
>> > Uwe
>> >
>> >
>> > On 26 April 2011 01:06, Tatsuo Ishii <ishii@...> wrote:
>> >
>> >> > I'm using pg-pool-II 3.0.3 with streaming replication.
>> >> > I coded the failback scenario/script for the slave server and the
>> script
>> >> > itself works fine.
>> >> >
>> >> > I now configured the failback script in pgpool.conf and during testing
>> an
>> >> > error message comes up:
>> >> > 2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
>> >> > 2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not connect
>> >> > master node.
>> >> >
>> >> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> >> postgres
>> >> > postgres 0
>> >> > adt-db01 5432 1 0.500000
>> >> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> >> postgres
>> >> > postgres 1
>> >> > adt-db02 5432 3 0.500000
>> >> >
>> >> > pcp commands and pgpooladmin report that the master is up and running
>> and
>> >> > I'm able to connect to the master directly and through pgpool.
>> >> > So what's wrong? So far everything else works fine.
>> >>
>> >> Assuming you have set recovery_user and recovery_passwd correctly, I'm
>> >> not sure what's going on. IMO, the error message is very rare. It's so
>> >> rare and there's a bug in the error path, which had not been found for
>> >> long time. Can please try attached patch? The patch add a little bit
>> >> usefull info to the error message above.
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese: http://www.sraoss.co.jp
>> >>
>>
Tatsuo Ishii | 30 Apr 09:44 2011
Picon

Re: failback setup problem

> Hi Tatsuo,
> 
> ok. now its working fine. thanks for your help.
> 
> after getting through that initial setup for the first time, I'd like to
> give you some feedback about the (at least for me) missing information in
> the documentation.
> 
> - please document the control flow during recovery, e.g.
> connect to master server with recovery_user/recovery_password (connection
> check)
> run recovery_1st_stage_command
> run checkpoint
> run recovery_2nd_stage_command
> ...
> run failback_command
> 
> - please document that recovery_1st_stage_command and
> recovery_2nd_stage_command system calls by the current postgres masterserver
> in the PGDATA directory are. And that the failback_command a shell script
> command or system call from the pgpool server is. I needed to search for it
> in the source code.

Sorry for inconvenience. I will add info to the docs as you suggested.

> I have a different approach of recovering the postgres server. I'm
> recovering from an existing backup. I do that because it is faster and I
> don't put additional i/o load on the just activated server. I guess (or
> hope) most people will have an existing backup.
>
> So my question is - if the aproach of recovering the failed server via a sql
> command is optimal? What if both servers failed? then I'm not able to use
> pcp-tools or pgpoolAdmin for recovering?

I'm not sure what you are trying to do here. If "backup" means it was
created by pg_dump_all, I don't think your approach works. Streaming
replication requires a base backup(binary backup) which is managed by
pg_start_backup/pg_stop_backup.

> I'm asking because I worked for several years as an
> it-production-responsible and I learned a little how administrators
> think/work. They are happy if they have a (or better ONE) defined recovery
> procedure.
> Where am I getting? I'm asking you if it would makes sense to recode or
> reduce the recovery procedure code to one system call e.g. failback_command.
> Most people have their backup and restore functionality coded and ready for
> training and/or desaster. If they could simply use this very same
> functionality within pgpooladmin that would be great.
> 
> It might be that i have overseen something (as before) and this is already
> possible. If so please tell me how.
> 
> Best Regards,
> Uwe
> 
> 
> On 26 April 2011 07:34, Tatsuo Ishii <ishii@...> wrote:
> 
>> > thanks, that message already helped. I tried to recover the postgres
>> server
>> > with the failback_command.
>>
>> You are welcome.
>>
>> > I didn't realize these recovery_* parameters yet.
>> > So I use the recovery_* parameters for recovering the failed postgres
>> > server.
>>
>> pgpool-II connects to backend to issue some SQLs including CHECKPOINT.
>> recovery_* parameters define the user and password for the connection.
>> Usually they are for PostgreSQL super user (postgres).
>>
>> > And the failback_command to attach the postgres server into pgpool
>> > right?
>>
>> If you want to do something special, for example mailing to DBA, then
>> you might want to specify it.  Otherwise you can leave it empty.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> > Best Regards,
>> > Uwe
>> >
>> >
>> > On 26 April 2011 01:06, Tatsuo Ishii <ishii@...> wrote:
>> >
>> >> > I'm using pg-pool-II 3.0.3 with streaming replication.
>> >> > I coded the failback scenario/script for the slave server and the
>> script
>> >> > itself works fine.
>> >> >
>> >> > I now configured the failback script in pgpool.conf and during testing
>> an
>> >> > error message comes up:
>> >> > 2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
>> >> > 2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not connect
>> >> > master node.
>> >> >
>> >> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> >> postgres
>> >> > postgres 0
>> >> > adt-db01 5432 1 0.500000
>> >> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> >> postgres
>> >> > postgres 1
>> >> > adt-db02 5432 3 0.500000
>> >> >
>> >> > pcp commands and pgpooladmin report that the master is up and running
>> and
>> >> > I'm able to connect to the master directly and through pgpool.
>> >> > So what's wrong? So far everything else works fine.
>> >>
>> >> Assuming you have set recovery_user and recovery_passwd correctly, I'm
>> >> not sure what's going on. IMO, the error message is very rare. It's so
>> >> rare and there's a bug in the error path, which had not been found for
>> >> long time. Can please try attached patch? The patch add a little bit
>> >> usefull info to the error message above.
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese: http://www.sraoss.co.jp
>> >>
>>
Uwe Bartels | 30 Apr 11:17 2011
Picon

Re: failback setup problem



On 30 April 2011 09:44, Tatsuo Ishii <ishii-r5vX20e9KLfqq2nvvmkE/A@public.gmane.org> wrote:
> Hi Tatsuo,
>
> ok. now its working fine. thanks for your help.
>
> after getting through that initial setup for the first time, I'd like to
> give you some feedback about the (at least for me) missing information in
> the documentation.
>
> - please document the control flow during recovery, e.g.
> connect to master server with recovery_user/recovery_password (connection
> check)
> run recovery_1st_stage_command
> run checkpoint
> run recovery_2nd_stage_command
> ...
> run failback_command
>
> - please document that recovery_1st_stage_command and
> recovery_2nd_stage_command system calls by the current postgres masterserver
> in the PGDATA directory are. And that the failback_command a shell script
> command or system call from the pgpool server is. I needed to search for it
> in the source code.

Sorry for inconvenience. I will add info to the docs as you suggested.

thanks.

 

> I have a different approach of recovering the postgres server. I'm
> recovering from an existing backup. I do that because it is faster and I
> don't put additional i/o load on the just activated server. I guess (or
> hope) most people will have an existing backup.
>
> So my question is - if the aproach of recovering the failed server via a sql
> command is optimal? What if both servers failed? then I'm not able to use
> pcp-tools or pgpoolAdmin for recovering?

I'm not sure what you are trying to do here. If "backup" means it was
created by pg_dump_all, I don't think your approach works. Streaming
replication requires a base backup(binary backup) which is managed by
pg_start_backup/pg_stop_backup.

Yes of course I have a backup created as described in http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACKUP-BASE-BACKUP.

My question is, what happens if both server failed somehow? Can I still use pgpoolAdmin to recover a database server?

Best..
Uwe

 

> I'm asking because I worked for several years as an
> it-production-responsible and I learned a little how administrators
> think/work. They are happy if they have a (or better ONE) defined recovery
> procedure.
> Where am I getting? I'm asking you if it would makes sense to recode or
> reduce the recovery procedure code to one system call e.g. failback_command.
> Most people have their backup and restore functionality coded and ready for
> training and/or desaster. If they could simply use this very same
> functionality within pgpooladmin that would be great.
>
> It might be that i have overseen something (as before) and this is already
> possible. If so please tell me how.
>
> Best Regards,
> Uwe
>
>
> On 26 April 2011 07:34, Tatsuo Ishii <ishii-r5vX20e9KLfqq2nvvmkE/A@public.gmane.org> wrote:
>
>> > thanks, that message already helped. I tried to recover the postgres
>> server
>> > with the failback_command.
>>
>> You are welcome.
>>
>> > I didn't realize these recovery_* parameters yet.
>> > So I use the recovery_* parameters for recovering the failed postgres
>> > server.
>>
>> pgpool-II connects to backend to issue some SQLs including CHECKPOINT.
>> recovery_* parameters define the user and password for the connection.
>> Usually they are for PostgreSQL super user (postgres).
>>
>> > And the failback_command to attach the postgres server into pgpool
>> > right?
>>
>> If you want to do something special, for example mailing to DBA, then
>> you might want to specify it.  Otherwise you can leave it empty.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> > Best Regards,
>> > Uwe
>> >
>> >
>> > On 26 April 2011 01:06, Tatsuo Ishii <ishii-r5vX20e9KLfqq2nvvmkE/A@public.gmane.org> wrote:
>> >
>> >> > I'm using pg-pool-II 3.0.3 with streaming replication.
>> >> > I coded the failback scenario/script for the slave server and the
>> script
>> >> > itself works fine.
>> >> >
>> >> > I now configured the failback script in pgpool.conf and during testing
>> an
>> >> > error message comes up:
>> >> > 2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
>> >> > 2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not connect
>> >> > master node.
>> >> >
>> >> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> >> postgres
>> >> > postgres 0
>> >> > adt-db01 5432 1 0.500000
>> >> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> >> postgres
>> >> > postgres 1
>> >> > adt-db02 5432 3 0.500000
>> >> >
>> >> > pcp commands and pgpooladmin report that the master is up and running
>> and
>> >> > I'm able to connect to the master directly and through pgpool.
>> >> > So what's wrong? So far everything else works fine.
>> >>
>> >> Assuming you have set recovery_user and recovery_passwd correctly, I'm
>> >> not sure what's going on. IMO, the error message is very rare. It's so
>> >> rare and there's a bug in the error path, which had not been found for
>> >> long time. Can please try attached patch? The patch add a little bit
>> >> usefull info to the error message above.
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese: http://www.sraoss.co.jp
>> >>
>>

_______________________________________________
Pgpool-general mailing list
Pgpool-general@...
http://pgfoundry.org/mailman/listinfo/pgpool-general
Tatsuo Ishii | 30 Apr 11:42 2011
Picon

Re: failback setup problem

>> > I have a different approach of recovering the postgres server. I'm
>> > recovering from an existing backup. I do that because it is faster and I
>> > don't put additional i/o load on the just activated server. I guess (or
>> > hope) most people will have an existing backup.
>> >
>> > So my question is - if the aproach of recovering the failed server via a
>> sql
>> > command is optimal? What if both servers failed? then I'm not able to use
>> > pcp-tools or pgpoolAdmin for recovering?
>>
>> I'm not sure what you are trying to do here. If "backup" means it was
>> created by pg_dump_all, I don't think your approach works. Streaming
>> replication requires a base backup(binary backup) which is managed by
>> pg_start_backup/pg_stop_backup.
>>
> 
> Yes of course I have a backup created as described in
> http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACKUP-BASE-BACKUP
> .
> 
> My question is, what happens if both server failed somehow? Can I still use
> pgpoolAdmin to recover a database server?

No, pgpoolAdmin cannot recover a database server in this situation.
You have to recover the database manualy.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Best..
> Uwe
> 
> 
> 
>>
>> > I'm asking because I worked for several years as an
>> > it-production-responsible and I learned a little how administrators
>> > think/work. They are happy if they have a (or better ONE) defined
>> recovery
>> > procedure.
>> > Where am I getting? I'm asking you if it would makes sense to recode or
>> > reduce the recovery procedure code to one system call e.g.
>> failback_command.
>> > Most people have their backup and restore functionality coded and ready
>> for
>> > training and/or desaster. If they could simply use this very same
>> > functionality within pgpooladmin that would be great.
>> >
>> > It might be that i have overseen something (as before) and this is
>> already
>> > possible. If so please tell me how.
>> >
>> > Best Regards,
>> > Uwe
>> >
>> >
>> > On 26 April 2011 07:34, Tatsuo Ishii <ishii@...> wrote:
>> >
>> >> > thanks, that message already helped. I tried to recover the postgres
>> >> server
>> >> > with the failback_command.
>> >>
>> >> You are welcome.
>> >>
>> >> > I didn't realize these recovery_* parameters yet.
>> >> > So I use the recovery_* parameters for recovering the failed postgres
>> >> > server.
>> >>
>> >> pgpool-II connects to backend to issue some SQLs including CHECKPOINT.
>> >> recovery_* parameters define the user and password for the connection.
>> >> Usually they are for PostgreSQL super user (postgres).
>> >>
>> >> > And the failback_command to attach the postgres server into pgpool
>> >> > right?
>> >>
>> >> If you want to do something special, for example mailing to DBA, then
>> >> you might want to specify it.  Otherwise you can leave it empty.
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese: http://www.sraoss.co.jp
>> >>
>> >> > Best Regards,
>> >> > Uwe
>> >> >
>> >> >
>> >> > On 26 April 2011 01:06, Tatsuo Ishii <ishii@...> wrote:
>> >> >
>> >> >> > I'm using pg-pool-II 3.0.3 with streaming replication.
>> >> >> > I coded the failback scenario/script for the slave server and the
>> >> script
>> >> >> > itself works fine.
>> >> >> >
>> >> >> > I now configured the failback script in pgpool.conf and during
>> testing
>> >> an
>> >> >> > error message comes up:
>> >> >> > 2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
>> >> >> > 2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not
>> connect
>> >> >> > master node.
>> >> >> >
>> >> >> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> >> >> postgres
>> >> >> > postgres 0
>> >> >> > adt-db01 5432 1 0.500000
>> >> >> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> >> >> postgres
>> >> >> > postgres 1
>> >> >> > adt-db02 5432 3 0.500000
>> >> >> >
>> >> >> > pcp commands and pgpooladmin report that the master is up and
>> running
>> >> and
>> >> >> > I'm able to connect to the master directly and through pgpool.
>> >> >> > So what's wrong? So far everything else works fine.
>> >> >>
>> >> >> Assuming you have set recovery_user and recovery_passwd correctly,
>> I'm
>> >> >> not sure what's going on. IMO, the error message is very rare. It's
>> so
>> >> >> rare and there's a bug in the error path, which had not been found
>> for
>> >> >> long time. Can please try attached patch? The patch add a little bit
>> >> >> usefull info to the error message above.
>> >> >> --
>> >> >> Tatsuo Ishii
>> >> >> SRA OSS, Inc. Japan
>> >> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >> Japanese: http://www.sraoss.co.jp
>> >> >>
>> >>
>>
Uwe Bartels | 30 Apr 14:34 2011
Picon

Re: failback setup problem


On 30 April 2011 11:42, Tatsuo Ishii <ishii-r5vX20e9KLfqq2nvvmkE/A@public.gmane.org> wrote:
>> > I have a different approach of recovering the postgres server. I'm
>> > recovering from an existing backup. I do that because it is faster and I
>> > don't put additional i/o load on the just activated server. I guess (or
>> > hope) most people will have an existing backup.
>> >
>> > So my question is - if the aproach of recovering the failed server via a
>> sql
>> > command is optimal? What if both servers failed? then I'm not able to use
>> > pcp-tools or pgpoolAdmin for recovering?
>>
>> I'm not sure what you are trying to do here. If "backup" means it was
>> created by pg_dump_all, I don't think your approach works. Streaming
>> replication requires a base backup(binary backup) which is managed by
>> pg_start_backup/pg_stop_backup.
>>
>
> Yes of course I have a backup created as described in
> http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACKUP-BASE-BACKUP
> .
>
> My question is, what happens if both server failed somehow? Can I still use
> pgpoolAdmin to recover a database server?

No, pgpoolAdmin cannot recover a database server in this situation.
You have to recover the database manualy.
Yes. I thought that of course.
So my question was, if you leave the whole restore process up to one called recovery script - including optional checkpointing etc. pgpool would be more flexible and simplier in terms of supporting different recovery procedures.

By he way - where does pgpool actually store the information about attached/detached servers?

Best Regards,
Uwe


 
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Best..
> Uwe
>
>
>
>>
>> > I'm asking because I worked for several years as an
>> > it-production-responsible and I learned a little how administrators
>> > think/work. They are happy if they have a (or better ONE) defined
>> recovery
>> > procedure.
>> > Where am I getting? I'm asking you if it would makes sense to recode or
>> > reduce the recovery procedure code to one system call e.g.
>> failback_command.
>> > Most people have their backup and restore functionality coded and ready
>> for
>> > training and/or desaster. If they could simply use this very same
>> > functionality within pgpooladmin that would be great.
>> >
>> > It might be that i have overseen something (as before) and this is
>> already
>> > possible. If so please tell me how.
>> >
>> > Best Regards,
>> > Uwe
>> >
>> >
>> > On 26 April 2011 07:34, Tatsuo Ishii <ishii-r5vX20e9KLfqq2nvvmkE/A@public.gmane.org> wrote:
>> >
>> >> > thanks, that message already helped. I tried to recover the postgres
>> >> server
>> >> > with the failback_command.
>> >>
>> >> You are welcome.
>> >>
>> >> > I didn't realize these recovery_* parameters yet.
>> >> > So I use the recovery_* parameters for recovering the failed postgres
>> >> > server.
>> >>
>> >> pgpool-II connects to backend to issue some SQLs including CHECKPOINT.
>> >> recovery_* parameters define the user and password for the connection.
>> >> Usually they are for PostgreSQL super user (postgres).
>> >>
>> >> > And the failback_command to attach the postgres server into pgpool
>> >> > right?
>> >>
>> >> If you want to do something special, for example mailing to DBA, then
>> >> you might want to specify it.  Otherwise you can leave it empty.
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese: http://www.sraoss.co.jp
>> >>
>> >> > Best Regards,
>> >> > Uwe
>> >> >
>> >> >
>> >> > On 26 April 2011 01:06, Tatsuo Ishii <ishii-r5vX20e9KLfqq2nvvmkE/A@public.gmane.org> wrote:
>> >> >
>> >> >> > I'm using pg-pool-II 3.0.3 with streaming replication.
>> >> >> > I coded the failback scenario/script for the slave server and the
>> >> script
>> >> >> > itself works fine.
>> >> >> >
>> >> >> > I now configured the failback script in pgpool.conf and during
>> testing
>> >> an
>> >> >> > error message comes up:
>> >> >> > 2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
>> >> >> > 2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not
>> connect
>> >> >> > master node.
>> >> >> >
>> >> >> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> >> >> postgres
>> >> >> > postgres 0
>> >> >> > adt-db01 5432 1 0.500000
>> >> >> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>> >> >> postgres
>> >> >> > postgres 1
>> >> >> > adt-db02 5432 3 0.500000
>> >> >> >
>> >> >> > pcp commands and pgpooladmin report that the master is up and
>> running
>> >> and
>> >> >> > I'm able to connect to the master directly and through pgpool.
>> >> >> > So what's wrong? So far everything else works fine.
>> >> >>
>> >> >> Assuming you have set recovery_user and recovery_passwd correctly,
>> I'm
>> >> >> not sure what's going on. IMO, the error message is very rare. It's
>> so
>> >> >> rare and there's a bug in the error path, which had not been found
>> for
>> >> >> long time. Can please try attached patch? The patch add a little bit
>> >> >> usefull info to the error message above.
>> >> >> --
>> >> >> Tatsuo Ishii
>> >> >> SRA OSS, Inc. Japan
>> >> >> English: http://www.sraoss.co.jp/index_en.php
>> >> >> Japanese: http://www.sraoss.co.jp
>> >> >>
>> >>
>>

_______________________________________________
Pgpool-general mailing list
Pgpool-general@...
http://pgfoundry.org/mailman/listinfo/pgpool-general
Tatsuo Ishii | 1 May 00:01 2011
Picon

Re: failback setup problem

>> >> I'm not sure what you are trying to do here. If "backup" means it was
>> >> created by pg_dump_all, I don't think your approach works. Streaming
>> >> replication requires a base backup(binary backup) which is managed by
>> >> pg_start_backup/pg_stop_backup.
>> >>
>> >
>> > Yes of course I have a backup created as described in
>> >
>> http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACKUP-BASE-BACKUP

If you keep wal archives along with the base backup, then it should be
possible to recover primary PostgreSQL from them of course. Question
is, how to recover standbys. In my understanding you need to recreate
them from the recovered primary anyway and it will require *new* base
backup.

>> > My question is, what happens if both server failed somehow? Can I still
>> use
>> > pgpoolAdmin to recover a database server?
>>
>> No, pgpoolAdmin cannot recover a database server in this situation.
>> You have to recover the database manualy.
>>
> Yes. I thought that of course.
> So my question was, if you leave the whole restore process up to one called
> recovery script - including optional checkpointing etc. pgpool would be more
> flexible and simplier in terms of supporting different recovery procedures.

Good question. I will inspect current code if your idea is possible
while updating docs.

> By he way - where does pgpool actually store the information about
> attached/detached servers?

On shared memory.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
Tatsuo Ishii | 1 May 11:50 2011
Picon

Re: failback setup problem

>>> >> I'm not sure what you are trying to do here. If "backup" means it was
>>> >> created by pg_dump_all, I don't think your approach works. Streaming
>>> >> replication requires a base backup(binary backup) which is managed by
>>> >> pg_start_backup/pg_stop_backup.
>>> >>
>>> >
>>> > Yes of course I have a backup created as described in
>>> >
>>> http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACKUP-BASE-BACKUP
> 
> If you keep wal archives along with the base backup, then it should be
> possible to recover primary PostgreSQL from them of course. Question
> is, how to recover standbys. In my understanding you need to recreate
> them from the recovered primary anyway and it will require *new* base
> backup.
> 
>>> > My question is, what happens if both server failed somehow? Can I still
>>> use
>>> > pgpoolAdmin to recover a database server?
>>>
>>> No, pgpoolAdmin cannot recover a database server in this situation.
>>> You have to recover the database manualy.
>>>
>> Yes. I thought that of course.
>> So my question was, if you leave the whole restore process up to one called
>> recovery script - including optional checkpointing etc. pgpool would be more
>> flexible and simplier in terms of supporting different recovery procedures.
> 
> Good question. I will inspect current code if your idea is possible
> while updating docs.

Here are steps executed in recovery procedure (described in the updted
doc today). Note that CHECK POINT is not performed in the procedure.
If all PostgreSQL servers down case, you want to skip #1 and #2 and
want to execute #3, #4, #5 by ssh. What do you think?

   1. Pgpool-II connects to primary server's template1 database as
   user = recovery_user, password = recovery_password.

   2. Primary server executes pgpool_recovery function.

   3. pgpool_recovery function executes
   recovery_1st_stage_command. Note that PostgreSQL executes functions
   with database cluster as the current directory. Thus
   recovery_1st_stage_command is executed in the database cluster
   directory.

   4. Primary server executes pgpool_remote_start function. This
   function executes a script named "pgpool_remote_start" in the
   database cluster directory, and it executes pg_ctl command on the
   standby server to be recovered via ssh. pg_ctl will start
   postmaster in background. So we need to make sure that postmaster
   on the standby actually starts.

   5. pgpool-II tries to connect to the standby PostgreSQL as user =
   recovery_user and password = recovery_password. The database to be
   connected is "postgres" if possible. Otherwise "template1" is
   used. pgpool-II retries for recovery_timeout seconds. If success,
   go to next step.

   6. If failback_command is not empty, pgpool-II parent process
   executes the script.

   7. After failback_command finishes, pgpool-II restart all child
   processes.

>> By he way - where does pgpool actually store the information about
>> attached/detached servers?
> 
> On shared memory.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general@...
> http://pgfoundry.org/mailman/listinfo/pgpool-general
Sandeep Thakkar | 16 May 12:57 2011
Picon

Re: failback setup problem

"After failback_command finishes, pgpool-II restart all child  processes." . How about the pgpool server process (pgpool -d -D)? Does it also get affected when I execute pcp_recovery_node? I would like to see this source code..  Which files should I look into?

Thanks.

From: Tatsuo Ishii <ishii-r5vX20e9KLfqq2nvvmkE/A@public.gmane.org>
To: uwe.bartels-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Cc: pgpool-general-JL6EbXIHTPOxbKUeIHjxjQ@public.gmane.org
Sent: Sun, May 1, 2011 3:20:31 PM
Subject: Re: [Pgpool-general] failback setup problem

>>> >> I'm not sure what you are trying to do here. If "backup" means it was
>>> >> created by pg_dump_all, I don't think your approach works. Streaming
>>> >> replication requires a base backup(binary backup) which is managed by
>>> >> pg_start_backup/pg_stop_backup.
>>> >>
>>> >
>>> > Yes of course I have a backup created as described in
>>> >
>>> http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACKUP-BASE-BACKUP
>
> If you keep wal archives along with the base backup, then it should be
> possible to recover primary PostgreSQL from them of course. Question
> is, how to recover standbys. In my understanding you need to recreate
> them from the recovered primary anyway and it will require *new* base
> backup.
>
>>> > My question is, what happens if both server failed somehow? Can I still
>>> use
>>> > pgpoolAdmin to recover a database server?
>>>
>>> No, pgpoolAdmin cannot recover a database server in this situation.
>>> You have to recover the database manualy.
>>>
>> Yes. I thought that of course.
>> So my question was, if you leave the whole restore process up to one called
>> recovery script - including optional checkpointing etc. pgpool would be more
>> flexible and simplier in terms of supporting different recovery procedures.
>
> Good question. I will inspect curr ent code if your idea is possible
> while updating docs.

Here are steps executed in recovery procedure (described in the updted
doc today). Note that CHECK POINT is not performed in the procedure.
If all PostgreSQL servers down case, you want to skip #1 and #2 and
want to execute #3, #4, #5 by ssh. What do you think?

  1. Pgpool-II connects to primary server's template1 database as
  user = recovery_user, password = recovery_password.

  2. Primary server executes pgpool_recovery function.

  3. pgpool_recovery function executes
  recovery_1st_stage_command. Note that PostgreSQL executes functions
  with database cluster as the current directory. Thus
  recovery_1st_stage_command is executed in the database cluster
  directory.

  4. Primary server executes pgpool_remote_start function. This
  function executes a script named "p gpool_remote_start" in the
  database cluster directory, and it executes pg_ctl command on the
  standby server to be recovered via ssh. pg_ctl will start
  postmaster in background. So we need to make sure that postmaster
  on the standby actually starts.

  5. pgpool-II tries to connect to the standby PostgreSQL as user =
  recovery_user and password = recovery_password. The database to be
  connected is "postgres" if possible. Otherwise "template1" is
  used. pgpool-II retries for recovery_timeout seconds. If success,
  go to next step.

  6. If failback_command is not empty, pgpool-II parent process
  executes the script.

  7. After failback_command finishes, pgpool-II restart all child
  processes.

>> By he way - where does pgpool actually store the information about
>> attached/detached servers?
& gt;
> On shared memory.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general-JL6EbXIHTPOxbKUeIHjxjQ@public.gmane.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general
_______________________________________________
Pgpool-general mailing list
Pgpool-general-JL6EbXIHTPOxbKUeIHjxjQ@public.gmane.org
http://pgfoundry.org/mailman/listinfo/pgpool-general
_______________________________________________
Pgpool-general mailing list
Pgpool-general@...
http://pgfoundry.org/mailman/listinfo/pgpool-general
Tatsuo Ishii | 17 May 11:08 2011
Picon

Re: failback setup problem

You mean the pgpool parent process? It won't restart by failover or
any pcp command. BTW pgpool main process source code is main.c.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> "After failback_command finishes, pgpool-II restart all child   processes." . 
> How about the pgpool server process (pgpool -d -D)? Does it also get affected 
> when I execute pcp_recovery_node? I would like to see this source code..  Which 
> files should I look into?
> 
> 
> Thanks.
> 
> 
> 
> 
> ________________________________
> From: Tatsuo Ishii <ishii@...>
> To: uwe.bartels@...
> Cc: pgpool-general@...
> Sent: Sun, May 1, 2011 3:20:31 PM
> Subject: Re: [Pgpool-general] failback setup problem
> 
>>>> >> I'm not sure what you are trying to do here. If "backup" means it was
>>>> >> created by pg_dump_all, I don't think your approach works. Streaming
>>>> >> replication requires a base backup(binary backup) which is managed by
>>>> >> pg_start_backup/pg_stop_backup.
>>>> >>
>>>> >
>>>> > Yes of course I have a backup created as described in
>>>> >
>>>>http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACKUP-BASE-BACKUP
>>>>P
>> 
>> If you keep wal archives along with the base backup, then it should be
>> possible to recover primary PostgreSQL from them of course. Question
>> is, how to recover standbys. In my understanding you need to recreate
>> them from the recovered primary anyway and it will require *new* base
>> backup.
>> 
>>>> > My question is, what happens if both server failed somehow? Can I still
>>>> use
>>>> > pgpoolAdmin to recover a database server?
>>>>
>>>> No, pgpoolAdmin cannot recover a database server in this situation.
>>>> You have to recover the database manualy.
>>>>
>>> Yes. I thought that of course.
>>> So my question was, if you leave the whole restore process up to one called
>>> recovery script - including optional checkpointing etc. pgpool would be more
>>> flexible and simplier in terms of supporting different recovery procedures.
>> 
>> Good question. I will inspect current code if your idea is possible
>> while updating docs.
> 
> Here are steps executed in recovery procedure (described in the updted
> doc today). Note that CHECK POINT is not performed in the procedure.
> If all PostgreSQL servers down case, you want to skip #1 and #2 and
> want to execute #3, #4, #5 by ssh. What do you think?
> 
>    1. Pgpool-II connects to primary server's template1 database as
>    user = recovery_user, password = recovery_password.
> 
>    2. Primary server executes pgpool_recovery function.
> 
>    3. pgpool_recovery function executes
>    recovery_1st_stage_command. Note that PostgreSQL executes functions
>    with database cluster as the current directory. Thus
>    recovery_1st_stage_command is executed in the database cluster
>    directory.
> 
>    4. Primary server executes pgpool_remote_start function. This
>    function executes a script named "pgpool_remote_start" in the
>    database cluster directory, and it executes pg_ctl command on the
>    standby server to be recovered via ssh. pg_ctl will start
>    postmaster in background. So we need to make sure that postmaster
>    on the standby actually starts.
> 
>    5. pgpool-II tries to connect to the standby PostgreSQL as user =
>    recovery_user and password = recovery_password. The database to be
>    connected is "postgres" if possible. Otherwise "template1" is
>    used. pgpool-II retries for recovery_timeout seconds. If success,
>    go to next step.
> 
>    6. If failback_command is not empty, pgpool-II parent process
>    executes the script.
> 
>    7. After failback_command finishes, pgpool-II restart all child
>    processes.
> 
>>> By he way - where does pgpool actually store the information about
>>> attached/detached servers?
>> 
>> On shared memory.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>> _______________________________________________
>> Pgpool-general mailing list
>> Pgpool-general@...
>> http://pgfoundry.org/mailman/listinfo/pgpool-general
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general@...
> http://pgfoundry.org/mailman/listinfo/pgpool-general
Tatsuo Ishii | 1 May 11:42 2011
Picon

Re: failback setup problem

>> Hi Tatsuo,
>> 
>> ok. now its working fine. thanks for your help.
>> 
>> after getting through that initial setup for the first time, I'd like to
>> give you some feedback about the (at least for me) missing information in
>> the documentation.
>> 
>> - please document the control flow during recovery, e.g.
>> connect to master server with recovery_user/recovery_password (connection
>> check)
>> run recovery_1st_stage_command
>> run checkpoint
>> run recovery_2nd_stage_command
>> ...
>> run failback_command
>> 
>> - please document that recovery_1st_stage_command and
>> recovery_2nd_stage_command system calls by the current postgres masterserver
>> in the PGDATA directory are. And that the failback_command a shell script
>> command or system call from the pgpool server is. I needed to search for it
>> in the source code.
> 
> Sorry for inconvenience. I will add info to the docs as you suggested.

Done.

>> I have a different approach of recovering the postgres server. I'm
>> recovering from an existing backup. I do that because it is faster and I
>> don't put additional i/o load on the just activated server. I guess (or
>> hope) most people will have an existing backup.
>>
>> So my question is - if the aproach of recovering the failed server via a sql
>> command is optimal? What if both servers failed? then I'm not able to use
>> pcp-tools or pgpoolAdmin for recovering?
> 
> I'm not sure what you are trying to do here. If "backup" means it was
> created by pg_dump_all, I don't think your approach works. Streaming
> replication requires a base backup(binary backup) which is managed by
> pg_start_backup/pg_stop_backup.
> 
>> I'm asking because I worked for several years as an
>> it-production-responsible and I learned a little how administrators
>> think/work. They are happy if they have a (or better ONE) defined recovery
>> procedure.
>> Where am I getting? I'm asking you if it would makes sense to recode or
>> reduce the recovery procedure code to one system call e.g. failback_command.
>> Most people have their backup and restore functionality coded and ready for
>> training and/or desaster. If they could simply use this very same
>> functionality within pgpooladmin that would be great.
>> 
>> It might be that i have overseen something (as before) and this is already
>> possible. If so please tell me how.
>> 
>> Best Regards,
>> Uwe
>> 
>> 
>> On 26 April 2011 07:34, Tatsuo Ishii <ishii@...> wrote:
>> 
>>> > thanks, that message already helped. I tried to recover the postgres
>>> server
>>> > with the failback_command.
>>>
>>> You are welcome.
>>>
>>> > I didn't realize these recovery_* parameters yet.
>>> > So I use the recovery_* parameters for recovering the failed postgres
>>> > server.
>>>
>>> pgpool-II connects to backend to issue some SQLs including CHECKPOINT.
>>> recovery_* parameters define the user and password for the connection.
>>> Usually they are for PostgreSQL super user (postgres).
>>>
>>> > And the failback_command to attach the postgres server into pgpool
>>> > right?
>>>
>>> If you want to do something special, for example mailing to DBA, then
>>> you might want to specify it.  Otherwise you can leave it empty.
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese: http://www.sraoss.co.jp
>>>
>>> > Best Regards,
>>> > Uwe
>>> >
>>> >
>>> > On 26 April 2011 01:06, Tatsuo Ishii <ishii@...> wrote:
>>> >
>>> >> > I'm using pg-pool-II 3.0.3 with streaming replication.
>>> >> > I coded the failback scenario/script for the slave server and the
>>> script
>>> >> > itself works fine.
>>> >> >
>>> >> > I now configured the failback script in pgpool.conf and during testing
>>> an
>>> >> > error message comes up:
>>> >> > 2011-04-09 06:42:18 LOG:   pid 16863: starting recovering node 1
>>> >> > 2011-04-09 06:42:18 ERROR: pid 16863: start_recover: could not connect
>>> >> > master node.
>>> >> >
>>> >> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>>> >> postgres
>>> >> > postgres 0
>>> >> > adt-db01 5432 1 0.500000
>>> >> > [root <at> adt-web01 pgpool-II-3.0.3]# pcp_node_info 10 adt-web01 9898
>>> >> postgres
>>> >> > postgres 1
>>> >> > adt-db02 5432 3 0.500000
>>> >> >
>>> >> > pcp commands and pgpooladmin report that the master is up and running
>>> and
>>> >> > I'm able to connect to the master directly and through pgpool.
>>> >> > So what's wrong? So far everything else works fine.
>>> >>
>>> >> Assuming you have set recovery_user and recovery_passwd correctly, I'm
>>> >> not sure what's going on. IMO, the error message is very rare. It's so
>>> >> rare and there's a bug in the error path, which had not been found for
>>> >> long time. Can please try attached patch? The patch add a little bit
>>> >> usefull info to the error message above.
>>> >> --
>>> >> Tatsuo Ishii
>>> >> SRA OSS, Inc. Japan
>>> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> Japanese: http://www.sraoss.co.jp
>>> >>
>>>
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general@...
> http://pgfoundry.org/mailman/listinfo/pgpool-general

Gmane