Article

RFA Recovery and Fail-over

Veerapath Rungruengrayubkul
Developer Advocate Developer Advocate

Overview

This article describes various concepts related to fail-over and recovery in RFA C++ and RFA .Net. It includes RFA behaviors, configurations and necessary information. All topics in this article can apply to Consumer applications, but only connection related topics (i.e. Connection recovery, fail-over) can apply to Non-Interactive Provider applications. This article should benefit users who are interested in the connection and item resiliency in RFA application.

This article requires basic understanding of request, response process and configuration of RFA.

 

Connection Recovery

RFA automatically performs connection recovery on behalf of Consumer and Non-interactive Provider applications. If a connection is lost, RFA periodically attempts to reconnect until the connection is reestablished. Once a connection is reestablished, RFA will re-login on behalf of the application using the same credentials used during the initial login.

Moreover, there are some situations that the connection does not recover. In case that the Login stream is closed (i.e. login denied, max retry interval is reached), RFA will not recover the connection. In this case, applications need to reconnect to the server by sending a login request again.

For Non-interactive Provider applications, they need to re-publish directory refresh message and all items in its watch list, after the connection recovery.

 

Connection Fail-over

RFA provides capability to fail-over to other servers once the connection is failed in the "serverList" configuration parameter. Once RFA fails to connect to last server in the list, it will reconnect to a server in the list again. By default, RFA will connect to the first server on the list. If RFA fails to connect, it will try connecting to next server on the list from left to right. User can also configure the "serverSelectionOrder" parameter to true to change RFA to select server randomly.

Sample scenario: Connection Fail-over with serverSelectionOrder = false

There are 3 servers in the serverList configuration and the serverSelectionOrder is set to false. RFA will try to connect to the first server. If the serverA  failed to connect or goes down, RFA will automatically try to connect to the next server, serverB and so on.

Configurations:

\Connections\Connection_RSSL\serverList                      = “serverA, serverB, serverC”

\Connections\Connection_RSSL\serverSelectionOrder = false

Diagram: Connection Fail-over with serverList


Sample scenario
:
 Connection Fail-over with serverSelectionOrder = true

There are 3 servers in the serverList configuration and the serverSelectionOrder is set to true. RFA will randomly select a server from the list to be connected/ retried. In this diagram, I assume that random server order is serverB=>serverA=>serverC.

Configurations:

\Connections\Connection_RSSL\serverList                      = “serverA, serverB, serverC”

\Connections\Connection_RSSL\serverSelectionOrder = true

Diagram: Connection Fail-over with serverList


Max Retry Count

RFA provides another configuration parameter, "MaxRetryCount", which determines the number of times that RFA will retry an unsuccessful connection, not including the first attempt. By default RFA will retry the connection indefinitely; however, if maxRetryCount is set to some positive number, RFA will only retry the connection that number of times. Once the number of connection retrys reaches the configured number, RFA will close the Login stream by generating a Login status message with DataState=Suspect and StreamState=Closed. This means that RFA will not recovery any connection, once the max try count is reached. In this case, applications need to manually re-register Login again to establish connection.

Below is the sample of Login status generated once the Max Retry Count is reached.

  streamState : Closed

  dataState   : Ok

  statusCode  : NotOpen

  statusText  : Max Retry Connect Limit reached.

If the maxRetryCount and serverList are both configured, the number of retry will be counted only when all servers in the serverList is failed to connect.

 

Item Recovery

Item recovery can be automatically performed by RFA for all streaming item requests that it makes, if SingleOpen is requested by Consumer application. For example, assume that the connection to a provider was lost. If the client requested SingleOpen, RFA will re-request all streaming items from the provider as soon as the connection is re-established. However, if SingleOpen was not request by the client, the client is responsible for re-requesting all items when the connection is re-established.

By default, the SingleOpen will be requested by RFA, if consumer application doesn’t pass the attribute in Login request. So, this can be said that the RFA perform automatically item recovery by default.

Once items are recovered, application will receive changes of item status via the RespStatus. The recovery will continue until the stream of item is closed. The stream closed can be caused by various scenarios such as permission denied, item invalid which StreamState is changed to StreamState::ClosedRecover,  StreamState::Closed, StreamState::Redirect and StreamState::NonStreaming. Some stream states also can be affected by the Single Open and Allow Suspect Data attributes.

 

Single Open and Allow Suspect Data

The SingleOpen and AllowSuspectData are elements passed via the AttribInfo.Attrib of Login request message can affect how state information is processed.

 

  • indicates that the consumer would like automatic item recovery performed for all streaming item requests that it makes. If provider supports SingleOpen which is indicated in the SingleOpen attribute of Login response, provide should drive the recovery of item streams. Otherwise, RFA consumer apps should drive any recovery instead. Once the recovery is driven, item status with DataState::Suspect and StreamState::Open will be passed to consumer application. Otherwise, the DataState::Suspect and StreamState::ClosedRecoverable will be passed instead.
  • indicates that the consumer app does not want to receive suspect status. If a status message with DataState::Suspect and StreamState::Open is received from network, the status message with DataSate::Suspect and StreamState::ClosedRecoverable will be passed instead.

 

If any SingleOpen and AllowSuspectData configuration causes a behavior contradiction (e.g., SingleOpen indicates the provider should handle recovery, but AllowSuspectData indicates that the consumer does not want to receive suspect status), SingleOpen behavior takes precedence.

 

The following table is from RFA C++ RDM Usage Guide document. The table shows how a provider can convert messages to honor the consumer’s SingleOpen and AllowSuspectData settings. The first column in the table shows the provider’s actual RespStatus.StreamState and RespStatus.DataState. Each subsequent column shows how this state information can be modified to follow that column specific SingleOpen and AllowSuspectData settings.


Connection List

Connection list is another configuration where RFA will concurrently establish connections for all RSSL/RSSL_NIPROV connections in the list. Login requests are sent to all connections to receive credential information. It could be expected that all servers in the connection list have the same Login credential.

RFA aggregate Login responses from each connection as well and then pass only aggregated login state changes to application.

The rules for aggregation of stream and data states are as follows:

  • If at least one connection is Open/Ok, client's login state = RespStatus::Open/RespStatus::Ok
  • If all connections login states are Open/Suspect or Disconnected, client's login state = RespStatus::Open/RespStatus::Suspect
  • If at least one connection responds with a login stream state of Closed, client’s login state = RespStatus::Closed

When aggregating the overall state of the login, RFA ensures that login responses received from multiple connections are compatible. If responses are incompatible, then all of the login streams are closed, their corresponding network connections are disconnected, and the application receives a Closed Status message for the Login Stream.

Login responses are considered incompatible if:

  • One of the login connections responds with a Closed Status message.
  • The AttribInfo AppName, NameType, AppId, InstanceId, or Position of the connections does not match.
  • The AttribInfo ProvidePermissionProfile or ProvidePermissionExpressions do not match and the client requested them either explicitly or implicitly.

The Connection List is available for both Consumer and Non Interactive Provider applications. However, there are some differences in term of item requesting and publishing.

For Consumer apps, Service list on each connection will be aggregated and can be requested by Consumer application. Once the application request an item, RFA will send an item request to a most suitable service within the aggregated services list. In addition, if same service name are provided by multiple servers, RFA add only one service to internal source directory list, and then log the following message and then send a request to first service listed in directory payload.

Warning: Received Source Name from Connection <Connection...> is same as the one from Connection <Connection...>

However, if the service goes down, RFA will not recover the item from service on another connection. To allow RFA to recover item from another connection, the Service Group configuration parameters need to be used.

The following illustration demonstrates how RFA aggregates services from multiple connections. Each connection provides different services. Application will receive list of aggregated services via directory response message. Item request will be routed properly to a connection providing the requested service.

Sample scenario: Consumer with Connection List

There are 3 RSSL connections in the ConnectionList configuration. Each connection provides different services. Connection_1 provides two services: A and B. Connection_2 provides a service: C. The last connection provides three services: C, D and F. RFA establishes connection to all RSSL connections and then sends Login and Directory requests. Once the consumer application sends a request for Directory, it will receive aggregated list of services: A, B, C, D, E and F. Application can freely request items from these services. The requests will correctly route to the appropriate connection.

Configuration:

\Sessions\Session1\ConnectionList = “Connection_1, Connection_2, Connection_3”

Diagram: Item requests on aggregated service list.


For Non Interactive provider apps, RFA establishes connections to all ADHs in the Connection List. Login request needs to be registered by application. After the connections are established, application needs to publish Directory and Item responses. RFA will fan-out the responses to all connection established to ADH on the application’s behalf.

Sample scenario: Non-Interactive Provider with Connection List

There are three RSSL_NIPROV connections in the ConnectionList. RFA simultaneously connects to all connections.

Configuration:

\Sessions\Session1\ConnectionList = “Connection_1, Connection_2, Connection_3”

Diagram: Non-Interactive Provider with connectionList


Service Group

A service group contains a combination of multiple concrete services. Multiple services providing data for the same items are grouped together providing redundancy in case one of the services goes down. It also includes other scenarios that data cannot be provided by one of the service: i.e. connection goes down.

The service group can be used for concrete services provided on either single-connection or multi-connection. If the single session contains multiple connections which provide the same service, creating service aliases for the services is required.

Please note that the Service Group supports only Consumer apps, not Non-Interactive Provider apps.

Sample scenario: Consumer with Service Group and Connection List

There are three connections in the connectionList. Connection_1 and Connection_2 provide service which has the same name. In this case, service alias needs to be used to differentiate service A in each connection. The service group named “SG1” contains three services: Connection_1:A, Connection_2:A and Connection_3:D.

Once the application call registerClient() function to request an item from “SG1” service, RFA will initially send an item request to a service in the service group. If the subscribed service is down, RFA will automatically recover the item stream by sending the request to another service in the group. Item refresh message will be passed to application as unsolicited refresh message.

Configurations:

\Connections\Connection_1\serviceList = “A-1”

\Connections\Connection_2\serviceList = “A-2”

\Services\A-1\feedName = “A”

\Services\A-2\feedName = “A”

\Sessions\Session1\connectionList = “Connection_1, Connection_2, Connection_3”

\Sessions\Session1\serviceGroupList = “SG1”

\ServiceGroups\SG1\serviceList = “A-1,A-2,D”

 

Diagram: Initial request with Service Group and Connection List


Diagram:  Item recovery with Service Group and Connection List

References

For further details, please check out the following resources: