Cambium Networks - CBRS Issues for Federated Wireless Customers – Incident details

All systems operational

CBRS Issues for Federated Wireless Customers

Resolved
Degraded performance
Started almost 2 years agoLasted about 4 hours

Affected

CBRS

Degraded performance from 3:49 AM to 7:36 AM

CBRS Partner - Federated Wireless

Degraded performance from 3:49 AM to 7:36 AM

Updates
  • Update
    Update

    The team at Federated Wireless have updated this investigation and sent the following details:

    ================

    The Federated Wireless Spectrum controller is currently running Release 3.9, which is the original plan. While these statements do not comprise an RCA, they should clarify what we’ve observed and what we’ve been able to do to work around some of the issues.

    FEBRUARY 3, 2023 APPROXIMATELY 5AM EASTERN TIME: On Thursday night, February 2, Federated Wireless performed the upgrade from Spectrum Controller Release 3.8 to Release 3.9. At the conclusion of the upgrade two issues occurred:
    Approximately 1,000 devices belonging to three separate customers were unable to connect to the SAS, while most of the remainder of the customer base of over 150,000 devices operated normally.
    A final step in our long-standing blue-green upgrade/downgrade process applies only to CBSDs that use TCP keepalives. Federated performs a “rolling restart” to break the persistent TCP connections between devices and the old SAS instance so that they are automatically reestablished on the new SAS instance. Shortly after performing the rolling restart, over 100,000 devices unexpectedly transitioned into either registered or granted state from authorized state. It was determined that this state transition occurred as a result of steps taken to solve the problem described in part a above. Devices recovered on their own after several hours..
    During the day on Friday, February 3, Federated determined that the problem impacting 1,000 devices was caused by an authentication certificate cipher negotiation issue injected by an open-source component of the SAS as part of the upgrade to 3.9. To get these customers back online, Federated decided to revert to Release 3.8 on that same night. Federated created and approved an emergency change request to notify customers of the downgrade.
    FEBRUARY 3, 2023, APPROXIMATELY 10PM EASTERN TIME: The downgrade to Release 3.8 proceeded smoothly until the rolling restart step, after which the same massive state transition unexpectedly occurred. Federated decided to to back out changes and return to Release 3.9. Devices recovered on their own after several hours.

    Troubleshooting on February 4, 2023 provided Federated Wireless with a more granular approach to working around the authentication certificate problem, and customers were able to do so by supporting the ECC authentication certificate instead of the RSA one. The Federated team will work to get the cipher negotiation issue fixed in a subsequent release.

    While the root cause of exactly what about the rolling restart was causing the state transitions is still under investigation, what is known is that steady state operation of the SAS is normal, and will continue to be so as long as no attempt is made to upgrade or downgrade the SAS version. No such attempts are planned.

    Federated Wireless is happy to address any questions about this. Many Cambium customers can directly open support cases with Federated Wireless to ask questions, and we invite them to do so.

    Federated will also have an RCA completed within 72 business hours and will make that available to Cambium and Cambium CBRS customers.

  • Resolved
    Resolved

    Federated Team confirmed that SAS is back to regular operational state. Customers still having issues getting back to Authorised state can try to check CBSD-SAS communication to make sure CBSD is sending messages to SAS, as SAS is able to respond to requests normally currently.

  • Monitoring
    Monitoring

    We have heard from the team at Federated Wireless that revert procedure to 3.9 version is complete. Error rates have dropped and a fix is in place.

  • Identified
    Identified

    We are continuing to work on a fix for this incident.

  • Investigating
    Investigating

    We are currently investigating this incident.