THIS HAS BEEN RESOLVED.
----------------------
Dear clients,
we have small update regarding our last announcement.
Regarding
1) ARP Resolving Issues in a Larger (VPS) VLAN (after upgrade to new Routing Stack and changing network style
Thursday night to Friday 18th October we have changed the mode to native 40G for every Port-Channel member on this channel (2x40G).
For this we needed to restart some of the line cards to apply a new config on it.
First there was no errors until the morning. Then we have seen errors on the link again.
This is caused by some microcode on the optical transceivers/lasers. Since this was traced back by our supplier before to the microcode on the transceivers
we try soon this evening or in the morning of the Saturday to replace the transceivers with some from a other supplier. In stock we have only had 24 other 40G Modules of the same that are from the same supplier.
However there is no FPC restart or anything needed that would cause temporare packet loss due to the redundant style of the channel the traffic can contiune to flow while
this modules are being replaced.
We have deeply debugged this before, gathered informations, reviewed our setup and configurations as well as have anaylzed traffic in the network using netflow/sflow collectors and export the traffic data to our analyzers. Alot of actions has been done in the meanhile. There is no other disturbing factor
Especially this is not causing any issues or performance loss on most paths in the network that might be connected via this channel.
Other channels using other transceivers as well as the 100G ones working without problems too.
2) Failure and Hardware Investigation/Replacement of node
10GKVMVPS28.hostslick.com (formerly 10GKVMVPS15.hostslick.com):
We have received our shipment with new hardware and SSDs to rebuild this node fully.
As mentioned we didnt have enough 2TB drives in stock (this node has 24x2TB SSD)
The Node has been reinstateed with name 10GKVMVPS28.hostslick.com.
Currently we are doing some tests and except to have all VPS that where served by it we think latest in next 12-24h.
The process of re-provisioning and processing all tickets will shortly begin within the next hour
As we are currently not restocking VPS very much (for us its a secondary product, we focus on the dedicated server market), we have had unfortunately not enough space free to move everyone to other node.
This is also because this Node was one of the bigger ones. So this Node catched us during a very bad time.
However our compensation offer still stands and everyone affected will receive 1 month free service on top and 10€ compensation credit.
___________________________________________________________________________________________________________
-- Initial announcement 15. October 2024 --
We hereby announce delays in our Support System due to two ongoing issues.
We ask for just a little patience now as everything is about to be fixed. We will get back to you soon. Please do not open multiple or new tickets, as we prioritize the oldest ones first.
1) ARP Resolving Issues in a Larger (VPS) VLAN (after upgrade to new Routing Stack and changing network style)
Two 40G interfaces (80 Gbit channel) from our Juniper MX960 to part of our network where we run Arista switches were split into 4x10G breakout mode from Juniper to Arista, with multiple links aggregated using LACP. ARP resolution failures and multicast delivery problems were observed. The root cause was traced back to the LACP hash algorithm, which was not appropriately distributing multicast or broadcast traffic (such as ARP requests) across the links.
This issue was mainly seen in a larger VLAN where we placed the VPS nodes due to the volume of ARP traffic.
Dedicated server clients (or even most VPS customers) might not have noticed any issues as they are either in their own VLAN or routed with their own ASN/IPs in separate VLANs, or if not, in smaller shared VLANs.
However, in this larger VPS VLAN (where VPS nodes are also connected via this channel), we saw degraded network speeds to some nodes and packet loss.
Resolution:
We are updating the configuration algorithm as well as switching back to a native 2x40Gbit setup to fully resolve the issue tonight.
We have already reviewed our entire network configuration and setup and have worked hard over the past 48 hours to address this.
2) Failure and Hardware Investigation/Replacement:
10GKVMVPS28.hostslick.com (formerly 10GKVMVPS15.hostslick.com)
Unfortunately, during a very inconvenient time, we experienced a major failure with the 10GKVM15 node, which primarily served some special deals from last year. During the same days when we were busy with network upgrades to move to the new routing stack, this node failed.
We discovered that more than one disk failed in the same mirrored span, which in RAID10 means data loss. We have informed the affected clients about compensation and reinstalled their service.
Further investigation with the SSD supplier was conducted. We use SSDs from the same supplier (just a different model) in many servers. We have never had issues with them before, but it suddenly happened that this specific model used to build this node had many problems. We replaced the failed drives and reinstalled the node. After that, we re-provisioned the clients.
However, two days later, six more disks failed, which was extremely frustrating for us. We had to order new ones, as we did not have 2TB SSDs in stock (the node had 24x2TB SSDs), and we are still in contact with the manufacturer. We have never experienced such an issue before, despite having racks full of VPS nodes. We understand the frustration from affected clients and are doing everything we can to resolve the situation.
We are currently working to get the node back online, and we expect this to happen by Wednesday evening at the latest. During this time, we have created a dedicated support ticket status/category to better organize tickets related to this issue and respond to every client as soon as possible to restore your services fully.
Compensation:
We are offering €10 credit and an additional month of free service to every affected client.
Péntek, Október 18, 2024