Overview
A significant bug has been identified in Loki v3.0.0, which causes the backend to crash with a segmentation violation error (SIGSEGV) when the index_gateway.mode
is set to ring
. This issue was first reported by user @awoimbee on March 20, 2024, and has been discussed extensively with multiple users encountering similar problems.
Description of the Bug
When running the Loki backend with the configuration setting index_gateway.mode
set to ring
, the system crashes with a segmentation fault due to a nil pointer dereference. The error trace typically looks like this:
gopanic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x288 pc=0x223f470]
goroutine 1 [running]:
github.com/grafana/loki/pkg/loki.(*Loki).updateConfigForShipperStore(0xc000638be0?)
/src/loki/pkg/loki/modules.go:709 +0xb0
...
This issue affects the stability of the Loki backend, causing it to enter a CrashLoopBack state.
Reproduction
The problem has been reproduced consistently under the following conditions:
- Infrastructure: Kubernetes
- Deployment tool: Helm
- Loki version: 3.0.0
- Configuration:
index_gateway.mode
set toring
Workaround
As a temporary solution, users have been advised to change the index_gateway.mode
from ring
to simple
in the configuration file. This has been confirmed to prevent the crash:
yamlindex_gateway:
mode: simple
Resolution and Further Developments
The Grafana Loki team has acknowledged this issue and a fix has been implemented to address the nil pointer dereference during the bloomstore initialization. The fix can be tracked and reviewed in this commit.
Key Updates:
- A fix for the nil pointer dereference was merged on May 3, 2024.
- The problem persists for some users even after updating to the latest patches.
- Additional related issues are being tracked and addressed, such as those mentioned in issue #13208.
User Reports and Feedback
Multiple users have reported encountering this issue and have shared their experiences and configurations in the issue thread. For instance:
- @Nissou31 reported the issue while deploying a scalable Loki 3.0.0 setup.
- @alexandergoncharovaspecta faced a similar problem with a three-pod setup where one pod was crashing.
- @sslny57 and @abh shared that changing the mode from
ring
tosimple
resolved their crash issues, but they encountered other configuration problems.
Conclusion
The Loki v3.0.0 backend crash issue due to index_gateway.mode: ring
is a critical bug affecting users running Loki in Kubernetes environments. While a temporary workaround is available, a permanent fix has been implemented and is available in the latest updates. Users are encouraged to update their Loki deployments and monitor the GitHub issue tracker for further updates and resolutions.
For more details and ongoing discussions, visit the GitHub issue page.