Critical Issue in Loki v3.0.0: Backend Crashes Due to Index Gateway Mode Setting

Overview

A significant bug has been identified in Loki v3.0.0, which causes the backend to crash with a segmentation violation error (SIGSEGV) when the index_gateway.mode is set to ring. This issue was first reported by user @awoimbee on March 20, 2024, and has been discussed extensively with multiple users encountering similar problems.





Description of the Bug

When running the Loki backend with the configuration setting index_gateway.mode set to ring, the system crashes with a segmentation fault due to a nil pointer dereference. The error trace typically looks like this:

go
panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x288 pc=0x223f470] goroutine 1 [running]: github.com/grafana/loki/pkg/loki.(*Loki).updateConfigForShipperStore(0xc000638be0?) /src/loki/pkg/loki/modules.go:709 +0xb0 ...

This issue affects the stability of the Loki backend, causing it to enter a CrashLoopBack state.

Reproduction

The problem has been reproduced consistently under the following conditions:

  • Infrastructure: Kubernetes
  • Deployment tool: Helm
  • Loki version: 3.0.0
  • Configuration: index_gateway.mode set to ring

Workaround

As a temporary solution, users have been advised to change the index_gateway.mode from ring to simple in the configuration file. This has been confirmed to prevent the crash:

yaml
index_gateway: mode: simple

Resolution and Further Developments

The Grafana Loki team has acknowledged this issue and a fix has been implemented to address the nil pointer dereference during the bloomstore initialization. The fix can be tracked and reviewed in this commit.

Key Updates:

  • A fix for the nil pointer dereference was merged on May 3, 2024.
  • The problem persists for some users even after updating to the latest patches.
  • Additional related issues are being tracked and addressed, such as those mentioned in issue #13208.

User Reports and Feedback

Multiple users have reported encountering this issue and have shared their experiences and configurations in the issue thread. For instance:

  • @Nissou31 reported the issue while deploying a scalable Loki 3.0.0 setup.
  • @alexandergoncharovaspecta faced a similar problem with a three-pod setup where one pod was crashing.
  • @sslny57 and @abh shared that changing the mode from ring to simple resolved their crash issues, but they encountered other configuration problems.

Conclusion

The Loki v3.0.0 backend crash issue due to index_gateway.mode: ring is a critical bug affecting users running Loki in Kubernetes environments. While a temporary workaround is available, a permanent fix has been implemented and is available in the latest updates. Users are encouraged to update their Loki deployments and monitor the GitHub issue tracker for further updates and resolutions.

For more details and ongoing discussions, visit the GitHub issue page.

Post a Comment

Previous Post Next Post