How to use Argus
This document explains how to use STACKIT argus with the ODJ. If you look for argus specific documentation you can find on this link the Official Argus Documentation which is available in English and German.
What components will be deployed with argus
While the STACKIT argus itself is offering you a variety of support for metrics, logs and traces, it is mainly a sink
for this kind of data. ODJ therefore installs and configures for you a couple of standard tools on your SKE cluster
to feed the argus instance with the proper data. The following tools are installed when argus is enabled for your infra
:
- prometheus
- promtail
- Metrics exporters for STACKIT-based components such as PostgreSQL, MySQL, MongoDB and RabbitMQ
These deployments are created and properly configured within the namespace odj-monitoring
.
How do I access my argus instance?
When enabled within your infra
, ODJ automatically adds to your cluster the credentials and urls of your argus instance
within the namespace odj-monitoring
. You can access these credentials using kubectl
.
# WORK environment SKE cluster
kubectl get secret -o yaml ru-siwk-argus -n odj-monitoring
# LIVE environment SKE cluster
kubectl get secret -o yaml ru-silv-argus -n odj-monitoring
The secret within the odj-monitoring
namespace holds all necessary data to access and manage your argus instance.
It holds the following data:
apiVersion: v1
kind: Secret
metadata:
labels:
app.kubernetes.io/managed-by: odj
name: ru-siwk-argus
namespace: odj-monitoring
stringData:
argus_alerting_url: https://alerting.stackit7.argus.eu01.stackit.cloud/instances/[argus-id]
argus_dashboard_url: https://portal.stackit.cloud/projects/[project-id]/service/[argus-id]/argus-dashboard/instances/[argus-id]/overview
argus_grafana_initial_admin_password: SOME_PASSWORD
argus_grafana_initial_admin_user: SOME_USER
argus_grafana_url: https://ui.stackit7.argus.eu01.stackit.cloud/instances/[argus-id]
argus_id: [argus-id]
argus_jaeger_traces_url: 4361332c-9519-gj.traces.stackit7.argus.eu01.stackit.cloud:443
argus_jaeger_ui_url: https://4361332c-9519-jui.traces.stackit7.argus.eu01.stackit.cloud/instances/[argus-id]
argus_logs_push_url: https://logs.stackit7.argus.eu01.stackit.cloud/instances/[argus-id]/loki/api/v1/push
argus_logs_url: https://logs.stackit7.argus.eu01.stackit.cloud/instances/[argus-id]
argus_metrics_push_url: https://push.metrics.stackit7.argus.eu01.stackit.cloud/instances/[argus-id]/api/v1/receive
argus_metrics_url: https://storage.api.stackit7.argus.eu01.stackit.cloud/instances/[argus-id]
argus_otlp_traces_url: 4361332c-9519-op.traces.stackit7.argus.eu01.stackit.cloud:443
argus_password: SOME_PASSWORD
argus_targets_url: https://metrics.stackit7.argus.eu01.stackit.cloud/instances/[argus-id]
argus_username: SOME_USERNAME
argus_zipkin_spans_url: https://4361332c-9519-zk.traces.stackit7.argus.eu01.stackit.cloud/api/v2/spans
type: Opaque
How can I limit the metrics transferred to my argus instance?
Argus comes with different plans and therefore different support of metrics that can be transferred to it per minute. ODJ gives you the possibility to filter the metrics transferred to your argus instance. The prometheus instance within your SKE cluster is configured by default to scrape and transfer all metrics. Depending on your workloads this can lead to exceeding the limits of your argus plan. Exceeding the argus plan will lead into 429 errors within the log of the prometheus instance. Further you will be able to see the amount of transferred metrics within the default argus dashboard which is automatically available on the Grafana within your argus instance.
The config-map prometheus-server-conf-custom
within the namespace odj-monitoring
enables you to white- and blacklisting
metrics from your applications and also infrastructure components such as the NGINX ingress controller. You can edit the
config map prometheus-server-conf-custom
within the following command:
kubectl edit cm prometheus-server-conf-custom -n odj-monitoring
This config map comes by default with guidance how to configure the metrics scraping and transfer of the prometheus instance of your SKE instance within your SKE instance. The content of the guidance is shown in the following:
# THIS Configuration will be only created ONCE by ODJ
# In case you want your defaults back you can simply DELETE this file and rerun the INFRA-Run of ODJ
#
# Feel free to edit this default configuration to whitelist/blacklist certain metrics
# Be aware that you need to restart the prometheus after changes in this ConfigMap
# since there is NO Hot-Reload!
#
# Please test your configuration changes locally using promtool
#
# Documentation: https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/
# ./promtool check config prometheus.yml
#
# Rolling Restart kubectl command
# kubectl -n odj-monitoring rollout restart deployment prometheus-deployment
#
Every change on the config map needs a restart of your prometheus instance with the following command:
kubectl -n odj-monitoring rollout restart deployment prometheus-deployment
Please ensure that your changes are valid, otherwise they will prevent the prometheus instance from starting, and therefore you won't get any metrics transferred to your argus instance.
Grafana
Grafana is a great tool to visualize metrics, logs and traces with dashboards. ODJ by default provided you the initial admin and the password for it while provisioning the argus instance. This admin account allows you to manage users within your Grafana instance.
Dashboards - how to build them?
Building the right dashboard can be challenging and depends on the metrics, logs or traces your application(s) provide. Great inspirations for dashboards can be found on the Grafana Dashboard Directory where you can download community-provided dashboards. These are also great inspirations to build your own dashboards. Further Grafana comes with a great documentation which can be found here: Grafana Dashboards Documentation
Traces - How can I push traces to argus?
Argus comes with great support for Jaeger- and OpenTelemetry-based metrics. The target Urls for pushing your metrics can be found as explained within the section How do I access my argus instance?.
ODJ automatically inserts the following ENV-Variables when argus is enabled within your infra
:
ODJ_EE_MONITORING_TRACING_OTLP_URL=THE_OTLPTracesURL
ODJ_EE_MONITORING_TRACING_USER=THE_Username
ODJ_EE_MONITORING_TRACING_PASSWORD=The_Password
Which enables you automatically to read them within your application and pus the traces to your argus instance. In the following you can see some small Golang-based example code which provides tracing functionality based on these environment variables:
package tracing
import (
"context"
"crypto/tls"
"encoding/base64"
"fmt"
"log/slog"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials"
"google.golang.org/grpc/credentials/insecure"
"dev.azure.com/schwarzit-wiking/schwarzit.siam-input/_git/masterdata-condenser.git/internal/config"
)
const (
authKey = "Authorization"
totalDialOptions = 2
)
// GRPCExporter creates a trace.SpanExporter using a gRPC connection to a tracing backend.
func GRPCExporter(
ctx context.Context, cfg *config.TracerConfig, logger *slog.Logger,
) (sdktrace.SpanExporter, error) {
ctx, cancel := context.WithTimeout(ctx, cfg.Timeout)
defer cancel()
opts := make([]grpc.DialOption, 0, totalDialOptions)
switch {
case cfg.Username != "" && cfg.Password != "":
opts = append(opts,
// Disable "G402 (CWE-295): TLS MinVersion too low. (Confidence: HIGH, Severity: HIGH)":
// Go has a minimum TLS version 1.2 set. By creating an empty tls.Config we're following that minimum version.
//
// To comply with this linter's rule, we'd need to add a minimum TLS version -- making the team revisit the code
// on a future Go version where the minimum TLS version is updated (e.g. due to a crypto CVE), or making the app
// less robust when preventing transport layer version downgrade attacks
//
// #nosec G402
grpc.WithTransportCredentials(credentials.NewTLS(&tls.Config{})),
grpc.WithPerRPCCredentials(basicAuth{
username: cfg.Username,
password: cfg.Password,
}),
)
default:
opts = append(opts, grpc.WithTransportCredentials(insecure.NewCredentials()))
}
conn, err := grpc.DialContext(ctx, cfg.URL, opts...)
if err != nil {
return nil, fmt.Errorf("failed to create gRPC connection to collector: %w", err)
}
exporter, err := otlptracegrpc.New(context.Background(), otlptracegrpc.WithGRPCConn(conn))
if err != nil {
return noopExporter{}, err
}
return exporter, nil
}
type basicAuth struct {
username string
password string
}
// GetRequestMetadata implements the credentials.PerRPCCredentials interface
//
// It returns a key-value (string) map of request headers used in basic authorization.
func (b basicAuth) GetRequestMetadata(_ context.Context, _ ...string) (map[string]string, error) {
return map[string]string{
authKey: "Basic " + base64.StdEncoding.EncodeToString([]byte(b.username+":"+b.password)),
}, nil
}
// RequireTransportSecurity implements the credentials.PerRPCCredentials interface.
func (basicAuth) RequireTransportSecurity() bool {
return true
}
//nolint:revive // returning a private concrete type, but it is only usable internally
func NoopExporter() noopExporter {
return noopExporter{}
}
type noopExporter struct{}
func (noopExporter) ExportSpans(_ context.Context, _ []sdktrace.ReadOnlySpan) error {
return nil
}
func (noopExporter) Shutdown(_ context.Context) error {
return nil
}