TscMon: Complete Overview and Key Features

Troubleshooting Common TscMon Issues and Fixes

Symptoms: Service won’t start, shows “failed” in systemctl or exits immediately.
Likely causes: Configuration syntax error, missing dependencies, corrupted binary, or permission issues.

Fixes:

Check status and logs:

Code
sudo systemctl status tscmon sudo journalctl -u tscmon –no-pager -n 200

Validate configuration file (assume /etc/tscmon/tscmon.yml):

Code
tscmon –config-test /etc/tscmon/tscmon.yml

If no built-in checker, run YAML lint:

Code
yamllint /etc/tscmon/tscmon.yml

Verify dependencies are installed (e.g., required databases, language runtimes). Reinstall packages if needed:
```
Code
sudo apt-get install –reinstall tscmon 
```

Check file permissions and ownership for config, binaries, and data directories:

Code
sudo chown -R tscmon:tscmon /var/lib/tscmon /etc/tscmon sudo chmod -R 750 /var/lib/tscmon

If binary is corrupted, replace from a verified release and restart:
```
Code
sudo systemctl restart tscmon 
```

Symptoms: TscMon process consumes excessive CPU/RAM, causing system slowdowns.
Likely causes: Heavy polling frequency, large number of monitored targets, memory leak, or inefficient plugin.
Fixes:
1. Identify offending process threads:
```
Code
top -p $(pgrep -d, -f tscmon) sudo perf top -p 
```
2. Reduce polling frequency and batch checks in config (increase intervals, add jitter).
3. Temporarily disable nonessential plugins/modules to isolate the culprit.
4. Update to latest TscMon release (may include performance fixes).
5. If memory leak suspected, enable core dumps and collect heap profile (if supported). Restart service after collecting diagnostics.

Symptoms: Dashboard shows no data or old timestamps.
Likely causes: Ingestion pipeline stalled, time synchronization issues, or exporter failures.
Fixes:
1. Verify TscMon is publishing metrics (check local metrics endpoint, e.g., http://localhost:9100/metrics).
2. Inspect ingestion logs (message queue, TSDB) for errors or backpressure.
3. Confirm system time is correct:
```
Code
timedatectl status sudo ntpstat || sudo systemctl restart systemd-timesyncd 
```
4. Check exporters on monitored hosts are reachable and running; test connectivity with curl or telnet.
5. Clear any metric ingestion queues if they’re backed up, then restart the ingestion component.

Symptoms: Expected alerts absent, or alerts fire too frequently/with wrong severity.
Likely causes: Alerting rule misconfiguration, wrong thresholds, silences/maintenance windows active, or time-window misalignment.
Fixes:
1. Review alert rules for logic errors (evaluation window, aggregation, labels).
2. Test rules locally with sample metric data (use tscmon alert-testing tool or query language).
3. Check for active silences or muted receivers.
4. Verify notification channels (email, PagerDuty, Slack) are configured and credentials valid.
5. Adjust thresholds and add annotations explaining rationale.

Symptoms: Users cannot log in, API calls return ⁄₄₀₃.
Likely causes: Token expiry, misconfigured OAuth/OIDC, incorrect role mappings, or LDAP issues.
Fixes:
1. Check authentication provider status and logs (OIDC, LDAP).
2. Validate client IDs, secrets, and callback URLs.
3. Inspect role/permission mappings in TscMon config.
4. Rotate or refresh tokens if expired; ensure time sync between systems.
5. Test API with an admin token to isolate client vs