背景
knative 0.14.0
实际修改可能与贴出来的代码不符,贴出来的代码只是为了方便快速实现功能
在支持了前面的定制功能后,集群中部署ksvc服务时会报IngressNotConfigured错误
原因分析
首先根据错误提示及日志信息,可以发现是在做健康检查的时候出的问题,期望得到200,但是得到了404
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
func (m *Prober) probeVerifier(item *workItem) prober.Verifier {
return func(r *http.Response, _ []byte) (bool, error) {
// In the happy path, the probe request is forwarded to Activator or Queue-Proxy and the response (HTTP 200)
// contains the "K-Network-Hash" header that can be compared with the expected hash. If the hashes match,
// probing is successful, if they don't match, a new probe will be sent later.
// An HTTP 404/503 is expected in the case of the creation of a new Knative service because the rules will
// not be present in the Envoy config until the new VirtualService is applied.
// No information can be extracted from any other scenario (e.g. HTTP 302), therefore in that case,
// probing is assumed to be successful because it is better to say that an Ingress is Ready before it
// actually is Ready than never marking it as Ready. It is best effort.
switch r.StatusCode {
case http.StatusOK:
hash := r.Header.Get(network.HashHeaderName)
switch hash {
case "":
m.logger.Errorf("Probing of %s abandoned, IP: %s:%s: the response doesn't contain the %q header",
item.url, item.podIP, item.podPort, network.HashHeaderName)
return true, nil
case item.ingressState.hash:
return true, nil
default:
m.logger.Warnf("unexpected hash: want %q, got %q", item.ingressState.hash, hash)
return true, nil
}
// 日志中报错的地方,探活希望得到200,但是得到了404
case http.StatusNotFound, http.StatusServiceUnavailable:
return false, fmt.Errorf("unexpected status code: want %v, got %v", http.StatusOK, http.StatusNotFound)
default:
m.logger.Errorf("Probing of %s abandoned, IP: %s:%s: the response status is %v, expected 200 or 404",
item.url, item.podIP, item.podPort, r.StatusCode)
return true, nil
}
}
}
|
其实这时候大致也能猜到是什么原因了,因为我们定制了通过USN进行过滤,探活的时候,Url中其实是没有USN的。下一步就是顺藤摸瓜,找到探活对应的代码验证我们的猜想,也比较简单
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
// processWorkItem processes a single work item from workQueue.
// It returns false when there is no more items to process, true otherwise.
func (m *Prober) processWorkItem() bool {
...
// probePath /healthz
probeURL.Path = path.Join(probeURL.Path, probePath)
ok, err := prober.Do(
item.context,
transport,
probeURL.String(),
prober.WithHeader(network.UserAgentKey, network.IngressReadinessUserAgent),
prober.WithHeader(network.ProbeHeaderName, network.ProbeHeaderValue),
m.probeVerifier(item))
...
}
|
可以看到探活的时候就是拿Path拼上/heathz,验证了我们的猜想
修复
修改也就比较简单了,在添加wotkitem时,预先把USN添加到path中即可
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
func (l *gatewayPodTargetLister) ListProbeTargets(ctx context.Context, ing *v1alpha1.Ingress) ([]status.ProbeTarget, error) {
...
// Use sorted hosts list for consistent ordering.
for i, host := range gatewayHosts[gatewayName].List() {
newURL := *target.URLs[0]
newURL.Host = host + ":" + target.Port
var usn string
if ing.Annotations != nil {
usn = ing.Annotations["serverless.kakuchuxing.com/usn"]
}
newURL.Path = path.Join(newURL.Path, usn)
qualifiedTarget.URLs[i] = &newURL
...
}
|
总结
通过这个问题也看到了对于一些细节和关键流程掌握的还不够,还是需要进行系统性的学习。至于健康检查的逻辑,和k8s的健康检查稍有不同,参考这篇文章