通过已有网站内容借助GPT来回答问题

最近学习 chatGPT 文档的时候,看到这么一篇文章 How to build an AI that can answer questions about your website. 它讲的是如何用你现有的网站来做一个AI bot 来回答有关你网站的问题. 这种场景很常见: 比如你公司有很多很有用的文档存放在某个站点, 或者你有一个专门针对某个主题的blog网站,又或者某个产品的详细使用说明在线文档. 当有了GPT的工具后, 我们把这些站点的内容作为context送给GPT,然后GPT以这些context为基础来回答用户的问题.

下面我们就以我的个人网站为例,以 openai 的chatGPT API为工具, 构建这么一个问答程序.

步骤概括

总的来看, 我们要做下面的一些步骤:

  1. 把整个网站下载下来.
  2. 把有用的网页文档里面的核心内容取出来.
  3. 把这些取出来的核心文本内容做 text embeding, 并放入向量数据库.
  4. 当有问题的时候, 先使用问题的内容去搜索向量数据库, 把搜索的结果做为 context, 连同问题一并送给 chatGPT 获取答案.
    下面我们就给出具体的代码和步骤.

把整个网站下载下来

使用 wget 命令很容易去下载一个网站.

$ wget -r https://www.tianxiaohui.com 
        ...省略...
FINISHED --2024-01-06 19:42:12--
Total wall clock time: 11m 28s
Downloaded: 3611 files, 133M in 3m 37s (625 KB/s)

通过 -r 参数, wget会把整个网站都下载下来, 并且按照网页的路径分类. 可以看到这里下载了3611个文件, 133M. 但是我的网站明显没有这么多文章, 这里面包含很多图片的链接, 有些分类的页面.

把有用的网页文档里面的核心内容取出来.

通过人工浏览这些页面, 我们可以看到我们只需要特定的含有整篇文章的html页面, 有些分类页面(比如2023年3月的文章合集)是不需要的, 一篇文章的html被加载之后, 我们只需要取其中文章的部分, 不需要菜单链接和旁边的分类链接. 所以我们有下面的代码:

import os

from bs4 import BeautifulSoup


def fetch_docs(folder: str = None):
    # 遍历并过滤以 .html 结尾的文档
    html_docs = [f for f in os.listdir(folder) if f.endswith('.html')]

    txts = []
    for html_doc in html_docs:
        with open(folder + "/" + html_doc, 'r') as file:
            # 使用BeautifulSoup 解析html
            soup = BeautifulSoup(file, 'html.parser')
            # 只取其中文章的部分, 有些分类页面没有文章部分, 这里就会放弃
            post = soup.find('div', class_='post-content')
            if post:
                # 替换掉很多分隔符
                txt = post.get_text().replace("\n", "").replace("\t", " ").replace("\r", "");
                # print(txt) 查看具体文本, 方便跟多加工
                txts.append(txt)
            else:
                print("not find article from " + html_doc)
    print(len(txts))
    return txts
    

fetch_docs(“/Users/eric/Downloads/blogs/www.tianxiaohui.com/index.php/Troubleshooting”)

把这些取出来的核心文本内容做 text embeding, 并放入向量数据库.

我们使用openAI 的 embedding, 并使用 FAISS 做为向量库来进行相似性搜索.

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS


embeddings_model = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(
    fetch_docs("/Users/eric/Downloads/blogs/www.tianxiaohui.com/index.php/Troubleshooting"), embedding=embeddings_model
)

从向量数据库获取相关内容调用 GPT 生成答案

首先我们把向量库FAISS设置为 retriever, 然后检索相关文档, 然后把问题和相关文档组成的context 给chatGPT, 获取答案.

from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI


question = "如何分析Java应用OOM问题?"
retriever = vectorstore.as_retriever(search_kwargs={"k": 3, "score_threshold": .5})
docs = retriever.get_relevant_documents(question)
doc_contents = [doc.page_content for doc in docs]

prompt = PromptTemplate.from_template("here is the question: {query}, this is the contexts: {context}")
chain = prompt | ChatOpenAI()
result = chain.invoke({"query": question, "context": doc_contents})
print(result)

总结

通过上面几十行代码, 我们就能把一个现有的知识网站, 做成了一个初级版本的可以回答问题的智能AI了.

静态代码分析工具 Spoon 使用

官网: https://spoon.gforge.inria.fr/launcher.html
它是基于 Eclipse JDT 打造的静态代码分析工具. 它把原代码拆解成包,模块,类,方法,语句,表达式,各种语法单元, 各个语法单元又形成了父子包含等关系. 可以对原代码编译, 检查, 分析, 过滤, 替换, 转换 等.

AST 语法树的元素

语法树包含的各类语法基本单元: https://spoon.gforge.inria.fr/structural_elements.html
详细的代码块形成的各个基本单元: https://spoon.gforge.inria.fr/code_elements.html

setup project

可以分析一个基本的 Java 项目, 一个 Maven 项目, 或者一个 Jar包(通过反编译).

# 分析一个 Maven project 的 source code
MavenLauncher launcher = new MavenLauncher("/Users/tianxiaohui/codes/myProj", SOURCE_TYPE.APP_SOURCE,"/Users/tianxiaohui/apache-maven-3.9.1/");

launcher.getEnvironment().setComplianceLevel(17);
launcher.getEnvironment().setNoClasspath(true); //有些类没提供, 比如 servlet jar 里面的类
launcher.buildModel();

CtModel model = launcher.getModel();

3种情况:

  1. 有源代码 reference.getDeclaration() = reference.getTypeDeclaration()
  2. 没有源代码, 只有binary(jar). reference.getDeclaration() = null, reference.getTypeDeclaration() 反射得来, isShadow = true.
  3. 没有源代码, 也没有binary. reference.getDeclaration() = reference.getTypeDeclaration() = false.
    上面的 getTypeDeclaration 适用于 getFieldDeclaration, getExecutableDeclaration.

常见的代码分析

返回原代码中所有的包和类

for(CtPackage p : model.getAllPackages()) {
  System.out.println("package: " + p.getQualifiedName());
}
// list all classes of the model
for(CtType<?> s : model.getAllTypes()) {
  System.out.println("class: " + s.getQualifiedName());
}

找到一个方法的定义

当你知道一个方法名的时候, 你要查看这个类具体的定义, 可以通过下面的查找方法.

    public static void findMethodDefinition(CtModel ctModel, String clazzName, String methodName) {
        CtClass<?> foundClass = ctModel.getElements(new TypeFilter<CtClass<?>>(CtClass.class) {
            @Override
            public boolean matches(CtClass<?> clazz) {
                return clazz.getQualifiedName().equals(clazzName);
            }
        }).stream().findFirst().orElse(null);

        if (foundClass != null) {
            //System.out.println("Found class definition: " + foundClass);
            foundClass.getMethodsByName(methodName).forEach(m -> {
                System.out.println("Found method definition: " + m.getSignature());
                System.out.println(m.toString());
            });
        } else {
            System.out.println("Class definition not found for: " + clazzName);
        }
    }

    public static void findInvocationPoints(CtModel ctModel, String className, String methodName) {
        System.out.println(" Method " + className + "." + methodName + " is called by:");
        findInvocation(ctModel, className, methodName, 0, false);
    }

查找一个类实例是在哪里构造的

有时候我们要查找某个类是在哪里被初始化的, 可以通过下面的代码获得.

    public static void findNewClassConstruct(CtModel ctModel, String clazzName) {
        long start = System.currentTimeMillis();
        ctModel.getRootPackage().getElements(new TypeFilter<>(CtConstructorCall.class)).forEach(e -> {
            if (e.getExecutable().getDeclaringType().getQualifiedName().equals(clazzName)) {
                System.out.println(clazzName + "is created at: " + e.getPosition());
            }
        });
        System.out.println("time0: " + (System.currentTimeMillis() - start));
    }
    public static void findNewClassConstruct1(CtModel ctModel, String clazzName) {
        long start = System.currentTimeMillis();
        ctModel.getRootPackage().getElements(new TypeFilter<>(CtConstructorCall.class)).forEach(e -> {
            System.out.println(e.getType());
            String type = e.getType().toString();
            if (type.equals(clazzName)) {
                System.out.println(clazzName + "is created at: " + e.getPosition());
            }
        });
        System.out.println("time1: " + (System.currentTimeMillis() - start));
    }

查找一个方法的调用点

一个方法被调用的时候, 它声明的类型可能是它本身的类型, 或者它实现的接口类型, 或者直接/非直接父类的类型. 为了查看完整的可能性, 要能要去每个父类, 实现的接口都去查看一遍. 下面的方法只是查看当前类型的直接调用.

    /**
     *  here we only find the invocation with the exactly class name and method name, not the declared method in
     *  Interface and parent class.
     *  Sometimes you want to find all the invocation points of a method, include the declared method in Interface and parent class.
     *  ex:
     *    IHello hello = new Hello();
     *    hello.sayHello();
     *    In this case, the invocation point of sayHello() is in IHello, not in Hello.
     * @param ctModel
     * @param className
     * @param methodName
     * @param depth
     * @param recursive
     */
    private static void findInvocation(CtModel ctModel, String className, String methodName, int depth, boolean recursive) {

        List<CtInvocation<?>> invocations = ctModel.getElements(new TypeFilter<CtInvocation<?>>(CtInvocation.class) {
            @Override
            public boolean matches(CtInvocation<?> element) {
                return element.getExecutable().getSignature().toString().equals(methodName) && containsRefType(element.getReferencedTypes(), className);
            }
        });

        for (CtInvocation<?> invocation : invocations) {
            CtExecutable<?> caller = invocation.getParent(CtExecutable.class);
            if (caller != null) {
                for (int i = 0; i < depth; i++) {
                    System.out.print("\t");
                }
                System.out.print(" - " + caller.getParent(CtClass.class).getPackage() + "." + caller.getParent(CtClass.class).getSimpleName() + "." + caller.getSignature());
                if (caller.getThrownTypes().size() > 0) {
                    System.out.print(" throws ");
                    System.out.print(caller.getThrownTypes().stream().map(t -> t.toString()).collect(Collectors.joining( ", ")));
                }
                System.out.println(" at line " + invocation.getPosition().getLine());
                if (recursive) {
                    findInvocation(ctModel, caller.getParent(CtClass.class).getQualifiedName(), caller.getSignature(), 1 + depth, recursive);
                }
            }
        }
    }

找到所有抛出异常的代码

下面的代码找出所有抛出异常的代码点. 当然你可以根据异常的类型去过滤.

    public static void findThrowStatements(CtModel ctModel) {
        List<CtThrow> throwStatements = ctModel.getElements(new TypeFilter<>(CtThrow.class));

        // Process each throw statement
        for (CtThrow throwStatement : throwStatements) {
            CtExecutable<?> executable = throwStatement.getParent(CtExecutable.class);
            if (executable != null) {
                System.out.println(executable.getParent(CtClass.class).getPackage() + " - " + executable.getParent(CtClass.class).getSimpleName() + "." + executable.getSimpleName()
                        + " at line " + throwStatement.getPosition().getLine());
            }
        }
    }

找到包含特定注解的类或方法

有时候你想找到特定注解的类, 比如有些注解定义了系统的所有API, 有些注解标注了系统将要废弃的API.

findWithAnnotation(ctModel, javax.ws.rs.ApplicationPath.class).forEach(clazz -> {
            System.out.println(clazz.getAnnotation(javax.ws.rs.ApplicationPath.class).value());
        });


public static List<CtClass> findWithAnnotation(CtModel ctModel, Class<? extends Annotation> annotationType) {
        return ctModel.getRootPackage().getElements(new AnnotationFilter<>(annotationType));

列出某个函数调用的其它函数列表

给出特定函数, 我们可以列出当前函数使用了其他哪些函数

public static void findNextCalls(CtModel ctModel, String pkg, String clazz, String methodName) {
        //find this method
        ctModel.getElements(new TypeFilter<CtMethod<?>>(CtMethod.class) {
            @Override
            public boolean matches(CtMethod<?> method) {
                if (!(method.getParent() instanceof CtClass)) {
                    return false;
                }

                CtClass cz = (CtClass) method.getParent();
                String curClazz = cz.getSimpleName();
                String curPkg = null != cz.getPackage() ? cz.getPackage().getQualifiedName() : "-";

                if (method.getSimpleName().equals(methodName)) {
                    System.out.println(curPkg + " - " + curClazz + " - " + method.getSimpleName());
                }
                return pkg.equals(curPkg) && clazz.equals(curClazz) && method.getSimpleName().equals(methodName);
            }
        }).forEach(m -> {
            System.out.println("Method found: " + m.getSimpleName());
            //find next calls
            List<CtInvocation<?>> invocations = m.getElements(new TypeFilter<>(CtInvocation.class));
            for (CtInvocation<?> invocation : invocations) {
                // Check if the invocation is a client call
                CtExecutableReference exec = invocation.getExecutable();
                System.out.println(m.getSimpleName() + " -> " + invocation.getExecutable().getDeclaringType() + "."
                        + invocation.getExecutable().getSignature());
            }
        });
    }

Linux 文件系统学习摘要

这是公司内学习 Linux Kernel 的第13章, 关于文件系统的部份.

inode

In Linux and other Unix-like operating systems, an inode (index node) is a data structure that stores information about a file or directory except its name and its actual data. Each file or directory has an inode that contains important metadata about the file or directory.

The information stored in an inode includes:

  1. File size
  2. Device ID
  3. User ID (owner)
  4. Group ID
  5. File permissions
  6. File creation, modification, and access times
  7. Number of links (how many file names point to this inode)
  8. Pointers to the disk blocks that store the file's data

The inode number is a unique identifier for the inode within the filesystem. You can view the inode information of a file or directory using the ls -i or stat command in the terminal.

dentry

In the Linux kernel, a dentry (directory entry) is a data structure that represents a specific inode in the cache. It's a key component of the Virtual File System (VFS) layer, which provides a common interface for all file systems.

The dentry object contains information about the inode, the file name, and pointers to the parent and child dentries, forming a dentry tree that represents the directory hierarchy. This structure allows the kernel to quickly look up files and directories, improving the efficiency of file system operations.

Dentries are stored in a dentry cache (dcache), which keeps track of recently accessed dentries to speed up subsequent file and directory lookups. When a file is accessed, the kernel first checks the dcache. If the dentry is found, the kernel can access the file directly without having to traverse the entire file system, which can significantly improve performance.

stat 命令

stat 命令显示文件, 路径, 文件系统信息. 具体看下面例子.

# 显示 /tmp 目录所在的文件系统信息
$ stat -f /tmp/
File: "/tmp/"
    ID: 7ae4c0a51947813 Namelen: 255     Type: ext2/ext3
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 122221576  Free: 73122076   Available: 66895184
Inodes: Total: 31121408   Free: 28709496

例子

当我们新建一个空文件和一个空目录的时候, 可以看到如下:

supra@suprabox:~/Downloads$ touch x
supra@suprabox:~/Downloads$ stat x
  File: x
  Size: 0             Blocks: 0          IO Block: 4096   regular empty file
Device: fd01h/64769d    Inode: 10748017    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/   supra)   Gid: ( 1000/   supra)
Access: 2023-12-01 21:05:46.177415933 -0800
Modify: 2023-12-01 21:05:46.177415933 -0800
Change: 2023-12-01 21:05:46.177415933 -0800
 Birth: 2023-12-01 21:05:46.177415933 -0800
supra@suprabox:~/Downloads$ mkdir y
supra@suprabox:~/Downloads$ stat y
  File: y
  Size: 4096          Blocks: 8          IO Block: 4096   directory
Device: fd01h/64769d    Inode: 11153180    Links: 2
Access: (0775/drwxrwxr-x)  Uid: ( 1000/   supra)   Gid: ( 1000/   supra)
Access: 2023-12-01 21:06:28.061716185 -0800
Modify: 2023-12-01 21:06:28.061716185 -0800
Change: 2023-12-01 21:06:28.061716185 -0800
 Birth: 2023-12-01 21:06:28.061716185 -0800

下面是对x这个空文件输出的详细解释:

  1. File: x:文件名是x。
  2. Size: 0:文件大小是0字节,因为你刚创建了这个文件但还没有写入任何内容。
  3. Blocks: 0:文件占用的块数是0,这与文件大小为0是一致的。
  4. IO Block: 4096:文件系统的I/O块大小是4096字节。
  5. regular empty file:这是一个常规的空文件。
  6. Device: fd01h/64769d:文件所在的设备的设备号。
  7. Inode: 10748017:文件的inode号是10748017。
  8. Links: 1:硬链接数是1,表示只有一个文件名指向这个inode。
  9. Access: (0664/-rw-rw-r--):文件的权限是0664,也就是用户(owner)和组(group)有读写权限,其他人(other)只有读权限。
  10. Uid: ( 1000/ supra):文件的所有者的用户ID是1000,用户名是supra。
  11. Gid: ( 1000/ supra):文件的所有者的组ID是1000,组名是supra。
  12. Access: 2023-12-01 21:05:46.177415933 -0800:文件最后一次被访问的时间。
  13. Modify: 2023-12-01 21:05:46.177415933 -0800:文件最后一次被修改的时间。
  14. Change: 2023-12-01 21:05:46.177415933 -0800:文件状态最后一次被改变的时间。
  15. Birth: 2023-12-01 21:05:46.177415933 -0800:文件的创建时间。

Java 为什么我要的锁不见了

线上遇到问题: 有些 Tomcat 线程卡住了, 卡住的越来越多, 重启虽然能暂时解决, 但不是长期解决办法, 如下图:
TomcatBusyThread.png

确定卡住的线程

随机找一个症状还在的服务器, 获取 thread dump, 看到如下卡住的线程 (截取部分):

"MyTaskExecutor-127" #407 daemon prio=5 os_prio=0 tid=0x00007ff6d0019000 nid=0x1da waiting on condition [0x00007ff4159d7000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x000000075e438328> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:837)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:872)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1202)
    at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:213)
    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
    at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:951)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    - locked <0x000000075e5b2488> (a java.io.BufferedInputStream)
    at sun.net.www.MeteredStream.read(MeteredStream.java:134)
    - locked <0x000000075e5b4400> (a sun.net.www.http.KeepAliveStream)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3471)

可以看出这是一个使用 HTTPS 访问外部请求的操作, 现在卡在了 SSLSocketImpl$AppInputStream.read() 上面, 现在它需要一把锁.

环境信息

Open JDK 1.8.362. 为什么要强调 JDK 版本, 后面会看到JDK涉及到这块的代码改动量非常大, 每个版本代码都不一样.

初步分析

一开始认为是没有设置 read timeout, 导致一直死等. 但是看了应用程序配置, 发现是设置的, 查看heapd dump 里面, 却是也是设置的. 如下图:
timeout.png

为什么设置了 connect 和 read timeout 还死等

根据这个栈可以看出, 连接已经建立(新建或者使用KeepLiveCache里面的), 所以, connect timeout 阶段已经过了, 不管用了.
同时, read time 是在使用 poll() 或者在它的外层 c代码 Java_java_net_SocketInputStream 里面才会计算 read timeout, 所以这里还没到.

这个线程等的锁被谁占用

通过 thread dump 很容易可以看到, 这个锁被下面的线程占着.

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007ff8508d3000 nid=0x26 in Object.wait() [0x00007ff80f427000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007570016f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
    - locked <0x00000007570016f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:188)

   Locked ownable synchronizers:
    - <0x000000075e438328> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000075e438410> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000075e613df8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000075e6682c0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000075e6a3e90> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000075e6b4070> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000075e6c51f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000075ee84098> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000075f636998> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x00000007610a3e08> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x0000000766af21d0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x00000007759c8fd8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000078c1bed50> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000078d6fb888> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000078e2b8ff0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000078f448fc8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000078f592e50> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000078f5a9430> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    - <0x000000078f5bed40> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

从上面的信息中看到, 这个Finalizer 线程的 Locked ownable synchronizers部分看, 它占有了很多锁. 其中第一个就是我们之前线程正想要获取的锁.

那么这个 Finalizer 在干嘛?

从上面的栈信息, 结合具体的源代码, 可以看到这个 Finalizer 线程其实在等下一个需要 finalize() 方法的对象. 并且当前没有在排队的对象(从heap dump)可以看到:
lock.png

矛盾的现象

这个线程拥有了这把锁(其实它拥有很多把锁, 从上面Locked ownable synchronizers可以看到), 却没有在使用这把锁做事情, 反而现在没有任何事情在做. 那么它就没有可能释放这把锁. 也就是说, 它曾经获得了这把锁, 但是没办法释放这把锁.

那么任何在等这把锁的对象, 都面临着永远等不来的情况.

为什么会造成这种情况

从现在的数据来看, 这种情况发生的几率很小, 没几天才能发生一次. 从已有的数据看, 很有可能是 Finalizer 线程在执行 sun.security.ssl.SSLSocketImplfinalize() 方法的时候, 获得了这把锁, 然而却没释放.

于是去查看这个版本的sun.security.ssl.SSLSocketImpl的源代码, 发现几乎每处使用这个锁的地方都是 try{}finally{} 方式, 在 finally{} 代码块里去释放的锁. 所以正常执行完不可能不释放.

唯一的可能性就是: finalize() 方法没有正常执行完. 在获得锁还没有释放锁的位置, finalize() 方法被中断了. 在JDK 里面, 根本不保证 finalize() 一定被执行, 什么时候被执行, 以及是不是执行完. 所以在 JDK 9 之后 finalize() 就被 deprecated 了.

思考

如果这个 sun.security.ssl.SSLSocketImpl 已经被开始执行 finalize() 方法, 那么它在某个时间点, 已经被 JVM GC 判定为不可达. 那么肯定有一种神秘的力量把它从死神哪里拉回来了. 并且现在正在被另外一个线程使用.
当一个 AbstractOwnableSynchronizer 的锁被一个线程使用的时候, 它会记录拥有锁的线程名字到它的 exclusiveOwnerThread 字段. 从heap dump, 我们可以证实这个锁也是被 Finalizer 拥有.
lock2.png
这里的线程是 Finalizer, state 是1, 表示这个 ReentrantLock$NonfairSync 被进入一次.

是哪个神秘的力量救活了它?

探索 Java URL 连接的 read timeout 如何实现的

使用最朴实的 java.net.HttpURLConnection 来获取网页数据, 设置 timeout 然后通过OS层面的 tracing 工具去获取代码栈. 运行于 Ubuntu 22.04 和 OpenJDK 11. 由于JDK 11 使用了系统调用 poll 去获取网络事件, 然后使用了系统调用 recvfrom 来取的数据, 所以会拦截这2个系统调用. 另外可以看到下面使用perf 注入符号表, perf 和 bpftrace 都能看到 Java 编译的方法. 但是 strace 和 gdb 不行.
依次使用的工具:

  1. strace
  2. perf
  3. bpftrace
  4. gdb

java 代码

还是使用之前 chatGPT 给的这段代码. 注意这里设置的 connect timeoutread timeout 参数, 后面会看到.

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;

public class URLTest {
    public static void main(String[] args) throws Exception {
        for (int i = 0; i < 1000; i++) {
            try {
                URL url = new URL("http://www.tianxiaohui.com");
                HttpURLConnection con = (HttpURLConnection) url.openConnection();
                con.setRequestMethod("GET");
                con.setConnectTimeout(997); // 连接超时时间 997ms
                con.setReadTimeout(719); // 读取超时时间 719ms
                System.out.println(i + "Response code: " + con.getResponseCode());
                BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
                String line;
                StringBuilder response = new StringBuilder();
    
                while ((line = in.readLine()) != null) {
                    response.append(line);
                }
                in.close();
                //System.out.println(response.toString());
            } catch (Exception e) {
                e.printStackTrace();
                System.out.println(e.getMessage());
            }
            Thread.sleep(2000);
        }
    }
}

准备工作

运行程序使用以下代码:

$ javac URLTest.java
$ java -XX:+PreserveFramePointer -XX:-TieredCompilation -XX:CompileThreshold=1 URLTest

生成JIT 编译的代码符号表. 参看: bpftrace 探测 Java 运行时栈-实践

strace

直接对这个java程序使用 trace pollrecvfrom event. 可以看到 poll 系统调用的timeout参数是997和719. 分别来自于JDK里面的C 代码 Java_java_net_PlainSocketImpl_socketConnectJava_java_net_SocketInputStream_socketRead0

$ sudo strace --stack-trace -f -e trace=poll  -p 1097556
strace: Process 1097556 attached with 18 threads
[pid 1097557] poll([{fd=5, events=POLLOUT}], 1, 997 <unfinished ...>
[pid 1097670] +++ exited with 0 +++
[pid 1097557] <... poll resumed>)       = 1 ([{fd=5, revents=POLLOUT}])
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__poll+0x4f) [0x118dbf]
 > /usr/lib/jvm/java-11-openjdk-amd64/lib/libnet.so(NET_Poll+0xab) [0xff2b]
 > /usr/lib/jvm/java-11-openjdk-amd64/lib/libnet.so(Java_java_net_PlainSocketImpl_socketConnect+0x1ec) [0xd24c]
 > unexpected_backtracing_error [0x7f93742cbd95]
[pid 1097557] poll([{fd=5, events=POLLIN|POLLERR}], 1, 719) = 1 ([{fd=5, revents=POLLIN}])
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__poll+0x4f) [0x118dbf]
 > /usr/lib/jvm/java-11-openjdk-amd64/lib/libnet.so(NET_Timeout+0xed) [0x1019d]
 > /usr/lib/jvm/java-11-openjdk-amd64/lib/libnet.so(Java_java_net_SocketInputStream_socketRead0+0xdf) [0xe75f]
 > unexpected_backtracing_error [0x7f93742daa34]


$ sudo strace --stack-trace -f -e trace=recvfrom  -p 1097556
strace: Process 1097556 attached with 18 threads
[pid 1097557] recvfrom(5, "HTTP/1.1 301 Moved Permanently\r\n"..., 8192, MSG_DONTWAIT, NULL, NULL) = 242
 > /usr/lib/x86_64-linux-gnu/libc.so.6(recv+0x6e) [0x1278ae]
 > /usr/lib/jvm/java-11-openjdk-amd64/lib/libnet.so(NET_NonBlockingRead+0xb0) [0xf1e0]
 > /usr/lib/jvm/java-11-openjdk-amd64/lib/libnet.so(Java_java_net_SocketInputStream_socketRead0+0xf9) [0xe779]
 > unexpected_backtracing_error [0x7f93742daa34]

perf

使用 perf 要先记录数据, 然后生成命令行report, 进入单个结果, 就能看到完整的栈. 这个栈是从上往下的, 最新的在下面.

$ sudo perf record -p 1097556  -e "syscalls:sys_enter_poll" -ag
$ ls -lah perf.data
$ sudo perf report 
# 选择一个sample, enter 键进入detail 如下
Samples: 8  of event 'syscalls:sys_enter_poll', Event count (approx.): 8
  Children      Self  Trace output
-   37.50%    37.50%  ufds: 0x7f937753c438, nfds: 0x00000001, timeout_msecs: 0x000002cf                                                                                                 ▒
     start_thread                                                                                                                                                                       ▒
     ThreadJavaMain                                                                                                                                                                     ▒
     JavaMain                                                                                                                                                                           ▒
     jni_CallStaticVoidMethod                                                                                                                                                           ▒
     jni_invoke_static                                                                                                                                                                  ▒
     JavaCalls::call_helper                                                                                                                                                             ▒
     call_stub                                                                                                                                                                          ▒
     Interpreter                                                                                                                                                                        ▒
     Ljava/net/HttpURLConnection;::getResponseCode                                                                                                                                      ▒
     Lsun/net/www/protocol/http/HttpURLConnection;::getInputStream                                                                                                                      ▒
     Lsun/net/www/protocol/http/HttpURLConnection;::getInputStream0                                                                                                                     ▒
     Lsun/net/www/http/HttpClient;::parseHTTP                                                                                                                                           ▒
     Lsun/net/www/http/HttpClient;::parseHTTPHeader                                                                                                                                     ▒
     Ljava/io/BufferedInputStream;::read                                                                                                                                                ▒
     Ljava/io/BufferedInputStream;::read1                                                                                                                                               ▒
     Ljava/io/BufferedInputStream;::fill                                                                                                                                                ▒
     Ljava/net/SocketInputStream;::read                                                                                                                                                 ▒
     Ljava/net/SocketInputStream;::socketRead0                                                                                                                                          ▒
     Java_java_net_SocketInputStream_socketRead0                                                                                                                                        ▒
     __poll                                                                                                                                                                             ▒
+   25.00%    25.00%  ufds: 0x7f937754c378, nfds: 0x00000001, timeout_msecs: 0x000003e5                                                                                                 ▒
+   12.50%    12.50%  ufds: 0x7f937754a888, nfds: 0x00000001, timeout_msecs: 0x00000000                                                                                                 ▒
+   12.50%    12.50%  ufds: 0x7f937754a888, nfds: 0x00000001, timeout_msecs: 0x00001385                                                                                                 ▒
+   12.50%    12.50%  ufds: 0x7f937754a888, nfds: 0x00000001, timeout_msecs: 0x00001388
$ sudo perf record -p 1097556  -e "syscalls:sys_enter_recvfrom" -ag
$ sudo perf report 
Samples: 2  of event 'syscalls:sys_enter_recvfrom', Event count (approx.): 2
  Children      Self  Trace output
-  100.00%   100.00%  fd: 0x00000005, ubuf: 0x7f937753c4c0, size: 0x00002000, flags: 0x00000040, addr: 0x00000000, addr_len: 0x00000000                                                 ◆
     start_thread                                                                                                                                                                       ▒
     ThreadJavaMain                                                                                                                                                                     ▒
     JavaMain                                                                                                                                                                           ▒
     jni_CallStaticVoidMethod                                                                                                                                                           ▒
     jni_invoke_static                                                                                                                                                                  ▒
     JavaCalls::call_helper                                                                                                                                                             ▒
     call_stub                                                                                                                                                                          ▒
     Interpreter                                                                                                                                                                        ▒
     Ljava/net/HttpURLConnection;::getResponseCode                                                                                                                                      ▒
     Lsun/net/www/protocol/http/HttpURLConnection;::getInputStream                                                                                                                      ▒
     Lsun/net/www/protocol/http/HttpURLConnection;::getInputStream0                                                                                                                     ▒
     Lsun/net/www/http/HttpClient;::parseHTTP                                                                                                                                           ▒
     Lsun/net/www/http/HttpClient;::parseHTTPHeader                                                                                                                                     ▒
     Ljava/io/BufferedInputStream;::read                                                                                                                                                ▒
     Ljava/io/BufferedInputStream;::read1                                                                                                                                               ▒
     Ljava/io/BufferedInputStream;::fill                                                                                                                                                ▒
     Ljava/net/SocketInputStream;::read                                                                                                                                                 ▒
     Ljava/net/SocketInputStream;::socketRead0                                                                                                                                          ▒
     Java_java_net_SocketInputStream_socketRead0                                                                                                                                        ▒
     __libc_recv

bpftrace

使用 bpftrace 给出的 event 获取用户态的栈.

$ sudo bpftrace -e 'tracepoint:syscalls:sys_enter_poll /pid==1097556/{ @[ustack(20)] = count(); }'
Attaching 1 probe...
^C

@[
    poll+79
    Java_java_net_SocketInputStream_socketRead0+223
    Ljava/net/SocketInputStream;::socketRead0+244
    Ljava/net/SocketInputStream;::read+224
    Ljava/io/BufferedInputStream;::fill+784
    Ljava/io/BufferedInputStream;::read1+176
    Ljava/io/BufferedInputStream;::read+252
    Lsun/net/www/http/HttpClient;::parseHTTPHeader+444
    Lsun/net/www/http/HttpClient;::parseHTTP+1004
    Lsun/net/www/protocol/http/HttpURLConnection;::getInputStream0+1420
    Lsun/net/www/protocol/http/HttpURLConnection;::getInputStream+196
    Ljava/net/HttpURLConnection;::getResponseCode+96
    Interpreter+4352
    call_stub+138
    JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+883
    jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) [clone .constprop.1]+682
    jni_CallStaticVoidMethod+352
    JavaMain+3441
    ThreadJavaMain+13
    start_thread+755
]: 1
$ sudo bpftrace -e 'tracepoint:syscalls:sys_enter_recvfrom /pid==1097556/{ @[ustack(20)] = count(); }'
Attaching 1 probe...
^C

@[
    recvfrom+116
]: 2
@[
    __GI___recv+110
    Java_java_net_SocketInputStream_socketRead0+249
    Ljava/net/SocketInputStream;::socketRead0+244
    Ljava/net/SocketInputStream;::read+224
    Ljava/io/BufferedInputStream;::fill+784
    Ljava/io/BufferedInputStream;::read1+176
    Ljava/io/BufferedInputStream;::read+252
    Lsun/net/www/http/HttpClient;::parseHTTPHeader+444
    Lsun/net/www/http/HttpClient;::parseHTTP+1004
    Lsun/net/www/protocol/http/HttpURLConnection;::getInputStream0+1420
    Lsun/net/www/protocol/http/HttpURLConnection;::getInputStream+196
    Ljava/net/HttpURLConnection;::getResponseCode+96
    Interpreter+4352
    call_stub+138
    JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+883
    jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) [clone .constprop.1]+682
    jni_CallStaticVoidMethod+352
    JavaMain+3441
    ThreadJavaMain+13
    start_thread+755
]: 4

gdb

使用gdb连接, 然后通过设置段点来找到我们期望的栈. 这个看到的更多一些, 包括从 Java_java_net_SocketInputStream_socketRead0 到 glibc 的 __GI___poll 中间的2步: NET_ReadWithTimeoutNET_Timeout.

# gdb 连接
$ sudo gdb --pid=1097556
# 找到我们需要的线程
$ thread 2
#设置断点, 先设置一个 socketRead的, 如果直接设置
$ break Java_java_net_SocketInputStream_socketRead0
$ cont # 到上面这个断点停止
$ bt # 查看是不是我们想要的栈
$ break __GI___poll
$ cont # 这样就会到我们想要的点
(gdb) break __GI___poll
Breakpoint 5 at 0x7f9378c6dd70: file ../sysdeps/unix/sysv/linux/poll.c, line 27.
(gdb) cont
Continuing.

Thread 2 "java" hit Breakpoint 5, __GI___poll (fds=fds@entry=0x7f937753c438, nfds=nfds@entry=1, timeout=719, timeout@entry=<error reading variable: That operation is not available on integers of more than 8 bytes.>) at ../sysdeps/unix/sysv/linux/poll.c:27
27    ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) bt
#0  __GI___poll (fds=fds@entry=0x7f937753c438, nfds=nfds@entry=1, timeout=719,
    timeout@entry=<error reading variable: That operation is not available on integers of more than 8 bytes.>) at ../sysdeps/unix/sysv/linux/poll.c:27
#1  0x00007f937402419d in poll (__timeout=719, __nfds=1, __fds=0x7f937753c438) at /usr/include/x86_64-linux-gnu/bits/poll2.h:39
#2  NET_Timeout (env=env@entry=0x7f937001bb48, s=s@entry=5, timeout=<error reading variable: That operation is not available on integers of more than 8 bytes.>,
    nanoTimeStamp=nanoTimeStamp@entry=1643674031468882) at ./src/java.base/linux/native/libnet/linux_close.c:433
#3  0x00007f937402275f in NET_ReadWithTimeout (timeout=<optimized out>, len=8192, bufP=0x7f937753c4c0 "", fd=5, env=0x7f937001bb48)
    at ./src/java.base/unix/native/libnet/SocketInputStream.c:55
#4  Java_java_net_SocketInputStream_socketRead0 (env=0x7f937001bb48, this=<optimized out>, fdObj=<optimized out>, data=0x7f937754c518, off=0, len=8192, timeout=719)
    at ./src/java.base/unix/native/libnet/SocketInputStream.c:127
#5  0x00007f93742daa34 in ?? ()
#6  0x00000000000002cf in ?? ()

timeout 的实现

根据上面的栈信息, 我们很容易就知道了我们期望的 timeout 如何使用的:

  1. Java_java_net_SocketInputStream_socketRead0 - https://github.com/openjdk/jdk11/blob/37115c8ea4aff13a8148ee2b8832b20888a5d880/src/java.base/unix/native/libnet/SocketInputStream.c#L91C1-L91C44
  2. NET_ReadWithTimeout - https://github.com/openjdk/jdk11/blob/37115c8ea4aff13a8148ee2b8832b20888a5d880/src/java.base/unix/native/libnet/SocketInputStream.c#L50
  3. NET_Timeout - https://github.com/openjdk/jdk11/blob/master/src/java.base/linux/native/libnet/linux_close.c#L415
  4. poll - https://github.com/openjdk/jdk11/blob/master/src/java.base/linux/native/libnet/linux_close.c#L441

其中真正使用 timeout 地方在于 poll 和 Java_java_net_SocketInputStream_socketRead0 里面的循环.

如果改成 https

改成 https 连接 connect 的栈:

    poll+79
    Java_java_net_PlainSocketImpl_socketConnect+492
    Ljava/net/PlainSocketImpl;::socketConnect+213
    Ljava/net/AbstractPlainSocketImpl;::doConnect+328
    Ljava/net/AbstractPlainSocketImpl;::connect+444
    Ljava/net/SocksSocketImpl;::connect+2528
    Ljava/net/Socket;::connect+592
    Lsun/security/ssl/SSLSocketImpl;::connect+96
    Lsun/net/NetworkClient;::doConnect+432
    Lsun/net/www/http/HttpClient;::openServer+72
    Lsun/net/www/http/HttpClient;::openServer+348
    Lsun/net/www/protocol/https/HttpsClient;::<init>+972
    Lsun/net/www/protocol/https/HttpsClient;::New+1444
    Lsun/net/www/protocol/http/HttpURLConnection;::plainConnect0+3272
    Lsun/net/www/protocol/http/HttpURLConnection;::plainConnect+256
    Lsun/net/www/protocol/https/AbstractDelegateHttpsURLConnection;::connect+100
    Lsun/net/www/protocol/http/HttpURLConnection;::getInputStream0+1996
    Lsun/net/www/protocol/http/HttpURLConnection;::getInputStream+192
    Ljava/net/HttpURLConnection;::getResponseCode+116
    LURLTest;::main+660

改成 https 连接 read 的栈:

    poll+79
    Java_java_net_SocketInputStream_socketRead0+223
    Ljava/net/SocketInputStream;::socketRead0+244
    Ljava/net/SocketInputStream;::read+204
    Lsun/security/ssl/SSLSocketInputRecord;::read+52
    Lsun/security/ssl/SSLSocketInputRecord;::readHeader+144
    Lsun/security/ssl/SSLSocketInputRecord;::bytesInCompletePacket+72
    Lsun/security/ssl/SSLSocketImpl;::readApplicationRecord+348
    Lsun/security/ssl/SSLSocketImpl$AppInputStream;::read+552
    Ljava/io/BufferedInputStream;::fill+760
    Ljava/io/BufferedInputStream;::read+676
    Lsun/net/www/http/ChunkedInputStream;::fastRead+116
    Lsun/net/www/http/ChunkedInputStream;::read+584
    Lsun/net/www/protocol/http/HttpURLConnection$HttpInputStream;::read+116
    Lsun/nio/cs/StreamDecoder;::readBytes+444
    Lsun/nio/cs/StreamDecoder;::implRead+392
    Lsun/nio/cs/StreamDecoder;::read+432
    Ljava/io/BufferedReader;::fill+724
    Ljava/io/BufferedReader;::readLine+1132
    LURLTest;::main+304