Technical Articles
review rerank_by_model
dive into rerank_by_model
通过Docker重置MySQL密码
当遗忘Docker中MySQL实例的密码时,如何安全有效地重置密码
Basic Linear Algebra Subprograms
Basic Linear Algebra Subprograms
常见的 similarity 比较
common similarity searches
General Matrix-Matrix Multiplication
通用矩阵乘法(GEMM)详解
FP8的两种格式
FP8的两种格式(E4M3和E5M2)
dot-product in vLLM
摘录vLLM中 dot product计算过程
FMA 指令
现代 CPU 和 GPU 中的一种基础算术指令
pragma unroll
pragma unroll的详解
Dense vs Sparse
Dense and Sparse overview
forward 方法实现(Embedding)
走读一下 forward 方法(Embedding)
Display and Debug Traits in Rust
Display and Debug Traits in Rust
从 Input Embeddings 到 Context Vectors
Hyperparameter overview
Hyperparameter overview
Overfitting and Underfitting
Overfitting and Underfitting in Machine Learning
Training a Neural Network
A comprehensive guide on training neural networks using PyTorch, covering data loading, preprocessing, training loops, evaluation metrics, and model saving.
Hello World in CUDA
A simple CUDA program that prints 'Hello, World!' from a GPU thread.
nn module
An overview of the nn module in PyTorch, providing a high-level interface for building neural networks.
Optimization Algorithms
An overview of optimization algorithms used in training neural networks.
Loss Functions
An overview of loss functions in neural networks and their importance in training models.
Activation Functions
An overview of activation functions in neural networks
Activation Functions
An overview of activation functions in neural networks
Artificial Neurons
An introduction to artificial neurons, their structure, and how they function in neural networks.
Neural Networks
An overview of neural networks, their architecture, and applications in AI.
Excerpt - Love Poetry
Exploring the essence of love through poetry, capturing its fleeting nature and enduring impact.
Custom Autograd Functions
Learn how to create custom autograd functions in PyTorch for complex operations.
均值(Mean)和方差(Variance)
Understanding mean and variance in deep learning and statistics.
Bias Vector
Understanding the role of bias vectors in multi-head attention mechanisms.
Gemini Embedding Models
Explore Gemini's text embedding models for advanced NLP tasks.
估算运行大模型需要的GPU memory
GPU Memory Requirement Calculator for AI Models
因果自注意力与多头注意力
因果自注意力和高效多头注意力模块的PyTorch实现
Single Head Self-Attention
Single Head Self-Attention的完整实现
一个简单的Self-Attention机制的实现
一个简单的自注意力机制实现
一个 PyTorch FishData 例子
一个 PyTorch FishData 例子
trainable weight self Attention
simple self attention -> trainable weight self Attention
机器学习基础概念
什么是机器学习?
PyTorch Cheatsheet
PyTorch常用操作速查表
PyTorch中Broadcasting(广播机制)
详解PyTorch Broadcasting(广播机制)
PyTorch中rank与ndim的区别
详解PyTorch张量中rank(秩)与ndim(维度数)的关键区别
Training with multiple GPUs
Training with multiple GPUs - template code with pytorch
AI 词汇表
AI 词汇表 - 常见缩写与术语
计算model的parameter参数大小
代码片段计算model的参数大小
Greedy search decoding
一个Greedy search decoding的例子
字典推导式
Python中字典推导式的基本用法
列表推导式
Python中列表推导式的基本用法
Read - July 2025'
Book reading records for July 2025
Chat with llama.cpp
A hands-on guide to creating an interactive chat interface using llama.cpp, covering model loading, prompt engineering, and response streaming.
llama.cpp 分词器类型对比
llama.cpp 分词器类型对比.
llama.cpp simple source code
A llama.cpp Helloworld.
搭建 llama.cpp 开发环境并运行 simple 示例
详细指南:从零开始配置 llama.cpp 开发环境,编译并运行 simple 示例程序。
SFT vs LoRA:大模型微调技术对比
深入解析监督微调(SFT)与低秩适应(LoRA)的核心原理、应用场景及性能对比,帮助开发者选择最佳微调方案。
HelloCuda 系列: CUDA CheckP2P
检查CUDA设备之间的P2P通信能力,了解如何优化GPU间的数据传输。
HelloCuda 系列: CUDA nsys Profiler
使用 NVIDIA nsys Profiler 分析 CUDA 程序性能,了解如何优化 GPU 计算效率。
HelloCuda 系列: CUDA nsys Profiler
使用 NVIDIA nsys Profiler 分析 CUDA 程序性能,了解如何优化 GPU 计算效率。
HelloCuda 系列: CUDA Thrust Basic
介绍CUDA Thrust库的基本用法,了解如何在GPU上进行高效的数据处理和算法实现。
HelloCuda 系列 第二章: CUDA Architecture
深入了解CUDA架构,探索GPU的硬件结构、计算单元及其在并行计算中的应用。
HelloCuda 系列 Dynamic Parallelism
深入探讨CUDA动态并行编程模型,了解如何在GPU上实现更灵活的并行计算。
One-Hot Encoding
了解独热编码的概念、应用场景以及在机器学习中的重要性。
HelloCuda 系列 第一章: CUDA Overview
深入了解CUDA编程模型,探索GPU的并行计算能力及其在数据处理中的应用。
HelloCuda 系列 第一章: CUDA Overview
深入了解CUDA编程模型,探索GPU的并行计算能力及其在数据处理中的应用。
HelloCuda 系列 第三章: CUDA Parallel Programming
深入探讨CUDA并行编程模型,了解如何高效利用GPU进行大规模数据处理和计算任务。
HelloCuda 系列 第四章: CUDA Profiling
深入探讨CUDA性能分析工具,了解如何优化GPU应用程序的性能和资源利用率。
深入理解 PyTorch 中的 `with torch.no_grad()`
本文深入探讨 PyTorch 中的 `with torch.no_grad()` 上下文管理器,解释其作用、使用场景及等价实现方式。
Python 中的三元条件表达式
了解 Python 中的三元条件表达式。
OCaml Function Definition
Learn how to define functions in OCaml, including syntax, examples, and best practices.
深度学习中的CNN、FNN和RNN:网络架构与应用差异详解
介绍前馈神经网络(FNN)、卷积神经网络(CNN)和循环神经网络(RNN)的基本概念、结构和应用场景。
TorchServe 基本用法
The comprehensive guide should get you started with deploying your FNN model using TorchServe.
安装 PyTorch CUDA
搭建 PyTorch CUDA 环境的详细步骤,包括创建 Conda 环境、安装 PyTorch 及其 CUDA 支持,并验证安装。
PyTorch 基本操作
介绍 PyTorch 的基本操作,包括张量的创建、形状变换、切片、连接、转置以及矩阵运算等。
C++ 中的 Cast 操作总结
C++ 中的类型转换操作符总结,包括 `static_cast`、`dynamic_cast`、`const_cast`、`reinterpret_cast`、`bit_cast`、`duration_cast` 和 C 风格的类型转换。每种转换的用途、特点和示例代码。
Install conda on Ubuntu
Install conda on Ubuntu
matchematics in machine learning
Understanding the role of mathematics in machine learning
理解 [[maybe_unused]]:处理未使用变量与函数的正确方式
介绍C++17引入的[[maybe_unused]]属性,如何优雅地处理未使用的变量和函数。
Rust 笔记
Rust is a systems programming language focused on safety, speed, and concurrency. It uses a unique ownership model to manage memory without a garbage collector.
The C++ language considers six member functions as special.
VSCode 配置 C/C++ 头文件路径
在 VSCode 中配置 C/C++ 项目的头文件搜索路径。
outs.guts.firing.boots
分析不同媒体在报道RFK Jr.解雇疫苗专家小组时使用的动词差异及其隐含立场。
Read - June 2025
Book reading records for June 2025
OpenAI模型微调实战:打造企业级邮件自动回复系统
本文详细介绍如何通过微调OpenAI模型创建符合企业专业术语和写作风格的邮件自动回复系统,包含完整代码实现和最佳实践。
NVIDIA GPU Architectures Explained: From Tesla to Ampere
A comprehensive guide to NVIDIA's GPU architectures, exploring key innovations from Tesla to Ampere for developers and tech enthusiasts.
C++11/14/17/20 核心特性速查表
Concise reference for key C++ features across modern standards, helping developers write efficient and maintainable code.
Blanket Implementation 与 Trait Bound 完全指南
A comprehensive guide to Rust's trait system, covering blanket implementations and trait bounds with practical examples. Explores advanced patterns, performance implications, and standard library internals.
CUDA 内存类型及特性总结
深入解析CUDA内存体系结构,涵盖寄存器、共享内存、全局内存等所有内存类型的工作原理与优化策略。包含Bank Conflict解决方案、内存访问模式优化技巧及实际性能对比分析。
Vulkan Graphics pipeline basics
Detailed examination of each Vulkan graphics pipeline stage and their rendering responsibilities
Drawing Your First Triangle with Vulkan: A Step-by-Step Guide
From VkInstance to vkCmdDraw: The essential steps to render your first Vulkan triangle
50 Universal Technical Phrases (Set 2)
Designed to help you articulate ideas clearly and professionally in technical discussions, interviews, or presentations.
PartialEq与Eq:深入理解相等性比较的差异与实现
全面解析Rust中PartialEq与Eq特质的区别,从数学定义到实际应用场景,帮助开发者正确实现和使用这两种相等性关系。
must_use
Rust官方文档对#[must_use]属性的完整说明,包含类型、函数、trait等场景的应用规范。
Send与Sync:深入解析
全面解析Rust中Send与Sync特质的工作原理、实现机制及实际应用场景,涵盖自动实现规则、手动实现要点以及常见并发模式的最佳实践。
Send与Sync:代码示例
通过基础代码示例演示Send特质的所有权跨线程转移和Sync特质的共享不可变数据访问。
RewardInfo in agave
Solana 账户获得的奖励信息。
Clone + Send + Sync的三重约束
trait AppendVecScan: Send + Sync + Clone
型变三法则:协变 vs 逆变 vs 不变
剖析Rust类型系统中协变(&T)、逆变(fn(T))和不变(&mut T)的核心区别,掌握安全泛型编程的关键规则。
50 Universal Technical Phrases
Designed to help you articulate ideas clearly and professionally in technical discussions, interviews, or presentations.
Rust编译时与运行时代码:核心特征与实用指南
深度解析Rust语言中编译时(compile-time)与运行时(runtime)代码的执行机制与判断方法,包含实际案例和优化技巧。
AIGC Hoopics
AIGC Hoopics is a comprehensive and general-purpose AI-generated content service. It is designed for scalability, flexibility, and efficiency, leveraging the power of Rust and Shell scripting to empower users in creating and managing AI-powered solutions.
hoopics-admin-restful-api
A performant admin backend for hoopics image sharing platform, rebuilt in Rust with Actix-Web and Diesel ORM. Features multi-database support and follows EggJS-inspired architecture.
SGX Attacker
SGX Attacker is an experimental project designed to explore vulnerabilities in Intel's Software Guard Extensions (SGX). It uses a combination of C++, Shell scripting, and Makefile to simulate and analyze potential attack vectors.
The Time Machine
The Time Traveller (for so it will be convenient to speak of him) was expounding a recondite matter to us. His pale grey eyes shone and twinkled, and his usually pale face was flushed and animated...
English Reading Journey Since 2016
Daily English book reading project started September 1, 2016