微调 DeepSeek R1(推理模型)
DeepSeek 颠覆了 AI 领域,通过推出一系列全新高级推理模型挑战 OpenAI 的主导地位。最棒的是?这些模型完全免费使用,没有任何限制,每个人都可以使用。您可以在下面观看有关如何微调 DeepSeek 的视频教程。
https://youtube.com/watch?v=qcNmOItRw4U%3Fsi%3DBwYocwA6CmoPRzvx
在本教程中,我们将在 Hugging Face 的医疗思路链数据集上对模型进行微调DeepSeek-R1-Distill-Llama-8B
。这个精简的 DeepSeek-R1 模型是通过在使用 DeepSeek-R1 生成的数据上对 Llama 3.1 8B 模型进行微调而创建的。它展示了与原始模型类似的推理能力。

作者图片
DeepSeek R1 简介
中国人工智能公司 DeepSeek AI 已开源其第一代推理模型 DeepSeek-R1 和 DeepSeek-R1-Zero,它们在数学、编码和逻辑等推理任务上的表现可与 OpenAI 的 o1 相媲美。您可以阅读我们关于 DeepSeek R1 的完整指南以了解更多信息。
DeepSeek-R1-Zero
DeepSeek-R1-Zero 是第一个完全使用大规模强化学习(RL) 而不是监督式微调 (SFT) 作为初始步骤进行训练的开源模型。这种方法使模型能够独立探索思路链(CoT) 推理、解决复杂问题并迭代优化其输出。然而,它面临着重复推理步骤、可读性差和语言混合等挑战,这些挑战会影响其清晰度和可用性。
DeepSeek-R1
DeepSeek-R1 的推出是为了通过在强化学习之前加入冷启动数据来克服 DeepSeek-R1-Zero 的局限性,为推理和非推理任务提供坚实的基础。
这种多阶段训练使模型能够在数学、代码和推理基准上实现与 OpenAI-o1 相当的最先进的性能,同时提高其输出的可读性和连贯性。
DeepSeek 蒸馏
除了需要大量计算能力和内存才能运行的大型语言模型外,DeepSeek 还引入了精简模型。这些更小、更高效的模型已经证明它们仍然可以实现出色的推理性能。
这些模型的参数范围从 1.5B 到 70B,保留了强大的推理能力,其中 DeepSeek-R1-Distill-Qwen-32B 在多个基准测试中的表现均优于 OpenAI-o1-mini。
较小的模型继承了较大模型的推理模式,展示了提炼过程的有效性。

阅读DeepSeek -R1:功能、o1 比较、提炼模型等博客,了解其主要功能、开发过程、提炼模型、访问、定价以及与 OpenAI o1 的比较。
微调 DeepSeek R1:分步指南
要微调DeepSeek R1模型,您可以按照以下步骤操作:
1. 设置
对于这个项目,我们使用 Kaggle 作为我们的 Cloud IDE,因为它可以免费访问 GPU,而这些 GPU 通常比 Google Colab 中提供的 GPU 更强大。首先,启动一个新的 Kaggle 笔记本,并将您的 Hugging Face 令牌和 Weights & Biases 令牌添加为机密。
您可以通过导航到Add-ons
Kaggle 笔记本界面中的选项卡并选择Secrets
选项来添加机密。
设置机密后,安装unsloth
Python 包。Unsloth 是一个开源框架,旨在使微调大型语言模型 (LLM) 的速度提高 2 倍,并且更节省内存。
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
使用我们从 Kaggle Secrets 中安全提取的 Hugging Face API 登录到 Hugging Face CLI。
from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
hf_token = user_secrets.get_secret("HUGGINGFACE_TOKEN")
login(hf_token)
使用您的 API 密钥登录 Weights & Biases(wandb
)并创建一个新项目来跟踪实验和微调进度。
import wandb
wb_token = user_secrets.get_secret("wandb")
wandb.login(key=wb_token)
run = wandb.init(
project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset',
job_type="training",
anonymous="allow"
)
2. 加载模型和标记器
对于这个项目,我们正在加载DeepSeek-R1-Distill-Llama-8B的 Unsloth 版本。此外,我们将以 4 位量化加载模型,以优化内存使用和性能。
from unsloth import FastLanguageModel
max_seq_length = 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
token = hf_token,
)
3. 微调前的模型推理
为了为模型创建提示样式,我们将定义一个系统提示,并包含用于生成问题和响应的占位符。提示将引导模型逐步思考并提供合乎逻辑且准确的响应。
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
### Question:
{}
### Response:
<think>{}"""
在这个例子中,我们将向提供一个医疗问题prompt_style
,将其转换为标记,然后将标记传递给模型进行响应生成。
question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"
FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")
outputs = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=1200,
use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])
即使没有微调,我们的模型也成功地生成了思路链,并在给出最终答案之前进行了推理。推理过程封装在 <think></think> 标签中。
那么,为什么我们还需要微调呢?推理过程虽然详细,但却冗长而不简洁。此外,最终答案是以项目符号格式呈现的,这偏离了我们想要微调的数据集的结构和风格。
<think>
Okay, so I have this medical question to answer. Let me try to break it down. The patient is a 61-year-old woman with a history of involuntary urine loss during activities like coughing or sneezing, but she doesn't leak at night. She's had a gynecological exam and a Q-tip test. I need to figure out what cystometry would show regarding her residual volume and detrusor contractions.
First, I should recall what I know about urinary incontinence. Involuntary urine loss during activities like coughing or sneezing makes me think of stress urinary incontinence. Stress incontinence typically happens when the urethral sphincter isn't strong enough to resist increased abdominal pressure from activities like coughing, laughing, or sneezing. This usually affects women, especially after childbirth when the pelvic muscles and ligaments are weakened.
The Q-tip test is a common diagnostic tool for stress urinary incontinence. The test involves inserting a Q-tip catheter, which is a small balloon catheter, into the urethra. The catheter is connected to a pressure gauge. The patient is asked to cough, and the pressure reading is taken. If the pressure is above normal (like above 100 mmHg), it suggests that the urethral sphincter isn't closing properly, which is a sign of stress incontinence.
So, based on the history and the Q-tip test, the diagnosis is likely stress urinary incontinence. Now, moving on to what cystometry would show. Cystometry, also known as a filling cystometry, is a diagnostic procedure where a catheter is inserted into the bladder, and the bladder is filled with a liquid to measure how much it can hold (residual volume) and how it responds to being filled (like during a cough or sneeze). This helps in assessing the capacity and compliance of the bladder.
In a patient with stress incontinence, the bladder's capacity might be normal, but the sphincter's function is impaired. So, during the cystometry, the residual volume might be within normal limits because the bladder isn't overfilled. However, when the patient is asked to cough or perform a Valsalva maneuver, the detrusor muscle (the smooth muscle layer of the bladder) might not contract effectively, leading to an increase in intra-abdominal pressure, which might cause leakage.
Wait, but detrusor contractions are usually associated with voiding. In stress incontinence, the issue isn't with the detrusor contractions but with the sphincter's inability to prevent leakage. So, during cystometry, the detrusor contractions would be normal because they are part of the normal voiding process. However, the problem is that the sphincter doesn't close properly, leading to leakage.
So, putting it all together, the residual volume might be normal, but the detrusor contractions would be normal as well. The key finding would be the impaired sphincter function leading to incontinence, which is typically demonstrated during the Q-tip test and clinical history. Therefore, the cystometry would likely show normal residual volume and normal detrusor contractions, but the underlying issue is the sphincter's inability to prevent leakage.
</think>
Based on the provided information, the cystometry findings in this 61-year-old woman with stress urinary incontinence would likely demonstrate the following:
1. **Residual Volume**: The residual volume would be within normal limits. This is because the bladder's capacity is typically normal in cases of stress incontinence, where the primary issue lies with the sphincter function rather than the bladder's capacity.
2. **Detrusor Contractions**: The detrusor contractions would also be normal. These contractions are part of the normal voiding process and are not impaired in stress urinary incontinence. The issue is not with the detrusor muscle but with the sphincter's inability to prevent leakage.
In summary, the key findings of the cystometry would be normal residual volume and normal detrusor contractions, highlighting the sphincteric defect as the underlying cause of the incontinence.<|end▁of▁sentence|>
4.加载和处理数据集
我们将通过添加复杂思路链列的第三个占位符来稍微改变处理数据集的提示样式。
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
### Question:
{}
### Response:
<think>
{}
</think>
{}"""
编写 Python 函数,在数据集中创建一个“文本”列,该列由训练提示样式组成。用问题、文本链和答案填充占位符。
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
inputs = examples["Question"]
cots = examples["Complex_CoT"]
outputs = examples["Response"]
texts = []
for input, cot, output in zip(inputs, cots, outputs):
text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
texts.append(text)
return {
"text": texts,
}
我们将从FreedomIntelligence/medical-o1-reasoning-SFT数据集(可在 Hugging Face 中心获得)中加载前 500 个样本。之后,我们将text
使用formatting_prompts_func
函数映射列。
from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]
我们可以看到,文本栏有系统提示、说明、思路、答案。
"Below is an instruction that describes a task, paired with an input that provides further context. \nWrite a response that appropriately completes the request. \nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. \nPlease answer the following medical question. \n\n### Question:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Response:\n<think>\nOkay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing. This sounds a lot like stress urinary incontinence to me. Now, it's interesting that she doesn't have any issues at night; she isn't experiencing leakage while sleeping. This likely means her bladder's ability to hold urine is fine when she isn't under physical stress. Hmm, that's a clue that we're dealing with something related to pressure rather than a bladder muscle problem. \n\nThe fact that she underwent a Q-tip test is intriguing too. This test is usually done to assess urethral mobility. In stress incontinence, a Q-tip might move significantly, showing urethral hypermobility. This kind of movement often means there's a weakness in the support structures that should help keep the urethra closed during increases in abdominal pressure. So, that's aligning well with stress incontinence.\n\nNow, let's think about what would happen during cystometry. Since stress incontinence isn't usually about sudden bladder contractions, I wouldn't expect to see involuntary detrusor contractions during this test. Her bladder isn't spasming or anything; it's more about the support structure failing under stress. Plus, she likely empties her bladder completely because stress incontinence doesn't typically involve incomplete emptying. So, her residual volume should be pretty normal. \n\nAll in all, it seems like if they do a cystometry on her, it will likely show a normal residual volume and no involuntary contractions. Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.\n</think>\nCystometry in this case of stress urinary incontinence would most likely reveal a normal post-void residual volume, as stress incontinence typically does not involve issues with bladder emptying. Additionally, since stress urinary incontinence is primarily related to physical exertion and not an overactive bladder, you would not expect to see any involuntary detrusor contractions during the test.<|end▁of▁sentence|>"
5. 建立模型
使用目标模块,我们将通过向模型添加低秩采用者来建立模型。
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth", # True or "unsloth" for very long context
random_state=3407,
use_rslora=False,
loftq_config=None,
)
接下来,我们将通过提供模型、标记器、数据集和其他重要的训练参数来设置训练参数和训练器,以优化我们的微调过程。
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
dataset_num_proc=2,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
# Use num_train_epochs = 1, warmup_ratio for full training runs!
warmup_steps=5,
max_steps=60,
learning_rate=2e-4,
fp16=not is_bfloat16_supported(),
bf16=is_bfloat16_supported(),
logging_steps=10,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs",
),
)
6.模型训练
运行以下命令开始训练。
trainer_stats = trainer.train()
训练过程耗时 44 分钟。训练损失逐渐减少,这是模型性能提高的一个好兆头。

您可以通过登录网站并查看项目,在权重和权重仪表板上查看填充模型评估报告。

如果您在运行上述代码时遇到问题,请参阅微调 DeepSeek R1(推理模型) Kaggle 笔记本。
7. 微调后的模型推理
为了比较结果,我们将向微调模型提出与之前相同的问题,看看有什么变化。
question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"
FastLanguageModel.for_inference(model) # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")
outputs = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=1200,
use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])
这个好多了,准确多了。思路很直接,答案很直接,就一个段落。微调成功了。
<think>
Okay, so let's think about this. We have a 61-year-old woman who's been dealing with involuntary urine loss during things like coughing or sneezing, but she's not leaking at night. That suggests she might have some kind of problem with her pelvic floor muscles or maybe her bladder.
Now, she's got a gynecological exam and a Q-tip test. Let's break that down. The Q-tip test is usually used to check for urethral obstruction. If it's positive, that means there's something blocking the urethra, like a urethral stricture or something else.
Given that she's had a positive Q-tip test, it's likely there's a urethral obstruction. That would mean her urethra is narrow, maybe due to a stricture or some kind of narrowing. So, her bladder can't empty properly during activities like coughing because the urethral obstruction is making it hard.
Now, let's think about what happens when her bladder can't empty. If there's a urethral obstruction, the bladder is forced to hold more urine, increasing the residual volume. That's because her bladder doesn't empty completely. So, her residual volume is probably increased.
Also, if her bladder can't empty properly, she might have increased detrusor contractions. These contractions are usually stronger to push the urine out. So, we expect her detrusor contractions to be increased.
Putting it all together, if she has a urethral obstruction and a positive Q-tip test, we'd expect her cystometry results to show increased residual volume and increased detrusor contractions. That makes sense because of the obstruction and how her bladder is trying to compensate by contracting more.
</think>
Based on the findings of the gynecological exam and the positive Q-tip test, it is most likely that the cystometry would reveal increased residual volume and increased detrusor contractions. The positive Q-tip test indicates urethral obstruction, which would force the bladder to retain more urine, thereby increasing the residual volume. Additionally, the obstruction can lead to increased detrusor contractions as the bladder tries to compensate by contracting more to expel the urine.<|end▁of▁sentence|>
8. 本地保存模型
现在,让我们在本地保存采用者、完整模型和标记器,以便我们可以在其他项目中使用它们。
new_model_local = "DeepSeek-R1-Medical-COT"
model.save_pretrained(new_model_local)
tokenizer.save_pretrained(new_model_local)
model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

9. 将模型推送至 Hugging Face Hub
我们还将把采用者、标记器和模型推送到 Hugging Face Hub,以便 AI 社区可以将此模型集成到他们的系统中来利用它。
new_model_online = "kingabzpro/DeepSeek-R1-Medical-COT"
model.push_to_hub(new_model_online)
tokenizer.push_to_hub(new_model_online)
model.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit")

来源:kingabzpro/DeepSeek-R1-Medical-COT·Hugging Face
学习之旅的下一步是将模型部署到云端。您可以按照如何使用 BentoML 部署 LLM指南进行操作,该指南提供了使用 BentoML 和 vLLM 等工具高效且经济高效地部署大型语言模型的分步流程。
结论
人工智能领域瞬息万变。开源社区正在接管市场,挑战过去三年来统治人工智能领域的专有模型的主导地位。
开源大型语言模型 (LLM) 变得越来越好、越来越快、越来越高效,使得在较低的计算和内存资源上对其进行微调变得比以往任何时候都更容易。
在本教程中,我们探索了 DeepSeek R1 推理模型,并学习了如何针对医疗问答任务微调其精简版本。微调后的推理模型不仅可以提高性能,还可以应用于医学、急救服务和医疗保健等关键领域。
为了应对DeepSeek R1的推出,OpenAI推出了两款强大的工具:OpenAI的o3,一种更先进的推理模型,以及OpenAI的操作员AI代理,由新的计算机使用代理(CUA)模型提供支持,可以自主浏览网站并执行任务。