WrapperThis script finetunes a vision language model using Unsloth’s fast training framework.
It supports vision tasks by converting raw image-caption samples into a conversation format,
adding vision-specific LoRA adapters, and training using TRL’s SFTTrainer with UnslothVisionDataCollator.