week 08 [flux-dev-lora-trainer]

For this week's assignment, I decided to experiment with fine-tuning an image model as a way to deepen my understanding of the process. I chose to work with the ostris/flux-dev-lora-trainer and implemented it in Node.js. The code provided by Replicate was user-friendly and straightforward to set up.

I uploaded a collection of images from a project I worked on earlier this semester, hoping the model would replicate a similar style. Initially, I used the trigger word "TOK," which was suggested in the example code, just to verify that the model was functioning correctly. However, the results felt somewhat random. The generated images only closely matched my reference style when I also uploaded a specific reference image during generation—which, while effective, felt a bit like "cheating" rather than achieving the fine tuned style model.
I also didn't liked the results, but I guess that this can be improve them with the right fine tuning?

To help the model better associate with my dataset, I later changed the trigger word to "GEMMA" to ensure it consistently referenced my images.

But, it didn't seem to help.
why?

Link to the fine-tune model

Link to the video (I still need to Deploy it)

Example of data images:

Example of the results (which were accompanied with an image):

import Replicate from "replicate";
import dotenv from "dotenv";
import fs from "fs";

dotenv.config();

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});


async function trainModel() {
  try {
    const training = await replicate.trainings.create(
      "ostris",
      "flux-dev-lora-trainer",
      "e440909d3512c31646ee2e0c7d6f6f4923224863a6a10c494606e79fb5844497",
      {
        destination: "michalshoshannyu/4pptest",
        input: {
          steps: 1000,
          lora_rank: 16,
          optimizer: "adamw8bit",
          batch_size: 1,
          resolution: "512,768,1024",
          autocaption: true,
          input_images: "https://raw.githubusercontent.com/michalshoshannyu/dataset/main/data.zip",  
          trigger_word: "GEMMA",
          learning_rate: 0.0004,
          wandb_project: "flux_train_replicate",
          wandb_save_interval: 100,
          caption_dropout_rate: 0.05,
          cache_latents_to_disk: false,
          wandb_sample_interval: 100,
        },
      }
    );

    console.log("Training started:", training.status);
    console.log("Training ID:", training.id);
  } catch (error) {
    console.error("Error starting training:", error);
  }
}

trainModel();