FLUX + LORA La façon la plus simple de créer une Lora pour Flux
Résumé
- J’ai découvert un nouveau modèle de génération d’images appelé Flux, qui peut produire des images étonnantes mais présente certaines limitations, telles que les styles artistiques et les personnalités célèbres.
- Pour contourner ces limitations, je peux créer des Loras ou des modèles affinés en ajoutant mes propres données d’entraînement, ce qui me permet d’obtenir des résultats plus personnalisés.
- L’entraînement d’un modèle Lora coûte entre 2 et 10 dollars et nécessite environ 20 images, avec un temps d’entraînement d’environ 30 minutes.
- Il est essentiel de préparer un ensemble d’images et, idéalement, des légendes qui expliquent ce qu’il y a dans chaque image pour garantir de bons résultats lors de l’entraînement.
- Je peux utiliser un modèle de vision appelé LAVA pour générer automatiquement les légendes, mais je devrais envisager de créer des légendes manuelles pour un contrôle et une précision optimaux.
- Des paramètres comme le nombre d’étapes et le taux d’apprentissage influencent le processus d’entraînement et les résultats finaux, nécessitant éventuellement quelques expérimentations.
- Après l’entraînement, j’ai été surpris par la qualité des images générées, obtenant des résultats meilleurs que ceux que j’ai précédemment réalisés avec d’autres modèles.
- En cas de besoin, je vais explorer l’utilisation des modèles entraînés sur des interfaces utilisateur comme Comfy UI, mais j’ai réalisé que j’avais oublié certaines étapes pour les télécharger correctement.
- Pour éviter des problèmes futurs, j’ai appris l’importance de documenter chaque étape et les réglages que je fais pendant l’entraînement, ce qui m’aidera à reproduire mes résultats plus tard.
Citations de Endangered AI
« La créativité est le moteur de l’innovation »
« Les limites ne définissent pas notre potentiel »
« Innover, c’est oser aller au-delà des frontières »
« Chaque idée compte, même la plus petite »
« Le succès appartient à ceux qui persévèrent »
Comment passer à l’action ?
Je suggérerais de commencer par utiliser le modèle Flux pour générer des images sans dépenser beaucoup d’argent ou de temps. Pour cela, je peux créer mes propres modèles Lora en ajoutant mes données d’entraînement. Cela coûte entre 2 et 10 dollars et peut être fait en seulement 30 minutes avec environ 20 images. Je vais d’abord préparer une collection d’images avec des légendes qui expliquent ce qu’elles montrent. Cela m’aidera à obtenir de bons résultats dès le début.
Une bonne façon de procéder serait d’utiliser un modèle de vision tel que LAVA pour générer automatiquement les légendes. Cependant, je pourrais aussi créer des légendes manuelles pour un meilleur contrôle. En ajoutant des détails spécifiques, comme les marques de lunettes ou de vêtements, je peux rendre mes images encore plus personnalisées.
Il serait aussi judicieux d’expérimenter avec les paramètres d’entraînement, comme le nombre d’étapes et le taux d’apprentissage. Au fil du temps, je noterais les ajustements et les résultats afin de reproduire mes succès. Je pourrais également explorer l’utilisation d’interfaces utilisateur comme Comfy UI pour gérer mes modèles Lora après leur entraînement.
Finalement, j’apprendrais de chaque étape afin de ne pas répéter les erreurs, comme ne pas oublier de configurer correctement le téléchargement du modèle. Cela m’aiderait à créer des images de meilleure qualité et à évoluer dans mon processus créatif.
Transcription
if you haven’t been living under a rock you’ve surely heard of the new image generation model that everyone is talking about flux it’s so good that people are even calling it a possible Contender to Mid Journey with over 12 billion parameters there’s not a lot that flux can’t do or can it if you’ve been using flux a lot you may have started to realize there are certain limitations with what it can generate particularly when referring to art styles famous personalities and other types of images that it refuses to produce probably initially done as a security measure these lockdowns can severely limit the creativity that can be achieved with flux to address this like with stable diffusion you can create luras or fine tunes by injecting training data with the missing information into the model allowing you to generate images based on what you want as long as you’ve provided the training data today we’re going to explore an easy way to generate your own Laura using online tools download it and use it on comfy UI if you’re interested in creating allur on your own Hardware using one of the tools like simple tuner or AI toolkit please let me know in the comments section below and I will create a video specifically for that for advanced users however today’s video is meant to be broad easy to understand and explain concepts for everybody now before we begin let’s have a quick refresher on what is the difference between a Laura and a fine tune both are different techniques to customize and train models in both situations you’ll need to prepare a set of images with the subject art style or concept that you’re training ideally have some captions that explain what’s happening in those images and run them through a training software the main difference is that allora is a standalone file that can be used with the base model or a variation of the model previously in stable diffusion if you trained Aura you could use it with the Bas table diffusion or any fine tune fine tune on the other hand is an all-in-one model that contains the training data and the base model Allin one typically you would use fine tunes for bigger batches of data or if you wanted better results because the training data would be injected at an earlier stage of the image generation process resulting in better understanding and better output however a lot of the times allora was often good enough particularly with anime images today we’re going to explore how to do allur and based on the results that I’ve seen you may not even need to explore fine-tuning I’m endangered Ai and let’s get plugged in so here we are on the replicate page the link is down in the description below and probably the first thing that you’re going to want to do is set up a payment method over here in pricing now don’t worry it’s not very expensive to train your own Laura it should cost cost anywhere between $2 and $10 depending on the parameters and the training time that you put in from what I’ve seen uh relatively basic training with about 20 images shouldn’t take more than 30 minutes obviously the more images the more captions you have the longer that will take okay so here we are on the replicate training page and we’re going to attempt two trainings today one of myself and one of Final Fantasy character Styles this way we can show what are the different things that we need to do when training a face or a person and how the process varies when you’re training a style so the first thing you’re going to want to do before anything is put together a bunch of images that you are going to use as your training data here I have a whole bunch of screenshots of myself and here are a whole bunch of Final Fantasy 7 images that I’ve collected from the internet once you’ve done that head on over to the replicate training page link is down below in the description and let’s go through each of the fields and parameters to understand what everything does up here at the top we’ve got destination this is going to be the name of the model in this case let’s call it endangered AI person and click create new model this will create a allocation and replicate where the model will be stored next we’ve got the input images now this is really important because there’s a couple of ways that you can upload the images which can significantly affect your training if we come back here to the training images we’re going to start start with the endangered AI folder we’re going to go ahead and zip it and this will be the file that we upload now it’s important to note that in here we have only images further down on the page you’ll see this autoc caption flag to successfully train a Laura model you need to include the training images as well as text writing explaining what is in each image replicate uses this Vision model called lava 1.53 billion and it does its best to explain what is happening in the image however if if you want to take a fine tooth comb and actually caption the images yourself because there are certain things that you want to reference or you want to explicitly point out certain elements Styles or features in the images you can do that by creating a text file for each image included here so for example for image 01 I would create I would create a new text file and simply describe what are the key elements in the image that are important to me so in this case let’s assume that lava would describe a brown male with a beard and glasses if that’s not good enough for us we might want to specify for example the brand of the glasses let’s say that these are Armani glasses so if we look at this description we can see here some elements that I’ve put in that may not have been included in the model for example I’ve specified the brand of the glasses and I’ve also specified a light beard while the lava model May pick up on the beard it may not describe it in the way that we want let’s say for example in our training images we have different lengths of beard and we want to be able to specify that length in the model when we’re doing the inference this is the opportunity to do that once you’ve done that you go ahead and save the file as the exact same name as the image file in this case it would be 0 1.txt that way when we zip everything together and we upload it to the trainer it knows which caption belongs to which image however for today we’re just going to use the autoc captioning but I wanted to explain what the manual captioning process is like in case you wanted to try it out so if we come back here and we grab our ZIP file let’s go ahead and upload it if we are not including manual captions you might be wondering how do we reference the character or component in the images that we’re training on this is where trigger word comes in this is effectively what token are you assigning to the primary element of the image in this case it’s a person so we will just use x e AI X typically you want to use something that is not typically found in the English language so it doesn’t conflict with any other Concepts that exist within the model that way when you reference in this case x e a iix it can find the training images that we’ve uploaded as well as its relationships with everything else in the model then as we come down we’ve got autoc caption so these again are the captions that will be created by laa now if you’re not captioning manually that doesn’t mean you don’t have control over how the captions appear as there’s also this autoc caption prefix and suffix and this is an opportunity again for you to add specific elements that are present across all images that you want the model to train upon so in this case just to make sure that it references a man we can say a man of Indian descent so now it will include on every caption a man of Indian descent the trigger word and then whatever the caption is and then same thing for the suffix this is exactly the same as prefix but it takes whatever you put in here and puts it at the end as we come down here to number of steps this is the number of steps that the model will train itself on the data typically more is better but not always overdoing it can result in a phenomenon called overtraining where the model or the Laura takes a little too much influence from what you’re training and it can result in images with artifacts or it trying to forcibly put elements from the training images in places where they shouldn’t be for the purposes of this experimentation we’re just going to leave it at the default at a th000 as it sits well within the training range however as I said depending on the result of your training you may want to come back and increase the steps if you feel it’s not picked up the concepts aggressively enough or reduce it if you’re seeing too much artifacting coming in from the training images learning rate this is how much the model learns at each step and again you can mess around with this in conjunction with the steps you might want to for example increase the learning rate but decrease the number of steps or vice versa these can have different results in what the model finally outputs and again some experimentation will be required to see how these parameters affect the final model and finally we have here Laura rank the description here says that it needs to be a number that is 16 32 64 or 1 128 and that higher ranks take longer to train but capture more complex features so this will be especially useful if you have images with a lot of information a lot of components a lot of intricacy and your caption captures that just to give an example let’s find an image with a bit of complexity let’s grab something like this where there’s a lot going on in the image we have the fact that it appears to be a cafe it’s nighttime we’ve got the neon lights then we’ve got the things happening inside the cafe where we’ve got a person we’ve got the furniture which appears to be this kind of garden furniture style we’ve got some vending machines so this would be an opportunity to increase the Lowa rank as long as our caption captures all of these elements in the image so the training time took approximately 20 minutes it actually was a little bit shorter than what I expected I had said 30 minutes both of these trainings completed in 19 minutes so let’s go have a look at them now if you want to access the models head on over to your dashboard up here click on models and all the models that you’ve trained should be here I also did the Final Fantasy 7 style one just so that we’re not wasting time we’re going to try them both out now so if we jump into the first one let’s prompt up selfie in Paris so we’re going to use our token uh selfie of x e AI X in Paris and let’s see what we get so we can see here it’s starting the process loading in the Laura and we should have the image momentarily well I definitely have a bit more hair than uh I typically do but besides that that image is actually fantastic I am thoroughly impressed I mean look at this it’s even picked up here the defraction you get from glasses now uh I I will say it is a little extreme uh it doesn’t typically look like that but these results are better than anything I’ve ever been able to do with stable diffusion even the hair even though it’s given me a little bit more than what I usually have that is the texture of my hair when it’s frizzled uh let’s let’s try this again let’s let’s have another go and see what we can get so this is what I was referring to about not even needing to fine tune this Laura is giving me better results than anything I’ve ever been able to achieve with sdxl fine-tuning here we go uh this is another fantastic image it’s not perfect but it’s 90 95% of the way they are I could show this to people they would not be able to recognize that it’s AI unless they start to dig in deep it’s got the the coloring of my beard perfectly and uh all the major facial expressions this is actually impressive right let’s try out the Final Fantasy one let’s see if we’re able to get that style transposition over let’s go back to our dashboard models and grab the ffi style so let’s try blonde woman in FF style let’s make it a simple prompt first okay it’s not quite what I had in mind it doesn’t quite get that PS1 style but it kind of does this is absolutely fascinating it’s like an upscaled model of what Final Fantasy F 7 remasters should look like if I’m not yeah let’s let’s take this a step further Let’s uh this is fun this is a dangerous um selfie in Paris off large blackhaired man okay not quite as good as what we had before there’s some funny business Happening Here the background is realistic whereas the main character is very much an anime style a really cool looking anime style again it’s got a lot of those facets from that Final Fantasy 7 style similar to what we saw in the previous image just with a bit more anime sty let’s let’s make this here this is where doing some manual captioning can come in handy because we let the auto captioner run we also don’t know what or how it’s captioned the images if you do it manually you can have an idea of what you’ve used to tell the image what is what and then you can use that in your prompts to get exactly what you want so here you know we’ve thrown in 3D again it is giving us a bit more of a 3D look but the background is still very realistic and the image still very much looks uh it has a bit of a 3D look looks like a cell-shaded 3D anime style so I would say that the performance here is maybe not as good as the training of myself uh again we only picked out 20 images there was a lot going on in them and perhaps with better captioning we could get better results so what happens if we now want to download these luras and use them in comfy UI let’s go ahead and do that now so unfortunately don’t be me earlier when we were going through the training parameters there is a field here that is hugging face repo ID and hugging face token we had skipped those over and if you want to download the model for use with comfy UI that is actually the easiest way to do it so if you want to do it the way that you would do it head on over to hugging face log in head on over to the top right menu click on new model and go ahead and create the model here in this case we are going to do e AI person you can choose to make it public or private and then go ahead and create it once it’s done you’ll have a allocation for the model here uh but there’s nothing there however we are going to grab this URL over here drop it into the repo ID then head back over to hugging face go to settings access tokens create a new token give it a name we’re going to want to give it write capabilities as we are writing a model onto it so give a name to the Token in this case we’re going to call it replicate and go ahead and create the token once you’ve done that copy it into hugging face token and then go ahead and create the training unfortunately because I did not do that the model is not available for me to download and test out on comfy UI right now however if you are interested in seeing the results what I was planning to do in this part of the video is try and combine the two Lura to create Final Fantasy style images of myself if you are interested in checking out those results I will will be posting the workflow on my patreon as well as the result images on my Discord and my blog so like subscribe click the Bell icon if you want to have a look at those images I’ll create a post here on YouTube when I’ve done some of those experiments if you want to check out the results finally if you found this video helpful please head on over to my patreon and check it out your support there makes it absolutely invaluable in creating these videos I