Comment utiliser l’API d’inférence Hugging Face

Vidéo

Par Fahd Mirza le 04/26/2024 durée 08:05

Résumé

  • J'apprends que l'utilisation des modèles de Hugging Face en local est plus simple que je ne le pensais.
  • Je n'ai pas besoin de télécharger les modèles ou d'acheter des crédits. Une simple API suffit pour faire des appels.
  • Je recommande de créer un compte gratuit sur Hugging Face pour obtenir un token d'accès, ce qui peut réduire les limites de taux.
  • Pour utiliser un modèle, je dois installer la bibliothèque huggingface_hub avec Pip, puis exporter mon token dans mon environnement de code.
  • Je peux facilement réaliser des inférences en utilisant simplement une méthode HTTP POST, ce qui rend le processus rapide et efficace.
  • Je découvre qu'il est préférable de rester dans les limites d'utilisation gratuites pour éviter des blocages, et que les modèles les plus grands peuvent parfois être surchargés.
  • Je me rappelle qu'un compte Pro à 9 $ par mois offre moins de limitations et une meilleure expérience générale.
  • Je trouve intéressant que l'hébergement de mon propre modèle affiné est très abordable, à seulement 0,032 $ de l'heure.
  • Je prends note de l'importance de choisir le bon modèle en fonction de mes besoins, car certains modèles plus spécifiques peuvent donner de meilleures réponses.

Comment passer à l’action ?

Je suggérerais de commencer par créer un compte gratuit sur Hugging Face. Cela ne prend que quelques minutes et vous permettre d'obtenir un jeton d'accès, ce qui peut réduire les limites de taux. Après cela, je vous recommande d'installer la bibliothèque huggingface_hub en utilisant Pip. C’est un processus simple qui vous aide à accéder aux modèles facilement.

Une fois que vous avez installé la bibliothèque, exportez le jeton dans votre espace de code. Cela facilite l’inférence. Par exemple, vous pouvez faire des appels API en utilisant simplement la méthode HTTP POST pour obtenir des réponses des modèles. C’est rapide, et cela peut être fait avec presque aucun coût.

Pour éviter d'éventuels blocages, restez dans les limites d'utilisation gratuites. Les petits modèles fonctionnent bien et donnent souvent d'excellentes réponses. Si vous rencontrez des erreurs de limite, attendez un peu avant de réessayer.

Pensez également à tester différents modèles pour voir lequel répond le mieux à vos besoins. Si vous voulez une expérience améliorée, envisager le compte Pro à 9 $ par mois, qui offre moins de limitations.

Enfin, si vous avez un modèle que vous affinez, sachez qu'il est très abordable de l'héberger à seulement 0,032 $ de l'heure. Cela peut être une excellente option si vous souhaitez avancer dans vos projets avec efficacité.

Citations de Fahd Mirza

"Il est très simple d'utiliser n'importe quel modèle de Hugging Face localement juste avec l'API"

– Fahd Mirza

 

"Vous n'avez même pas besoin de vous connecter à Hugging Face"

– Fahd Mirza

 

"Si vous ne l'abusez pas, vous pouvez l'utiliser même dans vos petites applications gratuitement"

– Fahd Mirza

 

"C'est tellement facile de faire des inférences avec cela"

– Fahd Mirza

 

"La tarification est très abordable par rapport à tout autre fournisseur d'hébergement"

– Fahd Mirza

Transcription

one of the common misconception is that if you want to use hugging face models locally you have to first download them and then you can do the inference or you might have to buy some credit from hugging phas or create a face and then use it the thing is it is very very simple to use any hugging face model locally just with API so if you’re looking to do your API calls Lo to hugging face model just like we do with open a models then you can do it in a very very simple way and I going to show you in this video as how to do it for this you don’t even have to log into hugging face and you don’t even need an API key or access token but I would highly suggest that you grab one otherwise there is lot of rate limiting because lot of people are using it for free the first step create a account on hugging face if you don’t have already then on the top right just click on these three bars or your profile picture go to settings and then on the left hand side just click on access tokens and then create a new token and that is just for free just give it any name and read is fine and then generate a token once you have your token go to your local uh code editor wherever you are looking to do the infant with hugging face model locally I am using vs code the first up you would need to do two things first just simply install this hugging face _ Hub library with Pip and then export this hugging face _ token or HF unor token and then simply put your token here which you have just copied and created on hugging F website once that’s done and these are in your environment then simply you need to run this code let me run it and the prompt which I’m passing it is simply this what is happiness so let me run it and then I will explain the code let’s wait for it to come back there you go so you see that all I have done I have simply put in my prompt and it has given me the uh response p a very very fine response from a model and the model which I’m using is 53 mini and you can put in any model from hugging face there are some big models which sometimes refuse to answer because there’s so much load or it is quite expensive but all in all I have found out that most of the model you can easily use like this and if you don’t abuse it you can use it in in even in your small applications for free so first up I’m just importing these libraries then I’m specifying my models repo ID which you can of course grab from hugging face website and let me quickly show you how to do that so just go to hugging F like I’m here then search for your model once you have your model this is the whole repon name the model developer which in this case is Microsoft slash your model name simply click on this uh to sort of scares and it will copy it then go back to your code editor like this and paste it here I already have it then instantiate your llm client with that inference client library of hugging phe here then simply time out 120 it is more than enough 2 minutes and this is a function which I have to find call llm and then we are just passing it to entrance client we are passing it to our prompt and we are getting the response back with simple HTTP post method and then we are just generating 200 tokens you can increase it but because it’s a free one I would just suggest keep it like this so that it will be less chance of that you are throttled for rate limiting and it is going to respond um in a string format because it Returns the CH on and there you go so I’m just storing the response back from this function and then I printing out the response so this is how easy it is to do inference with it for example let me change the prompt here I’m just going to ask a coding question I’ll ask it um right right me a Python program to the list let’s save it let’s go down let me clear my screen and let me run that again and sometimes you will get some sort of rate limiting then in that case just wait for a few minutes or seconds and then drag in but this time you see because I’m not abusing it it has given me a perfect answer here that it has just given me this program of um python which is reversing the list this is how you do it and how easy this is and similarly you can change the model for example you can replace this five with maybe tiny llama or any model of your choice and then you can run the infant again so for example let me change the prompt here I’m just going to ask it write me a c j let’s see what it does I’m just going to save it let’s clear the screen and let me run it again you see because I think it’s a chat model so maybe that is why you can but infant is working and let me now show you the pricing at hugging face because that is important to know if you face start facing throttling errors or that sort of stuff so here you can see that this is a hugging face upub this is a free account which I am using at the moment and you can see that I can upload download any sort of models and then I can create as as many private rappers as as I want I already have plenty and then we already have seen that we can use that inance and there is community support in the Forum which is quite good but Pro account is also quite good in my humble opinion it just us9 per month and then you can do lot of stuff and there is less rate limiting in this still there is but less and then Enterprise Hub is quite Advanced quite generous and there is very little um throttling there but of course if you’re really really hammering it it is going to throttle your excess and it starts at $20 uh per user per month and then we have spaces Hardware I already have done a video on Zer GP which you can check out on my channel where I explain that how can you create it you can even uh you know use gpus there to but if you want to Simply use CPU you can easily do it and then inference endpoint that is what we also uh you know use but it is you can have your dedicated endpoint in set settings where you can if you have your own custom model which you find tune with Auto Train or anything else and you upload it and you want to just host your own model find tune one or train one you can host it and then you can simply access it like this it is very cheap as you can see 0032 per hour if you compare it to with any other hosting provider you’ll be surprised how cheap this pricing is and of course there are a lot of other things that you can check out I will drop the link to it in video’s description I hope that you enjoyed it if you like the content please consider subscribing to the channel and if you’re already subscribed then please share it among your network as it helps a lot thanks for watching