Overview
This is a personal note on hosting Hugging Face models on AWS Lambda for serverless inference, based on the following article.
https://aws.amazon.com/jp/blogs/compute/hosting-hugging-face-models-on-aws-lambda/
Additionally, I cover providing an API using Lambda function URLs and CloudFront.
Hosting Hugging Face Models on AWS Lambda
Preparation
For this section, I referred to the document introduced at the beginning.
https://aws.amazon.com/jp/blogs/compute/hosting-hugging-face-models-on-aws-lambda/
First, run the following commands. I created a virtual environment called venv, but this is not strictly required.
Note
The documentation states that you should run cdk deploy after this. As a result, I was able to deploy and run Lambda tests. However, when I issued and used the Lambda function URL described later, several errors occurred. Therefore, the following modifications are needed.
inference/*.py
When using via the function URL, parameters are stored in queryStringParameters, so processing for this needs to be added. If using POST requests, further modifications are required.
Below is an example of changes to sentiment.py. By changing the pipeline arguments, it is also possible to perform inference based on your own custom model.
inference/Dockerfile
Next, add a character encoding specification to the Dockerfile.
Deploy
After that, run the following to deploy.
Lambda Function URL
The following article is helpful as a reference.
https://dev.classmethod.jp/articles/integrate-aws-lambda-with-cloudfront/
Create a function URL from the Lambda console. This time, I simply specified “NONE” for the “Auth type” and enabled “Configure cross-origin resource sharing (CORS)”.

Additionally, I selected “*” (all) for “Allow methods”.

As a result, you can execute inference from a URL like the following.
https://XXX.lambda-url.us-east-1.on.aws/?text=i am happy
The following result is obtained.
CloudFront
For this configuration as well, the following article is helpful as a reference.
https://dev.classmethod.jp/articles/integrate-aws-lambda-with-cloudfront/
The important point is to set Query strings to All in the Origin request settings. Without this, the same result will be returned even when the URL query string is changed.
As a result, you can execute inference from a URL like the following. It is also possible to assign a custom domain.
https://yyy.cloudfront.net/?text=i am happy
Summary
I introduced how to host Hugging Face models on AWS Lambda for serverless inference. There may be some inaccuracies, but I hope this serves as a useful reference.