Overview

This is a personal note on hosting Hugging Face models on AWS Lambda for serverless inference, based on the following article.

https://aws.amazon.com/jp/blogs/compute/hosting-hugging-face-models-on-aws-lambda/

Additionally, I cover providing an API using Lambda function URLs and CloudFront.

Hosting Hugging Face Models on AWS Lambda

Preparation

For this section, I referred to the document introduced at the beginning.

https://aws.amazon.com/jp/blogs/compute/hosting-hugging-face-models-on-aws-lambda/

First, run the following commands. I created a virtual environment called venv, but this is not strictly required.

Note

The documentation states that you should run cdk deploy after this. As a result, I was able to deploy and run Lambda tests. However, when I issued and used the Lambda function URL described later, several errors occurred. Therefore, the following modifications are needed.

inference/*.py

When using via the function URL, parameters are stored in queryStringParameters, so processing for this needs to be added. If using POST requests, further modifications are required.

Below is an example of changes to sentiment.py. By changing the pipeline arguments, it is also possible to perform inference based on your own custom model.

inference/Dockerfile

Next, add a character encoding specification to the Dockerfile.

Deploy

After that, run the following to deploy.

Lambda Function URL

The following article is helpful as a reference.

https://dev.classmethod.jp/articles/integrate-aws-lambda-with-cloudfront/

Create a function URL from the Lambda console. This time, I simply specified “NONE” for the “Auth type” and enabled “Configure cross-origin resource sharing (CORS)”.

Additionally, I selected “*” (all) for “Allow methods”.

As a result, you can execute inference from a URL like the following.

https://XXX.lambda-url.us-east-1.on.aws/?text=i am happy

The following result is obtained.

CloudFront

For this configuration as well, the following article is helpful as a reference.

https://dev.classmethod.jp/articles/integrate-aws-lambda-with-cloudfront/

The important point is to set Query strings to All in the Origin request settings. Without this, the same result will be returned even when the URL query string is changed.

As a result, you can execute inference from a URL like the following. It is also possible to assign a custom domain.

https://yyy.cloudfront.net/?text=i am happy

Summary

I introduced how to host Hugging Face models on AWS Lambda for serverless inference. There may be some inaccuracies, but I hope this serves as a useful reference.

Overview#

Hosting Hugging Face Models on AWS Lambda#

Preparation#

Note#

inference/*.py#

inference/Dockerfile#

Deploy#

Lambda Function URL#

CloudFront#

Summary#