Handling File Upload in Mule API using Multipart Form Data

    Handling and processing large files via API can sometimes be challenging because the bytes of data need to be encrypted and sent over the network, which may have some bandwidth and security restrictions. However, in my experience it is still possible to send a moderately large file (less than 5 MB) using multipart/form-data POST request in Mule. This is possible because Mule supports streaming of data, which simply means that the whole data is NOT being read and loaded into memory; rather, the data are being read by chunks or buffers of a specific size in an efficient way. The HTTP Listener endpoint is one of those endpoints that support streaming. More technical details here.

    File transfers of moderately large file size can be done in Mule. For very large files, I haven't explored this yet, but the most common solution in the past is using the secured FTP transfer. File transfers of very large files via API may be possible but may require a lot more worker size (maybe 1 GB RAM or more) if you're using CloudHub. In the on-prem runtime environment, this may cause an Out-of-Memory error, which may affect other running applications, so this should be carefully designed. The best way to mitigate this is to impose a size limit on the request that can be sent by the client application, and this can be applied on the API gateway level.

    Mule API can receive and process a file using a multipart/form-data request. This can simply be done by defining the POST body in the RAML as multipart/form-data media type, with properties as file type. Below is a sample RAML definition to do that.


    For example, I expect a CSV file to be submitted to the API endpoint. I would define a form-data named csvDocument of file type. You may notice that you cannot specify a specific media type for the file. At this point, the file can be of any type; it could be a PDF file, a CSV file, JPG file, or whatever. You first need to check the specific media type of the file before processing it further, and apply a proper error handling if it is not of the expected media type. Mule parses the parts into headers and content. The headers is where you can check the media type of the file that is received. Using DataWeave, you can access both the headers and the content from the payload.

The following DataWeave expression returns the headers of the csvDocument part:

    payload.parts.csvDocument.headers

{
  "Content-Disposition": {
    name: "csvDocument",
    filename: "personal_data.csv",
    subtype: "form-data"
  },
  "Content-Type": "application/octet-stream"
}

    I found this a bit tricky because for a CSV file, it returned the Content-Type as application/octet-stream which is a binary data type. However, for PDF file it returns Content-Type as application/pdf which is fine and can be used for error handling.

    One way to handle this is to use the following DataWeave expression to read the content as application/csv. If this does not return any error, then the content is good and of the expected CSV format. You also have to use this expression because the content is originally in a binary format, and cannot be used directly to transform it into another format such as JSON or XML.

    read(payload.parts.csvDocument.content, 'application/csv')

    To test this you can go the APIKit Console, and then choose a file to upload.


    I hope this post is helpful and please let me know if you have any comments or questions below. :)

Comments

Post a Comment

Popular posts from this blog

XML Schema and JSON Schema Validation in Mule 4

Using XML To Java in TIBCO BW