Network Working Group E. Nebel Request For Comments: 1867 L. Masinter Category: Experimental Xerox Corporation November 1995 Form-based File Upload in HTML Status of this Memo This memo defines an Experimental Protocol for the Internet community. This memo does not specify an Internet standard of any kind. Discussion and suggestions for improvement are requested. Distribution of this memo is unlimited. 1. Abstract Currently, HTML forms allow the producer of the form to request information from the user reading the form. These forms have proven useful in a wide variety of applications in which input from the user is necessary. However, this capability is limited because HTML forms don't provide a way to ask the user to submit files of data. Service providers who need to get files from the user have had to implement custom user applications. (Examples of these custom browsers have appeared on the www-talk mailing list.) Since file-upload is a feature that will benefit many applications, this proposes an extension to HTML to allow information providers to express file upload requests uniformly, and a MIME compatible representation for file upload responses. This also includes a description of a backward compatibility strategy that allows new servers to interact with the current HTML user agents. The proposal is independent of which version of HTML it becomes a part. 2. HTML forms with file submission The current HTML specification defines eight possible values for the attribute TYPE of an INPUT element: CHECKBOX, HIDDEN, IMAGE, PASSWORD, RADIO, RESET, SUBMIT, TEXT. In addition, it defines the default ENCTYPE attribute of the FORM element using the POST METHOD to have the default value "application/x-www-form-urlencoded". This proposal makes two changes to HTML: 1) Add a FILE option for the TYPE attribute of INPUT. 2) Allow an ACCEPT attribute for INPUT tag, which is a list of media types or type patterns allowed for the input. In addition, it defines a new MIME media type, multipart/form-data, and specifies the behavior of HTML user agents when interpreting a form with ENCTYPE="multipart/form-data" and/or <INPUT type="file"> tags. These changes might be considered independently, but are all necessary for reasonable file upload. The author of an HTML form who wants to request one or more files from a user would write (for example): <FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST> File to process: <INPUT NAME="userfile1" TYPE="file"> <INPUT TYPE="submit" VALUE="Send File"> </FORM> The change to the HTML DTD is to add one item to the entity "InputType". In addition, it is proposed that the INPUT tag have an ACCEPT attribute, which is a list of comma-separated media types. ... (other elements) ... <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX | RADIO | SUBMIT | RESET | IMAGE | HIDDEN | FILE )"> <!ELEMENT INPUT - 0 EMPTY> <!ATTLIST INPUT TYPE %InputType TEXT NAME CDATA #IMPLIED -- required for all but submit and reset VALUE CDATA #IMPLIED SRC %URI #IMPLIED -- for image inputs -- CHECKED (CHECKED) #IMPLIED SIZE CDATA #IMPLIED --like NUMBERS, but delimited with comma, not space MAXLENGTH NUMBER #IMPLIED ALIGN (top|middle|bottom) #IMPLIED ACCEPT CDATA #IMPLIED --list of content types > ... (other elements) ... 3. Suggested implementation While user agents that interpret HTML have wide leeway to choose the most appropriate mechanism for their context, this section suggests how one class of user agent, WWW browsers, might implement file upload. 3.1 Display of FILE widget When a INPUT tag of type FILE is encountered, the browser might show a display of (previously selected) file names, and a "Browse" button or selection method. Selecting the "Browse" button would cause the browser to enter into a file selection mode appropriate for the platform. Window-based browsers might pop up a file selection window, for example. In such a file selection dialog, the user would have the option of replacing a current selection, adding a new file selection, etc. Browser implementors might choose let the list of file names be manually edited. If an ACCEPT attribute is present, the browser might constrain the file patterns prompted for to match those with the corresponding appropriate file extensions for the platform. 3.2 Action on submit When the user completes the form, and selects the SUBMIT element, the browser should send the form data and the content of the selected files. The encoding type application/x-www-form-urlencoded is inefficient for sending large quantities of binary data or text containing non-ASCII characters. Thus, a new media type, multipart/form-data, is proposed as a way of efficiently sending the values associated with a filled-out form from client to server. 3.3 use of multipart/form-data The definition of multipart/form-data is included in section 7. A boundary is selected that does not occur in any of the data. (This selection is sometimes done probabilisticly.) Each field of the form is sent, in the order in which it occurs in the form, as a part of the multipart stream. Each part identifies the INPUT name within the original HTML form. Each part should be labelled with an appropriate content-type if the media type is known (e.g., inferred from the file extension or operating system typing information) or as application/octet-stream. If multiple files are selected, they should be transferred together using the multipart/mixed format. While the HTTP protocol can transport arbitrary BINARY data, the default for mail transport (e.g., if the ACTION is a "mailto:" URL) is the 7BIT encoding. The value supplied for a part may need to be encoded and the "content-transfer-encoding" header supplied if the value does not conform to the default encoding. [See section 5 of RFC 1521 for more details.] The original local file name may be supplied as well, either as a 'filename' parameter either of the 'content-disposition: form-data' header or in the case of multiple files in a 'content-disposition: file' header of the subpart. The client application should make best effort to supply the file name; if the file name of the client's operating system is not in US-ASCII, the file name might be approximated or encoded using the method of RFC 1522. This is a convenience for those cases where, for example, the uploaded files might contain references to each other, e.g., a TeX file and its .sty auxiliary style description. On the server end, the ACTION might point to a HTTP URL that implements the forms action via CGI. In such a case, the CGI program would note that the content-type is multipart/form-data, parse the various fields (checking for validity, writing the file data to local files for subsequent processing, etc.). 3.4 Interpretation of other attributes The VALUE attribute might be used with <INPUT TYPE=file> tags for a default file name. This use is probably platform dependent. It might be useful, however, in sequences of more than one transaction, e.g., to avoid having the user prompted for the same file name over and over again.