One place for hosting & domains

      Streams

      How To Work with Files Using Streams in Node.js


      The author selected Girls Who Code to receive a donation as part of the Write for DOnations program.

      Introduction

      The concept of streams in computing usually describes the delivery of data in a steady, continuous flow. You can use streams for reading from or writing to a source continuously, thus eliminating the need to fit all the data in memory at once.

      Using streams provides two major advantages. One is that you can use your memory efficiently since you do not have to load all the data into memory before you can begin processing. Another advantage is that using streams is time-efficient. You can start processing data almost immediately instead of waiting for the entire payload. These advantages make streams a suitable tool for large data transfer in I/O operations. Files are a collection of bytes that contain some data. Since files are a common data source in Node.js, streams can provide an efficient way to work with files in Node.js.

      Node.js provides a streaming API in the stream module, a core Node.js module, for working with streams. All Node.js streams are an instance of the EventEmitter class (for more on this, see Using Event Emitters in Node.js). They emit different events you can listen for at various intervals during the data transmission process. The native stream module provides an interface consisting of different functions for listening to those events that you can use to read and write data, manage the transmission life cycle, and handle transmission errors.

      There are four different kinds of streams in Node.js. They are:

      • Readable streams: streams you can read data from.
      • Writable streams: streams you can write data to.
      • Duplex streams: streams you can read from and write to (usually simultaneously).
      • Transform streams: a duplex stream in which the output (or writable stream) is dependent on the modification of the input (or readable stream).

      The file system module (fs) is a native Node.js module for manipulating files and navigating the local file system in general. It provides several methods for doing this. Two of these methods implement the streaming API. They provide an interface for reading and writing files using streams. Using these two methods, you can create readable and writable file streams.

      In this article, you will read from and write to a file using the fs.createReadStream and fs.createWriteStream functions. You will also use the output of one stream as the input of another and implement a custom transform steam. By performing these actions, you will learn to use streams to work with files in Node.js. To demonstrate these concepts, you will write a command-line program with commands that replicate the cat functionality found in Linux-based systems, write input from a terminal to a file, copy files, and transform the content of a file.

      Prerequisites

      To complete this tutorial, you will need:

      Step 1 — Setting up a File Handling Command-Line Program

      In this step, you will write a command-line program with basic commands. This command-line program will demonstrate the concepts you’ll learn later in the tutorial, where you’ll use these commands with the functions you’ll create to work with files.

      To begin, create a folder to contain all your files for this program. In your terminal, create a folder named node-file-streams:

      Using the cd command, change your working directory to the new folder:

      Next, create and open a file called mycliprogram in your favorite text editor. This tutorial uses GNU nano, a terminal text editor. To use nano to create and open your file, type the following command:

      In your text editor, add the following code to specify the shebang, store the array of command-line arguments from the Node.js process, and store the list of commands the application should have.

      node-file-streams/mycliprogram

      #!/usr/bin/env node
      
      const args = process.argv;
      const commands = ['read', 'write', 'copy', 'reverse'];
      

      The first line contains a shebang, which is a path to the program interpreter. Adding this line tells the program loader to parse this program using Node.js.

      When you run a Node.js script on the command-line, several command-line arguments are passed when the Node.js process runs. You can access these arguments using the argv property or the Node.js process. The argv property is an array that contains the command-line arguments passed to a Node.js script. In the second line, you assign that property to a variable called args.

      Next, create a getHelpText function to display a manual of how to use the program. Add the code below to your mycliprogram file:

      node-file-streams/mycliprogram

      ...
      const getHelpText = function() {
          const helpText = `
          simplecli is a simple cli program to demonstrate how to handle files using streams.
          usage:
              mycliprogram <command> <path_to_file>
      
              <command> can be:
              read: Print a file's contents to the terminal
              write: Write a message from the terminal to a file
              copy: Create a copy of a file in the current directory
              reverse: Reverse the content of a file and save its output to another file.
      
              <path_to_file> is the path to the file you want to work with.
          `;
          console.log(helpText);
      }
      

      The getHelpText function prints out the multi-line string you created as the help text for the program. The help text shows the command-line arguments or parameters that the program expects.

      Next, you’ll add the control logic to check the length of args and provide the appropriate response:

      node-file-streams/mycliprogram

      ...
      let command = '';
      
      if(args.length < 3) {
          getHelpText();
          return;
      }
      else if(args.length > 4) {
          console.log('More arguments provided than expected');
          getHelpText();
          return;
      }
      else {
          command = args[2]
          if(!args[3]) {
              console.log('This tool requires at least one path to a file');
              getHelpText();
              return;
          }
      }
      

      In the code snippet above, you have created an empty string command to store the command received from the terminal. The first if block checks whether the length of the args array is less than 3. If it is less than 3, it means that no other additional arguments were passed when running the program. In this case, it prints the help text to the terminal and terminates.

      The else if block checks to see if the length of the args array is greater than 4. If it is, then the program has received more arguments than it needs. The program will print a message to this effect along with the help text and terminate.

      Finally, in the else block, you store the third element or the element in the second index of the args array in the command variable. The code also checks whether there is a fourth element or an element with index = 3 in the args array. If the item does not exist, it prints a message to the terminal indicating that you need a file path to continue.

      Save the file. Then run the application:

      You might get a permission denied error similar to the output below:

      Output

      -bash: ./mycliprogram: Permission denied

      To fix this error, you will need to provide the file with execution permissions, which you can do with the following command:

      Re-run the file again. The output will look similar to this:

      Output

      simplecli is a simple cli program to demonstrate how to handle files using streams. usage: mycliprogram <command> <path_to_file> read: Print a file's contents to the terminal write: Write a message from the terminal to a file copy: Create a copy of a file in the current directory reverse: Reverse the content of a file and save it output to another file.

      Finally, you are going to partially implement the commands in the commands array you created earlier. Open the mycliprogram file and add the code below:

      node-file-streams/mycliprogram

      ...
      switch(commands.indexOf(command)) {
          case 0:
              console.log('command is read');
              break;
          case 1:
              console.log('command is write');
              break;
          case 2:
              console.log('command is copy');
              break;
          case 3:
              console.log('command is reverse');
              break;
          default:
              console.log('You entered a wrong command. See help text below for supported functions');
              getHelpText();
              return;
      }
      

      Any time you enter a command found in the switch statement, the program runs the appropriate case block for the command. For this partial implementation, you print the name of the command to the terminal. If the string is not in the list of commands you created above, the program will print out a message to that effect with the help text. Then the program will terminate.

      Save the file, then re-run the program with the read command and any file name:

      • ./mycliprogram read test.txt

      The output will look similar to this:

      Output

      command is read

      You have now successfully created a command-line program. In the following section, you will replicate the cat functionality as the read command in the application using createReadStream().

      Step 2 — Reading a File with createReadStream()

      The read command in the command-line application will read a file from the file system and print it out to the terminal similar to the cat command in a Linux-based terminal. In this section, you will implement that functionality using createReadStream() from the fs module.

      The createReadStream function creates a readable stream that emits events that you can listen to since it inherits from the EventsEmitter class. The data event is one of these events. Every time the readable stream reads a piece of data, it emits the data event, releasing a piece of data. When used with a callback function, it invokes the callback with that piece of data or chunk, and you can process that data within that callback function. In this case, you want to display that chunk in the terminal.

      To begin, add a text file to your working directory for easy access. In this section and some subsequent ones, you will be using a file called lorem-ipsum.txt. It is a text file containing ~1200 lines of lorem ipsum text generated using the Lorem Ipsum Generator, and it is hosted on GitHub. In your terminal, enter the following command to download the file to your working directory:

      • wget https://raw.githubusercontent.com/do-community/node-file-streams/999e66a11cd04bc59843a9c129da759c1c515faf/lorem-ipsum.txt

      To replicate the cat functionality in your command-line application, you’ll need to import the fs module because it contains the createReadStream function you need. To do this, open the mycliprogram file and add this line immediately after the shebang:

      node-file-streams/mycliprogram

      #!/usr/bin/env node
      
      const fs = require('fs');
      

      Next, you will create a function below the switch statement called read() with a single parameter: the file path for the file you want to read. This function will create a readable stream from that file and listen for the data event on that stream.

      node-file-streams/mycliprogram

      ...
      function read(filePath) {
          const readableStream = fs.createReadStream(filePath);
      
          readableStream.on('error', function (error) {
              console.log(`error: ${error.message}`);
          })
      
          readableStream.on('data', (chunk) => {
              console.log(chunk);
          })
      }
      

      The code also checks for errors by listening for the error event. When an error occurs, an error message will print to the terminal.

      Finally, you should replace console.log() with the read() function in the first case block case 0 as shown in the code block below:

      node-file-streams/mycliprogram

      ...
      switch (command){
          case 0:
              read(args[3]);
              break;
          ...
      }
      

      Save the file to persist the new changes and run the program:

      • ./mycliprogram read lorem-ipsum.txt

      The output will look similar to this:

      Output

      <Buffer 0a 0a 4c 6f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 73 69 74 20 61 6d 65 74 2c 20 63 6f 6e 73 65 63 74 65 74 75 72 20 61 64 69 70 69 73 63 69 ... > ... <Buffer 76 69 74 61 65 20 61 6e 74 65 20 66 61 63 69 6c 69 73 69 73 20 6d 61 78 69 6d 75 73 20 75 74 20 69 64 20 73 61 70 69 65 6e 2e 20 50 65 6c 6c 65 6e 74 ... >

      Based on the output above, you can see that the data was read in chunks or pieces, and these pieces of data are of the Buffer type. For the sake of brevity, the terminal output above shows only two chunks, and the ellipsis indicates that there are several buffers in between the chunks shown here. The larger the file, the greater the number of buffers or chunks.

      To return the data in a human-readable format, you will set the encoding type of the data by passing the string value of the encoding type you want as a second argument to the createReadStream() function. In the second argument to the createReadStream() function, add the following highlighted code to set the encoding type to utf8.

      node-file-streams/mycliprogram

      
      ...
      const readableStream = fs.createReadStream(filePath, 'utf8')
      ...
      

      Re-running the program will display the contents of the file in the terminal. The program prints the lorem ipsum text from the lorem-ipsum.txt file line by line as it appears in the file.

      Output

      Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean est tortor, eleifend et enim vitae, mattis condimentum elit. In dictum ex turpis, ac rutrum libero tempus sed... ... ...Quisque nisi diam, viverra vel aliquam nec, aliquet ut nisi. Nullam convallis dictum nisi quis hendrerit. Maecenas venenatis lorem id faucibus venenatis. Suspendisse sodales, tortor ut condimentum fringilla, turpis erat venenatis justo, lobortis egestas massa massa sed magna. Phasellus in enim vel ante viverra ultricies.

      The output above shows a small fraction of the content of the file printed to the terminal. When you compare the terminal output with the lorem-ipsum.txt file, you will see that the content is the same and takes the same formatting as the file, just like with the cat command.

      In this section, you implemented the cat functionality in your command-line program to read the content of a file and print it to the terminal using the createReadStream function. In the next step, you will create a file based on the input from the terminal using createWriteStream().

      Step 3 — Writing to a File with createWriteStream()

      In this section, you will write input from the terminal to a file using createWriteStream(). The createWriteStream function returns a writable file stream that you can write data to. Like the readable stream in the previous step, this writable stream emits a set of events like error, finish, and pipe. Additionally, it provides the write function for writing data to the stream in chunks or bits. The write function takes in the chunk, which could be a string, a Buffer, <Uint8Array>, or any other JavaScript value. It also allows you to specify an encoding type if the chunk is a string.

      To write input from a terminal to a file, you will create a function called write in your command-line program. In this function, you will create a prompt that receives input from the terminal (until the user terminates it) and writes the data to a file.

      First, you will need to import the readline module at the top of the mycliprogram file. The readline module is a native Node.js module that you can use to receive data from a readable stream like the standard input (stdin) or your terminal one line at a time. Open your mycliprogram file and add the highlighted line :

      node-file-streams/mycliprogram

      #!/usr/bin/env node
      
      const fs = require('fs');
      const readline = require('readline');
      

      Then, add the following code below the read() function.

      node-file-streams/mycliprogram

      ...
      function write(filePath) {
          const writableStream = fs.createWriteStream(filePath);
      
          writableStream.on('error',  (error) => {
              console.log(`An error occured while writing to the file. Error: ${error.message}`);
          });
      }
      

      Here, you are creating a writable stream with the filePath parameter. This file path will be the command-line argument after the write word. You are also listening for the error event if anything goes wrong (for example, if you provide a filePath that does not exist).

      Next, you will write the prompt to receive a message from the terminal and write it to the specified filePath using the readline module you imported earlier. To create a readline interface, a prompt, and to listen for the line event, update the write function as shown in the block:

      node-file-streams/mycliprogram

      ...
      function write(filePath) {
          const writableStream = fs.createWriteStream(filePath);
      
          writableStream.on('error',  (error) => {
              console.log(`An error occured while writing to the file. Error: ${error.message}`);
          });
      
          const rl = readline.createInterface({
              input: process.stdin,
              output: process.stdout,
              prompt: 'Enter a sentence: '
          });
      
          rl.prompt();
      
          rl.on('line', (line) => {
              switch (line.trim()) {
                  case 'exit':
                      rl.close();
                      break;
                  default:
                      sentence = line + 'n'
                      writableStream.write(sentence);
                      rl.prompt();
                      break;
              }
          }).on('close', () => {
              writableStream.end();
              writableStream.on('finish', () => {
                  console.log(`All your sentences have been written to ${filePath}`);
              })
              setTimeout(() => {
                  process.exit(0);
              }, 100);
          });
      }
      

      You created a readline interface (rl) that allows the program to read the standard input (stdin) from your terminal on a line-by-line basis and write a specified prompt string to standard output (stdout). You also called the prompt() function to write the configured prompt message to a new line and to allow the user to provide additional input.

      Then you chained two event listeners together on the rl interface. The first one listens for the line event emitted each time the input stream receives an end-of-line input. This input could be a line feed character (n), the carriage return character (r), or both characters together (rn), and it usually occurs when you press the ENTER or return key on your computer. Therefore, any time you press either of these keys while typing in the terminal, the line event is emitted. The callback function receives a string containing the single line of input line.

      You trimmed the line and checked to see if it is the word exit. If not, the program will add a new line character to line and write the sentence to the filePath using the .write() function. Then you called the prompt function to prompt the user to enter another line of text. If the line is exit, the program calls the close function on the rl interface. The close function closes the rl instance and releases the standard input (stdin) and output (stdout) streams.

      This function brings us to the second event you listened for on the rl instance: the close event. This event is emitted when you call rl.close(). After writing data to a stream, you have to call the end function on the stream to tell your program that it should no longer write data to the writable stream. Doing this will ensure that the data is completely flushed to your output file. Therefore, when you type the word exit, you close the rl instance and stop your writable stream by calling the end function.

      To provide feedback to the user that the program has successfully written all the text from the terminal to the specified filePath, you listened for the finish event on writableStream. In the callback function, you logged a message to the terminal to inform the user when writing is complete. Finally, you exited the process after 100ms to provide enough time for the finish event to provide feedback.

      Finally, to call this function in your mycliprogram, replace the console.log statement in the case 1 block in the switch statement with the new write function, as shown here:

      node-file-streams/mycliprogram

      ...
      switch (command){
          ...
      
          case 1:
              write(args[3]);
              break;
      
          ...
      }
      

      Save the file containing the new changes. Then run the command-line application in your terminal with the write command.

      • ./mycliprogram write output.txt

      At the Enter a sentence prompt, add any input you’d like. After a couple of entries, type exit.

      The output will look similar to this (with your input displaying instead of the highlighted lines):

      Output

      Enter a sentence: Twinkle, twinkle, little star Enter a sentence: How I wonder what you are Enter a sentence: Up above the hills so high Enter a sentence: Like a diamond in the sky Enter a sentence: exit All your sentences have been written to output.txt

      Check output.txt to see the file content using the read command you created earlier.

      • ./mycliprogram read output.txt

      The terminal output should contain all the text you have typed into the command except exit. Based on the input above, the output.txt file has the following content:

      Output

      Twinkle, twinkle, little star How I wonder what you are Up above the hills so high Like a diamond in the sky

      In this step, you wrote to a file using streams. Next, you will implement the function that copies files in your command-line program.

      Step 4 — Copying Files Using pipe()

      In this step, you will use the pipe function to create a copy of a file using streams. Although there are other ways to copy files using streams, using pipe is preferred because you don’t need to manage the data flow.

      For example, one way to copy files using streams would be to create a readable stream for the file, listen to the stream on the data event, and write each chunk from the stream event to a writable stream of the file copy. The snippet below shows an example:

      example.js

      const fs = require('fs');
      const readableStream = fs.createReadStream('lorem-ipsum.txt', 'utf8');
      const writableStream = fs.createWriteStream('lorem-ipsum-copy.txt');
      
      readableStream.on('data', () => {
          writableStream.write(chunk);
      });
      
      writableStream.end();
      

      The disadvantage to this method is that you need to manage the events on both the readable and writeable streams.

      The preferred method for copying files using streams is to use pipe. A plumbing pipe passes water from a source such as a water tank (output) to a faucet or tap (input). Similarly, you use pipe to direct data from an output stream to an input stream. (If you are familiar with the Linux-based bash shell, the pipe | command directs data from one stream to another.)

      Piping in Node.js provides the ability to read data from a source and write it somewhere else without managing the data flow as you would using the first method. Unlike the previous approach, you do not need to manage the events on both the readable and writable streams. For this reason, it is a preferred approach for implementing a copy command in your command-line application that uses streams.

      In the mycliprogram file, you will add a new function invoked when a user runs the program with the copy command-line argument. The copy method will use pipe() to copy from an input file to the destination copy of the file. Create the copy function after the write function as shown below:

      node-file-streams/mycliprogram

      ...
      function copy(filePath) {
          const inputStream = fs.createReadStream(filePath)
          const fileCopyPath = filePath.split('.')[0] + '-copy.' + filePath.split('.')[1]
          const outputStream = fs.createWriteStream(fileCopyPath)
      
          inputStream.pipe(outputStream)
      
          outputStream.on('finish', () => {
              console.log(`You have successfully created a ${filePath} copy. The new file name is ${fileCopyPath}.`);
          })
      }
      

      In the copy function, you created an input or readable stream using fs.createReadStream(). You also generated a new name for the destination, output a copy of the file, and created an output or writable stream using fs.createWriteStream(). Then you piped the data from the inputStream to the outputStream using .pipe(). Finally, you listened for the finish event and printed out a message on a successful file copy.

      Recall that to close a writable stream, you have to call the end() function on the stream. When piping streams, the end() function is called on the writable stream (outputStream) when the readable stream (inputStream) emits the end event. The end() function of the writable stream emits the finish event, and you listen for this event to indicate that you have finished copying a file.

      To see this function in action, open the mycliprogram file and update the case 2 block of the switch statement as shown below:

      node-file-streams/mycliprogram

      ...
      switch (command){
          ...
      
          case 2:
              copy(args[3]);
              break;
      
          ...
      }
      

      Calling the copy function in the case 2 block of the switch statements ensures that when you run the mycliprogram program with the copy command and the required file paths, the copy function is executed.

      Run mycliprogram:

      • ./mycliprogram copy lorem-ipsum.txt

      The output will look similar to this:

      Output

      You have successfully created a lorem-ipsum-copy.txt copy. The new file name is lorem-ipsum-copy.txt.

      Within the node-file-streams folder, you will see a newly added file with the name lorem-ipsum-copy.txt.

      You have successfully added a copy function to your command-line program using pipe. In the next step, you will use streams to modify the content of a file.

      Step 5 — Reversing the Content of a File using Transform()

      In the previous three steps of this tutorial, you have worked with streams using the fs module. In this section, you will modify file streams using the Transform() class from the native stream module, which provides a transform stream. You can use a transform stream to read data, manipulate the data, and provide new data as output. Thus, the output is a ‘transformation’ of the input data. Node.js modules that use transform streams include the crypto module for cryptography and the zlib module with gzip for compressing and uncompressing files.

      You are going to implement a custom transform stream using the Transform() abstract class. The transform stream you create will reverse the contents of a file line by line, which will demonstrate how to use transform streams to modify the content of a file as you want.

      In the mycliprogram file, you will add a reverse function that the program will call when a user passes the reverse command-line argument.

      First, you need to import the Transform() class at the top of the file below the other imports. Add the highlighted line as shown below:

      mycliprogram

      #!/usr/bin/env node
      ...
      const stream = require('stream');
      const Transform = stream.Transform || require('readable-stream').Transform;
      

      In Node.js versions earlier than v0.10, the Transform abstract class is missing. Therefore, the code block above includes the readable-streams polyfill so that this program can work with earlier versions of Node.js. If the Node.js version is > 0.10 the program uses the abstract class, and if not, it uses the polyfill.

      Note: If you are using a Node.js version < 0.10, you will have to run npm init -y to create a package.json file and install the polyfill using npm install readable-stream to your working directory for the polyfill to be applied.

      Next, you will create the reverse function right under your copy function. In that function, you will create a readable stream using the filePath parameter, generate a name for the reversed file, and create a writable stream using that name. Then you create reverseStream, an instance of the Transform() class. When you call the Transform() class, you pass in an object containing one function. This important function is the transform function.

      Beneath the copy function, add the code block below to add the reverse function.

      node-file-streams/mycliprogram

      ...
      function reverse(filePath) {
          const readStream = fs.createReadStream(filePath);
          const reversedDataFilePath = filePath.split('.')[0] + '-reversed.'+ filePath.split('.')[1];
          const writeStream = fs.createWriteStream(reversedDataFilePath);
      
          const reverseStream = new Transform({
              transform (data, encoding, callback) {
                  const reversedData = data.toString().split("").reverse().join("");
                  this.push(reversedData);
                  callback();
              }
          });
      
          readStream.pipe(reverseStream).pipe(writeStream).on('finish', () => {
              console.log(`Finished reversing the contents of ${filePath} and saving the output to ${reversedDataFilePath}.`);
          });
      }
      

      The transform function receives three parameters: data, encoding type, and a callback function. Within this function, you converted the data to a string, split the string, reversed the contents of the resultant array, and joined them back together. This process rewrites the data backward instead of forward.

      Next, you connected the readStream to the reverseStream and finally to the writeStream using two pipe() functions. Finally, you listened for the finish event to alert the user when the file contents have been completely reversed.

      You will notice that the code above uses another syntax for listening for the finish event. Instead of listening for the finish event for the writeStream on a new line, you chained the on function to the second pipe function. You can chain some event listeners on a stream. In this case, doing this has the same effect as calling the on('finish') function on the writeStream.

      To wrap things up, replace the console.log statement in the case 3 block of the switch statement with reverse().

      node-file-streams/mycliprogram

      ...
      switch (command){
          ...
      
          case 3:
              reverse(args[3]);
              break;
      
          ...
      }
      

      To test this function, you will use another file containing the names of countries in alphabetical order (countries.csv). You can download it to your working directory by running the command below.

      • wget https://raw.githubusercontent.com/do-community/node-file-streams/999e66a11cd04bc59843a9c129da759c1c515faf/countries.csv

      You can then run mycliprogram.

      • ./mycliprogram reverse countries.csv

      The output will look similar to this:

      Output

      Finished reversing the contents of countries.csv and saving the output to countries-reversed.csv.

      Compare the contents of countries-reversed.csv with countries.csv to see the transformation. Each name is now written backward, and the order of the names has also been reversed (“Afghanistan” is written as “natsinahgfA” and appears last, and “Zimbabwe” is written as “ewbabmiZ” and appears first).

      You have successfully created a custom transform stream. You have also created a command-line program with functions that use streams for file handling.

      Conclusion

      Streams are used in native Node.js modules and in various yarn and npm packages that perform input/output operations because they provide an efficient way to handle data. In this article, you used various stream-based functions to work with files in Node.js. You built a command-line program with read, write, copy, and reverse commands. Then you implemented each of these commands in functions named accordingly. To implement the functions, you used functions like createReadStream, createWriteStream, pipe from the fs module, the createInterface function from the readline module, and finally the abstract Transform() class. Finally, you pieced these functions together in a small command-line program.

      As a next step, you could extend the command-line program you created to include other file system functionality you might want to use locally. A good example could be writing a personal tool to convert data from .tsv stream source to .csv or attempting to replicate the wget command you used in this article to download files from GitHub.

      The command-line program you have written handles command-line arguments itself and uses a simple prompt to get user input. You can learn more about building more robust and maintainable command-line applications by following How To Handle Command-line Arguments in Node.js Scripts and How To Create Interactive Command-line Prompts with Inquirer.js.

      Additionally, Node.js provides extensive documentation on the various Node.js stream module classes, methods, and events you might need for your use case.



      Source link