One place for hosting & domains

      Processes

      How To Sandbox Processes With Systemd On Ubuntu 20.04


      The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Sandboxing is a computer security technique that focuses on isolating a program or process from parts of a system that it does not need to interact with during normal operation. When a new program is started it has all of the abilities of the user that it runs as. These abilities are very often much more than the program needs to perform its function. This can lead to security issues when a bad actor manipulates the program to access some of its unused abilities to do something the program would not normally do.

      The purpose of sandboxing is to identify exactly what abilities and resources a program needs, and then block off everything else.

      The system management suite of tools systemd is used on almost all major Linux distributions to start, stop, and manage programs and processes. It has many sandboxing options that restrict how the process it starts accesses the host system, making it more secure.

      The aim of this tutorial is not to create the strictest sandbox environment possible, but rather to use the recommended and easily enabled settings to make your system more secure.

      In this tutorial you will run through a practical demonstration of how to use systemd’s sandboxing techniques on Ubuntu 20.04 for an efficient workflow to implement and to test these techniques. Any process that runs on a Linux system that uses systemd can be made more secure with these techniques.

      Prerequisites

      You will need the following to begin this guide:

      Step 1 — Installing lighttpd

      In this tutorial, we will sandbox the lighttpd web server. lighttpd was not chosen because it is any less secure than other software, but because it is a small program with a single function that is easily sandboxed. This makes it an excellent choice for a learning application.

      Let’s update the system to start:

      Check the packages that will be upgraded on your system before typing y:

      Then install lighttpd:

      • sudo apt install lighttpd

      This installation process will automatically install and enable a systemd service file for lighttpd. This will make lighttpd start on a system reboot.

      Now that we have lighttpd installed and running on our system we’ll get familiar with the systemd tools we will use when we start sandboxing.

      Step 2 — Preparing Your System

      In this step, you will get familiar with the systemd commands that you will use and prepare your system to enable you to efficiently sandbox a process.

      systemd is an umbrella name for a suite of tools that each have different names. The two that you will use are systemctl and journalctl. systemctl manages processes and their service files, while journalctl interacts with the system log.

      systemd uses service files to define how a process will be managed. systemd loads these files from several locations in the file system. The following command will show you the location of the active service file and display any overrides that are in use:

      • sudo systemctl cat process.service

      You need to replace process with the process that you are working on. Here lighttpd is used:

      • sudo systemctl cat lighttpd.service

      This is the output from the previous command:

      Output

      # /lib/systemd/system/lighttpd.service [Unit] Description=Lighttpd Daemon After=network-online.target [Service] Type=simple PIDFile=/run/lighttpd.pid ExecStartPre=/usr/sbin/lighttpd -tt -f /etc/lighttpd/lighttpd.conf ExecStart=/usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd.conf ExecReload=/bin/kill -USR1 $MAINPID Restart=on-failure [Install] WantedBy=multi-user.target

      This output shows that the service file is located at /lib/systemd/system/lighttpd.service and that there are no override options in use. Override options add to or modify the base service file. You will use overrides to sandbox lighttpd with a dedicated override file.

      Override files are located at /etc/systemd/system/process.service.d/override.conf. systemd has a dedicated edit command that will create an override file at the correct location and run systemctl daemon-reload after saving and exiting the editor. The systemctl daemon-reload instructs systemd to use any new configuration you wrote.

      The systemd edit command has the following form:

      • sudo systemctl edit process.service

      When you run this command systemd will usually choose your default CLI editor, but this is not always the case and you may find yourself in vi or even ed. You can configure which editor systemd will use by setting the SYSTEMD_EDITOR shell variable.

      Set this shell variable by adding a line to your ~/.bashrc file. Open this file with a text editor:

      And add the following line:

      ~/.bashrc

      export SYSTEMD_EDITOR=editor
      

      Change editor to your preferred CLI editor. Here is the line set to use the nano editor:

      ~/.bashrc

      export SYSTEMD_EDITOR=nano
      

      Confirm that this is set after you log out and log in again with the echo command:

      This command will print the name of the editor you set.

      The SYSTEMD_EDITOR shell variable is only set in your user’s shell and not root’s shell that gets opened by sudo. To pass this variable to root’s shell invoke systemctl edit using sudo -E:

      • sudo -E systemctl edit process.service

      The final recommendation will make debugging your sandboxing easier by showing you any errors that your changes have caused. These errors will be recorded by the system log, which is accessed with the journalctl command.

      During your sandboxing, you will make many changes that break the process you are trying to sandbox. For that reason, it is a good idea to open a second terminal and dedicate it to following the system log. This will save time re-opening the system log.

      Follow the system log in the second terminal by running:

      • sudo journalctl -f -u process.service
      • -f: Follow or tail the system log so new lines are displayed immediately.
      • -u process.service: Only show the log lines for the process you are sandboxing.

      The following is what you need to run to print only lighttpd’s errors:

      • sudo journalctl -f -u lighttpd.service

      Next you’ll begin editing the override.conf file and start sandboxing lighttpd.

      Step 3 — Enforcing a User and Group

      In this step, you will set the non-root user that lighttpd will run as.

      In its default configuration, lighttpd starts running as the root user and then changes to the www-data user and group. This is a problem because while lighttpd is running as root it can do anything that root can do—which is anything.

      systemd provides the ability to start and run the process as a non-root user thereby avoiding this problem.

      Return to your first terminal session and begin editing the override file by running:

      • sudo -E systemctl edit lighttpd.service

      Now, add the following lines:

      lighttpd override file

      [Service]
      User=www-data
      Group=www-data
      
      • [Service]: Tells systemd that the following options should be applied to the [Service] section.
      • User=www-data: Defines the user to start the process as.
      • Group=www-data: Defines the group to start the process as.

      Next, save and exit the editor and restart lighttpd with the following command:

      • sudo systemctl restart lighttpd.service

      lighttpd will not be able to start because it was using the root authority to write a PID file to a location owned by root. The www-data user is not able to write to directories owned by root. This problem is indicated in the system log that will appear in your second terminal session:

      journalctl error message

      Aug 29 11:37:35 systemd lighttpd[7097]: 2020-08-29 11:37:35: (server.c.1233) opening pid-file failed: /run/lighttpd.pid Permission denied

      Resolving this issue follows the process of sandboxing which is:

      1. Implement a sandbox restriction.
      2. Restart the process and check for errors.
      3. Fix any errors.

      Next you’ll resolve the PID file issue while still enforcing the user and group restrictions you set in this section.

      Step 4 — Managing the PID File

      A PID file is a file that contains the PID or Process Identification Number of a running process. Long-running programs like lighttpd use them to manage their own processes. The problem that you encountered in the previous section was that lighttpd was unable to write its PID file to /run/lighttpd.pid, because /run/ is owned by root.

      systemd has the RuntimeDirectory option for this problem, which you will use to give lighttpd a location that it can write its PID file to.

      The RuntimeDirectory option allows you to specify a directory under /run/ that will be created with the user and group you set in Step 3 when systemd starts lighttpd. lighttpd will be able to write its PID into this directory without needing root’s authority.

      First, open and edit the override file with the same command that you used in Step 3:

      • sudo -E systemctl edit lighttpd.service

      Next, add the following line under the two lines that you already added to the override file:

      lighttpd override file

      RuntimeDirectory=lighttpd
      

      Save and exit the editor.

      You do not add the full path to the directory with the RuntimeDirectory, only the name of the directory under /run/. In this case, the directory that systemd will create is /run/lighttpd/.

      You now need to configure lighttpd to write its PID file into the new directory /run/lighttpd/ instead of /run/.

      Open lighttpd’s configuration file with a text editor:

      • sudo nano /etc/lighttpd/lighttpd.conf

      Change the following line:

      /etc/lighttpd/lighttpd.conf

      server.pid-file             = "/run/lighttpd.pid"
      

      To:

      /etc/lighttpd/lighttpd.conf

      server.pid-file             = "/run/lighttpd/lighttpd.pid"
      

      Save and exit the editor.

      Now, restart lighttpd:

      • sudo systemctl restart lighttpd.service

      It won’t start because it is unable to do something that needs one of root’s capabilities. Next you’ll resolve this new issue.

      Step 5 — Borrowing root’s Capabilities

      The following line in the system log explains the issue that stopped lighttpd starting:

      journalctl error message

      Aug 29 12:07:22 systemd lighttpd[7220]: 2020-08-29 12:07:22: (network.c.311) can't bind to socket: 0.0.0.0:80 Permission denied

      Only root can open a network port below number 1024. lighttpd is trying to open the HTTP port 80, but it is being denied because the www-data user cannot do that.

      The issue is resolved by giving the lighttpd process a small part of root’s power—that is, to open ports below 1024.

      root’s “power” is divided into abilities called “capabilities”. The root user has every capability and can therefore do anything. Breaking root’s power up into capabilities means that they can be given individually to non-root processes. This allows that process to do something that would have required a full root user, but a normal user can now do with one of root’s capabilities.

      The systemd option to give a process one or more of root’s capabilities is the AmbientCapabilities option.

      Open the override file:

      • sudo -E systemctl edit lighttpd.service

      Then add the following line under the lines you already added:

      lighttpd override file

      AmbientCapabilities=CAP_NET_BIND_SERVICE
      

      The CAP_NET_BIND_SERVICE capability allows a process to open ports under 1024.

      Save and exit the file.

      lighttpd will now be able to start.

      You now have a working lighttpd web server that you have made more secure than its default configuration. There are more sandboxing options provided by systemd that you can use to make your target process even more secure. We will explore some of these in the following sections.

      In the next step, you will restrict what lighttpd can access in the file system.

      Step 6 — Locking Down the Filesystem

      The lighttpd process runs as the www-data user and so can access any file on the system that www-data has permission to read and write to. In the case of www-data that isn’t very much, but still more than lighttpd needs.

      The first and easiest sandbox setting is the ProtectHome option. This option stops the process from reading or writing to anything under /home/. lighttpd does not need access to anything under /home/ so implementing this will protect all of your private files without affecting lighttpd.

      Open the override file:

      • sudo -E systemctl edit lighttpd.service

      Then add the following line at the bottom of the file:

      lighttpd override file

      ProtectHome=true
      

      Save and exit the editor then restart lighttpd to check that it is working as you expect with the following command:

      • sudo systemctl restart lighttpd.service

      You have protected /home/, but that still leaves the rest of the file system. This is taken care of with the ProtectSystem option, which stops a process from writing to parts of the file system.

      The ProtectSystem option has three settings that offer increasing levels of protection. They are as follows:

      • true: Sets the following directories to read only:
      • full: Sets the following directories to read only:
      • strict: Sets the following directories to read only:

      A higher level of protection is more secure so set the ProtectSystem option to strict by adding the following line to the override file:

      lighttpd override file

      ProtectSystem=strict
      

      Save and exit the editor and restart lighttpd with the following command:

      • sudo systemctl restart lighttpd.service

      lighttpd will not be able to start because it needs to write its log files to /var/log/lighttpd/ and the strict setting forbids that. The following line in the system log shows the problem:

      journalctl error message

      Aug 29 12:44:41 systemd lighttpd[7417]: 2020-08-29 12:44:41: (server.c.752) opening errorlog '/var/log/lighttpd/error.log' failed: Read-only file system

      This issue was anticipated by systemd with the LogsDirectory option. It takes the name of a directory under /var/log/ that the process is permitted to write its logs into.

      Open the override file again in your first terminal session:

      • sudo -E systemctl edit lighttpd.service

      The lighttpd log directory is /var/log/lighttpd/ so add the following line to the bottom of the override file:

      lighttpd override file

      LogsDirectory=lighttpd
      

      Save and exit the editor and restart lighttpd:

      • sudo systemctl restart lighttpd.service

      lighttpd will now be able to start and run.

      Note: If you are sandboxing a process other than lighttpd and want to allow your process to write access to a specific directory outside of /var/log/ use the ReadWritePaths option.

      In the next step, you will limit how the lighttpd process can interact with the rest of the system by restricting the system calls it is allowed to make.

      Step 7 — Restricting System Calls

      A system call is how a program requests something from the kernel. The number of system calls is quite large and includes actions like reading, writing, and deleting files, hardware related tasks like mounting a file system, spawning process, rebooting, and many more.

      systemd has created groups of system calls that processes, like lighttpd, typically use and which exclude calls that they do not. The blocked system calls are things like mounting a file system and rebooting the system, which lighttpd never needs to do.

      First, open the override file:

      • sudo -E systemctl edit lighttpd.service

      Add the following line to the bottom of the file to use the SystemCallFilter option to set the @system-service group:

      lighttpd override file

      SystemCallFilter=@system-service
      

      Save and exit the editor and restart lighttpd:

      • sudo systemctl restart lighttpd.service

      In the next section, you will apply the remaining recommended sandboxing options.

      Step 8 — Implementing Further Options

      The systemd documentation recommends the following options are enabled for long-running, networked processes like lighttpd. These settings are all optional, but each one makes the process you are sandboxing more secure and should be used if you can.

      You should enable these options one at a time and restart your process after each one. If you add them all at once debugging a problem will be much harder.

      The recommended options following are accompanied by a brief description of what they do. Add these lines to your override file under the lines you have already added:

      lighttpd override file

      NoNewPrivileges=true
      

      This option stops the sandboxed process and any of its children from obtaining new privileges.

      lighttpd override file

      ProtectKernelTunables=true
      

      This option stops the process from changing any kernel variables.

      lighttpd override file

      ProtectKernelModules=true
      

      This option stops the process from loading or unloading kernel modules.

      lighttpd override file

      ProtectKernelLogs=true
      

      This option stops the process from reading and writing directly to the kernel log. It must use the system log application to record any log messages.

      lighttpd override file

      ProtectControlGroups=true
      

      This option stops the process from modifying the system control groups.

      lighttpd override file

      MemoryDenyWriteExecute=true
      

      This option stops the process from modifying any code that is running in the system’s memory.

      lighttpd override file

      RestrictSUIDSGID=true
      

      This option stops the process from setting the set-user-ID (SUID) or set-group-ID (SGID) on files or directories. This ability can be abused to elevate privileges.

      lighttpd override file

      KeyringMode=private
      

      This option stops the process from accessing the kernel keyring of other processes that are running as the same user.

      lighttpd override file

      ProtectClock=true
      

      This option stops the process from changing the hardware and software system clocks.

      lighttpd override file

      RestrictRealtime=true
      

      This option stops the process from enabling real-time scheduling that can be abused to overload the CPU.

      lighttpd override file

      PrivateDevices=true
      

      This option stops the process from accessing physical devices attached to the system such as storage devices or USB devices.

      lighttpd override file

      PrivateTmp=true
      

      This option forces the process to use private /tmp/ and /var/tmp/ directories. This stops the process from being able to read other program’s temporary files that are stored in those shared system directories.

      lighttpd override file

      ProtectHostname=true
      

      This option stops the process from changing the system’s hostname.

      The process that you have sandboxed is now much more secure than it was in its default configuration. You can now take these techniques and use them for any other processes you need to secure on your Linux system.

      Conclusion

      In this article, you made the lighttpd program more secure by using the systemd sandboxing options. You can use these techniques with any process that systemd manages allowing you to continue to improve the security of your system.

      The entire list of sandboxing and other security options are found in systemd’s online documenation. Also, check out further security topics on the DigitalOcean Community.



      Source link

      How To Launch Child Processes in Node.js


      The author selected the COVID-19 Relief Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      When a user executes a single Node.js program, it runs as a single operating system (OS) process that represents the instance of the program running. Within that process, Node.js executes programs on a single thread. As mentioned earlier in this series with the How To Write Asynchronous Code in Node.js tutorial, because only one thread can run on one process, operations that take a long time to execute in JavaScript can block the Node.js thread and delay the execution of other code. A key strategy to work around this problem is to launch a child process, or a process created by another process, when faced with long-running tasks. When a new process is launched, the operating system can employ multiprocessing techniques to ensure that the main Node.js process and the additional child process run concurrently, or at the same time.

      Node.js includes the child_process module, which has functions to create new processes. Aside from dealing with long-running tasks, this module can also interface with the OS and run shell commands. System administrators can use Node.js to run shell commands to structure and maintain their operations as a Node.js module instead of shell scripts.

      In this tutorial, you will create child processes while executing a series of sample Node.js applications. You’ll create processes with the child_process module by retrieving the results of a child process via a buffer or string with the exec() function, and then from a data stream with the spawn() function. You’ll finish by using fork() to create a child process of another Node.js program that you can communicate with as it’s running. To illustrate these concepts, you will write a program to list the contents of a directory, a program to find files, and a web server with multiple endpoints.

      Prerequisites

      Step 1 — Creating a Child Process with exec()

      Developers commonly create child processes to execute commands on their operating system when they need to manipulate the output of their Node.js programs with a shell, such as using shell piping or redirection. The exec() function in Node.js creates a new shell process and executes a command in that shell. The output of the command is kept in a buffer in memory, which you can accept via a callback function passed into exec().

      Let’s begin creating our first child processes in Node.js. First, we need to set up our coding environment to store the scripts we’ll create throughout this tutorial. In the terminal, create a folder called child-processes:

      Enter that folder in the terminal with the cd command:

      Create a new file called listFiles.js and open the file in a text editor. In this tutorial we will use nano, a terminal text editor:

      We’ll be writing a Node.js module that uses the exec() function to run the ls command. The ls command list the files and folders in a directory. This program takes the output from the ls command and displays it to the user.

      In the text editor, add the following code:

      ~/child-processes/listFiles.js

      const { exec } = require('child_process');
      
      exec('ls -lh', (error, stdout, stderr) => {
        if (error) {
          console.error(`error: ${error.message}`);
          return;
        }
      
        if (stderr) {
          console.error(`stderr: ${stderr}`);
          return;
        }
      
        console.log(`stdout:n${stdout}`);
      });
      

      We first import the exec() command from the child_process module using JavaScript destructuring. Once imported, we use the exec() function. The first argument is the command we would like to run. In this case, it’s ls -lh, which lists all the files and folders in the current directory in long format, with a total file size in human-readable units at the top of the output.

      The second argument is a callback function with three parameters: error, stdout, and stderr. If the command failed to run, error will capture the reason why it failed. This can happen if the shell cannot find the command you’re trying to execute. If the command is executed successfully, any data it writes to the standard output stream is captured in stdout, and any data it writes to the standard error stream is captured in stderr.

      Note: It’s important to keep the difference between error and stderr in mind. If the command itself fails to run, error will capture the error. If the command runs but returns output to the error stream, stderr will capture it. The most resilient Node.js programs will handle all possible outputs for a child process.

      In our callback function, we first check if we received an error. If we did, we display the error’s message (a property of the Error object) with console.error() and end the function with return. We then check if the command printed an error message and return if so. If the command successfully executes, we log its output to the console with console.log().

      Let’s run this file to see it in action. First, save and exit nano by pressing CTRL+X.

      Back in your terminal, run your application with the node command:

      Your terminal will display the following output:

      Output

      stdout: total 4.0K -rw-rw-r-- 1 sammy sammy 280 Jul 27 16:35 listFiles.js

      This lists the contents of the child-processes directory in long format, along with the size of the contents at the top. Your results will have your own user and group in place of sammy. This shows that the listFiles.js program successfully ran the shell command ls -lh.

      Now let’s look at another way to execute concurrent processes. Node.js’s child_process module can also run executable files with the execFile() function. The key difference between the execFile() and exec() functions is that the first argument of execFile() is now a path to an executable file instead of a command. The output of the executable file is stored in a buffer like exec(), which we access via a callback function with error, stdout, and stderr parameters.

      Note: Scripts in Windows such as .bat and .cmd files cannot be run with execFile() because the function does not create a shell when running the file. On Unix, Linux, and macOS, executable scripts do not always need a shell to run. However, a Windows machines needs a shell to execute scripts. To execute script files on Windows, use exec(), since it creates a new shell. Alternatively, you can use spawn(), which you’ll use later in this Step.

      However, note that you can execute .exe files in Windows successfully using execFile(). This limitation only applies to script files that require a shell to execute.

      Let’s begin by adding an executable script for execFile() to run. We’ll write a bash script that downloads the Node.js logo from the Node.js website and Base64 encodes it to convert its data to a string of ASCII characters.

      Create a new shell script file called processNodejsImage.sh:

      • nano processNodejsImage.sh

      Now write a script to download the image and base64 convert it:

      ~/child-processes/processNodejsImage.sh

      #!/bin/bash
      curl -s https://nodejs.org/static/images/logos/nodejs-new-pantone-black.svg > nodejs-logo.svg
      base64 nodejs-logo.svg
      

      The first statement is a shebang statement. It’s used in Unix, Linux, and macOS when we want to specify a shell to execute our script. The second statement is a curl command. The cURL utility, whose command is curl, is a command-line tool that can transfer data to and from a server. We use cURL to download the Node.js logo from the website, and then we use redirection to save the downloaded data to a new file nodejs-logo.svg. The last statement uses the base64 utility to encode the nodejs-logo.svg file we downloaded with cURL. The script then outputs the encoded string to the console.

      Save and exit before continuing.

      In order for our Node program to run the bash script, we have to make it executable. To do this, run the following:

      • chmod u+x processNodejsImage.sh

      This will give your current user the permission to execute the file.

      With our script in place, we can write a new Node.js module to execute it. This script will use execFile() to run the script in a child process, catching any errors and displaying all output to console.

      In your terminal, make a new JavaScript file called getNodejsImage.js:

      Type the following code in the text editor:

      ~/child-processes/getNodejsImage.js

      const { execFile } = require('child_process');
      
      execFile(__dirname + '/processNodejsImage.sh', (error, stdout, stderr) => {
        if (error) {
          console.error(`error: ${error.message}`);
          return;
        }
      
        if (stderr) {
          console.error(`stderr: ${stderr}`);
          return;
        }
      
        console.log(`stdout:n${stdout}`);
      });
      

      We use JavaScript destructuring to import the execFile() function from the child_process module. We then use that function, passing the file path as the first name. __dirname contains the directory path of the module in which it is written. Node.js provides the __dirname variable to a module when the module runs. By using __dirname, our script will always find the processNodejsImage.sh file across different operating systems, no matter where we run getNodejsImage.js. Note that for our current project setup, getNodejsImage.js and processNodejsImage.sh must be in the same folder.

      The second argument is a callback with the error, stdout, and stderr parameters. Like with our previous example that used exec(), we check for each possible output of the script file and log them to the console.

      In your text editor, save this file and exit from the editor.

      In your terminal, use node to execute the module:

      Running this script will produce output like this:

      Output

      stdout: PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2aWV3Qm94PSIwIDAgNDQyLjQgMjcwLjkiPjxkZWZzPjxsaW5lYXJHcmFkaWVudCBpZD0iYiIgeDE9IjE4MC43IiB5MT0iODAuNyIge ...

      Note that we truncated the output in this article because of its large size.

      Before base64 encoding the image, processNodejsImage.sh first downloads it. You can also verify that you downloaded the image by inspecting the current directory.

      Execute listFiles.js to find the updated list of files in our directory:

      The script will display content similar to the following on the terminal:

      Output

      stdout: total 20K -rw-rw-r-- 1 sammy sammy 316 Jul 27 17:56 getNodejsImage.js -rw-rw-r-- 1 sammy sammy 280 Jul 27 16:35 listFiles.js -rw-rw-r-- 1 sammy sammy 5.4K Jul 27 18:01 nodejs-logo.svg -rwxrw-r-- 1 sammy sammy 129 Jul 27 17:56 processNodejsImage.sh

      We’ve now successfully executed processNodejsImage.sh as a child process in Node.js using the execFile() function.

      The exec() and execFile() functions can run commands on the operating system’s shell in a Node.js child process. Node.js also provides another method with similar functionality, spawn(). The difference is that instead of getting the output of the shell commands all at once, we get them in chunks via a stream. In the next section we’ll use the spawn() command to create a child process.

      Step 2 — Creating a Child Process with spawn()

      The spawn() function runs a command in a process. This function returns data via the stream API. Therefore, to get the output of the child process, we need to listen for stream events.

      Streams in Node.js are instances of event emitters. If you would like to learn more about listening for events and the foundations of interacting with streams, you can read our guide on Using Event Emitters in Node.js.

      It’s often a good idea to choose spawn() over exec() or execFile() when the command you want to run can output a large amount of data. With a buffer, as used by exec() and execFile(), all the processed data is stored in the computer’s memory. For large amounts of data, this can degrade system performance. With a stream, the data is processed and transferred in small chunks. Therefore, you can process a large amount of data without using too much memory at any one time.

      Let’s see how we can use spawn() to make a child process. We will write a new Node.js module that creates a child process to run the find command. We will use the find command to list all the files in the current directory.

      Create a new file called findFiles.js:

      In your text editor, begin by calling the spawn() command:

      ~/child-processes/findFiles.js

      const { spawn } = require('child_process');
      
      const child = spawn('find', ['.']);
      

      We first imported the spawn() function from the child_process module. We then called the spawn() function to create a child process that executes the find command. We hold the reference to the process in the child variable, which we will use to listen to its streamed events.

      The first argument in spawn() is the command to run, in this case find. The second argument is an array that contains the arguments for the executed command. In this case, we are telling Node.js to execute the find command with the argument ., thereby making the command find all the files in the current directory. The equivalent command in the terminal is find ..

      With the exec() and execFile() functions, we wrote the arguments along with the command in one string. However, with spawn(), all arguments to commands must be entered in the array. That’s because spawn(), unlike exec() and execFile(), does not create a new shell before running a process. To have commands with their arguments in one string, you need Node.js to create a new shell as well.

      Let’s continue our module by adding listeners for the command’s output. Add the following highlighted lines:

      ~/child-processes/findFiles.js

      const { spawn } = require('child_process');
      
      const child = spawn('find', ['.']);
      
      child.stdout.on('data', data => {
        console.log(`stdout:n${data}`);
      });
      
      child.stderr.on('data', data => {
        console.error(`stderr: ${data}`);
      });
      

      Commands can return data in either the stdout stream or the stderr stream, so you added listeners for both. You can add listeners by calling the on() method of each streams’ objects. The data event from the streams gives us the command’s output to that stream. Whenever we get data on either stream, we log it to the console.

      We then listen to two other events: the error event if the command fails to execute or is interrupted, and the close event for when the command has finished execution, thus closing the stream.

      In the text editor, complete the Node.js module by writing the following highlighted lines:

      ~/child-processes/findFiles.js

      const { spawn } = require('child_process');
      
      const child = spawn('find', ['.']);
      
      child.stdout.on('data', (data) => {
        console.log(`stdout:n${data}`);
      });
      
      child.stderr.on('data', (data) => {
        console.error(`stderr: ${data}`);
      });
      
      child.on('error', (error) => {
        console.error(`error: ${error.message}`);
      });
      
      child.on('close', (code) => {
        console.log(`child process exited with code ${code}`);
      });
      

      For the error and close events, you set up a listener directly on the child variable. When listening for error events, if one occurs Node.js provides an Error object. In this case, you log the error’s message property.

      When listening to the close event, Node.js provides the exit code of the command. An exit code denotes if the command ran successfully or not. When a command runs without errors, it returns the lowest possible value for an exit code: 0. When executed with an error, it returns a non-zero code.

      The module is complete. Save and exit nano with CTRL+X.

      Now, run the code with the node command:

      Once complete, you will find the following output:

      Output

      stdout: . ./findFiles.js ./listFiles.js ./nodejs-logo.svg ./processNodejsImage.sh ./getNodejsImage.js child process exited with code 0

      We find a list of all files in our current directory and the exit code of the command, which is 0 as it ran successfully. While our current directory has a small number of files, if we ran this code in our home directory, our program would list every single file in every accessible folder for our user. Because it has such a potentially large output, using the spawn() function is most ideal as its streams do not require as much memory as a large buffer.

      So far we’ve used functions to create child processes to execute external commands in our operating system. Node.js also provides a way to create a child process that executes other Node.js programs. Let’s use the fork() function to create a child process for a Node.js module in the next section.

      Step 3 — Creating a Child Process with fork()

      Node.js provides the fork() function, a variation of spawn(), to create a child process that’s also a Node.js process. The main benefit of using fork() to create a Node.js process over spawn() or exec() is that fork() enables communication between the parent and the child process.

      With fork(), in addition to retrieving data from the child process, a parent process can send messages to the running child process. Likewise, the child process can send messages to the parent process.

      Let’s see an example where using fork() to create a new Node.js child process can improve the performance of our application. Node.js programs run on a single process. Therefore, CPU intensive tasks like iterating over large loops or parsing large JSON files stop other JavaScript code from running. For certain applications, this is not a viable option. If a web server is blocked, then it cannot process any new incoming requests until the blocking code has completed its execution.

      Let’s see this in practice by creating a web server with two endpoints. One endpoint will do a slow computation that blocks the Node.js process. The other endpoint will return a JSON object saying hello.

      First, create a new file called httpServer.js, which will have the code for our HTTP server:

      We’ll begin by setting up the HTTP server. This involves importing the http module, creating a request listener function, creating a server object, and listening for requests on the server object. If you would like to dive deeper into creating HTTP servers in Node.js or would like a refresher, you can read our guide on How To Create a Web Server in Node.js with the HTTP Module.

      Enter the following code in your text editor to set up an HTTP server:

      ~/child-processes/httpServer.js

      const http = require('http');
      
      const host="localhost";
      const port = 8000;
      
      const requestListener = function (req, res) {};
      
      const server = http.createServer(requestListener);
      server.listen(port, host, () => {
        console.log(`Server is running on http://${host}:${port}`);
      });
      

      This code sets up an HTTP server that will run at http://localhost:8000. It uses template literals to dynamically generate that URL.

      Next, we will write an intentionally slow function that counts in a loop 5 billion times. Before the requestListener() function, add the following code:

      ~/child-processes/httpServer.js

      ...
      const port = 8000;
      
      const slowFunction = () => {
        let counter = 0;
        while (counter < 5000000000) {
          counter++;
        }
      
        return counter;
      }
      
      const requestListener = function (req, res) {};
      ...
      

      This uses the arrow function syntax to create a while loop that counts to 5000000000.

      To complete this module, we need to add code to the requestListener() function. Our function will call the slowFunction() on subpath, and return a small JSON message for the other. Add the following code to the module:

      ~/child-processes/httpServer.js

      ...
      const requestListener = function (req, res) {
        if (req.url === '/total') {
          let slowResult = slowFunction();
          let message = `{"totalCount":${slowResult}}`;
      
          console.log('Returning /total results');
          res.setHeader('Content-Type', 'application/json');
          res.writeHead(200);
          res.end(message);
        } else if (req.url === '/hello') {
          console.log('Returning /hello results');
          res.setHeader('Content-Type', 'application/json');
          res.writeHead(200);
          res.end(`{"message":"hello"}`);
        }
      };
      ...
      

      If the user reaches the server at the /total subpath, then we run slowFunction(). If we are hit at the /hello subpath, we return this JSON message: {"message":"hello"}.

      Save and exit the file by pressing CTRL+X.

      To test, run this server module with node:

      When our server starts, the console will display the following:

      Output

      Server is running on http://localhost:8000

      Now, to test the performance of our module, open two additional terminals. In the first terminal, use the curl command to make a request to the /total endpoint, which we expect to be slow:

      • curl http://localhost:8000/total

      In the other terminal, use curl to make a request to the /hello endpoint like this:

      • curl http://localhost:8000/hello

      The first request will return the following JSON:

      Output

      {"totalCount":5000000000}

      Whereas the second request will return this JSON:

      Output

      {"message":"hello"}

      The request to /hello completed only after the request to /total. The slowFunction() blocked all other code from executing while it was still in its loop. You can verify this by looking at the Node.js server output that was logged in your original terminal:

      Output

      Returning /total results Returning /hello results

      To process the blocking code while still accepting incoming requests, we can move the blocking code to a child process with fork(). We will move the blocking code into its own module. The Node.js server will then create a child process when someone accesses the /total endpoint and listen for results from this child process.

      Refactor the server by first creating a new module called getCount.js that will contain slowFunction():

      Now enter the code for slowFunction() once again:

      ~/child-processes/getCount.js

      const slowFunction = () => {
        let counter = 0;
        while (counter < 5000000000) {
          counter++;
        }
      
        return counter;
      }
      

      Since this module will be a child process created with fork(), we can also add code to communicate with the parent process when slowFunction() has completed processing. Add the following block of code that sends a message to the parent process with the JSON to return to the user:

      ~/child-processes/getCount.js

      const slowFunction = () => {
        let counter = 0;
        while (counter < 5000000000) {
          counter++;
        }
      
        return counter;
      }
      
      process.on('message', (message) => {
        if (message == 'START') {
          console.log('Child process received START message');
          let slowResult = slowFunction();
          let message = `{"totalCount":${slowResult}}`;
          process.send(message);
        }
      });
      

      Let’s break down this block of code. The messages between a parent and child process created by fork() are accessible via the Node.js global process object. We add a listener to the process variable to look for message events. Once we receive a message event, we check if it’s the START event. Our server code will send the START event when someone accesses the /total endpoint. Upon receiving that event, we run slowFunction() and create a JSON string with the result of the function. We use process.send() to send a message to the parent process.

      Save and exit getCount.js by entering CTRL+X in nano.

      Now, let’s modify the httpServer.js file so that instead of calling slowFunction(), it creates a child process that executes getCount.js.

      Re-open httpServer.js with nano:

      First, import the fork() function from the child_process module:

      ~/child-processes/httpServer.js

      const http = require('http');
      const { fork } = require('child_process');
      ...
      

      Next, we are going to remove the slowFunction() from this module and modify the requestListener() function to create a child process. Change the code in your file so it looks like this:

      ~/child-processes/httpServer.js

      ...
      const port = 8000;
      
      const requestListener = function (req, res) {
        if (req.url === '/total') {
          const child = fork(__dirname + '/getCount');
      
          child.on('message', (message) => {
            console.log('Returning /total results');
            res.setHeader('Content-Type', 'application/json');
            res.writeHead(200);
            res.end(message);
          });
      
          child.send('START');
        } else if (req.url === '/hello') {
          console.log('Returning /hello results');
          res.setHeader('Content-Type', 'application/json');
          res.writeHead(200);
          res.end(`{"message":"hello"}`);
        }
      };
      ...
      

      When someone goes to the /total endpoint, we now create a new child process with fork(). The argument of fork() is the path to the Node.js module. In this case, it is the getCount.js file in our current directory, which we receive from __dirname. The reference to this child process is stored in a variable child.

      We then add a listener to the child object. This listener captures any messages that the child process gives us. In this case, getCount.js will return a JSON string with the total number counted by the while loop. When we receive that message, we send the JSON to the user.

      We use the send() function of the child variable to give it a message. This program sends the message START, which begins the execution of slowFunction() in the child process.

      Save and exit nano by entering CTRL+X.

      To test the improvement using fork() made on HTTP server, begin by executing the httpServer.js file with node:

      Like before, it will output the following message when it launches:

      Output

      Server is running on http://localhost:8000

      To test the server, we will need an additional two terminals as we did the first time. You can re-use them if they are still open.

      In the first terminal, use the curl command to make a request to the /total endpoint, which takes a while to compute:

      • curl http://localhost:8000/total

      In the other terminal, use curl to make a request to the /hello endpoint, which responds in a short time:

      • curl http://localhost:8000/hello

      The first request will return the following JSON:

      Output

      {"totalCount":5000000000}

      Whereas the second request will return this JSON:

      Output

      {"message":"hello"}

      Unlike the first time we tried this, the second request to /hello runs immediately. You can confirm by reviewing the logs, which will look like this:

      Output

      Child process received START message Returning /hello results Returning /total results

      These logs show that the request for the /hello endpoint ran after the child process was created but before the child process had finished its task.

      Since we moved the blocking code in a child process using fork(), the server was still able to respond to other requests and execute other JavaScript code. Because of the fork() function’s message passing ability, we can control when a child process begins an activity and we can return data from a child process to a parent process.

      Conclusion

      In this article, you used various functions to create a child process in Node.js. You first created child processes with exec() to run shell commands from Node.js code. You then ran an executable file with the execFile() function. You looked at the spawn() function, which can also run commands but returns data via a stream and does not start a shell like exec() and execFile(). Finally, you used the fork() function to allow for two-way communication between the parent and child process.

      To learn more about the child_process module, you can read the Node.js documentation. If you’d like to continue learning Node.js, you can return to the How To Code in Node.js series, or browse programming projects and setups on our Node topic page.



      Source link