CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Inoculating SSH Against Address-Harvesting Worms

Jaeyeon Jung, Will Stockwell, Hari Balakrishnan & Stuart E. Schechter

Address harvesting is the act of searching a compromised host for addresses of other hosts to attack. SSH, the tool of choice for administering and communicating with mission-critical hosts, security-critical hosts, and even some routers, leaves each user's list of previously contacted hosts open to harvest by anyone who compromises the user's account. Attackers have combined address harvesting with myriad mechanisms to impersonate legitimate users to authenticate to SSH. They have succeeded in breaching systems at major academic, commercial, and government institutions. In this study, we detail the threat posed should attackers automate this mode of attack to create a self-propagating worm. We then present a countermeasure to defend against address harvesting attacks, with an implementation written for OpenSSH. We also present the first study to measure how much information available to attackers who search users' known_hosts databases and who look for unencrypted identity key files. The data show how attacks can spread by compromising only a small fraction of users' credentials. We found that only a minority of users, 37.2% in our study, used pass phrases to protect their identity keys.

Attacks on SSH

SSH servers and user accounts are often configured to trust other hosts to act on their behalf, to authenticate users, or to safely store user credentials. All of these practices are potential targets of attack.

Exploiting trust in host authentication: If an attack comes from a compromised host that is listed in the shosts.equiv or hosts.equiv file in the target server's /etc directory, or the .shosts or .rhosts file of the targeted user, the attacker will be permitted to connect to a target user's account without presenting user credentials. If users place their public identity keys in their authorized_keys files on SSH servers and leave their secret identity key unencrypted on hosts they use as SSH clients, then they are trusting that these accounts and hosts used as clients will not be compromised. If one such client account or host is compromised, then the attacker can read the unencrypted key and use it to authenticate to the target host.

Unauthorized authentication via agent-forwarding: Authentication agents are programs employed by users to authenticate on their behalf. They free users from the need to retype the pass phrases that protect their identity-key credentials each time that they authenticate. A user can configure his agent to authenticate on his behalf when accessing services from an application run on a remote host. However, most SSH agents do not verify that the actions a remote host performs are the actions the user intended to authorize. Thus, when the user believes he is authorizing a CVS transaction he may instead be authorizing an SSH connection to a host targeted by the attacker.

Credential theft: An attacker who can obtain a valid user's credentials can impersonate that user anywhere that these credentials are accepted by various ways --- theft by compromised SSH servers, theft from authentication agents, theft by online and offline dictionary attacks.

Insertion attacks: An attacker may be able to insert his own commands into a user session or insert his own credentials in place of a legitimate user's credentials. The former attack, in which the attacker impersonates the user for part of the SSH session, can be used to perform that latter attack, which allows the attacker to impersonate the user in future sessions. For instance, if the compromised user's home directory is located on a shared file system, an attacker can insert an identity key into the user's authorized_keys file. The SSH server depends on this file to determine which keys the user has authorized to serve as his credentials. The attacker then uses the inserted identity keys to authenticate as that user to other hosts that mount the user's home directory. If the attacker can write to the system password file, he can replace any or all user passwords with those of his choosing.

Identifying Targets

Once an attacker is able to steal a user's credentials or convince the user's authentication agent to authenticate on his behalf, he will need to identify targets on which to attempt to exploit these credentials. If he has a remote exploit, a list of any hosts running SSH servers will do. Attackers could not hope for a better repository of prospective target addresses than that provided by the SSH client's known_hosts database, which is often implemented as a flat file. For each user, the known_hosts database stores addresses of the hosts to which the user has connected, each of which is mapped to the host's public key. This list is sorted in the order in which the hosts were first contacted, allowing the attacker to first focus on those hosts that are newer and less likely to have been moved or retired.

Empirical Data

To better understand how an SSH worm might spread, we have undertaken a multi-institution effort to collect data from users' known_hosts database entries and their overall SSH configuration. We made available a data collection and reporting script, written in Perl, that could be run on each host either by individual users to collect data from their own account or by system administrators to collect data from all user accounts. The data collection and reporting script is available at http://nms.lcs.mit.edu/projects/ssh/.

We have collected known_hosts data from 96 hosts, 14 of which ran the script as root and submitted data from all user accounts. In total, we received 31,446 anonymized known_hosts entries from 2,007 user accounts. For each known_host entry, we compute a bit-wise network distance (32 - the longest prefix length between two hosts) and plot its histogram and cumulative fraction in Figure 1. Figure 1 shows that about 60% of SSH communications appear to be between hosts within the same /16 network. Yet, it is surprising to find that about 40% SSH communications span across /16 networks, which suggests that there is a significant probability that an SSH worm can hop in to an other organization. In fact, these known_hosts entries lead to a total of 8,009 hosts on 88 valid /8 networks (55% of all valid /8 networks).

Figure 1. Bit-wise network distance of hosts that have had a direct SSH communication

The data collection script that was run on these hosts also parsed SSH2 identity key files to see what what fraction of these key files had the encryption flag set. We were quite surprised to see that only 37.2% of 274 key files were encrypted.

Countering Address Harvesting

We modified OpenSSH 3.9 to obfuscate known_hosts database entries in the following way. Our implementation replaces host addresses, whether in the form of domain names or IP addresses, with hashed tokens of the form "<salt--hash>'' where salt is a randomly generated 64 bit number, converted into ASCII text using a base64 encoding. The salt is prepended to the address or hostname before it is hashed via the SHA1 [1] algorithm. As with the salt, the hash is base64 encoded to convert its numerical value to a compact text representation. The use of the random salt prevents attackers from staging dictionary attacks in which a single hashing operation can be used to test a value against more than one token.

salt = base64_encode(gen_random_bits(64))
hash = base64_encode(sha1(salt*address));

When the SSH client is called upon to initiate a new connection, it checks the destination address against the \known_hosts database entry by entry. If the first character of the address stored in the known_hosts entry is a left angle bracket (<), it is assumed to be a hashed token. The destination host address is hashed using the salt extracted from the token, base64 encoded, and then compared to the hash encoded in the token. Matching encodings imply with extremely high probability that the addresses match. To maintain backwards compatibility with earlier SSH implementations, a plaintext comparison between addresses takes place when the first character of the address in the known_hosts database is not a left angle bracket. Since entries in the known_hosts database are created and verified automatically by the SSH client, its behavior will remain unchanged from the user's perspective. We provide two new commands for manipulating the known_hosts file should the user need to do so. remove-knownhost deletes a host entry from known_hosts by name and ssh-showkey returns the key of a host specified by name or address. To speed the transition to hashed host addresses we provide a program, ssh-hostname-encoder, that hashes all of the addresses in an existing known_hosts file. We have also provided a Perl script, convert_known_hosts.pl, that can be run to convert all known_hosts files on a given file system into hashed host address format. The patch can be downloaded from http://nms.lcs.mit.edu/projects/ssh

References

[1] National Institute of Standards and Technology. Secure Hash Standard. FIPS PUB 180-1, 1995

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)