2005 CSAIL Technical Reports Browse through the current year's technical reports. Please note that the listing below is a temporary page until the new system in conjunction with MIT DSpace is launched. Please do not bookmark this page, as the links to the publications below will eventually be transferred to permanent urls within MIT DSpace.
MIT-CSAIL-TR-2005-083 Author[s]: Charles C. Kemp, Aaron Edsinger Visual Tool Tip Detection and Position Estimation for Robotic Manipulation of Unknown Human Tools [PDF] [PS] December 21, 2005 Robots that use human tools could more easily work with people, perform tasks that are important to people, and benefit from human strategies for accomplishing these tasks. For a wide variety of tools and tasks, control of the tool's endpoint is sufficient for its use. In this paper we present a straight-forward method for rapidly detecting the endpoint of an unmodeled tool and estimating its position with respect to the robot's hand. The robot rotates the tool while using optical flow to detect the most rapidly moving image points, and then finds the 3D position with respect to its hand that best explains these noisy 2D detections. The resulting 3D position estimate allows the robot to control the position of the tool endpoint and predict its visual location. We show successful results for this method using a humanoid robot with a variety of traditional tools, including a pen, a hammer, and pliers, as well as more general tools such as a bottle and the robot's own finger.'
MIT-CSAIL-TR-2005-082 Author[s]: T. Serre, M. Kouh, C. Cadieu, U. Knoblich, G. Kreiman, T. Poggio A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex [PDF] [PS] December 19, 2005 We describe a quantitative theory to account for the computations performed by the feedforward path of the ventral stream of visual cortex and the local circuits implementing them. We show that a model instantiating the theory is capable of performing recognition on datasets of complex images at the level of human observers in rapid categorization tasks. We also show that the theory is consistent with (and in some case has predicted) several properties of neurons in V1, V4, IT and PFC. The theory seems sufficiently comprehensive, detailed and satisfactory to represent an interesting challenge for physiologists and modelers: either disprove its basic features or propose alternative theories of equivalent scope. The theory suggests a number of open questions for visual physiology and psychophysics.
MIT-CSAIL-TR-2005-081 Author[s]: Yuri Ivanov, Thomas Serre, Jacob Bouvrie Error Weighted Classifier Combination for Multi-modal Human Identification [PDF] [PS] December 14, 2005 In this paper we describe a technique of classifier combination used in a human identification system. The system integrates all available features from multi-modal sources within a Bayesian framework. The framework allows representing a class of popular classifier combination rules and methods within a single formalism. It relies on a “per-class” measure of confidence derived from performance of each classifier on training data that is shown to improve performance on a synthetic data set. The method is especially relevant in autonomous surveillance setting where varying time scales and missing features are a common occurrence. We show an application of this technique to the real-world surveillance database of video and audio recordings of people collected over several weeks in the office setting.
MIT-CSAIL-TR-2005-080 Author[s]: Ran Canetti, Ling Cheung, Dilsun Kaynar, Moses Liskov, Nancy Lynch, Olivier Pereira, Roberto Segala Using Probabilistic I/O Automata to Analyze an Oblivious Transfer [PDF] [PS] December 14, 2005 We demonstrate how to carry out cryptographic security analysis of distributed protocols within the Probabilistic I/O Automata framework of Lynch, Segala, and Vaandrager. This framework provides tools for arguing rigorously about the concurrency and scheduling aspects of protocols, and about protocols presented at different levels of abstraction. Consequently, it can help in making cryptographic analysis more precise and less susceptible to errors. We concentrate on a relatively simple two-party Oblivious Transfer protocol, in the presence of a semi-honest adversary (essentially, an eavesdropper). For the underlying cryptographic notion of security, we use a version of Canettiís Universally Composable security. In spite of the relative simplicity of the example, the exercise is quite nontrivial. It requires taking many fundamental issues into account, including nondeterministic behavior, scheduling, resource-bounded computation, and computational hardness assumptions for cryptographic primitives.
MIT-CSAIL-TR-2005-079 Author[s]: Leonid Taycher, Gregory Shakhnarovich, David Demirdjian, Trevor Darrell Conditional Random People: Tracking Humans with CRFs and Grid Filters [PDF] [PS] December 1, 2005 We describe a state-space tracking approach based on a Conditional Random Field (CRF) model, where the observation potentials are learned from data. We find functions that embed both state and observation into a space where similarity corresponds to $L_1$ distance, and define an observation potential based on distance in this space. This potential is extremely fast to compute and in conjunction with a grid-filtering framework can be used to reduce a continuous state estimation problem to a discrete one. We show how a state temporal prior in the grid-filter can be computed in a manner similar to a sparse HMM, resulting in real-time system performance. The resulting system is used for human pose tracking in video sequences.
MIT-CSAIL-TR-2005-078 Author[s]: Sameer Ajmani Automatic Software Upgrades for Distributed Systems [PDF] [PS] November 30, 2005 Upgrading the software of long-lived, highly-available distributed systems is difficult. It is not possible to upgrade all the nodes in a system at once, since some nodes may be unavailable and halting the system for an upgrade is unacceptable. Instead, upgrades may happen gradually, and there may be long periods of time when different nodes are running different software versions and need to communicate using incompatible protocols. We present a methodology and infrastructure that address these challenges and make it possible to upgrade distributed systems automatically while limiting service disruption. Our methodology defines how to enable nodes to interoperate across versions, how to preserve the state of a system across upgrades, and how to schedule an upgrade so as to limit service disruption. The approach is modular: defining an upgrade requires understanding only the new software and the version it replaces. The upgrade infrastructure is a generic platform for distributing and installing software while enabling nodes to interoperate across versions. The infrastructure requires no access to the system source code and is transparent: node software is unaware that different versions even exist. We have implemented a prototype of the infrastructure called Upstart that intercepts socket communication using a dynamically-linked C++ library. Experiments show that Upstart has low overhead and works well for both local-area and Internet systems.
MIT-CSAIL-TR-2005-077 Author[s]: Ozlem Uzuner Identifying Expression Fingerprints Using Linguistic Information [PDF] [PS] November 18, 2005 This thesis presents a technology to complement taxation-based policy proposals aimed at addressing the digital copyright problem. The approach presented facilitates identification of intellectual property using expression fingerprints. Copyright law protects expression of content. Recognizing literary works for copyright protection requires identification of the expression of their content. The expression fingerprints described in this thesis use a novel set of linguistic features that capture both the content presented in documents and the manner of expression used in conveying this content. These fingerprints consist of both syntactic and semantic elements of language. Examples of the syntactic elements of expression include structures of embedding and embedded verb phrases. The semantic elements of expression consist of high-level, broad semantic categories. Syntactic and semantic elements of expression enable generation of models that correctly identify books and their paraphrases 82% of the time, providing a significant (approximately 18%) improvement over models that use tfidf-weighted keywords. The performance of models built with these features is also better than models created with standard features used in stylometry (e.g., function words), which yield an accuracy of 62%. In the non-digital world, copyright holders collect revenues by controlling distribution of their works. Current approaches to the digital copyright problem attempt to provide copyright holders with the same kind of control over distribution by employing Digital Rights Management (DRM) systems. However, DRM systems also enable copyright holders to control and limit fair use, to inhibit others' speech, and to collect private information about individual users of digital works. Digital tracking technologies enable alternate solutions to the digital copyright problem; some of these solutions can protect creative incentives of copyright holders in the absence of control over distribution of works. Expression fingerprints facilitate digital tracking even when literary works are DRM- and watermark-free, and even when they are paraphrased. As such, they enable metering popularity of works and make practicable solutions that encourage large-scale dissemination and unrestricted use of digital works and that protect the revenues of copyright holders, for example through taxation-based revenue collection and distribution systems, without imposing limits on distribution.
MIT-CSAIL-TR-2005-076 Author[s]: Gang Zeng, Sylvain Paris, Long Quan, Francois Sillion Accurate and Scalable Surface Representation and Reconstruction from Images [PDF] [PS] November 18, 2005 We introduce a new surface representation, the patchwork, to extend the problem of surface reconstruction from multiple images. A patchwork is the combination of several patches that are built one by one. This design potentially allows the reconstruction of an object of arbitrarily large dimensions while preserving a fine level of detail. We formally demonstrate that this strategy leads to a spatial complexity independent of the dimensions of the reconstructed object, and to a time complexity linear with respect to the object area. The former property ensures that we never run out of storage (memory) and the latter means that reconstructing an object can be done in a reasonable amount of time. In addition, we show that the patchwork representation handles equivalently open and closed surfaces whereas most of the existing approaches are limited to a specific scenario (open or closed surface but not both). Most of the existing optimization techniques can be cast into this framework. To illustrate the possibilities offered by this approach, we propose two applications that expose how it dramatically extends a recent accurate graph-cut technique. We first revisit the popular carving techniques. This results in a well-posed reconstruction problem that still enjoys the tractability of voxel space. We also show how we can advantageously combine several image-driven criteria to achieve a finely detailed geometry by surface propagation. The above properties of the patchwork representation and reconstruction are extensively demonstrated on real image sequences.
MIT-CSAIL-TR-2005-075 Author[s]: Sanjoy Dasgupta, Adam Tauman Kalai, Claire Monteleoni Analysis of Perceptron-Based Active Learning [PDF] [PS] November 17, 2005 We start by showing that in an active learning setting, the Perceptron algorithm needs $\Omega(\frac{1}{\epsilon^2})$ labels to learn linear separators within generalization error $\epsilon$. We then present a simple selective sampling algorithm for this problem, which combines a modification of the perceptron update with an adaptive filtering rule for deciding which points to query. For data distributed uniformly over the unit sphere, we show that our algorithm reaches generalization error $\epsilon$ after asking for just $\tilde{O}(d \log \frac{1}{\epsilon})$ labels. This exponential improvement over the usual sample complexity of supervised learning has previously been demonstrated only for the computationally more complex query-by-committee algorithm.
MIT-CSAIL-TR-2005-074 Author[s]: Claire Monteleoni, Tommi Jaakkola Online Learning of Non-stationary Sequences [PDF] [PS] November 17, 2005 We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. We derive upper and lower relative loss bounds for a class of universal learning algorithms involving a switching dynamics over the choice of the experts. On the basis of the performance bounds we provide the optimal a priori discretization of the switching-rate parameter that governs the switching dynamics. We demonstrate the algorithm in the context of wireless networks.
MIT-CSAIL-TR-2005-073 Author[s]: Alexandr Andoni, Piotr Indyk New LSH-based Algorithm for Approximate Nearest Neighbor [PDF] [PS] November 4, 2005 We present an algorithm for c-approximate nearest neighbor problem in a d-dimensional Euclidean space, achieving query time of O(dn^{1/c^2+o(1)}) and space O(dn + n^{1+1/c^2+o(1)}).
MIT-CSAIL-TR-2005-072 Author[s]: Thomas Wies, Viktor Kuncak, Patrick Lam, Andreas Podelski, Martin Rinard On Field Constraint Analysis [PDF] [PS] November 3, 2005 We introduce field constraint analysis, a new technique for verifying data structure invariants. A field constraint for a field is a formula specifying a set of objects to which the field can point. Field constraints enable the application of decidable logics to data structures which were originally beyond the scope of these logics, by verifying the backbone of the data structure and then verifying constraints on fields that cross-cut the backbone in arbitrary ways. Previously, such cross-cutting fields could only be verified when they were uniquely determined by the backbone, which significantly limited the range of analyzable data structures. Our field constraint analysis permits non-deterministic field constraints on cross-cutting fields, which allows to verify invariants of data structures such as skip lists. Non-deterministic field constraints also enable the verification of invariants between data structures, yielding an expressive generalization of static type declarations. The generality of our field constraints requires new techniques, which are orthogonal to the traditional use of structure simulation. We present one such technique and prove its soundness. We have implemented this technique as part of a symbolic shape analysis deployed in the context of the Hob system for verifying data structure consistency. Using this implementation we were able to verify data structures that were previously beyond the reach of similar techniques.
MIT-CSAIL-TR-2005-071 Author[s]: Matthew Lepinski, Silvio Micali Subcontracted Rational SFE [PDF] [PS] November 2, 2005 In their paper, "Rational Secure Computation and Ideal Mechanism Design," Izmalkov, Lepinski and Micali show that any one-shot mediated game can be simulated by the players themselves, without the help of a trusted mediator, using physical envelopes and a ballot-box. We show that communication between the players is not essential to the ILM protocol. That is, we provide a protocol for rational secure function evaluation (Rational SFE) where the players just send a set of envelopes to a referee who simply performs a sequence of publicly verifiable actions. That is, the players can "subcontract" all of the computation to an untrusted referee. In addition to providing a communication structure that more closely matches the ideal game, our protocol also enables us to better simulate mediated games in which abort is not a dominated action.
MIT-CSAIL-TR-2005-070 Author[s]: Hariharan Rahul, Mangesh Kasbekar, Ramesh Sitaraman, Arthur Berger Towards Realizing the Performance and Availability Benefits of a Global Overlay Network [PDF] [PS] November 1, 2005 Prior analyses of the benefits of routing overlays are based on platforms consisting of nodes located primarily in North America, on the academic Internet, and at the edge of the network. This paper is the first global study of the benefits of overlays on the commercial Internet in terms of round trip latencies and availability, using measurements from diverse ISPs over 1100 locations (77 countries, 630 cities and 6 continents). Our study shows that while overlays provide some improvements in North America, their benefits are especially significant for paths with Asian endpoints. Regarding practical considerations in constructing overlay routes, we show that an algorithm that randomly chooses a small number of alternate redundant paths achieves an availability of over 99.5%. We also propose and evaluate a simple predictive scheme that achieves almost optimal latency using only 2-3 paths, and that this is achievable with surprisingly persistent routing choices.
MIT-CSAIL-TR-2005-069 Author[s]: Huu Hai Nguyen, Martin Rinard Using Cyclic Memory Allocation to Eliminate Memory Leaks [PDF] [PS] October 26, 2005 We present and evaluate a new memory management technique for eliminating memory leaks in programs with dynamic memory allocation. This technique observes the execution of the program on a sequence of training inputs to find m-bounded allocation sites, which have the property that at any time during the execution of the program, the program accesses at most only the last m objects allocated at that site. The technique then transforms the program to use cyclic memory allocation at that site: it preallocates a buffer containing m objects of the type allocated at that site, with each allocation returning the next object in the buffer. At the end of the buffer the allocations wrap back around to the first object. Cyclic allocation eliminates any memory leak at the allocation site - the total amount of memory required to hold all of the objects ever allocated at the site is simply $m$ times the object size. We evaluate our technique by applying it to several widely-used open source programs. Our results show that it is able to successfully eliminate important memory leaks in these programs. A potential concern is that the estimated bounds m may be too small, causing the program to overlay live objects in memory. Our results indicate that our bounds estimation technique is quite accurate in practice, providing incorrect results for only one of the 160 m-bounded sites that it identifies. To evaluate the potential impact of overlaying live objects, we artificially reduce the bounds at $m$-bounded sites and observe the resulting behavior. The resulting overlaying of live objects often does not affect the functionality of the program at all; even when it does impair part of the functionality, the program does not fail and is still able to acceptably deliver the remaining functionality.
MIT-CSAIL-TR-2005-068 Author[s]: Matthew Drake, Hank Hoffmann, Rodric Rabbah, Saman Amarasinghe MPEG-2 in a Stream Programming Language [PDF] [PS] October 22, 2005 Image and video codecs are prevalent in multimedia applications, ranging from embedded systems, to desktop computers, to high-end servers such as HDTV editing consoles. It is not uncommon however that developers create (from scratch) and customize their codec implementations for each of the architecture targets they intend their coders and decoders to run on. This practice is time consuming and error prone, leading to code that is not malleable or portable. In this paper we describe an implementation of the MPEG-2 codec using the StreamIt programming language. StreamIt is an architecture-independent stream language that aims to improve programmer productivity, while concomitantly exposing the inherent parallelism and communication topology of the application. We describe why MPEG is a good match for the streaming programming model, and illustrate the malleability of the implementation using a simple modification to the decoder to support alternate color compression formats. StreamIt allows for modular application development, which also reduces the complexity of the debugging process since stream components can be verified independently. This in turn leads to greater programmer productivity. We implement a fully functional MPEG-2 decoder in StreamIt. The decoder was developed in eight weeks by a single student programmer who did not have any prior experience with MPEG or other video codecs. Many of the MPEG-2 components were subsequently reused to assemble a JPEG codec.
MIT-CSAIL-TR-2005-067 Author[s]: Ross Lippert, Ryan Rifkin Asymptotics of Gaussian Regularized Least-Squares [PDF] [PS] October 21, 2005 We consider regularized least-squares (RLS) with a Gaussian kernel. We prove that if we let the Gaussian bandwidth $\\sigma \\rightarrow \\infty$ while letting the regularization parameter $\\lambda \rightarrow 0$, the RLS solution tends to a polynomial whose order is controlled by the relative rates of decay of $\\frac{1}{\\sigma^2}$ and $\\lambda$: if $\\lambda = \\sigma^{-(2k+1)}$, then, as $\\sigma \\rightarrow \\infty$, the RLS solution tends to the $k$th order polynomial with minimal empirical error. We illustrate the result with an example.
MIT-CSAIL-TR-2005-066 Author[s]: Emina Torlak, Marten van Dijk, Blaise Gassend, Daniel Jackson, Srinivas Devadas Knowledge Flow Analysis for Security Protocols [PDF] [PS] October 19, 2005 Knowledge flow analysis offers a simple and flexible way to find flaws in security protocols. A protocol is described by a collection of rules constraining the propagation of knowledge amongst principals. Because this characterization corresponds closely to informal descriptions of protocols, it allows a succinct and natural formalization; because it abstracts away message ordering, and handles communications between principals and applications of cryptographic primitives uniformly, it is readily represented in a standard logic. A generic framework in the Alloy modelling language is presented, and instantiated for two standard protocols, and a new key management scheme.
MIT-CSAIL-TR-2005-065 Author[s]: Gadi Geiger, Domenic G. Amara Towards the Prevention of Dyslexia [PDF] [PS] October 18, 2005 Previous studies have shown that dyslexic individuals who supplement windowed reading practice with intensive small-scale hand-eye coordination tasks exhibit marked improvement in their reading skills. Here we examine whether similar hand-eye coordination activities, in the form of artwork performed by children in kindergarten, first and second grades, could reduce the number of students at-risk for reading problems. Our results suggest that daily hand-eye coordination activities significantly reduce the number of students at-risk. We believe that the effectiveness of these activities derives from their ability to prepare the students perceptually for reading.
MIT-CSAIL-TR-2005-064 Author[s]: Michael Zhang, Krste Asanovic Victim Migration: Dynamically Adapting Between Private and Shared CMP Caches [PDF] [PS] October 10, 2005 Future CMPs will have more cores and greater onchip cache capacity. The on-chip cache can either be divided into separate private L2 caches for each core, or treated as a large shared L2 cache. Private caches provide low hit latency but low capacity, while shared caches have higher hit latencies but greater capacity. Victim replication was previously introduced as a way of reducing the average hit latency of a shared cache by allowing a processor to make a replica of a primary cache victim in its local slice of the global L2 cache. Although victim replication performs well on multithreaded and single-threaded codes, it performs worse than the private scheme for multiprogrammed workloads where there is little sharing between the different programs running at the same time. In this paper, we propose victim migration, which improves on victim replication by adding an additional set of migration tags on each node which are used to implement an exclusive cache policy for replicas. When a replica has been created on a remote node, it is not also cached on the home node, but only recorded in the migration tags. This frees up space on the home node to store shared global lines or replicas for the local processor. We show that victim migration performs better than private, shared, and victim replication schemes across a range of single threaded, multithreaded, and multiprogrammed workloads, while using less area than a private cache design. Victim migration provides a reduction in average memory access latency of up to 10% over victim replication.
MIT-CSAIL-TR-2005-063 Author[s]: Sanmay Das Learning to Trade with Insider Information [PDF] [PS] October 7, 2005 This paper introduces algorithms for learning how to trade using insider (superior) information in Kyle's model of financial markets. Prior results in finance theory relied on the insider having perfect knowledge of the structure and parameters of the market. I show here that it is possible to learn the equilibrium trading strategy when its form is known even without knowledge of the parameters governing trading in the model. However, the rate of convergence to equilibrium is slow, and an approximate algorithm that does not converge to the equilibrium strategy achieves better utility when the horizon is limited. I analyze this approximate algorithm from the perspective of reinforcement learning and discuss the importance of domain knowledge in designing a successful learning algorithm.
MIT-CSAIL-TR-2005-062 Author[s]: Sameer Ajmani, Barbara Liskov, Liuba Shrira, Dorothy Curtis Automatic Software Upgrades for Distributed Systems [PDF] [PS] October 6, 2005 Upgrading the software of long-lived, highly-available distributed systems is difficult. It is not possible to upgrade all the nodes in a system at once, since some nodes may be unavailable and halting the system for an upgrade is unacceptable. Instead, upgrades must happen gradually, and there may be long periods of time when different nodes run different software versions and need to communicate using incompatible protocols. We present a methodology and infrastructure that make it possible to upgrade distributed systems automatically while limiting service disruption. We introduce new ways to reason about correctness in a multi-version system. We also describe a prototype implementation that supports automatic upgrades with modest overhead.
MIT-CSAIL-TR-2005-061 Author[s]: Sameer Ajmani Automatic Software Upgrades for Distributed Systems (PhD thesis) [PDF] [PS] October 6, 2005 Upgrading the software of long-lived, highly-available distributed systems is difficult. It is not possible to upgrade all the nodes in a system at once, since some nodes may be unavailable and halting the system for an upgrade is unacceptable. Instead, upgrades may happen gradually, and there may be long periods of time when different nodes are running different software versions and need to communicate using incompatible protocols. We present a methodology and infrastructure that address these challenges and make it possible to upgrade distributed systems automatically while limiting service disruption. Our methodology defines how to enable nodes to interoperate across versions, how to preserve the state of a system across upgrades, and how to schedule an upgrade so as to limit service disruption. The approach is modular: defining an upgrade requires understanding only the new software and the version it replaces. The upgrade infrastructure is a generic platform for distributing and installing software while enabling nodes to interoperate across versions. The infrastructure requires no access to the system source code and is transparent: node software is unaware that different versions even exist. We have implemented a prototype of the infrastructure called Upstart that intercepts socket communication using a dynamically-linked C++ library. Experiments show that Upstart has low overhead and works well for both local-area and Internet systems.
MIT-CSAIL-TR-2005-060 Author[s]: B. Gassend, C. W. O'Donnell, W. Thies, A. Lee, M. van Dijk, S. Devadas Secondary Structure Prediction of All-Helical Proteins Using Hidden Markov Support Vector Machines [PDF] [PS] October 6, 2005 Our goal is to develop a state-of-the-art predictor with an intuitive and biophysically-motivated energy model through the use of Hidden Markov Support Vector Machines (HM-SVMs), a recent innovation in the field of machine learning. We focus on the prediction of alpha helices in proteins and show that using HM-SVMs, a simple 7-state HMM with 302 parameters can achieve a Q_alpha value of 77.6% and a SOV_alpha value of 73.4%. We briefly describe how our method can be generalized to predicting beta strands and sheets.
MIT-CSAIL-TR-2005-059 Author[s]: Konstantine Arkoudas Combining Diagrammatic and Symbolic Reasoning [PDF] [PS] October 6, 2005 We introduce a domain-independent framework for heterogeneous natural deduction that combines diagrammatic and sentential reasoning. The framework is presented in the form of a family of denotational proof languages (DPLs). Diagrams are represented as possibly partial descriptions of finite system states. This allows us to deal with incomplete information, which we formalize by admitting sets as attribute values. We introduce a notion of attribute interpretations that enables us to interpret first-order signatures into such system states, and develop a formal semantic framework based on Kleene's strong three-valued logic. We extend the assumption-base semantics of DPLs to accodomodate diagrammatic reasoning by introducing general inference mechanisms for the valid extraction of information from diagrams and for the incorporation of sentential information into diagrams. A rigorous big-step operational semantics is given, on the basis of which we prove that our framework is sound. In addition, we specify detailed algorithms for implementing proof checkers for the resulting languages, and discuss associated efficiency issues.
MIT-CSAIL-TR-2005-058 Author[s]: Georgios Theocharous, Sridhar Mahadevan, Leslie Pack Kaelbling Spatial and Temporal Abstractions in POMDPs Applied to Robot Navigation [PDF] [PS] September 27, 2005 Partially observable Markov decision processes (POMDPs) are a well studied paradigm for programming autonomous robots, where the robot sequentially chooses actions to achieve long term goals efficiently. Unfortunately, for real world robots and other similar domains, the uncertain outcomes of the actions and the fact that the true world state may not be completely observable make learning of models of the world extremely difficult, and using them algorithmically infeasible. In this paper we show that learning POMDP models and planning with them can become significantly easier when we incorporate into our algorithms the notions of spatial and tempral abstraction. We demonstrate the superiority of our algorithms by comparing them with previous flat approaches for large scale robot navigation.
MIT-CSAIL-TR-2005-057 Author[s]: Chris Stauffer Automated Audio-visual Activity Analysis [PDF] [PS] September 20, 2005 Current computer vision techniques can effectively monitor gross activities in sparse environments. Unfortunately, visual stimulus is often not sufficient for reliably discriminating between many types of activity. In many cases where the visual information required for a particular task is extremely subtle or non-existent, there is often audio stimulus that is extremely salient for a particular classification or anomaly detection task. Unfortunately unlike visual events, independent sounds are often very ambiguous and not sufficient to define useful events themselves. Without an effective method of learning causally-linked temporal sequences of sound events that are coupled to the visual events, these sound events are generally only useful for independent anomalous sounds detection, e.g., detecting a gunshot or breaking glass. This paper outlines a method for automatically detecting a set of audio events and visual events in a particular environment, for determining statistical anomalies, for automatically clustering these detected events into meaningful clusters, and for learning salient temporal relationships between the audio and visual events. This results in a compact description of the different types of compound audio-visual events in an environment.
MIT-CSAIL-TR-2005-056 Author[s]: Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, William T. Freeman LabelMe: A Database and Web-based Tool for Image Annotation [PDF] [PS] September 8, 2005 Research in object detection and recognition in cluttered scenes requires large image collections with ground truth labels. The labels should provide information about the object classes present in each image, as well as their shape and locations, and possibly other attributes such as pose. Such data is useful for testing, as well as for supervised learning. This project provides a web-based annotation tool that makes it easy to annotate images, and to instantly share such annotations with the community. This tool, plus an initial set of 10,000 images (3000 of which have been labeled), can be found at http://www.csail.mit.edu/$\sim$brussell/research/LabelMe/intro.html
MIT-CSAIL-TR-2005-055 Author[s]: Ran Canetti, Ling Cheung, Dilsun Kaynar, Moses Liskov, Nancy Lynch, Olivier, Roberto Segala Using Probabilistic I/O Automata to Analyze an Oblivious Transfer Protocol [PDF] [PS] August 19, 2005 We demonstrate how to carry out cryptographic security analysis of distributed protocols within the Probabilistic I/O Automata framework of Lynch, Segala, and Vaandrager. This framework provides tools for arguing rigorously about the concurrency and scheduling aspects of protocols, and about protocols presented at different levels of abstraction. Consequently, it can help in making cryptographic analysis more precise and less susceptible to errors. We concentrate on a relatively simple two-party Oblivious Transfer protocol, in the presence of a semi-honest adversary (essentially, an eavesdropper). For the underlying cryptographic notion of security, we use a version of Canetti's Universally Composable security. In spite of the relative simplicity of the example, the exercise is quite nontrivial. It requires taking many fundamental issues into account, including nondeterministic behavior, scheduling, resource-bounded computation, and computational hardness assumptions for cryptographic primitives.
MIT-CSAIL-TR-2005-054 Author[s]: Whitman Richards Collective Choice with Uncertain Domain Moldels [PDF] [PS] August 16, 2005 When groups of individuals make choices among several alternatives, the most compelling social outcome is the Condorcet winner, namely the alternative beating all others in a pair-wise contest. Obviously the Condorcet winner cannot be overturned if one sub-group proposes another alternative it happens to favor. However, in some cases, and especially with haphazard voting, there will be no clear unique winner, with the outcome consisting of a triple of pair-wise winners that each beat different subsets of the alternatives (i.e. a “top-cycle”.) We explore the sensitivity of Condorcet winners to various perturbations in the voting process that lead to top-cycles. Surprisingly, variations in the number of votes for each alternative is much less important than consistency in a voter’s view of how alternatives are related. As more and more voter’s preference orderings on alternatives depart from a shared model of the domain, then unique Condorcet outcomes become increasingly unlikely.
MIT-CSAIL-TR-2005-053 Author[s]: Sachin Katti, Dina Katabi, Katarzyna Puchala Slicing the Onion: Anonymous Routing Without PKI [PDF] [PS] August 15, 2005 Recent years have witnessed many proposals for anonymous routing in overlay peer-to-peer networks. The proposed protocols either expose the receiver and the message content, or require the overlay nodes to have public-private key pairs with the public keys known to everyone. In practice, however, key distribution and management are well-known difficult problems and have crippled any widespread deployment of anonymous routing. This paper uses a combination of information slicing and source routing to provide anonymous communication in a way similar to Onion Routing but without a public key infrastructure (PKI).
MIT-CSAIL-TR-2005-052 Author[s]: Shlomi Dolev, Limor Lahiani, Nancy Lynch, Tina Nolte Self-Stabilizing Mobile Node Location Management and Message [PDF] [PS] August 11, 2005 We present simple algorithms for achieving self-stabilizing location management and routing in mobile ad-hoc networks. While mobile clients may be susceptible to corruption and stopping failures, mobile networks are often deployed with a reliable GPS oracle, supplying frequent updates of accurate real time and location information to mobile nodes. Information from a GPS oracle provides an external, shared source of consistency for mobile nodes, allowing them to label and timestamp messages, and hence aiding in identification of, and eventual recovery from, corruption and failures. Our algorithms use a GPS oracle. Our algorithms also take advantage of the Virtual Stationary Automata programming abstraction, consisting of mobile clients, virtual timed machines called virtual stationary automata (VSAs), and a local broadcast service connecting VSAs and mobile clients. VSAs are distributed at known locations over the plane, and emulated in a self-stabilizing manner by the mobile nodes in the system. They serve as fault-tolerant building blocks that can interact with mobile clients and each other, and can simplify implementations of services in mobile networks. We implement three self-stabilizing, fault-tolerant services, each built on the prior services: (1) VSA-to-VSA geographic routing, (2) mobile client location management, and (3) mobile client end-to-end routing. We use a greedy version of the classical depth-first search algorithm to route messages between VSAs in different regions. The mobile client location management service is based on home locations: Each client identifier hashes to a set of home locations, regions whose VSAs are periodically updated with the client's location. VSAs maintain this information and answer queries for client locations. Finally, the VSA-to-VSA routing and location management services are used to implement mobile client end-to-end routing.
MIT-CSAIL-TR-2005-051 Author[s]: Arnab Bhattacharyya Implementing Probabilistically Checkable Proofs of Proximity [PDF] [PS] August 8, 2005 Abstract: In this paper, we describe a proof-of-concept implementation of the probabilistically checkable proof of proximity (PCPP) system described by Ben-Sasson and Sudan. In particular, we implement a PCPP prover and verifier for Reed-Solomon codes; the prover converts an evaluation of a polynomial on a linear set into a valid PCPP, while the verifier queries the evaluation and the PCPP to check that the evaluation is close to a Reed-Solomon codeword. We prove tight bounds on the various parameters associated with the prover and verifier and describe some interesting programmatic issues that arise during their implementation.
MIT-CSAIL-TR-2005-050 Author[s]: Bruno Marnette, Viktor Kuncak, Martin Rinard On Algorithms and Complexity for Sets with Cardinality Constraints [PDF] [PS] August 3, 2005 Typestate systems ensure many desirable properties of imperative programs, including initialization of object fields and correct use of stateful library interfaces. Abstract sets with cardinality constraints naturally generalize typestate properties: relationships between the typestates of objects can be expressed as subset and disjointness relations on sets, and elements of sets can be represented as sets of cardinality one. In addition, sets with cardinality constraints provide a natural language for specifying operations and invariants of data structures. Motivated by these program analysis applications, this paper presents new algorithms and new complexity results for constraints on sets and their cardinalities. We study several classes of constraints and demonstrate a trade-off between their expressive power and their complexity. Our first result concerns a quantifier-free fragment of Boolean Algebra with Presburger Arithmetic. We give a nondeterministic polynomial-time algorithm for reducing the satisfiability of sets with symbolic cardinalities to constraints on constant cardinalities, and give a polynomial-space algorithm for the resulting problem. The best previously existing algorithm runs in exponential space and nondeterministic exponential time. In a quest for more efficient fragments, we identify several subclasses of sets with cardinality constraints whose satisfiability is NP-hard. Finally, we identify a class of constraints that has polynomial-time satisfiability and entailment problems and can serve as a foundation for efficient program analysis. We give a system of rewriting rules for enforcing certain consistency properties of these constraints and show how to extract complete information from constraints in normal form. This result implies the soundness and completeness of our algorithms.
MIT-CSAIL-TR-2005-049 Author[s]: Mythili Vutukuru, Paul Valiant, Swastik Kopparty, Hari Balakrishnan How to Construct a Correct and Scalable iBGP Configuration [PDF] [PS] August 3, 2005 The Border Gateway Protocol (BGP), the current inter domain routing protocol in the Internet, has two modes of operation: eBGP (External BGP), used to exchange routing information between autonomous systems, and iBGP (Internal BGP), used to propagate that information within an autonomous system (AS). This paper focuses on the construction of an iBGP session configuration that guarantees two correctness properties - loop-free forwarding paths and complete visibility to all eBGP-learned best routes - while attempting to minimize the number of iBGP sessions (for scalability) and ensuring that the constructed configuration guarantees the two correctness properties even in the face of link failures and IGP path changes. Our algorithm constructs an iBGP configuration based on route reflectors, a commonly used way to control the number of iBGP sessions. The algorithm, BGPSep, uses the notion of a graph separator, a (small) set of nodes that partition a graph into connected components of roughly equal sizes, recursively applies this idea to the connected components, and produces a route reflector hierarchy and the associated iBGP sessions. We prove that BGPSep guarantees the desired correctness properties, and evaluate an implementation of the BGPSep algorithm on several real-world and simulated network topologies. Across these topologies, we find that the number of iBGP sessions with is a factor of 2.5 to 5 times smaller than with a \"full mesh\" iBGP, while guaranteeing the desired correctness properties.
MIT-CSAIL-TR-2005-048 Author[s]: Gregory Chockler, Nancy Lynch, Sayan Mitra, Joshua Tauber Proving Atomicity: An Assertional Approach [PDF] [PS] July 22, 2005 Atomicity (or linearizability) is a commonly used consistency criterion for distributed services and objects. Although atomic object implementations are abundant, proving that algorithms achieve atomicity has turned out to be a challenging problem. In this paper, we initiate the study of systematic ways of verifying distributed implementations of atomic objects, beginning with read/write objects (registers). Our general approach is to replace the existing operational reasoning about events and partial orders with assertional reasoning about invariants and simulation relations. To this end, we define an abstract state machine that captures the atomicity property and prove correctness of the object implementations by establishing a simulation mapping between the implementation and the specification automata. We demonstrate the generality of our specification by showing that it is implemented by three different read/write register constructions (the message-passing register emulation of Attiya, Bar-Noy and Dolev, its optimized version based on real time, and the shared memory register construction of Vitanyi and Awerbuch), and by a general atomic object implementation based on the Lamport's replicated state machine algorithm.
MIT-CSAIL-TR-2005-047 Author[s]: Barbara Liskov, Rodrigo Rodrigues Byzantine Clients Rendered Harmless [PDF] [PS] July 21, 2005 Byzantine quorum systems have been proposed that work properly even when up to f replicas fail arbitrarily. However, these systems are not so successful when confronted with Byzantine faulty clients. This paper presents novel protocols that provide atomic semantics despite Byzantine clients. Our protocols are the first to handle all problems caused by Byzantine clients. They prevent Byzantine clients from interfering with good clients: bad clients cannot prevent good clients from completing reads and writes, and they cannot cause good clients to see inconsistencies. In addition we also prevent bad clients that have been removed from operation from leaving behind more than a bounded number of writes that could be done on their behalf by a colluder. Our protocols are designed to work in an asynchronous system like the Internet and they are highly efficient. We require 3f +1 replicas, and either two or three phases to do writes; reads normally complete in one phase and require no more than two phases, no matter what the bad clients are doing. We also present strong correctness conditions for systems with Byzantine clients that limit what can be done on behalf of bad clients once they leave the system. Furthermore we prove that our protocols are both safe (they meet those conditions) and live.
MIT-CSAIL-TR-2005-046 Author[s]: Jerry Jun Yokono, Tomaso Poggio Boosting a Biologically Inspired Local Descriptor for Geometry-free Face and Full Multi-view 3D Object Recognition [PDF] [PS] July 7, 2005 Object recognition systems relying on local descriptors are increasingly used because of their perceived robustness with respect to occlusions and to global geometrical deformations. Descriptors of this type -- based on a set of oriented Gaussian derivative filters -- are used in our recognition system. In this paper, we explore a multi-view 3D object recognition system that does not use explicit geometrical information. The basic idea is to find discriminant features to describe an object across different views. A boosting procedure is used to select features out of a large feature pool of local features collected from the positive training examples. We describe experiments on face images with excellent recognition rate.
MIT-CSAIL-TR-2005-045 Author[s]: Chou Hung, Gabriel Kreiman, Tomaso Poggio, James J. DiCarlo Ultra-fast Object Recognition from Few Spikes [PDF] [PS] July 6, 2005 Understanding the complex brain computations leading to object recognition requires quantitatively characterizing the information represented in inferior temporal cortex (IT), the highest stage of the primate visual stream. A read-out technique based on a trainable classifier is used to characterize the neural coding of selectivity and invariance at the population level. The activity of very small populations of independently recorded IT neurons (~100 randomly selected cells) over very short time intervals (as small as 12.5 ms) contains surprisingly accurate and robust information about both object ‘identity’ and ‘category’, which is furthermore highly invariant to object position and scale. Significantly, selectivity and invariance are present even for novel objects, indicating that these properties arise from the intrinsic circuitry and do not require object-specific learning. Within the limits of the technique, there is no detectable difference in the latency or temporal resolution of the IT information supporting so-called ‘categorization’ (a.k. basic level) and ‘identification’ (a.k. subordinate level) tasks. Furthermore, where information, in particular information about stimulus location and scale, can also be read-out from the same small population of IT neurons. These results show how it is possible to decode invariant object information rapidly, accurately and robustly from a small population in IT and provide insights into the nature of the neural code for different kinds of object-related information.
MIT-CSAIL-TR-2005-044 Author[s]: Athicha Muthitacharoen, Seth Gilbert, Robert Morris Etna: A Fault-tolerant Algorithm for Atomic Mutable DHT Data [PDF] [PS] June 15, 2005 This paper presents Etna, an algorithm for atomic reads and writes of replicated data stored in a distributed hash table. Etna correctly handles dynamically changing sets of replica hosts, and is optimized for reads, writes, and reconfiguration, in that order. Etna maintains a series of replica configurations as nodes in the system change, using new sets of replicas from the pool supplied by the distributed hash table system. It uses the Paxos protocol to ensure consensus on the members of each new configuration. For simplicity and performance, Etna serializes all reads and writes through a primary during the lifetime of each configuration. As a result, Etna completes read and write operations in only a single round from the primary. Experiments in an environment with high network delays show that Etna's read latency is determined by round-trip delay in the underlying network, while write and reconfiguration latency is determined by the transmission time required to send data to each replica. Etna's write latency is about the same as that of a non-atomic replicating DHT, and Etna's read latency is about twice that of a non-atomic DHT due to Etna assembling a quorum for every read.
MIT-CSAIL-TR-2005-043 Author[s]: Shlomi Dolev, Seth Gilbert, Elad Schiller, Alex Shvartsman, Jennifer Welch Autonomous Virtual Mobile Nodes [PDF] [PS] June 15, 2005 This paper presents a new abstraction for virtual infrastructure in mobile ad hoc networks. An Autonomous Virtual Mobile Node (AVMN) is a robust and reliable entity that is designed to cope with the inherent difficulties caused by processors arriving, leaving, and moving according to their own agendas, as well as with failures and energy limitations. There are many types of applications that may make use of the AVMN infrastructure: tracking, supporting mobile users, or searching for energy sources. The AVMN extends the focal point abstraction in [9] and the virtual mobile node abstraction in [10]. The new abstraction is that of a virtual general-purpose computing entity, an automaton that can make autonomous on-line decisions concerning its own movement. We describe a self-stabilizing implementation of this new abstraction that is resilient to the chaotic behavior of the physical processors and provides automatic recovery from any corrupted state of the system.
MIT-CSAIL-TR-2005-042 Author[s]: David Saff, Shay Artzi, Jeff H. Perkins, Michael D. Ernst Automatic Test Factoring for Java [PDF] [PS] June 8, 2005 Test factoring creates fast, focused unit tests from slow system-wide tests; each new unit test exercises only a subset of the functionality exercised by the system test. Augmenting a test suite with factored unit tests should catch errors earlier in a test run. One way to factor a test is to introduce 'mock' objects. If a test exercises a component T, which interacts with another component E (the 'environment'), the implementation of E can be replaced by a mock. The mock checks that T's calls to E are as expected, and it simulates E's behavior in response. We introduce an automatic technique for test factoring. Given a system test for T and E, and a record of T's and E's behavior when the system test is run, test factoring generates unit tests for T in which E is mocked. The factored tests can isolate bugs in T from bugs in E and, if E is slow or expensive, improve test performance or cost. We have built an implementation of automatic dynamic test factoring for the Java language. Our experimental data indicates that it can reduce the running time of a system test suite by up to an order of magnitude.
MIT-CSAIL-TR-2005-041 Author[s]: Ali Rahimi, Ben Recht, Trevor Darrell Nonlinear Latent Variable Models for Video Sequences [PDF] [PS] June 6, 2005 Many high-dimensional time-varying signals can be modeled as a sequence of noisy nonlinear observations of a low-dimensional dynamical process. Given high-dimensional observations and a distribution describing the dynamical process, we present a computationally inexpensive approximate algorithm for estimating the inverse of this mapping. Once this mapping is learned, we can invert it to construct a generative model for the signals. Our algorithm can be thought of as learning a manifold of images by taking into account the dynamics underlying the low-dimensional representation of these images. It also serves as a nonlinear system identification procedure that estimates the inverse of the observation function in nonlinear dynamic system. Our algorithm reduces to a generalized eigenvalue problem, so it does not suffer from the computational or local minimum issues traditionally associated with nonlinear system identification, allowing us to apply it to the problem of learning generative models for video sequences.
MIT-CSAIL-TR-2005-040 Author[s]: Ravi Kumar, David Liben-Nowell, Jasmine Novak, Prabhakar Raghavan, Andrew Tomkins Theoretical Analysis of Geographic Routing in Social Networks [PDF] [PS] June 3, 2005 We introduce a formal model for geographic social networks, and introduce the notion of rank-based friendship, in which the probability that a person v is a friend of a person u is inversely proportional to the number of people w who live closer to u than v does. We then prove our main theorem, showing that rank-based friendship is a sufficient explanation of the navigability of any geographic social network that adheres to it.
MIT-CSAIL-TR-2005-039 Author[s]: Sam Larsen, Rodric Rabbah, Saman Amarasinghe Exploiting Vector Parallelism in Software Pipelined Loops [PDF] [PS] June 3, 2005 An emerging trend in processor design is the incorporation of short vector instructions into the ISA. In fact, vector extensions have appeared in most general-purpose microprocessors. To utilize these instructions, traditional vectorization technology can be used to identify and exploit data parallelism. In contrast, efficient use of a processor\'s scalar resources is typically achieved through ILP techniques such as software pipelining. In order to attain the best performance, it is necessary to utilize both sets of resources. This paper presents a novel approach for exploiting vector parallelism in a software pipelined loop. At its core is a method for judiciously partitioning operations between vector and scalar resources. The proposed algorithm (i) lowers the burden on the scalar resources by offloading computation to the vector functional units, and (ii) partially (or fully) inhibits the optimizations when full vectorization will decrease performance. ! This results in better resource usage and allows for software pipelining with shorter initiation intervals. Although our techniques complement statically scheduled machines most naturally, we believe they are applicable to any architecture that tightly integrates support for ILP and data parallelism. An important aspect of the proposed methodology is its ability to manage explicit communication of operands between vector and scalar instructions. Our methodology also allows for a natural handling of misaligned vector memory operations. For architectures that provide hardware support for misaligned references, software pipelining effectively hides the latency of these potentially expensive instructions. When explicit alignment is required in software, our algorithm accounts for these extra costs and vectorizes only when it is profitable. Finally, our heuristic can take advantage of alignment information where it is available. We evaluate our methodology using several DSP and SPEC FP benchmarks. Compared to software pipelining, our approach is able to achieve an average speedup of 1.30x and 1.18x for the two benchmark sets, respectively.
MIT-CSAIL-TR-2005-038 Author[s]: Florent Segonne, Jean-Philippe Pons, Bruce Fischl, Eric Grimson A Novel Active Contour Framework. Multi-component Level Set Evolution Under Topology Control [PDF] [PS] June 1, 2005 We present a novel framework to exert a topology control over a level set evolution. Level set methods offer several advantages over parametric active contours, in particular automated topological changes. In some applications, where some a priori knowledge of the target topology is available, topological changes may not be desirable. A method, based on the concept of simple point borrowed from digital topology, was recently proposed to achieve a strict topology preservation during a level set evolution. However, topologically constrained evolutions often generate topological barriers that lead to large geometric inconsistencies. We introduce a topologically controlled level set framework that greatly alleviates this problem. Unlike existing work, our method allows connected components to merge, split or vanish under some specific conditions that ensure that no topological defects are generated. We demonstrate the strength of our method on a wide range of numerical experiments.
MIT-CSAIL-TR-2005-037 Author[s]: Christopher J. Taylor Simultaneous Localization and Tracking in Wireless Ad-hoc Sensor Networks [PDF] [PS] May 31, 2005 In this thesis we present LaSLAT, a sensor network algorithm that simultaneously localizes sensors, calibrates sensing hardware, and tracks unconstrained moving targets using only range measurements between the sensors and the target. LaSLAT is based on a Bayesian filter, which updates a probability distribution over the quantities of interest as measurements arrive. The algorithm is distributable, and requires only a constant amount of space with respect to the number of measurements incorporated. LaSLAT is easy to adapt to new types of hardware and new physical environments due to its use of intuitive probability distributions: one adaptation demonstrated in this thesis uses a mixture measurement model to detect and compensate for bad acoustic range measurements due to echoes. We also present results from a centralized Java implementation of LaSLAT on both two- and three-dimensional sensor networks in which ranges are obtained using the Cricket ranging system. LaSLAT is able to localize sensors to within several centimeters of their ground truth positions while recovering a range measurement bias for each sensor and the complete trajectory of the mobile.
MIT-CSAIL-TR-2005-036 Author[s]: Andrea Caponnetto, Lorenzo Rosasco, Ernesto De Vito, Alessandro Verri Empirical Effective Dimension and Optimal Rates for Regularized Least Squares Algorithm [PDF] [PS] May 27, 2005 This paper presents an approach to model selection for regularized least-squares on reproducing kernel Hilbert spaces in the semi-supervised setting. The role of effective dimension was recently shown to be crucial in the definition of a rule for the choice of the regularization parameter, attaining asymptotic optimal performances in a minimax sense. The main goal of the present paper is showing how the effective dimension can be replaced by an empirical counterpart while conserving optimality. The empirical effective dimension can be computed from independent unlabelled samples. This makes the approach particularly appealing in the semi-supervised setting.
MIT-CSAIL-TR-2005-035 Author[s]: Jia Jane Wu Comparing Visual Features for Morphing Based Recognition [PDF] [PS] May 25, 2005 This thesis presents a method of object classification using the idea of deformable shape matching. Three types of visual features, geometric blur, C1 and SIFT, are used to generate feature descriptors. These feature descriptors are then used to find point correspondences between pairs of images. Various morphable models are created by small subsets of these correspondences using thin-plate spline. Given these morphs, a simple algorithm, least median of squares (LMEDS), is used to find the best morph. A scoring metric, using both LMEDS and distance transform, is used to classify test images based on a nearest neighbor algorithm. We perform the experiments on the Caltech 101 dataset [5]. To ease computation, for each test image, a shortlist is created containing 10 of the most likely candidates. We were unable to duplicate the performance of [1] in the shortlist stage because we did not use hand-segmentation to extract objects for our training images. However, our gain from the shortlist to correspondence stage is comparable to theirs. In our experiments, we improved from 21% to 28% (gain of 33%), while [1] improved from 41% to 48% (gain of 17%). We find that using a non-shape based approach, C2 [14], the overall classification rate of 33.61% is higher than all of the shaped based methods tested in our experiments.
MIT-CSAIL-TR-2005-034 Author[s]: Thade Nahnsen, Ozlem Uzuner, Boris Katz Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection [PDF] [PS] May 19, 2005 We present a system to determine content similarity of documents. More specifically, our goal is to identify book chapters that are translations of the same original chapter; this task requires identification of not only the different topics in the documents but also the particular flow of these topics. We experiment with different representations employing n-grams of lexical chains and test these representations on a corpus of approximately 1000 chapters gathered from books with multiple parallel translations. Our representations include the cosine similarity of attribute vectors of n-grams of lexical chains, the cosine similarity of tf*idf-weighted keywords, and the cosine similarity of unweighted lexical chains (unigrams of lexical chains) as well as multiplicative combinations of the similarity measures produced by these approaches. Our results identify fourgrams of unordered lexical chains as a particularly useful representation for text similarity evaluation.
MIT-CSAIL-TR-2005-033 Author[s]: Andrea Caponnetto, Alexander Rakhlin Some Properties of Empirical Risk Minimization Over Donsker Classes [PDF] [PS] May 17, 2005 We study properties of algorithms which minimize (or almost minimize) empirical error over a Donsker class of functions. We show that the L2-diameter of the set of almost-minimizers is converging to zero in probability. Therefore, as the number of samples grows, it is becoming unlikely that adding a point (or a number of points) to the training set will result in a large jump (in L2 distance) to a new hypothesis. We also show that under some conditions the expected errors of the almost-minimizers are becoming close with a rate faster than n^{-1/2}.
MIT-CSAIL-TR-2005-032 Author[s]: Neha Singh A Region-based Architecture for Service-Providing Distributed Systems [PDF] [PS] May 17, 2005 A service-providing system consists of hosts that provide services such as data, content, computational and memory resources and data-based services to other entities in the system. Consumers that wish to use services describe their needs with a set of high-level objectives. In this thesis, we address the problem of locating services in a large-scale distributed system using their descriptions, rather than their addresses. We propose a network architecture that is based on the concept of dividing the service-providing hosts into Regions. A Region is a grouping of elements of the network that share a set of common characteristics and policies. Members of a region manage their interactions with other regions and their elements according to some defined rules and policies. Hosts can be divided into regions based on various properties such as their content, their commercial model or their security characteristics to name a few. The service provided by a region is an ! aggregate of the services provided by all its member hosts. The region-based architecture routes a service request through the network efficiently based on its description and on the advertisements from regions providing services. Division of hosts into a set of independent regions partitions the search space and produces a scalable structure. The architecture also does not impose any rules on the internal organization of regions making the system flexible and dynamic.
MIT-CSAIL-TR-2005-031 Author[s]: Ernesto De Vito, Andrea Caponnetto Risk Bounds for Regularized Least-squares Algorithm with Operator-valued Kernels [PDF] [PS] May 16, 2005 We show that recent results in [3] on risk bounds for regularized least-squares on reproducing kernel Hilbert spaces can be straightforwardly extended to the vector-valued regression setting. We first briefly introduce central concepts on operator-valued kernels. Then we show how risk bounds can be expressed in terms of a generalization of effective dimension.
MIT-CSAIL-TR-2005-030 Author[s]: Stephen McCamant, Greg Morrisett Efficient, Verifiable Binary Sandboxing for a CISC Architecture [PDF] [PS] May 2, 2005 Executing untrusted code while preserving security requires enforcement of memory and control-flow safety policies: untrusted code must be prevented from modifying memory or executing code except as explicitly allowed. Software-based fault isolation (SFI) or \"sandboxing\" enforces those policies by rewriting the untrusted code at the level of individual instructions. However, the original sandboxing technique of Wahbe et al. is applicable only to RISC architectures, and other previous work is either insecure, or has been not described in enough detail to give confidence in its security properties. We present a novel technique that allows sandboxing to be easily applied to a CISC architecture like the IA-32. The technique can be verified to have been applied at load time, so that neither the rewriting tool nor the compiler needs to be trusted. We describe a prototype implementation which provides a robust security guarantee, is scalable to programs of any size, and has low runtime overheads. Further, we give a machine-checked proof that any program approved by the verification algorithm is guaranteed to respect the desired safety property.
MIT-CSAIL-TR-2005-029 Author[s]: Christopher Taylor, Ali Rahimi, Jonathan Bachrach, Howard Shrobe Simultaneous Localization, Calibration, and Tracking in an ad Hoc Sensor Network [PDF] [PS] April 26, 2005 We introduce Simultaneous Localization and Tracking (SLAT), the problem of tracking a target in a sensor network while simultaneously localizing and calibrating the nodes of the network. Our proposed solution, LaSLAT, is a Bayesian filter providing on-line probabilistic estimates of sensor locations and target tracks. It does not require globally accessible beacon signals or accurate ranging between the nodes. When applied to a network of 27 sensor nodes, our algorithm can localize the nodes to within one or two centimeters.
MIT-CSAIL-TR-2005-028 Author[s]: Jacob Eisenstein, Randall Davis Gestural Cues for Sentence Segmentation [PDF] [PS] April 19, 2005 In human-human dialogues, face-to-face meetings are often preferred over phone conversations. One explanation is that non-verbal modalities such as gesture provide additional information, making communication more efficient and accurate. If so, computer processing of natural language could improve by attending to non-verbal modalities as well. We consider the problem of sentence segmentation, using hand-annotated gesture features to improve recognition. We find that gesture features correlate well with sentence boundaries, but that these features improve the overall performance of a language-only system only marginally. This finding is in line with previous research on this topic. We provide a regression analysis, revealing that for sentence boundary detection, the gestural features are largely redundant with the language model and pause features. This suggests that gestural features can still be useful when speech recognition is inaccurate.
MIT-CSAIL-TR-2005-027 Author[s]: Andrea Caponnetto, Ernesto De Vito Fast Rates for Regularized Least-squares Algorithm [PDF] [PS] April 14, 2005 We develop a theoretical analysis of generalization performances of regularized least-squares on reproducing kernel Hilbert spaces for supervised learning. We show that the concept of effective dimension of an integral operator plays a central role in the definition of a criterion for the choice of the regularization parameter as a function of the number of samples. In fact, a minimax analysis is performed which shows asymptotic optimality of the above-mentioned criterion.
MIT-CSAIL-TR-2005-026 Author[s]: Jacob Beal Learning From Snapshot Examples [PDF] [PS] April 13, 2005 Examples are a powerful tool for teaching both humans and computers. In order to learn from examples, however, a student must first extract the examples from its stream of perception. Snapshot learning is a general approach to this problem, in which relevant samples of perception are used as examples. Learning from these examples can in turn improve the judgement of the snapshot mechanism, improving the quality of future examples. One way to implement snapshot learning is the Top-Cliff heuristic, which identifies relevant samples using a generalized notion of peaks. I apply snapshot learning with the Top-Cliff heuristic to solve a distributed learning problem and show that the resulting system learns rapidly and robustly, and can hallucinate useful examples in a perceptual stream from a teacherless system
MIT-CSAIL-TR-2005-025 Author[s]: Sara L. Su, Fredo Durand, Maneesh Agrawala De-Emphasis of Distracting Image Regions Using Texture Power Maps [PDF] [PS] April 12, 2005 A major obstacle in photography is the presence of distracting elements that pull attention away from the main subject and clutter the composition. In this article, we present a new image-processing technique that reduces the salience of distracting regions. It is motivated by computational models of attention that predict that texture variation influences bottom-up attention mechanisms. Our method reduces the spatial variation of texture using power maps, high-order features describing local frequency content in an image. We show how modification of power maps results in powerful image de-emphasis. We validate our results using a user search experiment and eye tracking data.
MIT-CSAIL-TR-2005-024 Author[s]: Justin Werfel, Yaneer Bar-Yam, Radhika Nagpal Construction by Robot Swarms Using Extended Stigmergy [PDF] [PS] April 8, 2005 We describe a system in which simple, identical, autonomous robots assemble two-dimensional structures out of identical building blocks. We show that, in a system divided in this way into mobile units and structural units, giving the blocks limited communication abilities enables robots to have sufficient global structural knowledge to rapidly build elaborate pre-designed structures. In this way we extend the principle of stigmergy (storing information in the environment) used by social insects, by increasing the capabilities of the blocks that represent that environmental information. As a result, arbitrary solid structures can be built using a few fixed, local behaviors, without requiring construction to be planned out in detail.
MIT-CSAIL-TR-2005-023 Author[s]: Nancy Lynch, Sayan Mitra, Tina Nolte Motion Coordination Using Virtual Nodes [PDF] [PS] April 6, 2005 We describe how a virtual node abstraction layer can be used to coordinate the motion of real mobile nodes in a region of 2-space. In particular, we consider how nodes in a mobile ad hoc network can arrange themselves along a predetermined curve in the plane, and can maintain themselves in such a configuration in the presence of changes in the underlying mobile ad hoc network, specifically, when nodes may join or leave the system or may fail. Our strategy is to allow the mobile nodes to implement a virtual layer consisting of mobile client nodes, stationary Virtual Nodes (VNs) for predetermined zones in the plane, and local broadcast communication. The VNs coordinate among themselves to distribute the client nodes between zones based on the length of the curve through those zones, while each VN directs its zone's local client nodes to move themselves to equally spaced locations on the local portion of the target curve.
MIT-CSAIL-TR-2005-022 Author[s]: Viktor Kuncak, Daniel Jackson On Relational Analysis of Algebraic Datatypes [PDF] [PS] April 5, 2005 We present a technique that enables the use of finite model finding to check the satisfiability of certain formulas whose intended models are infinite. Such formulas arise when using the language of sets and relations to reason about structured values such as algebraic datatypes. The key idea of our technique is to identify a natural syntactic class of formulas in relational logic for which reasoning about infinite structures can be reduced to reasoning about finite structures. As a result, when a formula belongs to this class, we can use existing finite model finding tools to check whether the formula holds in the desired infinite model.
MIT-CSAIL-TR-2005-021 Author[s]: Ittai Abraham, Gregory Chockler, Idit Keidar, Dahlia Malkhi Wait-free Regular Storage from Byzantine Components [PDF] [PS] April 5, 2005 We present a simple, efficient, and self-contained construction of a wait-free regular register from Byzantine storage components. Our construction utilizes a novel building block, called 1-regular register, which can be implemented from Byzantine fault-prone components with the same round complexity as a safe register, and with only a slight increase in storage space.
MIT-CSAIL-TR-2005-020 Author[s]: Kilian M. Pohl, John Fisher, W. Eric L. Grimson, William M. Wells An Expectation Maximization Approach for Integrated Registration, Segmentation, and Intensity Correction [PDF] [PS] April 1, 2005 This paper presents a statistical framework which combines the registration of an atlas with the segmentation of MR images. We use an Expectation Maximization-based algorithm to find a solution within the model, which simultaneously estimates image inhomogeneities, anatomical labelmap, and a mapping from the atlas to the image space. An example of the approach is given for a brain structure-dependent affine mapping approach. The algorithm produces high quality segmentations for brain tissues as well as their substructures. We demonstrate the approach on a set of 30 brain MR images. In addition, we show that the approach performs better than similar methods which separate the registration from the segmentation problem.
MIT-CSAIL-TR-2005-019 Author[s]: Lior Wolf, Stanley Bileschi Combining Variable Selection with Dimensionality Reduction [PDF] [PS] March 30, 2005 This paper bridges the gap between variable selection methods (e.g., Pearson coefficients, KS test) and dimensionality reduction algorithms (e.g., PCA, LDA). Variable selection algorithms encounter difficulties dealing with highly correlated data, since many features are similar in quality. Dimensionality reduction algorithms tend to combine all variables and cannot select a subset of significant variables. Our approach combines both methodologies by applying variable selection followed by dimensionality reduction. This combination makes sense only when using the same utility function in both stages, which we do. The resulting algorithm benefits from complex features as variable selection algorithms do, and at the same time enjoys the benefits of dimensionality reduction.
MIT-CSAIL-TR-2005-018 Author[s]: Luis Rademacher, Santosh Vempala, Grant Wang Matrix Approximation and Projective Clustering via Iterative Sampling [PDF] [PS] March 29, 2005 We present two new results for the problem of approximating a given real m by n matrix A by a rank-k matrix D, where k < min{m, n}, so as to minimize ||A-D||_F^2. It is known that bysampling O(k/eps) rows of the matrix, one can find a low-rank approximation with additive error eps||A||_F^2. Our first result shows that with adaptive sampling in t rounds and O(k/eps) samples in each round, the additive error drops exponentially as eps^t; the computation time is nearly linear in the number of nonzero entries. This demonstrates that multiple passes can be highly beneficial for a natural (and widely studied) algorithmic problem. Our second result is that there exists a subset of O(k^2/eps) rows such that their span contains a rank-k approximation with multiplicative (1+eps) error (i.e., the sum of squares distance has a small \"core-set\" whose span determines a good approximation). This existence theorem leads to a PTAS for the following projective clustering probl! em: Given a set of points P in R^d, and integers k,j, find a set of j subspaces F_1,...,F_j, each of dimension at most k, that minimize \\sum_{p \\in P} min_i d(p,F_i)^2.
MIT-CSAIL-TR-2005-017 Author[s]: Kristen Grauman, Trevor Darrell Pyramid Match Kernels: Discriminative Classification with Sets of Image Features [PDF] [PS] March 17, 2005 Discriminative learning is challenging when examples are sets of local image features, and the sets vary in cardinality and lack any sort of meaningful ordering. Kernel-based classification methods can learn complex decision boundaries, but a kernel similarity measure for unordered set inputs must somehow solve for correspondences -- generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function which maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in this space. This ``pyramid match" computation is linear in the number of features, and it implicitly finds correspondences based on the finest resolution histogram cell where a matched pair first appears. Since the kernel does not penalize the presence of extra features, it is robust to clutter. We show the kernel function is positive-definite, making it valid for use in learning algorithms whose optimal solutions are guaranteed only for Mercer kernels. We demonstrate our algorithm on object recognition tasks and show it to be dramatically faster than current approaches.
MIT-CSAIL-TR-2005-016 Author[s]: Leonid Taycher, John W. Fisher III, Trevor Darrell Combining Object and Feature Dynamics in Probabilistic Tracking [PDF] [PS] March 2, 2005 Objects can exhibit different dynamics at different scales, a property that is often exploited by visual tracking algorithms. A local dynamic model is typically used to extract image features that are then used as inputs to a system for tracking the entire object using a global dynamic model. Approximate local dynamics may be brittle---point trackers drift due to image noise and adaptive background models adapt to foreground objects that become stationary---but constraints from the global model can make them more robust. We propose a probabilistic framework for incorporating global dynamics knowledge into the local feature extraction processes. A global tracking algorithm can be formulated as a generative model and used to predict feature values that influence the observation process of the feature extractor. We combine such models in a multichain graphical model framework. We show the utility of our framework for improving feature tracking and thus shape and motion estimates in a batch factorization algorithm. We also propose an approximate filtering algorithm appropriate for online applications, and demonstrate its application to problems such as background subtraction, structure from motion and articulated body tracking.
MIT-CSAIL-TR-2005-015 Author[s]: Benjamin Balas, Pawan Sinha Receptive Field Structures for Recognition [PDF] [PS] March 1, 2005 Localized operators, like Gabor wavelets and difference-of-Gaussian filters, are considered to be useful tools for image representation. This is due to their ability to form a ‘sparse code’ that can serve as a basis set for high-fidelity reconstruction of natural images. However, for many visual tasks, the more appropriate criterion of representational efficacy is ‘recognition’, rather than ‘reconstruction’. It is unclear whether simple local features provide the stability necessary to subserve robust recognition of complex objects. In this paper, we search the space of two-lobed differential operators for those that constitute a good representational code under recognition/discrimination criteria. We find that a novel operator, which we call the ‘dissociated dipole’ displays useful properties in this regard. We describe simple computational experiments to assess the merits of such dipoles relative to the more traditional local operators. The results suggest that non-local operators constitute a vocabulary that is stable across a range of image transformations.
MIT-CSAIL-TR-2005-014 Author[s]: Russ Cox, William Josephson File Synchronization with Vector Time Pairs [PDF] [PS] February 28, 2005 Vector time pairs are a new method for tracking synchronization metadata. A vector time pair consists of two vector times: one tracking file modification history and one tracking file synchronization history. Because the vector times are maintained separately and used for different purposes, different algorithms and optimizations can be applied to each. As a result, vector time pairs impose no restriction on synchronization patterns, never falsely detect conflicts, require no space to store deletion notices, require network bandwidth proportional only to the number of files changed, and support partial synchronizations. No other current synchronization method has all these properties. Results from an implementation of vector time pairs in a new user-level file synchronizer called Tra confirm the benefits of vector time pairs.
MIT-CSAIL-TR-2005-013 Author[s]: Paul Attie, Rachid Guerraoui, Petr Kouznetsov, Nancy Lynch, Sergio Rajsbaum Impossibility of Boosting Distributed Service Resilience [PDF] [PS] February 25, 2005 We prove two theorems saying that no distributed system in which processes coordinate using reliable registers and f-resilient services can solve the consensus problem in the presence of f+1 undetectable process stopping failures. (A service is f-resilient if it is guaranteed to operate as long as no more than f of the processes connected to it fail.) Our first theorem assumes that the given services are atomic objects, and allows any connection pattern between processes and services. In contrast, we show that it is possible to boost the resilience of systems solving problems easier than consensus: the k-set consensus problem is solvable for 2k-1 failures using 1-resilient consensus services. The first theorem and its proof generalize to the larger class of failure-oblivious services. Our second theorem allows the system to contain failure-aware services, such as failure detectors, in addition to failure-oblivious services; however, it requires that each failure-aware service be connected to all processes. Thus, f+1 process failures overall can disable all the failure-aware services. In contrast, it is possible to boost the resilience of a system solving consensus if arbitrary patterns of connectivity are allowed between processes and failure-aware services: consensus is solvable for any number of failures using only 1-resilient 2-process perfect failure detectors.
MIT-CSAIL-TR-2005-012 Author[s]: Josef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, William T. Freeman Discovering Object Categories in Image Collections [PDF] [PS] February 25, 2005 Given a set of images containing multiple object categories, we seek to discover those categories and their image locations without supervision. We achieve this using generative models from the statistical text literature: probabilistic Latent Semantic Analysis (pLSA), and Latent Dirichlet Allocation (LDA). In text analysis these are used to discover topics in a corpus using the bag-of-words document representation. Here we discover topics as object categories, so that an image containing instances of several categories is modelled as a mixture of topics. The models are applied to images by using a visual analogue of a word, formed by vector quantizing SIFT like region descriptors. We investigate a set of increasingly demanding scenarios, starting with image sets containing only two object categories through to sets containing multiple categories (including airplanes, cars, faces, motorbikes, spotted cats) and background clutter. The object categories sample both intra-class and scale variation, and both the categories and their approximate spatial layout are found without supervision. We also demonstrate classification of unseen images and images containing multiple objects. Performance of the proposed unsupervised method is compared to the semi-supervised approach of Fergus et al.
MIT-CSAIL-TR-2005-011 Author[s]: Reina Riemann, Keith Winstein Improving 802.11 Range with Forward Error Correction [PDF] [PS] February 24, 2005 The ISO/IEC 8802-11:1999(E) specification uses a 32-bit CRC for error detection and whole-packet retransmissions for recovery. In long-distance or high-interference links where the probability of a bit error is high, this strategy results in excessive losses, because any erroneous bit causes an entire packet to be discarded. By ignoring the CRC and adding redundancy to 802.11 payloads in software, we achieved substantially reduced loss rates on indoor and outdoor long-distance links and extended line-of-sight range outdoors by 70 percent.
MIT-CSAIL-TR-2005-010 Author[s]: Tim Abbott, Daniel Kane, Paul Valiant Complexity of Finding Nash Equilibria in 0-1 Bimatrix Games [PDF] [PS] February 8, 2005 We exhibit a polynomial reduction from the problem of finding a Nash equilibrium of a bimatrix game with rational coefficients to the problem of finding a Nash equilibrium of a bimatrix game with 0-1 coefficients.
MIT-CSAIL-TR-2005-009 Author[s]: Nick Feamster, Ramesh Johari, Hari Balakrishnan Stable Policy Routing with Provider Independence [PDF] [PS] February 8, 2005 Thousands of competing autonomous systems (ASes) must cooperate with each other to provide global Internet connectivity. These ASes encode various economic, business, and performance decisions in their routing policies. The current interdomain routing system enables ASes to express policy using rankings that determine how each router in an AS orders the different routes to a destination, and filters that determine which routes are hidden from each neighboring AS. Since the Internet is composed of many independent, competing networks, the interdomain routing system should allow providers to set their rankings independently, and to have no constraints on allowed filters. This paper studies routing protocol stability under these constraints. We first demonstrate that certain rankings that are commonly used in practice may not ensure routing stability. We then prove that, with ranking independence and unrestricted filtering, guaranteeing that the routing system will converge to a stable path assignment essentially requires ASes to rank routes based on AS-path lengths. Finally, we discuss the implications of these results for the future of interdomain routing.
MIT-CSAIL-TR-2005-008 Author[s]: Benjamin Balas Using Computational Models to Study Texture Representations in the Human Visual System [PDF] [PS] February 7, 2005 Traditionally, human texture perception has been studied using artificial textures made of random-dot patterns or abstract structured elements. At the same time, computer algorithms for the synthesis of natural textures have improved dramatically. The current study seeks to unify these two fields of research through a psychophysical assessment of a particular computational model, thus providing a sense of what image statistics are most vital for representing a range of natural textures. We employ Portilla and Simoncelli’s 2000 model of texture synthesis for this task (a parametric model of analysis and synthesis designed to mimic computations carried out by the human visual system). We find an intriguing interaction between texture type (periodic v. structured) and image statistics (autocorrelation function and filter magnitude correlations), suggesting different processing strategies may be employed for these two texture families under pre-attentive viewing.
MIT-CSAIL-TR-2005-007 Author[s]: Gerald Jay Sussman, Jack Wisdom Functional Differential Geometry [PDF] [PS] February 2, 2005 Differential geometry is deceptively simple. It is surprisingly easy to get the right answer with unclear and informal symbol manipulation. To address this problem we use computer programs to communicate a precise understanding of the computations in differential geometry. Expressing the methods of differential geometry in a computer language forces them to be unambiguous and computationally effective. The task of formulating a method as a computer-executable program and debugging that program is a powerful exercise in the learning process. Also, once formalized procedurally, a mathematical idea becomes a tool that can be used directly to compute results.
MIT-CSAIL-TR-2005-006 Author[s]: Matt Lepinski, Sergei Izmalkov The Security Power of the Ballot Box [PDF] [PS] February 2, 2005 We show that any function F can be securely evaluated by a protocol with ballots and a ballot box. That is, N mutually suspicious players, each player possessing a secret input, can use ballots and a ballot box to jointly evaluate F on their secret inputs so that (no matter how many players may collude and deviate from their prescribed instructions, and no matter how long they compute!) each player learns exactly the output of the function with the same privacy and correctness as if all players privately handed their secret inputs to a trusted party, who privately evaluates F and privately returns the outputs to each player. Our protocol is (1) efficient, (2) enjoys perfect privacy, (3) guarantees perfect correctness, (4) is universally composable, and (5) is collusion-free even for games with secret actions.
MIT-CSAIL-TR-2005-005 Author[s]: Attila Kondacs Determining Articulator Configuration in Voiced Stop Consonants by Matching Time-domain Patterns in Pitch Periods [PDF] [PS] January 28, 2005 In this thesis I will be concerned with linking the observed speech signal to the configuration of articulators. Due to the potentially rapid motion of the articulators, the speech signal can be highly non-stationary. The typical linear analysis techniques that assume quasi-stationarity may not have sufficient time-frequency resolution to determine the place of articulation. I argue that the traditional low and high-level primitives of speech processing, frequency and phonemes, are inadequate and should be replaced by a representation with three layers: 1. short pitch period resonances and other spatio-temporal patterns 2. articulator configuration trajectories 3. syllables. The patterns indicate articulator configuration trajectories (how the tongue, jaws, etc. are moving), which are interpreted as syllables and words. My patterns are an alternative to frequency. I use short time-domain features of the sound waveform, which can be extracted from each vowel pitch period pattern, to identify the positions of the articulators with high reliability. These features are important because by capitalizing on detailed measurements within a single pitch period, the rapid articulator movements can be tracked. No linear signal processing approach can achieve the combination of sensitivity to short term changes and measurement accuracy resulting from these nonlinear techniques. The measurements I use are neurophysiologically plausible: the auditory system could be using similar methods. I have demonstrated this approach by constructing a robust technique for categorizing the English voiced stops as the consonants B, D, or G based on the vocalic portions of their releases. The classification recognizes 93.5%, 81.8% and 86.1% of the b, d and g to ae transitions with false positive rates 2.9%, 8.7% and 2.6% respectively.
MIT-CSAIL-TR-2005-004 Author[s]: Shlomi Dolev, Seth Gilbert, Limor Lahiani, Nancy Lynch, Tina Nolte Virtual Stationary Automata for Mobile Networks [PDF] [PS] January 21, 2005 We define a programming abstraction for mobile networks called the Virtual Stationary Automata programming layer, consisting of real mobile clients, virtual timed I/O automata called virtual stationary automata (VSAs), and a communication service connecting VSAs and client nodes. The VSAs are located at prespecified regions that tile the plane, defining a static virtual infrastructure. We present a self-stabilizing algorithm to emulate a VSA using the real mobile nodes that are currently residing in the VSA’s region. We also describe several examples of applications whose implementations benefit from the simplicity obtained through use of the VSA abstraction.
MIT-CSAIL-TR-2005-003 Author[s]: Jacob Beal, Gerald Sussman Biologically-Inspired Robust Spatial Programming [PDF] [PS] January 18, 2005 Inspired by the robustness and flexibility of biological systems, we are developing linguistic and programming tools to allow us to program spatial systems populated by vast numbers of unreliable components interconnected in unknown, irregular, and time-varying ways. We organize our computations around geometry, making the fact that our system is made up of discrete individuals implicit. Geometry allows us to specify requirements in terms of the behavior of the space occupied by the aggregate rather than the behavior of individuals, thereby decreasing complexity. So we describe the behavior of space explicitly, abstracting away the discrete nature of the components. As an example, we present the Amorphous Medium Language, which describes behavior in terms of homeostatic maintenance of constraints on nested regions of space.
MIT-CSAIL-TR-2005-002 Author[s]: Percy Liang, Nati Srebro How Much of a Hypertree can be Captured by Windmills? [PDF] [PS] January 3, 2005 Current approximation algorithms for maximum weight hypertrees find heavy windmill farms, and are based on the fact that a constant ratio (for constant width $k$) of the weight of a $k$-hypertree can be captured by a $k$-windmill farm. However, the exact worst case ratio is not known and is only bounded to be between $1/(k+1)!$ and $1/(k+1)$. We investigate this worst case ratio by searching for weighted hypertrees that minimize the ratio of their weight that can be captured with a windmill farm. To do so, we use a novel approach in which a linear program is used to find ``bad'' inputs to a dynamic program.
MIT-CSAIL-TR-2005-001 Author[s]: Percy Liang, Nati Srebro A Dynamic Data Structure for Checking Hyperacyclicity [PDF] [PS] January 3, 2005 We present a dynamic data structure that keeps track of an acyclic hypergraph (equivalently, a triangulated graph) and enables verifying that adding a candidate hyperedge (clique) will not break the acyclicity of the augmented hypergraph. This is a generalization of the use of Tarjan's Union-Find data structure for maintaining acyclicity when augmenting forests, and the amortized time per operation has a similar almost-constant dependence on the size of the hypergraph. Such a data structure is useful when augmenting acyclic hypergraphs, e.g. in order to greedily construct a high-weight acyclic hypergraph. In designing this data structure, we introduce a hierarchical decomposition of acyclic hypergraphs that aid in understanding hyper-connectivity, and introduce a novel concept of a hypercycle which is excluded from acyclic hypergraphs.
|