Our Work

We focus on solving big data problems that impact the Intelligence Community and national security.

All of our completed work is available on Github.

Projects

Magnolia

In Progress

Speech Isolation using Deep Learning

Speech, Audio, Deep learning, Tensorflow, Neural Networks

At cocktail parties, it is often difficult to make out what someone is saying because several people are talking at once. Humans do a decent job understanding anyway, in part because we have two ears that can determine the direction of a speaker. The same idea can be applied to microphones, using many microphones to resolve many speakers and isolate their speech signals. Current technologies use expensive microphone arrays, are limited in the environments they can operate in, or can isolate a limited numbers of speakers. Magnolia proposes to use deep learning to break these constraints and isolate speech to work with COTS microphones in a variety of environmental conditions.

Pelops

In Progress

Car Recognition using Deep Learning

Python, TensorFlow, Keras, Docker

Launch GitHub

Cars are ubiquitous in urban life. They are uniquely identifiable via their license plates, but unfortunately license plates are only visible from certain angles and even then they are hard to read at a distance. Pelops will use deep learning based methods to automatically identify cars by using their large scale features—color, shape, light configuration, etc. Pelops will also attempt to re-identify specific cars if they are seen multiple times, allowing automatic pattern of life discovery.

Altair

In Progress

Recommending Code to Coders

Jupyter Notebooks, Docker, Spark, Mesos, Python

Software development and data science teams typically consolidate previous projects into a common repository but fostering source code re-use and algorithm discoverability are vexing challenges. Altair will apply collaborative filtering and content-based filtering recommender techniques from Lab41’s previous Hermes challenge on galleries of Jupyter notebooks used by technical teams. The main goal will be to identify similarities between user activity and among source code segments such that a recommender system can predict a meaningful overlap between a user’s needs and code in the repository that the user has not yet discovered.

Poseidon

Complete

Software Defined Network Situational Awareness

Software Defined Networking (SDN), GPUs, Python, Docker, Spark

Launch GitHub

This challenge is a joint challenge between two IQT Labs: Lab41 and Cyber Reboot. Current software defined network offerings lack tangible security emphasis much less methods to enhance operational security. Without situational awareness and context, defending a network remains a difficult proposition. This challenge will utilize SDN and machine learning to determine what is on the network, and what is it doing. This goal will help sponsors leverage SDN to provide situational awareness and better defend their networks.

Attalos

Complete

Multimodal Joint Vector Representations

Tensorflow, Neon, Docker, Python

Launch GitHub

Current methods using machine learning are focused on classifying items into one of many classes. These techniques are often trained on one type of data (e.g., images), but ignore other information in the dataset (e.g., tags, metadata, etc.) The Attalos Challenge is focused on building representations of images, text, and social networks, leveraging all the information together. Doing so will enable training classifiers that will work across a variety of datasets.

Gestalt

Complete

Visual Data Story Telling

Vega, Lyra, Cognitive and Perceptual Principles, Human Centered Design, Front-End Technologies

Launch GitHub

This challenge will employ various data visualization tools and user experience frameworks to construct cohesive data stories focused on communicating ripple-effect scenarios. Several event-based datasets from different disciplines will serve as the basis for data story development. Lab41 will develop an optimal front-end visualization development stack in which user experience is a driver. Our ultimate goal is to create a roadmap for how to approach visual data stories from technical considerations to user engagement.

Pythia

Complete

Natural Language Processing & Text Classification

Python, TensorFlow, Docker, Neon

Launch GitHub

Pythia discovers new and useful information in large, dynamic document stores. It constructs systems to measure and locate new information in a document being ingested into a corpus, and is exploring predictive analytics for making existing structured metadata more informative, by doing modeling on document content.

MagicHour

Complete

Scalable Security Log File Ingest and Analysis

Jupyter, Spark, Mesos, Python

Launch Github

The challenge will evaluate text clustering machine learning algorithms and graph modeling for scalable system log ingest and analysis. Lab41 will create a solution that can automatically identify and parse multiple log file formats to obviate the need to write a specialized parser to each new type. Our ultimate goal is to transform disparate text-based event log content into a graph model for advanced analytics and reduced storage requirements.

D*Script

Complete

Identifying Authorship From Images of Unstructured Handwriting

TensorFlow, Neon, Torch, Caffe, Theano

Launch GitHub

This challenge will evaluate the potential of neural networks to recognize authorship over a variety of unstructured handwriting images. The data will include different document types, paper texture, pens used, and writer range of variation. We will implement state of the art technologies that have shown promise in computer vision related to visual attention (what features to specifically pay attention to) and sequential modeling (in what order do writers use pen strokes).

Hermes

Complete

Recommender System Analysis

Jupyter, Spark, Mesos, Python

Launch Github

Hermes will compare the results of multiple recommender systems on a variety of datasets. These datasets include common, conventional datasets that are traditionally used in recommender systems: movies, books, and news. However, the challenge will also look to explore programmatic datasets from GitHub, and data from internal sources. Each of these datasets will then be subjected to a variety of recommender systems where we can compare and contrast a wide variety of performance metrics.

Sunny-Side-Up

Complete

Deep Learning Sentiment Analysis

Torch, Caffe, Theano, Pylearn2, Neon, Lua, Python, Docker, Spark, GPUs

Launch Github

This challenge will evaluate the feasibility of using architectures such as Convolutional and Recurrent Neural Networks to classify the positive, negative, or neutral sentiment of Twitter messages towards a specific topic. The ultimate goal is to help government sponsors better characterize opinions expressed towards topics and events of national security importance.

See how we did it

Soft Boiled

Complete

Geo-Inference of Social Media Data

IPython, Spark, Mesos, Python

Launch Github

This challenge will employ various geospatial inference methods to determine the location of Twitter users. Lab41 will create and evaluate novel network based and content based approaches of inferring where users are based when posting a Tweet.

Circulo

Complete

Community Detection

Python, igraph, SNAP, sklearn

Launch Github

Circulo is a Python framework to evaluate community detection algorithms. The framework calculates a variety of quantitative metrics on each resulting community. This data can be used to draw conclusions about algorithm performance and efficacy.

See how we did it

Skyline

Complete

Streaming Updates to Graph Databases

Python, TitanDB

Launch Github

Lab41 conducted a market survey to assess the feature sets of existing open source graph databases and graph analytics platforms. We wanted to determine which would be most suitable for processing streaming updates to a large collection of graphs and triggering notifications when those updates cause certain conditions to be met or cease to be met.

See how we did it

ipython-spark-docker

Complete

Big Data

Spark, Docker, IPython

Launch Github

This challenge explored how to deploy an Apache Spark cluster driven by IPython notebooks, running Docker containers for each component. By using IPython as the interface, we were able to leverage a variety of data processing, machine learning, and visualization tasks using several data analysis tools and libraries.

See how we did it

Rio

Complete

Data Visualization

Gephi, Tinkerpop2

Launch Github

Rio enabled visualization of large-scale and streaming graphs. We employed Blueprints, an abstract specification for graphs, and Gephi, a prominent graph visualization package to enable cross-interface interactions. By connecting the two, end users could use Gephi on Blueprints-enabled datastores such as the Titan Distributed Graph Database.

See how we did it

Dendrite

Complete

Graph Analytics

Titan Graph Database; GraphLab, JUNG Java Framework, Faunus Graph Engine, ElasticSearch, Rexster Graph Server, SpringMVC, AngularJS, Hadoop, HBase, BerkeleyDB

Launch Github

Dendrite illustrated how to use graph storage and analytics within a shared environment. Lab41 borrowed inspiration from distributed version control systems, such as Git, to provide a user interface for project management and collaboration around graph analytics.

See how we did it

Redwood

Complete

File Anomaly Detection

Python, sklearn

Launch Github

Finding files of interest from large data collections is difficult for forensic analysts given the time and resources required. Redwood identified a subset class of files from a larger collection by evaluating how strongly any given file is associated with that known class.

Interested in participating?

Join us on any of our In Progress projects - or our next challenge!

Work with us

Have any interesting techniques or challenges we should consider?

We’d love to hear from you.

Talk with us

Notice: Undefined index: lab41_cta_background_image in /nas/content/live/lab41dev22/wp-content/themes/wp20150917/lib/helpers.php on line 8

Next: Process

Get more insight into how we work, and where you fit in.

Let's Go