Data Scientist Intern in McLean, VA

SphereOI is a high performance studio with exceptional data scientists, engineers, and product designers. We value digital innovation that challenges the status quo by making products that are meaningful for government and commercial customers. There’s no checklist mentality for product development at SphereOI. Our innovation follows a North Star vision and design with Strong Centers that keeps each product authentic to what is most meaningful to the customer. By centering our effort on what is most meaningful, we deliver transformative innovation.

SphereOI is seeking a Data Scientist Intern to work in our analytics studio in McLean, VA, where our team develops and operationalizes advanced analytic solutions for government and commercial customers. In this role, you will be a contributor on an established team tasked with developing analytic models and algorithm solution pipelines using R, Python, or Java.

What You Will Be Doing

  • Perform data reduction and normalization
  • Participate in a weekly “Demo Thursday” where the team demonstrates work-in-progress
  • Develop, enhance, test and evaluate algorithms for data processing and exploitation
  • Develop and conduct experiments for performance validation

What You Need for this Position

  • Nearing completion of a degree in Math, Statistics, Data Science, Physics or related technical field
  • Hands on experience developing statistical, mathematical, and predictive models using an analysis platform such as R or Python
  • Knowledge of SQL and database concepts is a plus
  • Knowledge of at least one programming language such as Java or C# is a plus

Sphere of Influence Expands Data Analytics Studio

Sphere of Influence – a leader in value add data science for high-volume, high-velocity and high-variety information assets – today announced continued investment in its McLean, VA operations where it has doubled its data science team over the past year.  The company – which recently expanded operations into Denver, CO – is also growing its digital solutions team.

The expansion of the Sphere of Influence data science studio coincides with the ramping-up of the company’s latest offering – analytics that predict customer experience for software systems.

“Our team of data strategists, data scientists and software developers has been creating exciting innovations that will make a real difference for businesses in competitive markets,” said Sphere of Influence Director of Accounts, Scott Pringle.  “Sphere of Influence has taken the steps to bring new data science solutions to our customers and expanded our science team to position the company for exciting new growth opportunities in 2016.”

About Sphere of Influence, Inc.

Sphere of Influence fuses advanced data science with digital solutions to deliver transformative products.  The company specializes in advanced data analytics for high-volume, high-velocity and high-variety information assets from a wide range of sensors in precision agriculture, automotive, and Internet of Things (IoT) telematics.  The company utilizes a broad and continuously growing integrated infrastructure of proprietary data science platforms, algorithms, and machine learning systems.  For additional information, please visit Sphere of Influence’s corporate website at:

View live release here.



Data Science to Stop Terrorist Counterfeiters

The U.S. Government has awarded Sphere of Influence, Inc. a contract to develop new technology that helps the U.S. Government understand more about terrorist networks that create forged identity documents.

Sphere of Influence, Inc., a McLean, Virginia based developer of advanced data analytics technologies, announced it has been awarded a contract by the U.S. Department of Defense (DoD) to build a data science platform that enables the U.S. Government to understand more about terrorist networks and forged identity documents they produce. The contract has an estimated value of $700k for one year. Under the terms of the contract, Sphere of Influence, Inc., will deliver technologies that apply advanced data science, computer vision, and machine learning algorithms.

With this contract the US Government will not only learn more about the networks that create counterfeit identity documents, but also how they use them.

About Sphere of Influence, Inc.

Sphere of Influence, Inc. provides technologies for advanced data analytics and interactive digital solutions. The company was formed in 2000 and is headquartered in McLean, VA.

View live release here.

If computers can beat Jeopardy! champions, why can’t they detect the insider threat?

The world was awed two years ago when IBM’s Watson defeated Jeopardy! champions Brad Rutter and Ken Jennings. Watson’s brilliant victory reintroduced the potential of machine learning to the public. Ideas flowed, and now this technology is being applied practically in the fields of healthcare, finance and education. Emulating human learning, Watson’s success lies in its ability to formulate hypotheses using models built from training questions and texts.


Three years ago, Army Private First Class Bradley Manning leaked massive amounts of classified information to WikiLeaks and brought to public awareness the significance of data breaches. In response to this and several other highly publicized data breaches, government committees and task forces established recommendations and policies, and invested heavily in cyber technologies to prevent such an event from reoccurring. Surely, we thought, if anyone had the motivation and resources to get a handle on the insider threat problem, it is the government. But, Edward Snowden, who caused the recent NSA breach, has made it painfully obvious how impotent the response was.


Lest we assume this is a just government problem, enormous evidence abounds showing how vulnerable commercial industry is to the insider. We are inundated with a flood of articles describing how malicious insiders have cost private enterprise billions of dollars in lost revenue, so why has no one offered a plausible solution?


The insider threat remains an unmitigated problem for most organizations, not because the technologies do not exist, but rather because the cyber defense industry is still attempting to discover the threat using a rules-based paradigm. Virtually all cyber defense solutions in the market today apply explicit rules, whether they are antivirus programs, firewalls with access control lists, deep packet inspectors, or protocol analyzers. This paradigm is very effective in defending against known malware and network exploits, but fails utterly when confronted with new attacks (i.e. “zero-days”) or the surreptitious insider.


In contrast, acknowledging that it was impossible to build a winning system that relied on enumerating all possible questions, IBM designed Watson to generalize and learn patterns from previous questions and use these models to hypothesize answers to novel questions. The hypothesis with the highest confidence was selected as the answer.


Like Watson, an effective technology to detecting the insider must adaptively learn historical network patterns and then use those patterns to automatically discover anomalous activity. Such anomalous traffic is symptomatic of unauthorized data collection and exfiltration.


Inspired by the WikiLeaks incident, Sphere’s R&D team has investigated machine learning algorithms that construct historical models by grouping users by their network fingerprints. As an example, without any rules or specifications, the algorithms learn that bookkeeping applications transmit a distinctive pattern that enables grouping accountants together, and HR professionals are grouped by the recruiting sites they visit. These behavioral models generalize normal activity and can be used as templates to detect outliers. While users commonly generate some outliers, suspicious users deviate significantly from their cohorts, such as the network administrator that accesses the HR department’s personnel records. Like Watson, the models allow the system to form hypotheses.


Applied to cyber security, every time an entity accesses the network, the algorithms hypothesize if the activity conforms to its model. If it does not conform, that activity is labeled an outlier. Because these methods use a statistical confidence that dynamically balances internal thresholds on network activities (e.g., sources and destinations, direction and amount of data transferred, times, protocols, etc.), it becomes extremely hard for a malicious insider to outsmart. Simply the fact that the system does not reveal its thresholds can have a significant deterrent effect.


A paradigm shift in cyber technologies is happening now. Cyber security professionals agree that preventing data breaches from a malicious insider is a difficult task, and the past suggests that next major breach will not be detected with existing rules-driven cyber defense solutions. Next generation cyber security technology developers must seek inspiration from IBM’s Watson and other successful implementations of machine learning before we can hope to prevail against the insider threat.