Set The Strategy With Machine Intelligence, OSINT, and Alternative Data
By combining machine intelligence, OSINT, and alternative data sources, strategy teams can make sure they get the full picture and create better strategies in the process.
In the coming decade, distinguishing between intelligence, misinformation, and noise will be a major challenge for individuals and organizations. In the past, the physical world had a greater impact on the digital information space. However, today, digital narratives are shaping and influencing the physical world. Narratives on both sides of the US political spectrum have led to large-scale political movements that counter the data and or evidence. At the same time, Reddit users were able to increase GameStop’s share price, which disrupted the traditional financial powers’ short position. Politics and Wall Street aside, corporations have paid little attention to information and corporate intelligence as strategic assets to be managed and exploited, inhibiting organizations from navigating complex environments.
The traditional view is that more information is better. Intelligence was previously expensive and difficult to obtain. But today, an abundance of self-curated news streams that confirm a person’s existing biases can have consequences for any firm - they distort an organization’s view of reality by provoking a response that lacks objectivity and or context and leads to poor and flawed decisions.
World-class Decisions Require World-class Intelligence
Over the last two years, AI’s role in misinformation has been acknowledged widely. However, adding more humans does not solve the problem—unlike machines, our judgment is noisy and inconsistent, and our brain’s ability to process information is finite. As a result, the first item on my agenda while working at the European Parliament, European Commission, Walmart, Wells Fargo, and HSBC was to build machine intelligence, alternative data, and OSINT capacities. The idea is that world-class decisions and strategy require world-class corporate and political intelligence.
Open-source intelligence (OSINT) is any structured or unstructured data in the public domain: news articles, social media, TikTok, Telegram, economics central bank metrics, Glass Door reviews, 10ks & earnings statements, financial markets, and Google search trends (a perfect example of a corporate open sourcing data). In short, anything freely available on the internet. Alternative data are harmonized and enriched proprietary data sets, often to hedge funds or financial firms. However, there are just as many applications within the corporate world. Machine intelligence, my preferred term over AI, refers to technologies or products that use a blend of natural language processing, statistical machine learning, and visualization to analyze structured and unstructured data to amplify an analyst or data scientist’s capabilities.
By combining OSINT, alternative, and internal data with advanced machine learning, organizations leverage a technique I developed in graduate school called "domain analysis." This technique employs machine intelligence to extract patterns from millions of structured and unstructured data points at speeds thousands of times faster and with more context than conventional desk research or business intelligence dashboards. As a result, data science teams or analysts can produce analytical, qualitative, and quantitative insights, 360-degree analyses, and maps of a topic or domain before investing massive amounts of resources or establishing strategic direction. This approach is particularly useful for executives and boards that must comprehensively understand their industry, an investment opportunity, a trend, or a market in hours, a few days, or weeks instead of months or years.
Domain Analysis Process
Identify terms and datasets associated with a domain, trend, or topic. This could be a global issue, such as COVID, a trend like working from home; a regulatory issue, such as the EU’s Digital Single Market; a currency like the Euro, a topic, such as misinformation or Generative AI, a company, such as Goldman Sachs, or a stock such as Tesla.
Crawl OSINT and alternative datasets to retrieve all associated documents, news articles, and social media mentions focused on the domain. Also, gather relevant structured data to contrast with qualitative trends from news and social media. I’d suggest starting with the mentioned key indicators and Google search trends. Additionally, most media monitoring platforms, such as Brandwatch, which I often use, will have news and social data sets built into their platforms, so there is no need to build a custom crawl unless there is a specific reason.
Apply multiple supervised and unsupervised machine learning algorithms, named entity recognition and NLP, to find patterns within each domain surface. Some key techniques here could be network or topological data analysis, clustering, or correlation between topics. More examples can be found here.
Identify key themes, entities, and topics that influence the domain via the algorithms.
Understand how the key insights and trends align with current goals, the organization’s core assets, business products, talent level, and business structure. Then, design a strategy based on those indicators, models, and insights.
The technique above can be applied to most business functions, ranging from PE, risk management, M&A, VC diligence deals, product development, procurement and supply chain management, FP&A, geopolitical and security analysis, HR, communications and marketing, public and government affairs, and in nearly any industry context.
Benefits
See all the data on a given topic or issue structured in the context of the information available, not just a few news articles or data points.
Personal and biased information feedback loops are filtered out initially since data comes from all the sources that generate signals within the domain - not personalized search engines. Machines can also be used to detect misinformation and suspect sources.
“Soft” trends, such as political, regulatory, or social issues, can be measured analytically rather than “gut feelings.” Then, these data points can be compared with structured datasets (e.g., stock prices, sales, profits, etc.) to present a holistic picture of that domain. This is particularly valuable in politically and emotionally charged environments where a business needs to correlate how these trends affect the bottom line.
Stress-test strategic hypotheses in minutes or hours, not days or weeks, before investing a large amount of time, assets, and resources.
The Information Game
As information increases exponentially, it makes quality signals and true insights more difficult to find but much more valuable at the same time. While the military has used OSINT and alternative data for more than a decade, and academics are beginning to take advantage of it at scale (particularly in computational social science), as well as hedge funds for market analysis, corporations, NGOs, and governments use it rarely in any strategic capacity.
Information Growth 2010-2020
In these organizations, strategies are built on static and often financially focused uncontextualized data, and their team members perform Google searches to find news articles and research papers, or they hire external consultants. Such labor-intensive approaches are not merely inefficient and time-consuming but are prone to individual blind spots and biases. While Google is fantastic at web crawling and delivering specific information on demand, this strength becomes a serious flaw when applied to organizational decision-making. A person’s bias affects each search, as does the choice of each link s/he clicks. The search results and articles do not contextualize or provide broader narratives or insights compared to millions of other documents not read. In short, the researcher obtains only a tiny snapshot of the information available. Like Donald Rumsfeld’s famous quote that referenced the Johari Window, it provides only an incomplete picture of the whole and leaves an organization blinded to peripheral risks and/or opportunities.
Deeper Knowledge Can Also Lead to Greater Bias
In-depth, high-quality professional research by trained experts offers advantages over crude generic Google Searches but suffers from many of the same issues and several more. There will inevitably be bias in how an analyst selects and connects information. Such bias is even more difficult to detect in technical and cognitively demanding documents and information streams. Further, it is a slow process, and there is a high probability that the insights will not yield any advantages in overcoming the increasingly high barrier to entry to access the actual documents. Deep expertise, particularly in technical domains, is always needed and is available via ad-hoc services such as GLG, AlphaSights, and Coleman Research. However, these experts still consume information in the same siloed way—through Google, research papers, and other experts. Thus, while in-depth knowledge will enhance a project or strategy, blind spots and inconsistent weighting of information points persist (albeit at a more granular level). Moreover, forget about memorializing and scaling expert knowledge beyond the walls of board rooms or CXOs. Another issue: Vertical or industry expertise is less relevant in ever-shifting consumer attitudes and globally connected data-rich environments. A case in point—I am confident that Tesla could build a Ford. However, I doubt Ford could build a Tesla Model S. Amazon can make a great movie, but Paramount Pictures could never create Amazon’s machine-learning platform. As modern business becomes more multi-dimensional, ambiguous, and volatile, it requires arrays of interdisciplinary data-driven insights, not domain expertise alone. Further, if the information and insights used to create a strategy are siloed and personalized to reinforce the bias of those making it, how effective can it be? The difference in outcomes will be as drastic as those of a captain who mastered reading a compass and map when trying to reach the new world compared to those who only sailed where there was a familiar shoreline.
OSINT + Machines V MBB
While domain analysis is new to most, I’ve applied these techniques and data sources in hundreds of use cases over the past decade. In simple and complex projects, it offered formerly impossible foresight, context, and predictions. One example is the graph below that shows McKinsey, Bain, and BCG’s publication rate on pandemics and epidemics as a global risk. In short, before COVID, leading consulting firms that had written hundreds of articles about business risk did not consider pandemics.
Nonetheless, by clustering OSINT (news articles) from 2017 to 2019, before COVID-19, that contained the term “global risks” with natural language processing and network analysis, pandemics emerged as a major risk in less than 30 minutes of research. In short, the “risk” was in front of us, but those firms failed to even write about global “Global Risk” with traditional research methods or strategy experts. This example is not intended as a slight against the firms mentioned. Instead, it is intended to point out that if what are considered the best minds in business strategy could not surface, pandemics are a prominent risk—something machine intelligence and OSINT did in a minute, it’s unlikely any organization without AI and alternative data could. While the insight would not have prevented COVID, governments and businesses would have had more lead time to develop better contingency and scenario plans.
How do you read a network?
The percentages associated with each tag show the market share of the topic within the network: banking (31%), oil (14%), and politics (13%) command the most.
Each node (circle) in a network represents one unique document.
Nodes in the network with similar language are associated with a link and are closer to each other.
Nodes that lack connections have a unique language that does not significantly match any other documents, which can be irrelevant or novel, so it is important to explore them.
Nodes that bridge different clusters across portions of the network are often insightful nodes to explore because they share similarities.
Real-World Examples From The Largest Organizations in the World.
At HSBC, natural language processing, OSINT, and alternative data, such as Google search trends and internal dashboards, were used to summarize millions of information points related to the pandemic. These insights provided daily and weekly insight briefs across multiple business functions to keep people informed. Because the machines identified core themes, key entities, and surface COVID-19 fraud risks and filtered for potential misinformation across millions of articles, analysts at the bank had time to assess nuanced trends on the periphery of COVID-19 that were of higher value.
Machine Summarized OSINT COVID-19 Event Map
Machine Summarizations of OSINT that Detects Potential COVID Misinformation and Fraud
The same technologies were also used to find niche-latent consumer signals. For example, in 2018, the team identified an increase in birth tourism among Asians in Canada of over 50%. And despite HSBC’s significant experience in the Asian market, no one knew this cohort. Two years later, birth tourism was up 119% (see chart below). Intelligence like the former could be applied to develop marketing, service offerings, or product capabilities that cater to this client segment.
Insights Extracted and Structured from OSINT by Machines’ Intelligence on “Birth Tourism”
Revealing Hidden Markets and Product Intelligence
One of my favorite examples comes from my time in Dubai in 2019. Using OSINT and network analysis in combination with economic data revealed that 1) UAE’s open immigration policy was central in driving an influx of immigration on the part of Indians and Pakistanis, which 2) was driving housing demand in turn, and 3) Currency volatility in India and Pakistan—a key determinant of immigration, FDI, and the open immigration policy itself—has created a window for developers to accept Bitcoin for real estate transactions.
Dubai Bitcoin, Migration, and Real-Estate Domain Network
These insights were generated in less than 90 minutes during a morning training session and presented that afternoon. The question is, what could they be used for? With little investment and marketing campaigns, the bank could target Dubai’s ex-pat and real-estate communities with Google ads. The information could also be used to create new products, such as crypto-based mortgages that integrate flexible tools for cross-border multi-currency transactions. This offering would appeal to those migrating into the Dubai region interested in buying real estate. Overall, the weak signals identified created an 18-month lead time to take advantage of those emerging trends. Since the insights surfaced, Dubai’s investment in Bitcoin and Crypto has only continued to grow, not to mention the massive BitCoin price increases at the time of this post’s publishing. None of this intelligence would have been possible using traditional business intelligence dashboards such as Tableau or PowerBI or financial metrics alone.
Walmart Command Center Used for WMT-X Data Exploration & Insights Sessions
Moving to retail, at Walmart, we synthesized OSINT and alternative data with internal data to gain insights into multiple business vertices ranging from corporate strategy, public affairs, regulatory issues, HR, consumer trends, competitive analysis, investor sentiment, and products. Using the Walmart command center (above) designed specifically for OSINT and alternative data, we would hold “WMT-X sessions” focused on putting an interdisciplinary group of executives, strategists, and data scientists in the same room. Then, attempt to solve some of Walmart’s most pressing and abstract strategic problems with the help of machine intelligence and alternative data. Often, we could address business questions on the spot in an engaging manner because the technologies are interactive compared to static dashboards. Because machine intelligence analyzed all the data available on that topic, we could consistently weigh and contextualize millions of information points in ways no human could. Frequently, trends or issues that appeared to be big were, in fact, smaller and less influential than assumed initially, while latent signals ended up being more central to a domain.
One of the insights revealed in 2015 was that Walmart’s stock price is associated highly with its dividend narrative and that strategic investments, such as wages, technology, and acquisition narratives, had no positive relation to the share price and were deducted from WMT’s market capitalization. Knowing how these dynamics are related to investor sentiment, investments could be used to create communication strategies that are more empathetic to market concerns or trends. It’s also worth noting that a stock trader could use the same technique to build more accurate forecasts or valuations of the company. In one other instance, augmented OSINT data surfaced that revealed the way certain products, such as large video game console releases or special events like Force Friday, were more likely to attract consumers to wait in line and purchase in-store, wearing costumes and interacting with other fans, in preference to the sterile transaction of buying online.
These methods and technologies are in use today and are evolving continuously. Wherever OSINT and machine intelligence are applied, they lead to faster, more targeted, and contextual decision-making by incorporating multiple datasets rather than relying on experience and previous knowledge alone. Moreover, perhaps the best aspect is to build all the above capacities and examples from zero in under one year.
Dollars & Cents
OSINT and machine intelligence ecosystems allowed these organizations to analyze hundreds of thousands of articles and extract novel insights. To a government or large corporation, this intelligence can be worth hundreds of millions of dollars if acted upon. At HSBC, we had more than 200 use cases throughout the enterprise (with many more undocumented), and over 200 analysts, data scientists, and non-technical business managers were accessing our OSINT platforms. In addition to making cutting-edge machine intelligence accessible to hundreds of people, technical and non-technical alike, it was one of the cheapest AI programs in the ban. It was also far cheaper than hiring a traditional consultancy for a couple of projects, in addition to a team of data scientists, program managers, and the infrastructure to accompany it, which can cost tens to hundreds of millions of dollars at a large corporation. However, let’s dive into the numbers more specifically. Corporations and banks have thousands of analysts and data science professionals. According to LinkedIn, HSBC, Goldman, and JP Morgan employ approximately 118,000. Most professionals are still using the traditional research methods discussed earlier. If we place each analyst’s average cost at $150,000, the total amount spent on this intellectual capital is $17.7 billion. At a global bank, an analyst is likely using machine intelligence systems to process a million documents a day, which would take a human 3,504 days of reading, or 9.6 years, which, in monetary terms, is equal to $345.6m of intellectual value per analyst per year. Having used these technologies firsthand, I would be very confident that I could increase efficiency by 25% across the board in one year if a company invested in machine intelligence systems aggressively. This amounts to a savings of $40,000 per analyst, and more than $1 billion for any of the banks mentioned, for a fraction of the investment needed for a program to be started.
Getting Started
Learn by doing
The best thing about OSINT and machine intelligence is that you can get started immediately and basically for free. There is no need to wait for IT or data science teams to build you a model or obtain data for you. It is in the public domain already, so there are no GDPR or regulation issues to consider, and many of the tools are accessible to both non-technical and technical professionals. With a bit of creativity and domain knowledge, you can obtain insights immediately. I begin every project with Google Search Trends to contrast the search volume of key topics and competition, key themes, and macro trends. Google Search data have been shown to be predictive in such areas as politics, retail, and finance. For example, one of the strategic priorities for financial firms in the past several years has been sustainable finance. Combined with Facebook’s Prophet, an easy-to-use, open-sourced time-series forecasting algorithm, it is possible to predict sustainable finance trends. The time-series graph below shows the way sustainable finance search trends are forecasted, in which the use of machine learning is predicted to rise from 60 in October 2019 to 89 in April 2021, an increase of 49%. As illustrated in the following graphs, the Search trends were a good indicator of the growth in green investments mirrored by the NASDAQ OMX Green Economy Index.
Google Trends Search Volume Forecast - “Sustainable Finance”
NASDAQ OMX Green Economy (QGREEN) Index
When developing your key terms, start with what you know—key topics—and then use the trends and metadata associated with the search terms, such as related key terms or top locations of interest, to refine your query to represent the issue space. It is vital to think about which key terms represent the concept the best. Thus, the expectation should be to spend a minute or two testing different combinations of search terms. In essence, search data allow firms to quantify themes not found in financial and market data. By combining both, it reaches a level of context in valuation or predictions that would be impossible using traditional techniques and datasets. Among open-source and alternative data, Wolfram Alpha is outstanding for data such as GDP and population growth and automated forecasts on many ETFs. In addition, its computational knowledge engine can answer factual queries directly by computing the answer, rather than providing a list of documents or web pages that may contain the solution as a search engine might. Other excellent OSINT data sources are the financial alternative data pioneers Quandl, which I have been using for more than six years, and Google data search.
2) Cultivate A Culture of Intellectual Honesty and Curiosity
It is easy to score points by telling executives what they want to hear, but this will not stop firms from being disrupted by external trends or competition with little empathy for the way a corporation’s executives feel about a given strategy or product. Just ask Nokia, Ford, Yahoo, the taxi lobby, or those in the EU who didn’t take UKIP seriously despite signals that new trends were on the horizon. Frequently, a good insight, particularly from a new signal source like OSINT or alternative data, will not conform to conventional wisdom or the chosen strategy, which is why there is an upside. Regrettably, firms usually prefer to ignore the insight or shoot the messenger rather than use intelligence to create new value for their customers or take advantage of a market asymmetry. Building a culture of intellectual honesty and rewarding divergent data-driven thinking needs to be instilled.
Guidelines
Anecdotal evidence should bear the burden of proof, not the intelligence and data revealed. Often, it is the converse despite research that definitively shows that humans are outperformed by even simple algorithms and that humans are biased against algorithms that outperform them.
Leadership needs to underscore that they can tolerate ambiguity and iterations while guided by insights found along the way. No one can predict the future. Often those who are overly certain are not looking at the problem hard enough.
Decision-makers need to practice deferring judgment on the machine-derived insights that surface, not revert to legacy heuristics immediately if they do not agree with or understand the machine-created outputs. Managers and business leaders cannot expect everything to make complete sense right away. The world is complex, interconnected, and uncertain. Hence, it requires thoughtful exploration anchored by machines and data before crucial decisions are made and strategies implemented.
3) Structure the program inputs and outcomes
Once people and business teams see how quickly insights can surface, it is tempting for managers and business leaders to want to analyze everything without considering the way to use it or what they are going to do with the information. This diverts analysts from higher-value workstreams as well as induces “analysis paralysis”. To avoid this pitfall, a structured intake process needs to be applied to each project or intelligence query at the beginning. The best way to do this is to develop a survey that tracks different use cases across your organization. Here is an example of one I created using Typeform.
Every person who asks for insights should be able to answer these questions:
What decision are you going to make with this information?
What assets do you have available to act on this information e.g., ad spend, press releases, money to invest in specific technologies, HR hiring moves, product creation?
Who/what unit is asking?
Why?
Anchors the focus of a project.
Positions intelligence as a strategic asset to be acted on, not something nice to know.
The data trail that is created can be used downstream to improve offers and management of the program and serve internal clientele better, as well as uncover internal trends.
It is early days for most organizations, but in general, political polarization and COVID have shown us that humans cannot extract objective signals from the noise and overcome their own biases. It is no different in business. The only way to address this is to manage the information deluge with machine intelligence and a more diverse palette of information signals that AI enables. Making investments in this space will not only yield insights impossible with any other approach. Still, it will also teach business professionals and organizations how to think with machines, which will be the most important skill for people to learn for the foreseeable future.