The concept of digital twins has attracted significant attention across various domains, particularly within the built environment. However, there is a sheer volume of definitions and the terminological consensus remains out of reach. The lack of a universally accepted definition leads to ambiguities in their conceptualization and implementation, and may cause miscommunication for both researchers and practitioners. We employed Natural Language Processing (NLP) techniques to systematically extract and analyze definitions of digital twins from a corpus of 15,000 full-text articles spanning diverse disciplines in the built environment. The study compares these findings with insights from an expert survey that included 52 experts. The study identifies concurrence on the components that comprise a ‘Digital Twin’ from a practical perspective across various domains, contrasting them with those that do not, to identify deviations. We investigate the evolution of digital twin definitions over time and across different scales, including manufacturing, building, and urban/geospatial perspectives. We extracted the main components of Digital Twins using Text Frequency Analysis and N-gram analysis. Subsequently, we identified components that appeared in the literature and conducted a Chi-square test to assess the significance of each component in different domains. Our findings indicate that definitions differ based on the field of research in which they are conceived, but with many similarities across domains. One significant generalizable differentiation is related to whether a digital twin was used for High-Performance Real-Time (HPRT) or Long-Term Decision Support (LTDS) applications. We synthesized and contrasted the most representative definitions in each domain, culminating in a novel, data-driven definition specifically tailored for each context.