Amendments from Version 1

Digital Twin

2752-5783

F1000 Research Limited

London, UK

10.12688/digitaltwin.17467.2

Method Article

Articles

Digital twin data: methods and key technologies

[version 2; peer review: 4 approved]

Zhang

Meng

Investigation Methodology Visualization Writing – Original Draft Preparation https://orcid.org/0000-0002-2253-4025 1 Tao

Fei

Conceptualization Funding Acquisition Methodology Project Administration Supervision Writing – Review & Editing https://orcid.org/0000-0002-9020-0633 a 2 Huang

Biqing

Methodology Supervision Writing – Review & Editing 1 Liu

Ang

Methodology Writing – Review & Editing https://orcid.org/0000-0001-9353-0948 3 Wang

Lihui

Methodology Visualization Writing – Review & Editing https://orcid.org/0000-0001-8679-8049 4 Anwer

Nabil

Methodology Writing – Review & Editing 5 Nee

A. Y. C.

Methodology Writing – Review & Editing 6 1Department of Automation, Tsinghua University, Beijing, 100084, China 2School of Automation Science and Electrical Engineering, Beihang University, Beijing, 100191, China 3School of Mechanical and Manufacturing Engineering, University of New South Wales, Sydney, NSW, 2052, Australia 4Department of Production Engineering, KTH Royal Institute of Technology, Stockholm, SE-10044, Sweden 5Automated Production Research Laboratory, Paris-Saclay University, ENS Paris-Saclay, LURPA, 91190, Gif-sur-Yvette, France 6Department of Mechanical Engineering, National University of Singapore, Singapore, 117576, Singapore

a ftao@buaa.edu.cn

No competing interests were disclosed.

8 2 2022

2021

28 1 2022

2022

This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

As a promising technology to converge the traditional industry with the digital economy, digital twin (DT) is being investigated by researchers and practitioners across many different fields. The importance of data to DT cannot be overstated. Data plays critical roles in constructing virtual models, building cyber-physical connections, and executing intelligent operations. The unique characteristics of DT put forward a set of new requirements on data. Against this background, this paper discusses the emerging requirements on DT-related data with respect to data gathering, interaction, universality, mining, fusion, iterative optimization, and on-demand usage. A new notion, namely digital twin data (DTD), is introduced. This paper explores some basic principles and methods for DTD gathering, interaction, storage, association, fusion, evolution and servitization, as well as the key enabling technologies. Based on the theoretical underpinning provided in this paper, it is expected that more DT researchers and practitioners can incorporate DTD into their DT development process.

digital twin (DT) digital twin data (DTD) principles methods key technologies

Beijing Municipal Natural Science Foundation

JQ19011

National Key Research and Development Program of China

2020YFB1708400

This work is supported by the National Key Research and Development Program of China (2020YFB1708400) and Beijing Municipal Natural Science Foundation (JQ19011). The grants were assigned to Prof. Fei Tao.

Revised Amendments from Version 1

In the revised version, more technical details are added in the section “Key enabling technologies for digital twin data” to better support the DTD applications. Then more explanations for some concepts (e.g. digital twin data, connection data, fusion data) are provided by adding related examples or definitions. DTD challenges are discussed in the last section. Figure 2, Figure 3 and Figure 4 are further refined in quality to better match the text in the corresponding sections. Besides, some well-cited references are added to supplement the work.

From data to digital twin data

With the rapid development of new information technologies (e.g., Internet of Things, cloud computing, big data, and artificial intelligence), the digital economy continues to flourish on a global scale. As an innovative way to converge the traditional industry and digital economy, digital twin (DT) is attracting continued attention from different fields ¹, such as aerospace ^{2,
3}, automotive ^{4,
5}, shipping ⁶, smart grid ^{7,
8}, and smart city ⁹. Especially in the field of smart manufacturing, DT has been widely applied in control and management for shop-floor ^{10,
11}, rapid configuration of production line ¹², product lifecycle management ¹³, intelligent logistics ¹⁴, dynamic scheduling ^{15,
16}, robot operation optimization ¹⁷, product quality assurance ^{18,
19}, machine tool maintenance ²⁰, and human-robot interaction ^{21,
22}. Practical applications of digital twin involve a set of key enabling technologies ²³, concerning the construction and verification of virtual models, the construction and management of intelligent services, the real-time sensing and control of physical entities, cyber-physical interaction and integration, multi-modal data association and fusion, and so forth. Since data is indispensable in empowering all these technologies, it can be argued that the success of DT lies in the availability of high-quality data source.

Early-stage data acquisition is typically conducted by manual means, which suffers from low efficiency and high cost. Since the collected data is less in quantity and poor in real-time performance, they could only reflect features of a physical entity (e.g., machine tool and process) for a limited period of time with inevitable delays. Benefiting from the emergence of new information technologies in recent years, a huge volume of data can be collected by sensors, IoT devices, mobile devices, and wearable devices in real time and processed through an integrated computing infrastructure (e.g., cloud computing, edge computing, and fog computing). As a result, it is possible to obtain a complete data record and conduct timely analysis throughout the whole lifecycle of a physical entity. On the other hand, the growth of DT-related research and application puts forward new requirements on data.

(1) Requirements on comprehensive data gathering

Comprehensive data is required to enhance the accuracy, efficiency, and adaptability of DT-based services (e.g., performance prediction, process optimization, and quality assurance). Comprehensive data refers to a wide spectrum of data including, for example, data on both normal state and abnormal state, data on both common event and rare event, data on both certain scenario and uncertain scenario, and so forth. DT applications developed upon incomprehensive data suffer from various challenges. On the one hand, some existing work focuses mainly on data gathered from the physical world, for which, it is difficult to include low-probability data (e.g., failure data and extreme environment data) and data that could not be measured directly (e.g., deformation data, stress distribution, and airflow distribution) ²⁴. On the other hand, some other work focuses primarily on data gathered from virtual models, for which, it is difficult to accurately simulate disturbance data caused by abrupt disturbances and time-varying data with high uncertainties ²⁵. A successful DT solution should be powered by comprehensive data obtained from both the physical and virtual worlds.

(2) Requirements on real-time data interaction

Real-time data interaction is required to enable coordinated operations. Firstly, real-time data from the physical entity can be used to dynamically update parameters of the virtual models, whereas simulation data from the virtual models is fed back to the physical entity in order to align its behaviors with a simulation plan. Secondly, data from DT-based services should be communicated with the physical entity for timely diagnosis, maintenance, and control, whereas real-time data from the physical entity can be used to update the services and make them more adaptable to the changing physical realities. Thirdly, since service availability should be verified before execution, real-time simulation results by the virtual models should be communicated with service providers to illustrate deficiencies, whereas service data can be used to calibrate the virtual models and improve their accuracy.

(3) Requirements on data universality

Low universality of data is a main obstacle for DT applications. It is difficult to transfer DT across different application scenarios, which enclose different requirements and constraints on data acquisition. The difficulty of data exchange and parsing is affected by various facets of a certain application scenario, such as different physical entities (e.g., robot, machine tool, and autonomous vehicle), data interfaces and communication protocols. Against different application scenarios (e.g., design, production, and maintenance), it is difficult to achieve smooth data integration and sharing due to various data formats. To cope with these difficulties, it is necessary to unify data transformation towards high universality of data.

(4) Requirements on knowledge mining

To build insightful virtual models that can reflect a physical entity’s internal mechanisms and rules, it is necessary to derive knowledge behind raw data through data mining. The in-depth mining of massive data (e.g., physical entity data, virtual model data, and information system data) towards new knowledge remains a challenge. On the one hand, not every data is equally useful for extracting information and knowledge, in particular, concerning the irrelevant data, abnormal data, and redundant data. On the other hand, it is difficult to fully mine the in-depth knowledge hidden behind data.

(5) Requirements on data fusion

Since DT-related data comes from multiple sources (e.g., physical entity, virtual model, and service), there exists data noise, inconsistency, and conflict. For the data collected from a physical entity, various factors such as sensor malfunction, environmental fluctuation, and human interference would affect information entropy (a higher value indicates higher data uncertainty). For the data simulated by virtual models, deviations from the physical reality due to unsatisfactory model effectiveness would reduce the data reliability. In addition, neither the collected data nor the simulated data is sufficient to derive global perspectives. Data fusion is therefore required, through which, data obtained from diversified sources is integrated synthetically. It would bring lots of benefits, such as reducing information entropy of the sensor data (i.e. reducing data uncertainty), reducing root mean squared error (RMSE) between the simulated data and measured data (i.e. improving data accuracy), and improving relevancy between the data and a certain target indicator, such as product quality, remaining useful life of a key component, and equipment healthy condition (i.e. increasing the amount of information related to the target indicator). By doing so, data can be verified, corrected, and supplemented by each other, hence leading to more accurate and consistent information extraction.

(6) Requirements on iterative optimization

In the paper, iterative optimization refers to a cyclic process of “data increase - data fusion - information increase”, through which, new data is fused with historical data to generate new information continuously. In the cyclic process, each optimization, triggered by the injection of new data, aims at adding appropriate data into the fusion to increase the amount of valuable information carried by the fusion data, while being subject to possible constraints on the data to be merged, such as keeping the data redundancy lower than an upper bound. The iterative optimization aims at realizing the continuous information increase.

(7) Requirements on convenient data usage

Since DT users play different roles and undertake different responsibilities, they tend to demand different types of data. For examples, field operators demand on-site operational data (e.g., assembly sequence, maintenance steps, and control order); technicians demand process data (e.g., real-time condition monitoring data, equipment performance data, and diagnosis data); senior managers demand market data (e.g., material cost, market dynamics, and product benchmarking) ²⁶. It is difficult to implement generic data operations (e.g., searching, matching, combination, invocation and visualization) in consideration of significant user discrepancies in terms of demand, professional skill, knowledge level, etc. To address this issue, it is necessary to encapsulate data provision as on-demand services.

Considering the above-mentioned requirements, the authors proposed a new notion, namely digital twin data (DTD), which constitutes an important part of the five-dimension DT, in the previous work ²⁶. Based on this, this paper aims to provide the theoretical underpinning of DTD with respect to its structure and processing. The remainder of this paper is organized as follows. Section 2, ‘Composition of digital twin data’, introduces the composition of DTD, followed by a set of DTD principles in Section 3, ‘Principles for digital twin data’. Section 4, ‘Methodology for digital twin data’, presents a structured methodology for DTD gathering, interaction, storage, association, fusion, evolution and servitization. Section 5, ‘Key enabling technologies for digital twin data’ elaborates a set of key technologies concerning seven aspects. The final section concludes this work and outlines future work.

Composition of digital twin data

DTD refers to a wide spectrum of data closely related to the DT. It focuses equally on data collected from the physical space (e.g. sensor data from equipment, materials, and workers) and data produced by the virtual space (e.g. data mimicked by simulation models, data generated by algorithms embedded in services, and data deduced based on knowledge), and tends to make the data from the two spaces corrected and supplemented by each other through data fusion, to achieve more accurate and comprehensive information for the DT-related applications. According to previous studies ²⁶, DTD can be classified into six parts, i.e., physical entity-related data, virtual model-related data, service-related data, fusion data, connection data, and domain knowledge. Figure 1 illustrates how these parts are interrelated towards a whole DTD.

Figure 1. Composition of digital twin data (DTD).

Physical entity refers to an object existing in the physical world with specific functions, behaviors, and structures ²⁶. According to ISO 23247-3 series (Digital Twin manufacturing framework -Part 3: Digital representation of manufacturing elements), physical entity-related data can be grouped in two categories: the first describing the static information concerning the physical entity (e.g. identification, characteristics, schedule) and the second reflecting the dynamic states (e.g. status, location, relationship). These data can help to represent the physical entity in a digital way.

Virtual models reproduce and describe a physical entity in the digital space with respect to different aspects, such as geometric property, physical parameter, dynamic behavior, operation and maintenance rules, and so forth ²⁶. Virtual model-related data include model parameters and simulation data (e.g., simulation conditions, simulation process data, and simulation results). Virtual model-related data is naturally coupled with the physical entity-related data, as virtual models are essentially built upon a physical entity.

DT services can be classified into application services and functional services ²⁷. Application services are provided, based on the physical entity-related data and virtual model-related data, to directly solve problems in a certain application scenario, such as equipment prognosis, resources scheduling, and product quality assurance. Therefore, the service-related data mainly includes performance data, scheduling data, and quality data. On the other hand, functional services are provided to support normal operations of DT by realizing relevant functions such as model management, data processing, data connection, etc. Related data therefore includes model configuration data, algorithm configuration data and service encapsulation data, etc.

Domain knowledge includes common knowledge such as expert experience, predefined rules, and industry standards ²⁶. Besides, domain knowledge also involves new knowledge being generated by data mining. Domain knowledge can be used to guide model construction, service optimization, data processing, etc.

Fusion data can be considered as a result of merging the physical entity-related data, virtual model-related data, service-related data and knowledge. Through data fusion, multi-source data with underlying relations are made complementary with each other, towards forming a unified description ²⁶. Compared with single-source data, fusion data can combine data from multiple perspectives, thus, including more abundant information. For example, the healthy condition of an equipment can be reflected by various sensor data such as the vibrations, speeds, forces and pressures, as well as the simulated data such as deformation, stress distribution, and airflow distribution. The fusion data is a merging result of the above data, which is obtained by processing the data from different sources via data fusion algorithms, such as the weighting method ²⁸, neural network ²⁹, and Bayesian method ³⁰. As a result, the fusion data would be more informative for the condition monitoring.

Connection data refers to data derived from and transferred among the physical entity-related data, virtual model-related data, service-related data, fusion data and domain knowledge ²⁶. Connection data makes it possible to compare data from different parts of DTD towards high consistency. For example, the real temperature collected by sensors and the expected temperature simulated by virtual models can be selected as the connection data between the physical entity-related data and virtual entity-related data. If their deviation exceeds a predefined threshold, perturbing factors (e.g. manual interferences, environment changes) causing the inconsistency would be found and eliminated, which helps to restore the data consistency.

Principles for digital twin data

In this section, a set of basic principles of gathering and processing DTD are proposed by the authors. They are developed in order to fulfil the requirements raised in Section 1 (‘From data to digital twin data’), i.e., comprehensive data gathering, real-time data interaction, data universality, knowledge mining, data fusion, iterative optimization, and convenient data usage. As in Figure 2, the outer layer shows the data requirements in seven aspects as raised in Section 1. While, the corresponding principles, which should be observed for the sake of better fulfilling the requirements, are given one by one in the inner layer.

Figure 2. Principles for digital twin data (DTD). (1) Complementary principle

The complementary principle of DTD corresponds to the requirement of comprehensive data gathering. It emphasizes the simultaneous gathering of data from both the physical and virtual worlds, which supplement each other and make up respective deficiencies. On one hand, the physical entity-related data can truly reflect the dynamic changes of physical reality, including uncertainty, fuzziness, and complexity, which are difficult to simulate. On the other hand, the virtual models can generate a great variety of rare event data, unmeasurable data, and multi-physics coupling data through simulation at low cost, which cannot be directly collected from the physical world.

(2) Timeliness principle

The timeliness principle emphasizes that the connection data between any two parts of DTD should be transferred and adjusted in a timely manner. It aims at ensuring the rapid twinning process, namely the process of synchronizing the virtual and physical states, thus to enable parts of the DT to act synchronously ³¹. Besides, when the connection data between two parts is discovered to be inconsistent (i.e., the discrepancy exceeds a predefined value), timely adjustment is triggered to diagnose inconsistency and restore consistency.

(3) Standardized principle

As much as possible, DTD obtained from various objects, models, conditions, and scenes should be transformed into standardized data with the unified format, interface and protocol. A template that describes the data with respect to different aspects would be built, to help identify their differences with the standard version. The standardized principle is intended for DTD to fulfil the requirement on data universality, thus to support data exchange between different application objects and data sharing against different application scenarios.

(4) Association principle

The association principle corresponds to the requirement on knowledge mining. It emphasizes extracting various association relations among different parts of DTD, such as the causal, analogous, and complementary relations among the physical entity-related data, virtual model-related data, service-related data, and domain knowledge. The extracted associations can reflect valuable information such as causality among events, correlation of performances, and dependency of data variables. Underlying knowledge can be further induced based on such information.

(5) Fusion principle

The fusion principle corresponds to the requirement on data fusion. It emphasizes realizing the data fusion by fully merging data with different relations. For examples, by fusing analogous data from the physical and virtual worlds, which could be verified by each other, it can minimize the uncertainty, randomness, and fuzziness of the physical entity-related data or improve the reliability of the virtual model-related data. Besides, by fusing complementary data from different sources, more comprehensive data can be obtained to achieve richer information.

(6) Information growth principle

The information growth principle corresponds to the requirement on iterative optimization. It means that, with the continuous accumulation of data, DTD should be optimized iteratively through continued data fusion between the new data with the historical data. After each fusion, the increase or decrease in information quantity can be evaluated to decide whether accept the fusion or not, as a way, to ensure the lasting growth of information.

(7) Servitization principle

The servitization principle means encapsulating DTD-related resources (e.g., data, data models, data processing algorithms, and data visualization methods) into on-demand services, which can be treated as black boxes with respect to the corresponding input, output, control, and mechanism. When a user proposes a demand on data, which is decomposed into sub-demands, suitable services are searched, matched, and combined to provide the required data. This principle is intended to lower the requirements on a user’s professional skills in virtue of those on-demand data services.

Methodology for digital twin data

Based on the above principles, a methodology for DTD gathering, interaction, storage, association, fusion, evolution, and servitization is developed by authors, as shown in Figure 3. According to the complementary principle, with respect to data gathering, data from both the physical and virtual worlds should be collected simultaneously to ensure data comprehensiveness. The real-time interaction is realized through timely transmission and adjustment for the connection data based on the timeliness principle. According to the standardized principle, with respect to data storage, data with different formats, interfaces and protocols should be transformed towards standard shapes. According to the association principle, various relations between different DT components should be extracted to support further knowledge mining. Based on the fusion principle, fusion data is generated by merging similar data and complementary data for mutual correction and supplementation. The information growth principle is followed to interpret the progressive data evolution. Based on the on-demand principle, the data-related resources are encapsulated into services for different users. The relevant methods are elaborated as follows.

Figure 3. Methodology for digital twin data (DTD). Digital twin data gathering

DTD gathering refers to the gathering of physical entity-related data, virtual model-related data, service-related data, and domain knowledge. These four kinds of data constitute the foundation of DTD, upon which, the fusion data and connection data can be derived. For the physical entity-related data, the dynamic data (e.g., entity state, environment parameters, and abrupt disturbances) can be collected in real time by means of sensors, embedded systems, radio frequency identification (RFID), and so forth; while the static data (e.g., entity attribute, identification and characteristics) can be obtained through off-line measurement and sampling inspection. The virtual model-related data is typically generated by the process of modeling, simulating, and verifying those geometric, physical, behavioral, and rule models. The data can be gathered through manual books, simulation logs, and real-time simulation outputs. In particular, since the physical model-based simulation, e.g., computational fluid dynamics (CFD) simulation, tends to be time-consuming and even inaccurate, a hybrid method that combines physical models and data-driven approaches can be adopted to improve the simulation efficiency and accuracy ¹⁹. Service-related data is generated based on application services and functional services. Such data can be obtained throughout the service construction, operation, and maintenance. Domain knowledge is typically collected from experts, crowdsourced, knowledge bases, and historical data, etc. In addition, the DTD would also extract essential information from the existing IT systems, such as product lifecycle management (PLM) system, product data management (PDM) system, manufacturing execution system (MES) and supervisory control and data acquisition (SCADA) system, which store large amounts of data across the lifecycle of the physical entity ³².

Digital twin data interaction

Connection data between any two parts of DTD supports the real-time interactions. First, it is necessary to select the suitable data to support the message transmission between any pair of two parts. In the interest of data transmission, the data is further processed by filtering algorithms, dimensionality reduction algorithms, and relevance analysis algorithms, which are intended to remove data noise and redundancy. Then the data can be transferred between the DTD parts through some types of data transmission protocols and formats suggested by the ISO 23247-4 (Digital Twin manufacturing framework- Part 4: Information exchange). During the transmission, the consistency of connection data can be evaluated by calculating the data distances. To better illustrate the interaction, take an autoclave (a key equipment for composites curing) as an example. During curing, temperature distribution of the mold within the autoclave is an important quality indicator, since it largely affects the deformation degree of the composite components ¹⁹. To monitor the component quality, the real mold temperatures are collected by thermocouples. While, a virtual model can also be built to mimic the expected mold temperature distribution by using the CFD simulation. The data interaction between the physical equipment and the virtual model is used to ensure that the actual curing is executed as simulated. In this situation, a major feature for reflecting the mold temperature distribution, i.e. the difference between the maximum and minimum temperatures on the mold upper surface, can be selected as the connection data, leaving out other redundant or unimportant features. Then the collected and simulated mold temperature differences can be exchanged and compared. If their deviation is always within a predefined threshold, the expected composite quality would be achieved. Otherwise, perturbing factors causing the inconsistency need to be found, such as uneven airflow field in the autoclave, deteriorating insulation of the autoclave wall, and abrupt changes of curing conditions. These factors should be eliminated by adjusting parameters of the physical autoclave (e.g. heating speed, air speed, pressure) or by maintaining the equipment, which aims at reducing negative effects caused by the disturbances, to keep the collected data close with the simulated ones.

Digital twin data storage

The physical entity-related data, virtual model-related data, and service-related data collected from different application objects and scenarios should be transformed into a unified mode for sharing. Towards this goal, data should be represented in a standardized fashion first, which involves the description of data properties, such as content, format, sampling frequency, historical data accumulation, interface, communication protocol, etc. Next, necessary constraints should be enclosed on the properties. For example, constraints on historical data accumulation can help to filter those unqualified application conditions, where DT is inapplicable due to the lack of adequate data. If certain constraints cannot be complied with, it indicates that the corresponding application conditions are not qualified. For the qualified ones, on one hand, data with various formats can be converted into a standardized format, e.g. eXtensible markup language (XML), for the sake of easier data sharing. On the other hand, since data from different objects would be transmitted by different interfaces and communication protocols, a middleware is needed to support the unified data access to the database, shielding the heterogeneity ³³. In addition, there are a wide variety of modelling languages and methods used in the literature to handle data and information modeling for products and systems, such as unified modeling language (UML) ³⁴, systems modeling language (SysML) ³⁵, ontology language ³⁶, etc. New sound mathematical approaches based on Category theory ³⁷ could provide a new way to create a comprehensive foundation for modeling, interoperability and integration.

Digital twin data association

Association relations among DTD are mined to support knowledge discovery. Firstly, data obtained from the physical entity, virtual model, and service is preprocessed through data filtering, data reduction, and feature extraction in order to remove irrelevant and useless data. Secondly, temporal and spatial alignments are conducted. For example, the least squares method can be used to synchronize data in time, and transform data into the same coordinate system in space. Next, the relations (e.g., causality, similarity, and complementation) among data are mined by means of Pearson correlation analysis ³⁸, K-means ³⁹, Apriori algorithm ⁴⁰, etc. Based on this, further knowledge (e.g. behavioral patterns of the physical entity, simulation mechanisms of virtual models, and behavior-performance mappings between different parts of DT) can be deduced through knowledge reasoning based on logic rules, distributed representations, and neural network, etc. ⁴¹, and the deduced knowledge can be expressed in the form of knowledge graph. To support the subsequent data fusion, two kinds of data associations are especially important, i.e., the similar relation and complementary relation. The former refers to the relation between data with similar attributes, values or changing trends, whereas the latter refers to the relation between multi-modal data from multiple sources, which can describe the same attribute or behavior from different perspectives.

Digital twin data fusion

Existing work on data fusion focuses primarily on the merging of data from the physical world (e.g., fusing sensor data with manual data), while few efforts have been devoted to fusing data from both the physical and virtual worlds ^{42,
43}. In contrast, DTD fusion involves a holistic merging of physical entity-related data, virtual model-related data, service-related data and domain knowledge. DTD fusion includes the following aspects. In the case that the physical entity-related data is disturbed by environment perturbations, sensor failures, and human interferences, methods such as the weighted average method ²⁸, Dempster–Shafer theory ⁴⁴, and Kalman filter ⁴⁵ can be used to fuse the physical entity-related data with similar virtual model-related data and service-related data to reduce the information entropy. By doing so, the data uncertainty, randomness, and fuzziness can be reduced. Likewise, in the case that the virtual model-related data and service-related data deviate from the reality, methods such as the neural network ²⁹ and Bayesian method ³⁰ can be used to merge the data with similar physical entity-related data, as a way, to improve accuracy and reliability. For the complementary multi-modal data from different parts of DTD, methods such as weighted average method ²⁸ and neural network ²⁹, can be used to increase information diversity.

Digital twin data evolution

DTD evolution is characterized by a dynamic process, through which, new data is continuously added, processed, and then fused with historical data. This process can be described by the data relation network built based on the complex network. When new data is incorporated into the network, new association relations are built between new data and historical data through preprocessing new data, aligning new data with historical data, and then extracting association relations from aligned data by using data mining algorithms, following by fusing the new data with similar or complementary relations for data correction and supplement. This process can be guided by an automatic fusion mechanism to reduce the dependency on manual labor. Information entropy is employed to evaluate the effectiveness of data evolution. By comparing the information entropy before and after the fusion, the gain or loss of information can be evaluated. Meanwhile, the structure change of the network after each fusion can be analyzed with respect to degree, edge betweenness, clustering coefficient, etc. In light of the evaluation results over time, certain DTD evolution rules can be mined, such as the rules for information transfer and the relations between network structure and information distribution. In turn, such rules can be used to guide the subsequent data fusions with respect to the selection of fusion data, and methods, to ensure iterative optimization and information growth.

Digital twin data servitization

DTD servitization aims to enable users to access data through on-demand services, including multiple aspects as follows ⁴⁶. Firstly, various digital resources are encapsulated into services, which have functions (e.g. data searching, processing, and virtualization), inputs (e.g. data type, volume, and feature), outputs (e.g. processed data and visual diagram), quality (e.g. time, cost, and reliability), and states (e.g. working and idle). Given a user demand (e.g., the query of equipment status data, part remaining life data, and maintenance guidance data), relevant services can be prescribed for data extraction, processing, and visualization, etc., respectively. The separate services can be then combined, in compliance with relevant constraints (e.g., time and cost), towards an integrated service solution. Related methods include demand decomposition, similarity matching, multi-objective optimization, and so forth ⁴⁶. With respect to the DTD visualization (i.e., a particular form of DTD service), on top of the traditional visual diagrams, virtual reality (VR) ⁴⁷ and augmented reality (AR) ⁴⁸ can be exploited to visualize the mapping relations between the physical entity and virtual models. Finally, the integrated services provide demanded results to users.

Key enabling technologies for digital twin data

As illustrated in Figure 4, the key enabling technologies for DTD are explored based on the enabling technologies for DT in general ²³ as well as the enabling technologies for digital twin shop-floor (DTS) in particular ⁴⁹. Some main technologies are explained as follows.

Figure 4. Key enabling technologies for digital twin data (DTD).

AR, augmented reality; VR, virtual reality.

(1) DTD gathering deals with rare-event data, extreme environment data, multi-physics coupling data, etc. Since such data is difficult to be measured directly, data generation based on simulation plays a crucial role. First, various modeling and simulation tools that can help to generate data concerning geometry, physics, behaviors, and rules are needed, such as AutoCAD, UG, ANSYS, COMSOL, Matlab, Pajek, etc. ²³. Then a set of activities of verification, validation and accreditation (VV&A) can be applied to ensure the model accuracy, such as raw modeling data credibility assessment, model transformation process checking (e.g. from conceptual model to formal model), and model completeness checking ⁵⁰. For the scenario that the modeling data is insufficient, transfer learning technology can be employed to enable the modeling based on small sample data, since this technology can use data from already known processes to supplement data of the similar process with limited data availability. The high-efficiency simulation technology can be used to improve the efficiency of data generation. For example, metamodel can reduce the computation burdens of simulation, since it is a relatively simple approximation of the input-output function defined by the complex and time-consuming simulation ⁵¹.

(2) With respect to data interaction, data transmission technology is required to exchange connection data among different parts of DTD. ISO 23247-4 provides some options for data transmission protocols and formats. For example, the virtual entity can access data of the physical entity through protocols such as MTConnect, OPC-UA, HTTP and CoAP; and the virtual entity can be connected with the service by using application interfaces (i.e. open API) or by employing shared memory. The suggested data formats include XML, JavaScript object notation (JSON), quality information framework (QIF) and so forth. The asset administration shell, which virtualizes the physical asset and enables the access to asset-related data via application interfaces, also provides a way for data exchange ⁵². Then to reduce the redundancy in the exchanged data, measures such as Pearson correlation coefficient ³⁸ and mutual information ⁵³ can be employed to judge the overlapping of the data. Besides, Euclidean distance, Manhattan distance and dynamic time warping ⁵⁴, etc. can be used to evaluate the discrepancy between data from different parts of the DT to evaluate their consistency.

(3) DTD storage deals with data originating from different sources with various formats. First, database management technology performs basic data functions such as adding, deleting, changing, and checking. To represent the data in a unified form for easier sharing, many researches explored the data formats transformation methods. For example, 55 converted the unstructured data to semi-structured data in XML format, and then to structured data through parsing XML file; and 56 transformed the heterogeneous big data to a semantically-enriched format called resource description framework (RDF). Data middleware technology completes the transformations for data interfaces and communication protocols. For example, the Industrial Internet-of-Things Hub (II-hub) was designed to transform various data interfaces (e.g. RS232, RS484, CAN) and protocols (e.g. Profibus, Modbus) into the unified Wi-Fi interface and CoAP protocol, respectively ³³. Besides, for data storage, the security would always be a concerned problem. To deal with it, related activities such as encryption, proof of ownership, authentication and authorization need to be conducted ⁵⁷.

(4) With respect to data association, related technologies include spatial-temporal data alignment, data mining, knowledge reasoning, knowledge representation, etc. Spatial-temporal data alignment algorithms, including the least square method, maximum likelihood method and Kalman filter method, etc., makes DTD synchronous in time and in the same coordinate system in space ⁵⁸. Data mining algorithms (e.g. K-means ³⁹, Apriori algorithm ⁴⁰, algorithm based on conditional mutual information ⁵⁹) can make visible clustering groups and association relations among DTD. For example, the conditional mutual information is a measure to identify the complementary relation between two variables, which means that the appearance of one variable would make the other more relevant with a certain target indicator ⁵⁹. Then knowledge reasoning technology based on logic rules, distributed representations, neural network, etc. ⁴¹, can be used to extract new knowledge from the extracted relations and groups, or from existing knowledge. Knowledge representation technology (e.g., knowledge graph) can visually describe the knowledge, knowledge carriers, and knowledge relations.

(5) With respect to data fusion, related technologies include anomaly detection, granularity transformation, heterogeneous data fusion, fault-tolerant technology, etc. ⁵⁸. Anomaly detection technology can be employed to remove abnormal data before fusion. The basic idea is training a model that describes the normal pattern of the target object based on historical data, to detect the anomaly of the newly collected data ⁶⁰. Granularity transformation technology can be used to transform data with different granularities (e.g., raw data and extracted features, sparse data and dense data, a large collection of text and refined knowledge) ⁵⁸ into the same level. Related methods include the feature extraction algorithms (e.g. Fourier transformation and wavelet analysis) for raw data processing, the neural network for building mapping relations between the sparse and dense data ⁶¹, as well as the knowledge extraction for text ⁶², etc. Then data fusion algorithms (e.g. weighting method ²⁸, Dempster-Shafer theory ⁴⁴, and Kalman filter ⁴⁵) can converge the data for getting richer information. In addition, adopting the distributed fusion framework introduced in 58 can achieve higher efficiency for data fusion, since the local processors in the framework share the computation tasks.

(6) Data evolution deals with the iterative process of data fusion. Complex network modeling technology can be used to build the data network, in which various data variables are regarded as nodes, and data association relations (e.g. similarity and complementarity) are treated as edges. The network virtualization and analysis tools are required, such as Pajek, EChart, and Gephi. Statistical characteristics of the network with respect to degree, weighted degree, edge betweenness, and clustering coefficient, etc., can be used to identify roles of different nodes, which could help to decide what data to be fused. Once new data is added, the data fusion will merge the new data with the previous ones for richer information. Then the information before and after the fusion can be measured by calculating the information entropy, based on which, to evaluate the efficiency of data evolution.

(7) With respect to data servitization, related technologies include resource encapsulation, demand decomposition, multi-objective combinational optimization, data visualization based on VR and AR, etc. Resource encapsulation technology can transform digital resources into services, by using semantic web, manufacturing service description language and ontology modeling ⁶³, etc. Demand decomposition technology can be used to decompose a complicated user demand into specific sub-demands, which can facilitate the search of a specific service. The hierarchical task network (HTN), which supports to separate a goal task into more specific tasks, is applied in this field ⁶⁴. Multi-objective combinational optimization technology can be used to integrate multiple services in light of conflicting objectives (e.g., time, cost, and reliability). The commonly used optimization algorithms include genetic algorithm, ant colony algorithm, and particle swarm optimization algorithm, etc. Data visualization based on VR and AR can be used to present the relations among data, virtual models, and physical entities in a more intuitive manner.

Conclusions and future work

Data is a core driver for DT. This paper focuses on the DTD that can be classified into physical entity-related data, virtual model-related data, service-related data, domain knowledge, fusion data, and connection data. The advent of DT puts forward some new requirements on data in terms of data gathering, interaction, universality, mining, fusion, iterative optimization, and on-demand usage. Triggered by these requirements, seven basic principles are proposed to support the DTD organization and processing. Guided by these principles, related methods for DTD gathering, interaction, storage, association, fusion, evolution, and servitization are discussed. Finally, the key enabling technologies are discussed.

This paper provides some theoretical underpinnings for DTD, which are imperative for the further promotion and application of DT, to support more DT researchers to incorporate the DTD into their DT development process. When it comes to a certain DT application scenario, the researchers are expected to analyze the data requirements according to Section 1 (‘From data to digital twin data’) first. Then to satisfy the requirements, related principles for DTD proposed in Section 3 (‘Principles for digital twin data’) should be observed. Under the guideline of the principles, researchers can employ the corresponding methods and key enabling technologies explored in Section 4 (‘Methodology for digital twin data’) and Section 5 (‘Key enabling technologies for digital twin data’) for DTD gathering and processing. The above processes facilitate achieving more comprehensive and consistent data, more abundant and deeper information, as well as more convenient and standard data usage, which could bring countless benefits to DT applications. For example, the DTD-based prognosis would have higher accuracy in virtue of the comprehensive data support. The DTD-based production control can better align the practical process with the simulated plan through the real-time data interaction. The DTD-based product design could have higher efficiency due to the efficient knowledge mining of key information for designers.

For the DTD, challenges ahead are mainly concerning simulation data generation and cyber-physical data fusion. Firstly, since DTD emphasizes on the supplementary role of simulation data, how to build virtual models to generate the simulation data accurately remains a fundamental challenge. The commonly used models include data model and physical model. However, the former relies heavily on the modeling data and it usually has poor interpretability. Whereas, although the latter is established based on physical mechanisms and constraints, it tends to be over-simplistic sometimes and thus leads to non-ignorable deviations with the practice. The hybrid modeling method that merges the data model and the physical model would be a promising way for addressing this challenge. Secondly, even though cyber-physical integration is a distinguishing feature of DT, cyber-physical data fusion remains to be a challenge. As data from the physical and virtual spaces may have their own limitations and flaws, how to ensure that the fusion result would always be superior to data from either space is a concerned problem.

In the future, this work will continue to be enriched by (1) applying the generic principles and methods to guide DTD gathering and processing in practical applications, (2) refining the principles and methods based on lessons learned from practical applications, for instance, further improving related processing algorithms to better adapt to DTD, refining data generation and fusion methods, and exploring the hierarchical organization way for the DTD and (3) introducing big data-related tools and platforms to support large-scale DT applications, since in this case, the DTD, which is closely related to real-time dynamic DT, also has the “4V” character of big data (i.e. volume, variety, value and velocity).

Data availability

No data are associated with this article.

Tao

: Make more digital twins. Nature. 2019;573(7775):490–491. 31554984

10.1038/d41586-019-02849-1

Liu

Tao

Cheng

: Digital twin satellite: concept, key technologies and applications. Comput Integr Manuf Syst. 2020;26(3):565–588. 10.13196/j.cims.2020.03.001

Mandolla

Petruzzell

Percoco

: Building a digital twin for additive manufacturing through the exploitation of blockchain: A case analysis of the aircraft industry. Comput Ind. 2019;109:134–152. 10.1016/j.compind.2019.04.011

Rajesh

Manikandan

Ramshankar

: Digital twin of an automotive brake Pad for predictive maintenance. Procedia Comput Sci. 2019;165:18–24. 10.1016/j.procs.2020.01.061

Zheng

Chen

: Digital twin for geometric feature online inspection system of car body-in-white. Int J Comput Integr Manuf. 2020;34(7–8):752–763. 10.1080/0951192X.2020.1736637

Coraddu

Oneto

Baldi

: Data-driven ship digital twin for estimating the speed loss caused by the marine fouling. Ocean Eng. 2019;186:106063. 10.1016/j.oceaneng.2019.05.045

Zhou

Yan

Feng

: Digital twin framework and its application to power grid online analysis. CSEE J Power Energy Syst. 2019;5(3):391–398. 10.17775/CSEEJPES.2018.01460

Peng

Zhao

Wang

: A digital twin based estimation method for health indicators of DC-DC Converters. IEEE Trans Power Electron. 2021;36(2):2105–2118. 10.1109/TPEL.2020.3009600

Deng

Zhang

Shen

: A systematic review of a digital twin city: A new pattern of urban governance toward smart cities. J Manag Sci Eng. 2021;6(2):125–134. 10.1016/j.jmse.2021.03.003

Tao

Zhang

: Digital twin shop-floor: a new shop-floor paradigm towards smart manufacturing. IEEE Access. 2017;5:20418–20427. 10.1109/ACCESS.2017.2756069

Tao

Zhang

Nee

AYC

: Digital twin driven smart manufacturing. Elsevier,2019. Reference Source

Leng

Liu

: Digital twin-driven rapid reconfiguration of the automated manufacturing system via an open architecture model. Robot Comput Integr Manuf. 2020;63:101895. 10.1016/j.rcim.2019.101895

Tao

Cheng

: Digital twin-driven product design, manufacturing and service with big data. Int J Adv Manuf Technol. 2018;94(9–12):3563–3576. 10.1007/S00170-017-0233-1

Zhang

Liu

Cheng

: Just-in-time material distribution method for satellite assembly digital twin shop-floor. Computer Integrated Manufacturing System. 2020;26(11):2897–2914.

Zhang

Tao

Nee

AYC

: Digital twin enhanced dynamic job-shop scheduling. J Manuf Syst. 2021;58(Part B):146–156. 10.1016/j.jmsy.2020.04.008

Fang

Peng

Lou

: Digital-Twin-based job shop scheduling toward smart manufacturing. IEEE Trans Industr Inform. 2019;15(12):6425–6435. 10.1109/TII.2019.2938572

Cui

: Digital twin-based industrial cloud robotics: framework, control approach and implementation. J Manuf Syst. 2021;58(Part B):196–209. 10.1016/j.jmsy.2020.07.013

Söderberg

Wärmefjord

Carlson

: Toward a digital twin for real-time geometry assurance in individualized production. CIRP Ann Manuf Technol. 2017;66(1):137–140. 10.1016/j.cirp.2017.04.038

Zhang

Tao

Huang

: A physical model and data-driven hybrid prediction method towards quality assurance for composite components. CIRP Ann Manuf Technol. 2021;70(1):115–118. 10.1016/j.cirp.2021.04.062

Luo

: A hybrid predictive maintenance approach for CNC machine tool driven by digital twin. Robot Comput Integr Manuf. 2020;65:101974. 10.1016/j.rcim.2020.101974

Bilberg

Malik

: Digital twin driven human-robot collaborative assembly. CIRP Ann Manuf Technol. 2019;68(1):499–502. 10.1016/j.cirp.2019.04.011

Malik

Brem

: Digital twins for collaborative robots: A case study in human-robot interaction. Robot Comput Integr Manuf. 2021;68:102092. 10.1016/j.rcim.2020.102092

Tao

: Enabling technologies and tools for digital twin. J Manuf Syst. 2019;58(Part B):3–21. 10.1016/j.jmsy.2019.10.001

Wang

Tao

Zhang

: Digital twin enhanced fault prediction for the autoclave with insufficient data. J Manuf Syst. 2021;60:350–359. 10.1016/j.jmsy.2021.05.015

Uhlmann

Barth

Seifarth

: Simulation of metal cutting with cutting fluid using the Finite-Pointset-Method. Procedia CIRP. 2021;101:98–101. 10.1016/j.procir.2021.02.013

Tao

Zhang

Liu

: Digital twin driven prognostics and health management for complex equipment. CIRP Ann Manuf Technol. 2018;67(1):169–172. 10.1016/j.cirp.2018.04.055

Tao

Liu

Zhang

: Five-dimension digital twin model and its ten applications. Computer Integrated Manufacturing System. 2019;25(1):1–18. 10.13196/j.cims.2019.01.001

Wang

Guo

Tie

: Weighted hybrid fusion with rank consistency. Pattern Recognit Lett. 2020;138:329–335. 10.1016/j.patrec.2020.07.037

Guan

Cao

Yang

: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Information Fusion. 2019;50:148–157. 10.1016/j.inffus.2018.11.017

Taylor

Bishop

: Homogeneous functionals and Bayesian data fusion with unknown correlation. Information Fusion. 2019;45:179–189. 10.1016/j.inffus.2018.02.002

Jones

Snider

Nassehi

: Characterising the digital twin: a systematic literature review. CIRP J Manuf Sci Technol. 2020;29(Part A):36–52. 10.1016/j.cirpj.2020.02.002

Boschert

Rosen

: Digital twin-the simulation aspect. In Mechatronic Futures.Springer,2016;59–74. 10.1007/978-3-319-32156-1_5

Tao

Cheng

: IIHub: An industrial Internet-of-Things hub toward smart manufacturing based on cyber-physical system. IEEE Trans Industr Inform. 2018;14(5):2271–2280. 10.1109/TII.2017.2759178

Uke

Thool

: UML Based modeling for data aggregation in secured wireless sensor network. Procedia Comput Sci. 2016;78:706–713. 10.1016/j.procs.2016.02.120

Brahmi

Hammadi

Aifaoui

: Interoperability of CAD models and SysML specifications for the automated checking of design requirements. Procedia CIRP. 2021;100:259–264. 10.1016/j.procir.2021.05.064

Manaa

Akaichi

: Ontology-based modeling and querying of trajectory data. Data & Knowledge Engineering. 2017;111:58–72. 10.1016/j.datak.2017.06.005

Legatiuk

: Mathematical modelling by help of category theory: models and relations between them. Mathematics. 2021;9(16):1946. 10.3390/math9161946

Jebli

Belouadha

Kabbaj

: Prediction of solar energy guided by pearson correlation using machine learning. Energy. 2021;224:120109. 10.1016/j.energy.2021.120109

Wang

Sheng

: An adaptive and opposite K-means operation based memetic algorithm for data clustering. Neurocomputing. 2021;437:131–142. 10.1016/j.neucom.2021.01.056

Tian

Zhang

Guo

: Data dependence analysis for defects data of relay protection devices based on Apriori algorithm. IEEE Access.2020;8:120647–120653. 10.1109/ACCESS.2020.3006345

Chen

Jia

Xiang

: A review: knowledge reasoning over knowledge graph. Expert Syst Appl. 2020;141:112948. 10.1016/j.eswa.2019.112948

Segreto

Caggiano

Teti

: Neuro-fuzzy system implementation in multiple sensor monitoring for Ni-Ti alloy machinability evaluation. Procedia CIRP. 2015;37:193–198. 10.1016/j.procir.2015.08.020

Yang

Wang

: An RBF neural network approach towards precision motion system with selective sensor fusion. Neurocomputing. 2016;199:31–39. 10.1016/j.neucom.2016.01.093

Mourtzis

Vlachou

Doukas

: Cloud-based adaptive shop-floor scheduling considering machine tool availability. ASME 2015 International Mechanical Engineering Congress and Exposition. November 13–19, Houston, Texas, USA 2015, 57588: V015T19A017.2015. 10.1115/IMECE2015-53025

Zheng

Qiu

Wang

: Data fusion based multi-rate Kalman filtering with unknown input for on-line estimation of dynamic displacements. Measurement. 2019;131:211–218. 10.1016/j.measurement.2018.08.057

Tao

Cheng

: SDMSim: A manufacturing service supply-demand matching simulator under cloud environment. Robot Comput Integr Manuf. 2017;45:34–46. 10.1016/j.rcim.2016.07.001

Nee

AYC

Ong

: Virtual and Augmented Reality Applications in Manufacturing. IFAC Proceedings Volumes. 2013;46(9):15–26. 10.3182/20130619-3-RU-3018.00637

Ong

Yew

AWW

Thanigaivel

: Augmented reality-assisted robot programming system for industrial applications. Robot Comput Integr Manuf. 2020;61:101820. 10.1016/j.rcim.2019.101820

Tao

Zhang

Cheng

: Digital twin workshop: a new paradigm for future workshop. Computer Integrated Manufacturing System,2017;23(1):1–9. 10.13196/j.cims.2017.01.001

Lehmann

Wang

: Verification, validation, and accreditation (VV&A)—requirements, standards, and trends. In Model Engineering for Simulation.Academic Press,2019;101–121. 10.1016/B978-0-12-813543-3.00006-8

Kleijnen

: Regression and Kriging metamodels with their experimental designs in simulation: review. Eur J Oper Res. 2017;256(1):1–16. 10.2139/ssrn.2627131

Wagner

Grothoff

Epple

: The role of the Industry 4.0 asset administration shell and the digital twin during the life cycle of a plant. 2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation. September 12–15, 2017, Limassol, Cyprus,2017. 10.1109/ETFA.2017.8247583

Cover

Thomas

: Elements of information theory. Wiley,2006. Reference Source

Bankó

Abonyi

: Correlation based dynamic time warping of multivariate time series. Expert Syst Appl. 2012;39(17):12814–12823. 10.1016/j.eswa.2012.05.012

Yang

Zhou

: Method for unstructured data transformation based on XML technology. Comput Sci. 2017;44(S2):414–417.

Malik

Ahmad

Farhan

: Big-data: transformation from heterogeneous data to semantically-enriched simplified data. Multimed Tools Appl. 2016;75:12727–12747. 10.1007/s11042-015-2918-5

Wen

: Big Data Storage Security. In Big Data Concepts, Theories, and Applications.Springer,2016.

Han

Zhu

Duan

: Multi-source information fusion.Tsinghua University Press, 2010.

Meyer

Schretter

Bontempi

: Information-theoretic feature selection in microarray data using variable complementarity. IEEE J Sel Top Signal Process. 2008;2(3):261–274. 10.1109/JSTSP.2008.923858

Agrawal

: Survey on anomaly detection using data mining techniques. Procedia Comput Sci. 2015;60:708–713. 10.1016/j.procs.2015.08.220

Zhang

Gao

Feng

: InteractionNN: a neural network for learning hidden features in sparse prediction. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. August 10–16, 2019, Macao, China,2019;4334–4340. Reference Source

Yang

Kim

Hur

: Knowledge extraction and visualization of digital design process. Expert Syst Appl. 2018;92:206–215. 10.1016/j.eswa.2017.09.002

Zhang

Liu

: Research on services encapsulation and virtualization access model of machine for cloud manufacturing. J Intell Manuf. 2017;28(5):1109–1123. 10.1007/s10845-015-1064-2

Cheng

Tao

Zhao

: Modeling of manufacturing service supply–demand matching hypernetwork in service-oriented manufacturing systems. Robot Comput Integr Manuf. 2017;45:59–72. 10.1016/j.rcim.2016.05.007

10.21956/digitaltwin.18840.r26887

Reviewer response for version 2

Zheng

Pai

1 Referee https://orcid.org/0000-0002-2329-8634 1Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong, Hong Kong

Competing interests: No competing interests were disclosed.

21 3 2022

2022

This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

recommendation

approve

The authors have fully addressed my concerns and I have no further comments to make.

Is the rationale for developing the new method (or application) clearly explained?

Yes

Is the description of the method technically sound?

Yes

Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Partly

If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No source data required

Are sufficient details provided to allow replication of the method development and its use by others?

Yes

Reviewer Expertise:

Smart manufacturing system, product-service systems, engineering design

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

10.21956/digitaltwin.18840.r26886

Reviewer response for version 2

Zhang

Yicha

1 Referee 1Mechanical Engineering and Design, Génie Mécanique et Conception, Campus de Sevenans, Universite de Technologie de Belfort-Montbeliard, Sevenans, France

Competing interests: No competing interests were disclosed.

3 3 2022

2022

recommendation

approve

The revised version well responded the concerns in the first round reviewing.

Is the rationale for developing the new method (or application) clearly explained?

Yes

Is the description of the method technically sound?

Partly

Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Partly

If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No source data required

Are sufficient details provided to allow replication of the method development and its use by others?

Partly

Reviewer Expertise:

Digital design, planning & manufacturing, additive manufacturing, product-service-system engineering

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

10.21956/digitaltwin.18840.r26888

Reviewer response for version 2

Nassehi

Aydin

1 Referee https://orcid.org/0000-0003-3417-3391 1Department of Mechanical Engineering, University of Bristol, Bristol, UK

Competing interests: No competing interests were disclosed.

9 2 2022

2022

recommendation

approve

With the revisions the paper presents a good overview of digital twin data challenges and solutions.

Is the rationale for developing the new method (or application) clearly explained?

Yes

Is the description of the method technically sound?

Yes

Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Yes

If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

Yes

Are sufficient details provided to allow replication of the method development and its use by others?

Reviewer Expertise:

AI in Manufacturing, Agent Based Modelling of Distributed System

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

10.21956/digitaltwin.18742.r26810

Reviewer response for version 1

Nassehi

Aydin

1 Referee https://orcid.org/0000-0003-3417-3391 1Department of Mechanical Engineering, University of Bristol, Bristol, UK

Competing interests: No competing interests were disclosed.

22 10 2021

2021

recommendation

approve-with-reservations

The paper presents an excellent overview of challenges and potential solutions to the issues related to digital twin data. The authors should consider including references to the ISO23247 standard which would complement the presented work very nicely. In particular, Part 3 of the document provides a standardised framework for digital representation of the elements in a digital twin and Part 4 provides the framework to underpin information exchange in Digital Twins.

There are a few other well-cited references which contain digital twin data issues that should be used in the paper to position the work better. These include:

Wagner et al. (2017) ¹ - Provides an overview of the Asset Administration Shell as an interoperability wrapper that enables data transfer between the physical and digital counterparts in a manner that virtualises the asset and allows various twins to communicate in a similar manner.

Jones et al. (2020) ² - Provides a contextual framework for Virtual-Physical and Physical-Virtual twinning to underpin the various purposes of data transfer and storage in Digital Twins (an article of mine).

Boschert et al. (2016) ³ - Provides a view of the Digital Twin and its data across the lifecycle of a product and its production system that provides scope for the topic of discussion in the presented paper.

Is the rationale for developing the new method (or application) clearly explained?

Yes

Is the description of the method technically sound?

Yes

Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Yes

If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

Yes

Are sufficient details provided to allow replication of the method development and its use by others?

Reviewer Expertise:

AI in Manufacturing, Agent Based Modelling of Distributed System

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

References 1

10.1109/ETFA.2017.8247583

: Characterising the Digital Twin: A systematic literature review. CIRP Journal of Manufacturing Science and Technology .2020;29: 10.1016/j.cirpj.2020.02.002 36-52

10.1016/j.cirpj.2020.02.002

: Digital Twin—The Simulation Aspect.2016; 10.1007/978-3-319-32156-1_5 59-74

10.1007/978-3-319-32156-1_5

Zhang

Meng

Department of Automation, Tsinghua University, China

Competing interests: No competing interests were disclosed.

28 1 2022

Reviewer 4

The paper presents an excellent overview of challenges and potential solutions to the issues related to digital twin data.

Question 1: The authors should consider including references to the ISO23247 standard which would complement the presented work very nicely. In particular, Part 3 of the document provides a standardised framework for digital representation of the elements in a digital twin and Part 4 provides the framework to underpin information exchange in Digital Twins.

Response: Thanks for the comments. The reference to the ISO 23247 has been added in the revised version. The information classification for the manufacturing elements in Part 3 is referenced in the section “Composition of digital twin data” as follows:

Section “Composition of digital twin data”, Para. 2

According to ISO 23247-3 series (Digital Twin manufacturing framework-Part 3: Digital representation of manufacturing elements), physical entity-related data can be grouped in two categories: the first describing the static information concerning the physical entity (e.g. identification, characteristics, schedule) and the second reflecting the dynamic states (e.g. status, location, relationship). These data can help to represent the physical entity in a digital way.

The suggested data formats and protocols for data exchange in Part 4 are cited in the section “Digital twin data interaction”, and the section “Key enabling technologies for digital twin data” as follows:

Section “Digital twin data interaction” Then the data can be transferred between the DTD parts through some types of data transmission protocols and formats suggested by the ISO 23247-4 (Digital Twin manufacturing framework- Part 4: Information exchange).

Section “Key enabling technologies for digital twin data”, Para. 3 ISO 23247-4 provides some options for data transmission protocols and formats. For example, the virtual entity can access data of the physical entity by using protocols such as MTConnect, OPC-UA, HTTP and CoAP; and the virtual entity can be connected with the service by using an application interface (i.e. open API) or by employing shared memory. The suggested data formats include XML, JavaScript object notation (JSON), quality information framework (QIF) and so forth.

Question 2: There are a few other well-cited references which contain digital twin data issues that should be used in the paper to position the work better. These include: Wagner et al. (2017) - Provides an overview of the Asset Administration Shell as an interoperability wrapper that enables data transfer between the physical and digital counterparts in a manner that virtualises the asset and allows various twins to communicate in a similar manner. Jones et al. (2020) - Provides a contextual framework for Virtual-Physical and Physical-Virtual twinning to underpin the various purposes of data transfer and storage in Digital Twins (an article of mine). Boschert et al. (2016) - Provides a view of the Digital Twin and its data across the lifecycle of a product and its production system that provides scope for the topic of discussion in the presented paper.

Response: Thanks for the comments. These references have been added in the sections of “Principles for digital twin data”, “Digital twin data gathering”, and “Key enabling technologies for digital twin data” as follows:

Section “Principles for digital twin data”, Para. 3 It aims at ensuring the rapid twinning process, namely the process of synchronizing the virtual and physical states, thus to enable parts of the DT to act synchronously ³¹. Section “Digital twin data gathering” In addition, the DTD would also extract essential information from the existing IT systems, such as product lifecycle management (PLM) system, product data management (PDM) system, manufacturing execution system (MES) and supervisory control and data acquisition (SCADA) system, which store large amounts of data across the lifecycle of the physical entity ³².

Section “Key enabling technologies for digital twin data”, Para. 3 In addition, the asset administration shell, which virtualizes the asset and enables the access to asset-related data via application interfaces, also provides a way for data exchange ⁵².

[31] Jones D, Snider C, Nassehi A, et al.: Characterising the digital twin: a systematic literature review. CIRP Journal of Manufacturing Science and Technology. 2020; 29, Part A: 36-52.

[32] Boschert S, Rosen R: Digital twin-the simulation aspect. In Mechatronic Futures. Springer, 2016.

[52] Wagner C, Grothoff J, Epple U, et al.: The role of the Industry 4.0 asset administration shell and the digital twin during the life cycle of a plant. 2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation, September 12-15, 2017, Limassol, Cyprus, 2017, DOI: 10.1109/ETFA.2017.8247583.

10.21956/digitaltwin.18742.r26809

Reviewer response for version 1

Zhang

Yicha

1 Referee 1Mechanical Engineering and Design, Génie Mécanique et Conception, Campus de Sevenans, Universite de Technologie de Belfort-Montbeliard, Sevenans, France

Competing interests: No competing interests were disclosed.

18 10 2021

2021

recommendation

approve-with-reservations

This paper tries to propose a new concept, DTD, to describe the related requirements, principles and enabling techs. It is interesting in the attempt to draw a new DT data processing framework or guideline. However, from reviewer's perspective on operation in practice, there are still some places that are unclear and hard to follow.

1. There is no clear definition about DTD, but more about data classification in the context of DT. It is possible to give a new definition to show the difference of data in the context of DT? The definition difference would focus on more about contents or format or application?

2. Is there an exact logic for the listing of DTD requirements? Mining is before or after fusion, or both directions are possible?

3. It is hard to understand 'data interaction'. It is more similar to service as we often encountered in IOT or CPS systems.

4. What exactly is 'iterative optimization'? As you mentioned in the paper: iterative optimization refers to a cyclic process of “data increase - data fusion - information increase”, through which, new data is fused with historical data to generate new information continuously. This means to generate new data sets. It is more about data evolution? However, as you also mentioned in the later section of the paper, 'optimization of DTD'. So, how to optimize DTD? What are the objective, process and constraints?

5. The requirement of data universality seems an ideal solution in every domain. However, DT is usually customized system. Hence, is there any conflict between this requirement with the application? What does it mean by universality, contents or format?

6. What is fusion data, data of data (well structured and associated) or 'green data' (intermediate data)?

7. Are the components of DTD organized in a hierarchical way or not?

Is the rationale for developing the new method (or application) clearly explained?

Yes

Is the description of the method technically sound?

Partly

Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Partly

If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No source data required

Are sufficient details provided to allow replication of the method development and its use by others?

Partly

Reviewer Expertise:

Digital design, planning & manufacturing, additive manufacturing, product-service-system engineering

Zhang

Meng

Department of Automation, Tsinghua University, China

Competing interests: No competing interests were disclosed.

28 1 2022

Reviewer 3

Question 1: There is no clear definition about DTD, but more about data classification in the context of DT. It is possible to give a new definition to show the difference of data in the context of DT? The definition difference would focus on more about contents or format or application?

Response: Thanks for the comments. A definition about DTD is added in the first paragraph of the section “Composition of digital twin data” as follows:

As cyber-physical integration is a distinguishing feature of DT, cyber-physical data fusion is a major characteristic of the DTD. It could be reflected on the data contents, as the DTD involves data from both the physical and virtual spaces, and tends to merge these data to generate fusion data for more accurate and comprehensive information. DTD refers to a wide spectrum of data closely related to the DT. It focuses equally on data collected from the physical space (e.g. sensor data from equipment, materials, and workers) and data produced by the virtual space (e.g. data mimicked by simulation models, data generated by algorithms embedded in services, and data deduced based on knowledge), and tends to make the data from the two spaces corrected and supplemented by each other through data fusion, to achieve more accurate and comprehensive information for the DT-related applications.

Question 2: Is there an exact logic for the listing of DTD requirements? Mining is before or after fusion, or both directions are possible?

Response: The listing of DTD requirements is corresponding to the logic of data gathering, interaction, storage, association, fusion, evolution and usage. There is a one to one correspondence between the DTD requirements and the principles (Fig.2), as well as the principles and methods (Fig.3). As data mining is the basis for data fusion, mining is before fusion. Data fusion is developed based on data with different relations, such as fusing similar data for data correction and fusing complementary data for getting supplementary information. While, data association relations (e.g. causality, similarity, and complementation) are extracted based data mining algorithms, such as Apriori algorithm, K-means, and Pearson correlation analysis, etc.

Question 3: It is hard to understand 'data interaction'. It is more similar to service as we often encountered in IOT or CPS systems

Response: Thanks for the comments. In the paper, 'data interaction' refers to the transmission and comparison of connection data from any two parts of the DT. It aims at making the data from different sources consistent with each other to enable the coordinated operations. In the section “Digital twin data interaction”, an example about the data interaction is added as follows:

To better illustrate the interaction, take an autoclave (a key equipment for composites curing) as an example. During curing, temperature distribution of the mold within the autoclave is an important quality indicator, since it largely affects the deformation degree of the composite components ¹⁹. To monitor the component quality, the real mold temperatures are collected by thermocouples. While, a virtual model can also be built to mimic the expected mold temperature distribution by using the CFD simulation. The data interaction between the physical equipment and the virtual model is used to ensure that the actual curing is executed as simulated. In this situation, a major feature for reflecting the mold temperature distribution, i.e. the difference between the maximum and minimum temperatures on the mold upper surface, can be selected as the connection data, leaving out other redundant or unimportant features. Then the collected and simulated mold temperature differences can be exchanged and compared. If their deviation is always within a predefined threshold, the expected composite quality would be achieved. Otherwise, perturbing factors causing the inconsistency need to be found, such as uneven airflow field in the autoclave, deteriorating insulation of the autoclave wall, and abrupt changes of curing conditions. These factors should be eliminated by adjusting parameters of the physical autoclave (e.g. heating speed, air speed, pressure) or by maintaining the equipment, which aims at reducing negative effects caused by the disturbances, to keep the collected data close with the simulated ones.

In the section “Key enabling technologies for digital twin data”, more technical details for data interaction are also added in the part (2). Besides, as the service is a relatively broad concept, I think the “data interaction” could also be treated as a kind of service, which exchanges and compares data from different sources to ensure the data consistency. Perhaps it can be encapsulated as a service and then invocated to realize the data exchange and comparison when necessary.

Question 4: What exactly is 'iterative optimization'? As you mentioned in the paper: iterative optimization refers to a cyclic process of “data increase - data fusion - information increase”, through which, new data is fused with historical data to generate new information continuously. This means to generate new data sets. It is more about data evolution? However, as you also mentioned in the later section of the paper, 'optimization of DTD'. So, how to optimize DTD? What are the objective, process and constraints?

Response: Thanks for the comments. Before the “iterative optimization”, “optimization of DTD” is first explained. “Optimization of DTD” mainly aims at making the data more informative through data fusion. In different situations, the objective, process and constraints would be different. For example, assume that there are data of several variables related to the remaining useful life (RUL) of an equipment component, such as vibrations, speeds and forces collected by sensors, as well as deformation and stress distribution simulated by models. We can fuse the data by using the neural network as the fusion algorithm to obtain more informative data. Inputs of the network are all or part of the RUL-related variables and the output is the merging result, i.e. the fusion data, which would be more relevant with the RUL than any other single variable. The relevance between the fusion data and the RUL can be measured by the mutual information (a measure to quantify the shared information between two variables). Higher relevance means that the data carry more RUL-related information. In this situation, the optimization objective would be maximizing the mutual information between the fusion data and the RUL. Since selecting different subsets of the RUL-related variables as inputs of the neural network would lead to different fusion data, the optimization process mainly needs to search out the best subset from the candidate ones to make the objective optimal. Some constraints would be imposed on the variables to be selected. For example, the relevance between each variable and the RUL would be larger than a lower limit, to eliminate the less correlated ones; and the relevance between any two selected variables would be lower than an upper limit, to reduce unnecessary redundancy. The iterative optimization refers to the repeated process of the above optimization. When new data from different parts such as the physical entity, virtual model and service are injected into the DTD, a new optimization would begin to decide what new data to be added to the fusion to make the objective more optimal than before. It means that the newly generated fusion data would carry more information. Besides, the “iterative optimization” is closely related with the data evolution. The logic in the paper is as follows. “Iterative optimization” is a requirement on DTD. To meet the requirement, the “information growth principle” proposed in the later section should be observed, to ensure that the “data evolution” step in the section “Methodology for digital twin data” would keep the fusion data generated and information carried by the fusion data increased. To better illustrate the iterative optimization, some explanations are added in the “Requirements on iterative optimization” in Section 1 as follows:

In the cyclic process, each optimization, triggered by the injection of new data, aims at adding appropriate data into the fusion to increase the amount of valuable information carried by the fusion data, while being subject to possible constraints on the data to be merged, such as keeping the data redundancy lower than an upper bound.

Question 5: The requirement of data universality seems an ideal solution in every domain. However, DT is usually customized system. Hence, is there any conflict between this requirement with the application? What does it mean by universality, contents or format?

Response: It could not be conflicted. As the DT system is customized, it would develop different sensors, models and services for different applications, which bring a wide variety of data with different types and contents. Although it is hard to make the data unified in every aspect, we can still make them transformed into some unified formats (e.g. XML) for data storing and sharing, and transmitted by some universal interfaces and communication protocols for easier communication. In the paper, universality mainly refers to unified data format, interface and communication protocol.

Question 6: What is fusion data, data of data (well structured and associated) or 'green data' (intermediate data)?

Response: Fusion data is a merging result of data from different sources by using data fusion algorithms. Take the healthy monitoring for an equipment as an example. From the physical space, data such as the vibrations, speeds, forces and pressures can be collected by sensors. While, from the virtual space, data such as deformation, stress distribution, and airflow distribution can be simulated by virtual models. Since all the data would be relevant with the equipment conditions, a merging result of these data (i.e. the fusion data) can be used to support a more accurate monitoring, as it involves more comprehensive information, compared with data from a single source. The fusion data is obtained through processing the original data from the sensors, simulations or other sources via data fusion algorithms, such as the weighting method, neural network, Bayesian method, etc. To clarify it, some explanations are added in the second to last paragraph in the section “Composition of digital twin data” as follows:

For example, the healthy condition of an equipment can be reflected by various sensor data such as the vibrations, speeds, forces and pressures, as well as the simulated data such as deformation, stress distribution, and airflow distribution. The fusion data is a merging result of the above data, which is obtained by processing the data from different sources via data fusion algorithms, such as the weighting method ²⁸, neural network ²⁹, and Bayesian method ³⁰. As a result, the fusion data would be more informative for the condition monitoring.

Question 7: Are the components of DTD organized in a hierarchical way or not?

Response: The components of DTD can be organized in a hierarchical way. In the previous work of our team [1], the hierarchical way for data organization has been preliminarily discussed, combining with the edge computing, fog computing and cloud computing. According to it, the original data from the physical entity, virtual entity and service belongs to the edge level. In this level, the incoming data can be preprocessed (e.g. filtering, redundancy removing, missing data filling) and then used to identify some potential threat patterns (e.g. equipment fault). The edge level can provide timely responses to the threats, as it generally has low latency. Then the preprocessed data would be transferred to the fog level, where more complex computing tasks are enabled to support the data feature extraction, data relation mining and fusion data generation. In this level, the fusion data can be employed to provide more comprehensive information for monitoring, prediction and optimization. The cloud level stores amounts of historical data from the edge and fog levels and supports the large-scale computation. It helps to mine more domain knowledge from the mass data to support long-term prediction and planning. In the paper, the hierarchical organization way for the DTD is not particularly discussed. More technical details about the DTD organization would be explored in our future work.

[1] Qi Q., Zhao D., Liao T. W., Tao F., Modeling of cyber-physical systems and digital twin based on edge computing, fog computing and cloud computing towards smart manufacturing, ASME 2018 13 ^th International Manufacturing Science and Engineering Conference, Texas, US, 2018, V001T05A018.

10.21956/digitaltwin.18742.r26808

Reviewer response for version 1

Zheng

Pai

1 Referee https://orcid.org/0000-0002-2329-8634 1Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong, Hong Kong

Competing interests: No competing interests were disclosed.

11 10 2021

2021

recommendation

approve-with-reservations

In this paper, the authors provide a future prospect and couple of requirements of data applications in the digital twin domain from structure, principle, and methodology views. With the illustration, the potential coming research directions are summarized and pointed out. This topic and corresponding learning areas are promising and worth investigating deeper.

Here are some minor issues that may be considered for improving this paper:

In the third requirement of digital twin data, could you explain more on the seamless mean? Such requirement is too blurry and what kind of performance do you want to achieve?

In the composition of digital twin data section, the explanation of the connection data is provided theoretically. However, it lacks the intuitive example demonstration, therefore, could more cases or concrete technologies be provided to help better understanding?

In the principle part, could more explanation of figure 2 be provided? What’s the serial numbers and arrows in the figure mean? What do the different levels stand for in the figure?

In figure 3, the quality of figures and expression way may be refined.

In key enabling technologies, some technologies concepts are adopted to enhance the development of DTD. However, could the authors provide more technical/algorithm details to achieve the corresponding goal/proposed performances?

Is the rationale for developing the new method (or application) clearly explained?

Yes

Is the description of the method technically sound?

Yes

Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Partly

If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No source data required

Are sufficient details provided to allow replication of the method development and its use by others?

Yes

Reviewer Expertise:

Smart manufacturing system, product-service systems, engineering design

Zhang

Meng

Department of Automation, Tsinghua University, China

Competing interests: No competing interests were disclosed.

28 1 2022

Reviewer 2

Question 1: In the third requirement of digital twin data, could you explain more on the seamless mean? Such requirement is too blurry and what kind of performance do you want to achieve?

Response: Thanks for the comments. The “seamless” is used to express that data carrying different information and coming from different sources are fused to maximize improving the data quality, such as improving data accuracy, reducing data uncertainty, and increasing the amount of information carried by data. For example, to improve accuracy of the simulated data, the fusion would aim at reducing the root mean squared error (RSME) between the simulated and measured data. For reducing data uncertainty, it would minimize the information entropy of the data. To improve the amount of information carried by data, taking product quality-related information as an example, the fusion will aim at improving the relevancy between the data and the quality indicator.

To clarify it, some words are added to this requirement (the fifth requirement in the revised version) as follows:

It would bring lots of benefits, such as reducing information entropy of the sensor data (i.e. reducing data uncertainty), reducing root mean squared error (RMSE) between the simulated data and measured data (i.e. improving data accuracy), and improving relevancy between the data and a certain target indicator, such as product quality, remaining useful life of a key component, and equipment healthy condition (i.e. increasing the amount of information related to the indicator).

Besides, as the “seamless” may seem blurry to express the above content, this word is deleted in the revised version.

Question 2: In the composition of digital twin data section, the explanation of the connection data is provided theoretically. However, it lacks the intuitive example demonstration, therefore, could more cases or concrete technologies be provided to help better understanding?

Response: Thanks for the comments. In the section “Composition of digital twin data”, a simple example is added in the last paragraph as follows:

For example, the real temperature collected by sensors and the expected temperature simulated by virtual models can be selected as the connection data between the physical entity-related data and virtual entity-related data. If their deviation exceeds a predefined threshold, perturbing factors (e.g. manual interferences, environment changes) causing the inconsistency would be found and eliminated, which helps to restore the data consistency.

Then a more detailed example about the connection data of an autoclave is provided in the section “Digital twin data interaction” as follows:

In addition, more related technical details are added in the part (2) of the section “Key enabling technologies for digital twin data” as follows:

(2) With respect to data interaction, data transmission technology is required to exchange connection data among different parts of DTD. ISO 23247-4 provides some options for data transmission protocols and formats. For example, the virtual entity can access data of the physical entity through protocols such as MTConnect, OPC-UA, HTTP and CoAP; and the virtual entity can be connected with the service by using application interfaces (i.e. open API) or by employing shared memory. The suggested data formats include XML, JavaScript object notation (JSON), quality information framework (QIF) and so forth. The asset administration shell, which virtualizes the asset and enables the access to asset-related data via application interfaces, also provides a way for data exchange ⁵². Then to reduce the redundancy in the exchanged data, measures such as Pearson correlation coefficient ³⁸ and mutual information ⁵³ can be employed to judge the overlapping of the data. Besides, Euclidean distance, Manhattan distance and dynamic time warping ⁵⁴, etc. can be used to evaluate the discrepancy between data from different parts of the DT to evaluate their consistency.

Question 3: In the principle part, could more explanation of figure 2 be provided? What’s the serial numbers and arrows in the figure mean? What do the different levels stand for in the figure?

Response: Thanks for the comments. More explanation for Fig. 2 is provided in first paragraph of the section “Principles for digital twin data” as follows, which explains the different levels in the figure. As in Figure 2, the outer layer shows the data requirements in seven aspects as raised in Section 1. While, the corresponding principles, which should be observed for the sake of better fulfilling the requirements, are given one by one in the inner layer. As arrows in the figure have no practical significance, they are deleted. In addition, serial numbers in the figure are used to distinguish the principles, since in the previous version, these principles are mentioned as “Principle (1), Principle (2)…” in Fig.3. But as in this version, such statements are deleted, the serial numbers in this figure are deleted accordingly.

Question 4: In figure 3, the quality of figures and expression way may be refined.

Response: Thanks for the comments. The Fig.3 has been refined.

Question 5: In key enabling technologies, some technologies concepts are adopted to enhance the development of DTD. However, could the authors provide more technical/algorithm details to achieve the corresponding goal/proposed performances?

Response: Thanks for the comments. More technical and algorithm details have been added in the section “Key enabling technologies for digital twin data” in the revised version.

10.21956/digitaltwin.18742.r26807

Reviewer response for version 1

Sun

Huibin

1 Referee https://orcid.org/0000-0002-6767-1912 1School of Mechanical Engineering, Northwestern Polytechnical University, Xi'an, China

Competing interests: No competing interests were disclosed.

7 10 2021

2021

recommendation

approve

Data plays critical roles in constructing virtual models, building cyber-physical connections, and executing intelligent operations in digital twins. This paper explores some basic principles and methods for digital twin data gathering, storage, interaction, association, fusion, evolution and servitization, as well as the key enabling technologies. The theoretical underpinnings for DTD proposed in this paper, are imperative for the further promotion and application of DT.

1. The barriers or challenges to digital twin data could be addressed.

2. If possible, an example or a scenario could be used to enhance the paper’s contributions.

Is the rationale for developing the new method (or application) clearly explained?

Yes

Is the description of the method technically sound?

Yes

Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Yes

If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No source data required

Are sufficient details provided to allow replication of the method development and its use by others?

Yes

Reviewer Expertise:

Digital twin

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Zhang

Meng

Department of Automation, Tsinghua University, China

Competing interests: No competing interests were disclosed.

28 1 2022

Reviewer 1

Question 1: The barriers or challenges to digital twin data could be addressed.

Response: Thanks for the comments. The challenges to digital twin data are discussed in the last section as follows:

Question 2: If possible, an example or a scenario could be used to enhance the paper’s contributions.

Response: Thanks for the comments. This paper mainly provides a theoretical method for DTD gathering and processing, aiming at helping more DT researchers to incorporate DTD into their works. The related research is still at the primary stage. Based on this, more applications examples and scenarios would be provided in our future works, such as examples of DTD applications in product design optimization, product quality control and equipment prognosis. They will further validate and refine the proposed methods.