KnowledgeGraphsinManufacturingand Production:ASystematicLiteratureReview
GEORGBUCHGEHERAVIDGABAUERORGEMARTINEZIANDLISAEHRLINGER1 2
Corresponding athor: Georg Buchgeher (goorg.buchgeher@scchat) situte for Aplicatie-Oriceed Kaorldge Pocesing (FAW) Jotanmes Kepler Uiversity Lin 4040 Linz Aastria Sotware Comekeee Cester Hgnberg GmbH 4232 Hageberg Aastria
This woek as sppedn pr by th Itg Osteic-Bam 2012020Prgrmm fned r Gr (A8292) par by thfor Digital an Emic Afrs (BMDW) and in pat by te Pvince of Upper Austria in the fram of the COMET Cetne Federal Ministry fr Climate Ation Environme Energy Mobility Inovation and Techology (BMK) in part by tc Federal MinistryCenters for Excellent Techologies Programme managed by Austrian Research Promotion Agency FFG.
ABSTRACT Knowledge graphs in manufacturing and production aim to make production lines moreefficient and flexible with higher quality output. This makes knowledge graphs atractive for panies opd b s n x se op a d pe op u e q s m eo ois needed. Therefore we have conducted a systematic literature review as an attempt to characterize the state-of-the-art in this field i.e. by identifying existing research and by identifying gaps and opportunitiesfor further research. We have focused on finding the primary studies in the existing literature which wereclassified and analyzed according to four criteria: bibliometric key facts research type facets knowledge graph characteristics and application scenarios. Besides an evaluation of the primary studies has also beensasype oogo s ssoop we can offer a plete picture of the domain which includes such interesting aspects as the fact thatknowledge fusion is currently the main use case for knowledge graphs that empirical research and industrialapplication arestillmising toalarge xtent that graph bedings are not fully exploited and that technical literature is fast-growing but still seems to be far from its peak.
INDEX TERMS Knowlede graphs manfacturing production systematic lierature reviw.
L.INTRODUCTION
closed applications within panies that are only accessi-ble to approved users [1]. Examples for public KG projects PROSPERA [6] Wikidata [7] and YAGO [8]. The most are DBpedia [2] Freebase [3] KBpedia [4] NELL [5] popular mercial and closed KGs are Cyc [9] Google Knowledge Graph [10] [11] Google Knowledge Vault [12] and Microsoft Satori. The blog entry by Google [10] isfrequently quoted as seminal work of KG research since it sparked the discussion in this field in 2012. Consideringearly KG research from the 1980s Google has rather revived KG technology than invented it. The foundation of KGs hasbeen laid out by Sowa [13] who provided conceptual graphtheory as an early stage contribution for knowledge represen- tation in semantic networks [14]. Further seminal work onaimed at building a KG to represent medical and sociological KGs has been conducted by Stokman and co-authors who
The twenty-first century has been clearly marked by its rapidpanies are required to undergo an inherent transformation growth in artificial intelligence (AI) applications. Thus -to leverage AI for reaching Industry 4.0 standards and for gaining a petitive advantage in the international mar-ket. While AI technologies such as neural networks naturallanguage processing chatbots autonomous driving vehicles and digital twins received increasing attention in the field applications of knowledge graphs KGs) in this domain. of manufacturing and production little light is shed on the
In recent years a large number of open (public) as wellas closed (enterprise) KGs have been developed. While open KGs which are often academic and open-source projects provide access to anyone on the web enterprise KGs are
literature [15][17].2 Even though knowledge graphs are nowadays frequently applied in different domains there isstill no formal definition which is accepted in the entire munity. In 2016 Ehrlinger and WoB [18] proposed the following widely-acknowledged definition:
systematically capture the current state-of-the-art of KGs by revealing their utilization in manufacturing and production.Already available standards and consensus with respect tobe discovered and discussed. By having an overview of the the structure and construction of KGs in this domain shouldextended as well as novel approaches can be developed that current state-of-the-art existing solutions can be refined andOECD ? more than 70 % of the G7's world trade is based have not yet been the subject of research. According to theupon goods. Even though less than 25 % of all jobs areprovided by industries a significant amount of jobs in other sectors depend on the jobs in the production sector. Despitethe substantial size and importance of this sector KGs have been neglected as one of the key AI technologies so far. Thus this research contributes to the dissemination and use of KGsin industry applications by highlighting their benefits and how panies can leverage them. We aim at answering theresearch question: *Which role play knowledge graphs in manufacturing and production? The question is answered byan investigation of (1) the bibliometric key facts (2) researchtype facets (3) KG characteristics and (4) KG application scenarios.
acquires and integrates informtion into an ontology and Definirion Knowiedge Graph:A knowledge graphapplies α reasoner to derive new knowledge.
tive definitions and their implications and limitations. To be e ix ssnsp [st e in-line with the state-of-the-art our paper covers all studiesaccepted by a scientific peer-reviewed journal or conference or if (2) they conform to at least one definition reviewed byEhrlinger and WoB. In summary the mon denominatorof a KG is its structure in terms of nodes (entities) and edges (relationships). For storing graphs the two most popular datamodels are RDF triples and property gruphs.3 The majority of public KGs is stored in form of Resource Description Frame-q psood (qods) sa ()omthe World Wide Web Consortium (W3C).* In RDF subjects and objects are nodes and predicates the edges betweenthe nodes. Property graphs store nodes and edges natively whereas the nodes can have properties in form of key-valuepairs [19].
Section II presents related work and Section II describes the The remainder of this paper is structured as follows:planing and realization of the systematic literature review.studies and answers to the rescarch questions are provided The results which were obtained from analyzing the primaryquestions along with open research challenges and the threats in Section IV. In Section V we further discuss the researchto validity. Section VI concludes our study.
cific and often plex domain [20]. This explicitly mod- KGs are primarily used to semantically model a spe-the accuracy of downstream tasks like question answering eled domain knowledge is used to support and enhance[21] [22] information extraction [23] [24] named entityremender systems [29] [30]. Also the analysis of KGs disambiguation [25] [26] semantic parsing [27] [28] andor to classify nodes has gained increasing attention [31]. with machine learning methods e.g. to predict missing edgesSince most machine learning models require a set of featureate “embeddings" from KGs. A KG embedding transforms vectors as input much research has been done to gener-the nodes and (depending on the approach) also the edges to a numeric feature vector [32] which serves as directinput to a machine learning model. Considering the plethoraof application scenarios mentioned above several domains have already perceived the substantial benefits KG technol-a e sup x sq o on the use of KGs are science [33] healthcare [34][37] cybersecurity [38][41] data defects [42] education andtraining [43][45] and tourism [46].
ILRELATED WORK
To the best of our knowledge there is no systematic literature review or systematic mapping study dedicated to knowledgegraphs in manufacturing and production. Yet there are stillsurveys reviews and books that aim to provide an overview on the state-of-the-art of KG technologies.
provide the first survey on KGs with a special focus on the Chronologically we start with Nickel et al. [47] whousage of latent and graph feature models for retrieving knowl-edge to predict new facts/edges in the graph. The founda- tion architecture construction and applications of enterprisePaulheim [49] describes how to refine a knowledge graph knowledge graphs is outlined in detail by Pan et af. in [48].based upon its A-box via pletion error detection typesof refinement internal and extermal methods and puts for- ward various evaluation standards that can be employed.The study of Wang er al. [50] is similar to [47] with a prehensive summary of translational distance and seman-tic matching models in the field of KG embeddings and ament on the usefulness of KGs with respect to re- [1s] u suoedde Suuamsue uosnb pue suass spu
To lay out the foundation for novel research in using KGsin the field of manufacturing and production we conducted a systematic literature review. The aim of this review is to
IIL RESEARCH METHOD
A.RESEARCH QUESTIONS
Lin et al. employ a subset of the KG embeddings presented in [47] [50] and address plex relation modeling rela-tional path modeling and multi-source information learning.do not focus on a specific KG topic but gives a gen- Contrary to all previously mentioned articles Yan et αl. [52]eral overview on how KGs are constructed.? The work of Gesese et af. [53] is the first survey on KG embeddings thatmake use of literals. Furthermore Kazemi er al. [54] out- lined how representation learning approaches are expedientfor dynamic graphs. The book of Kejriwal [55] provides avery general summary of domain-specific KG construction. One survey about fault domain knowledge graphs has beenhas been conducted by Al-Moslmi er al. [57] outlining Ks od ao [9s] u pue ue q umSussoud ensue engeu uo poseq soo ussoud-adqe puuooa a puse qons tion and named entity linking to enable the constructionof a KG. Fensel et al. [46] prowide a recent introduction into knowledge graphs with a lot of well-relatable real-lifeexamples. Heist et al. [58] give an overview of cross-domainKGs that are publicly available on the Web. Ji et al. [59] extend the study of [50] by explaining how different kindsof neural networks can be used to generate KG embeddings. Finally a recent study by Hogan et al. [60] prises all ofthe aforementioned studies’ topics and provides a profoundand prehensive foundation into the field of knowledge graphs starting from scratch covering both deductive andinductive knowledge representation techniques.
According to Brereton er af. [61] systematic literature reviews (SLRs) are *α means of evaluating and interpref-question or fopic area or phenomenon of interesr'". SLRs ing all available research relevant to α particulcar researchare secondary empirical studies used to provide a struc-tured overview of a research field [62]. An SLR follows a well-defined methodology which makes it less likelythat the results of the literature are biased although itdoes not protect against publication bias in the primary studies [62].
systematic literature review consists of three main phases In this SLR we followed the steps outlined in [62] areporting the SLR. This section presents the planning of the i.e. the planning of the SLR conducting the SLR andstudy i.e. the research questions the data sources and searchstrategy along with the classification and evaluation criteria.
The aim of this SLR is to analyze the current status of knowl- edge graphs in the field of manufacturing and production.Thus existing research is investigated to identify potential
TASLE 1. Research questions
Nr RQ1 Research questions What are the bibliometric key facts of KG publications?RQ2 RQ3 RQ4 Whzt are aplikation scenaros of knowledge graphs? What re the specific knowledge graph characteristics? Which rerch te fts do the identifed picatis addess?
gaps and opportunitie for future work. The main research question guiding this study is:
Which role play konowledge graphs in marufacturing and production?
attempts to provide specific insights into the relevant aspects The research question we established for this studyof how KGs are used in production and manufacturing. Thisincludes questions about the articles’ bibliometric key facts research type facets specific KG characteristics and theirof research carried out up to that time (theoretical pro- application scenarios. We also want to examine the typeposal empirical) together with the type of research forumsin which these works have been published and presented. The exact research questions this SLR answers are reportedin Table 1.
RQ1 provides an overview of bibliometrics of publishedstudies concerning knowledge graph applications in produc-tion and manufacturing to exhibit the importance and timeli- ness of this topic. In more detail we analyze the publicationtrend publication venues and origin countries of research institutes that have published studies in this field. RQ2 inves-tigates the maturity of knowledge graph applications by ana-lyzing which research methods have been used for the val- idation of research. The specific construction techniques ofknowledge graphs are addressed in RQ3. This is of major importance for consultants and practitioners as it reveals thestracture of knowledge graphs employed in a production andmanufacturing seting. Finally RQ4 examines the application scenarios in which knowledge graphs have been used inparticular manufacturing domains knowledge graphs have the context of production and manufacturing i.e. in whichbeen used for which concrete use cases knowledge graphs are used and which kinds of systems are developed based onknowledge graphs.
B. DATA SOURCES AND SEARCH STRATEGY
To build an adequate search string we have selected twomajor search terms: Method’ and ‘Field'. The first major search term represents the employed methodology namely ‘knowledge graph’ whereas the second major search term illustrates the field in which the knowledge graph shouldhave been used. This term includes all sorts of technologiesand synonyms of manufacturing and production in which the knowledge graph application should take place. Termslike 'enterprise industry” pany' corporate manu- facturer' 'manufacturing' ‘organization' and *production′should cover all synonyms for production and manufactur-ing whereby we included the German word *industrie'
TABLE 2. Search string
TASLE3. Summary of the search strategy.
Search strategy Acadcmic databasese ACM Digital LibcaryIEEExplore ISI Web of Sciences2omos tp x0 ScienceDirect Springersu sde Conference papers Google Scholar Bookssadnd dop Journal pupersIndustry/professnal coferec conrbution Industry/professional woekshop contributionSearch applied to Non-academic online publications TitleKeywonds Full-text (Google Scholar) Abstract
Method (knowledge graph') Seareh termsField (("enterprise") OR AND("indestry") OR ("industrie') OR("physical system) OR ("internet of things") OR("pany") OR ("corporate’) OR (“organizatios") OR(“"product”) OR (“production°) OR("management') OR ("manufacturer²)OR("manufacturing"))
as well since knowledge graphs could also be applied in terms of Indastrie 4.0 which is also monly used in thethings’ should supply us with references concerming *inter- international academic literature. Furthermore *intermet ofnet of things’ and industrial internet of things’² whereasthe bination of knowledge graphs and cyber-physical ‘physical system’ is related to literature with a focus onsystems. In addition *enterprise’ and *management” retrieveand finally we specifically outlined product for produt knowledge graphs’ which is a rather new but interesting fieldof knowledge graph applications.
or articles published in industry/professional conferences workshops online journals/magazines or corporate blogs. The strategic search has been conducted recursively that is considered. Personal blogs or web pages have been excluded relevant studies referenced in the primary studies will also befrom the search.
taken into account for the systematic literature review is The inclusion and exclusion criteria whether a paper isshown in Table 4. Every study needs to include at least one of both major search terms. Additionally it has to bepublished in an academic or professional forum. Englishtion date is not allowed to exceed the 26th of February has to be the language of the full-text and the publica-2020. In case the inclusion criteria have been fulfilled and none of the exclusion criteria has been triggered aswell the study will be considered as a primary study inthe SLR.
study is shown in Table 2. The search terms were constructed The final search string that has been used in the presentedusing steps described in [61] in which the Boolean OR isterms and the Boolean AND is bining the link to major pea no ss sus e oo n pnterms.
The proposed search strategy is set out in Table 3. Thescope of the search considers publications and contributions presented in both academic and professional forums andpublications. That is we have considered academic publi-academic conferences or peer-reviewed books) in addition cations (such as those published in journals or presented into publications and contributions presented in industry orprofessional forums such as conferences workshops and online publications. For academic publications the sourcesof choice are: ACM Digital Library IEEExplore ISI Web of Science ScienceDirectand Springer. It has been a need tousea general search engine which inour case is Google Scholr tocriteria onthe data sources has been invoked to overe par- include non-academic contributions and publications. Certainticular challenges to avoid assessing hundreds of thousands of articles. To keep the search within reasonable bounds we restricted the number of results retrieved from Google Scholar to 300.8 What is more this data source was appliedonly to search for non-academic primary studies: those papers
In the first round the title and abstract of each studyare analyzed whether the paper is an eligible fit for the SLR. Although corporate blog posts are considered per-sonal blogs or web pages are strictly excluded. In case the paper is only available in the form of a PowerPoint presentation orthe emphasis of the article is not onknowledgegraphs in a production or manufacturing setting it is excluded as well.
In the second round we are left with all papers thathave been affirmed to be relevant in the first round. In this round also the fulltexts of the papers are considered. If anarticle only has an abstract but no full-text or represents a summary of a workshop it is excluded. Non-academicor non-professional papers are eliminated as well. Further we have dropped papers discussing knowledge graphs but only refer to manufacturing and production as potential usageknowledge graphs in this domain. For deciding if a paperdescribes a use case from the manufacturing domain wehave used the North American Industry Classification Sys-
TABLE 4. Summary of the selection strategy.
Inclusion criteria Inclusion and exelusio m eriteria IC-1: Terms fulfill the search stringIC-2: Academic joumal conference ad workshop papers IC3: Contribution to industryprofesionl conferences workshops and online publicationsIC4:Papes writen in English IC-5: Publication date: until 26th of February 2020Exclusion criteria for titles and abstract EC-1.1: Personal blogs or web pages EC-1.2: Papers available only in the fom of PowerPoint presentationsfor full text EC-2.1: Papers available only in the fom of astracts EC-1.3: Papers which do not focus on knowledge graphs in manufacturing or productionEC-2.3: Non-acadcmic/non-profesional online publications EC-2.4: Paes hich are using mfactring proti apliations jst as amls EC-2.2: Papers presenting a summary of a workshopEC-2.5: Papers which do nt focus oe knwledge graphs in manfacturing acconding to the NAICS
TABLE 5. Data items extracted from each study.
Nr. Item name Deseription Relevant RQD1 D2 ra sogro Vene In which year was the article published? What is the name of the journal conference or workshop? RQ! RQID3 D4 Country Research feld What are the Scimago classifcations of th otlet? Where are the research institutes located that have published the study? RQI RQ2D5 D6 KG approach Evidence level What is thi e f tn of t psd What kind of knowledge graph and which technique has been used? RQ2 RQ3D7 D8 Domzin System Kind Use Case In which mafcturing domin has the knoledge raph been costructed epu qpddns s san Which kind of system has been developed based on a knowledge graph? RQ4 RQ4D9 RQ4
tem (NAICS)? as a reference. The NAICS is a classificationoverview of the manufacturing domain. All other articles system for business establishments providing a systematicthat are in-line with the inclusion and exclusion criteria are considered as primary studies.
results of the other and finally they discussed and reached aconsensus on the data extraction results.
D.EVALUATION
A six-point Likert-scale was designed to provide a qualityassessment of the selected primary studies. We categorizeaccording to their quality of evidence (as proposed by [63]) studies to five different research type facets that are weightednamely from Opinion Papers *1’ to Evaluation Resecrch '5” The final numerical value which generates the evaluation ofeach paper assumes a value between 0 and 1. The evaluationprovides insights into the degree to which different aspects of knowledge graphs are considered in existing research in thehelp to identify the quality of research carried out. field. It was decided that the results of this assessment would
C.DATA EXTRACTION AND SYNTHESIS
To answer the RQs defined in Table 1 we extract specificdata from the selected primary studies. Table 5 highlights thedata items (D1 to D9) extracted for the analysis in this review. D1 D2 and D3 provide clues concerning the distribution of over years venues and countries of publication and thus knowledge graph studies in manufacturing and productionanswers RQ1. D4 and D5 directly contribute to the answersof RQ2. D6 can be used to answer RQ3. D7 D8 and D9 contribute to the answer of RQ4 and further discussion ofknowledge graph approaches in manufacturing and produc- tion. To ensure that the data extraction results are unbiased two authors performed the data extraction for each primarystudy independently and then one checked the data extraction
The questions posing the quality assessment are shownevaluation suggested by [64]. The purpose of these evaluation in Table 6 and follow the six-level classification of evidencequestions was to assess the primary studies based upon theemployed methodology as well as how the proposal has been integrated.
IV. EMPIRICAL RESULTS
In this section we offer the detailed results of our literatureanalysis. Thus this section is structured around the four research questions we have answered.
A. RESULTS OF THE SEARCH
The initial search resulted in 833 articles. A detailed breakdown by databases is shown in Table 7. We have obtained