Is a valuable tool that enables you to find information on the Web by specifying words that are key to a topic of interest known as keywords?

Keyword Search

AnHai Doan, ... Zachary Ives, in Principles of Data Integration, 2012

Bibliographic Notes

Keyword search and its relationship to databases and data integration have been studied in a wide variety of ways. Yu, Lu, and Chang [587] provide a recent overview of work in the broader keyword search-over-databases field. Early systems included DISCOVER [304] and DbXplorer [18], which focused on SQL generation for keyword search but used very simple ranking schemes derived from the structure of the query tree itself. SPARK [403] proposed a scoring scheme that was motivated by techniques from information retrieval.

Algorithms for scaling up search over databases remain a key focus. BANKS [80] developed the techniques for backwards graph expansion described in this chapter and adopted a very general tree-based scoring model. A refined version of BANKS developed the more general bidirectional search strategy [337]. BLINKS [298] used the root-to-path-based scoring model described in this chapter and showed that there were complexity bound benefits. The STAR [344] Steiner Tree approximation algorithm exploits an existing taxonomic tree to provide improved performance beyond the methods described in this chapter and even gives an approximation guarantee in this setting. It focuses on supporting keyword search in the YAGO project [534], also discussed in Chapter 15. The work in [51] seeks to scale up keyword search in the DISCOVER system by requiring results to be returned within a time budget, and then to point the user toward a list of other resources for more specialized information, such as query forms matching the keywords [137, 492]137492.

Other work has explored broader sets of operations in the keyword search context. SQAK [540] incorporates aggregation operations. The Précis system [523] developed a semantics that enabled combinations of keywords using Boolean operators. Work on product search [509, 583]509583 has explored taking keyword search terms and expanding them into selection predicates over multiple attributes, for a single table.

One of the major ingredients of keyword search systems is top-k query processing algorithms. Fagin's Threshold Algorithm and numerous variants appear in [221]. There are also a wide variety of approaches to performing joins in a ranked model, including [226, 266, 310, 415, 514, 515]226266310415514515. Since a top-k query should only execute to the point where it has generated k answers, specialized techniques [109, 311, 383]109311383 have been developed for query optimization in this setting and for estimating how much input actually gets consumed. Techniques have also been developed for storing indices over which top-k queries may be computed, while eliminating the need for random accesses [334]. The work [52] discusses how to build industrial-strength keyword search systems.

In the data integration setting, one task is to identify the domain of the query, and then separate its different components. For example, to answer the query “vietnam coffee production,” we would first need to identify that the query is about the coffee domain, then translate it to a query of the form CoffeeProduction[vietnam]. We then need to find a table that is relevant to the property CoffeeProduction [which is likely to be called something very different in the table] and has a row for Vietnam. Initial work in this topic is described in [167]. The Kite system [512] developed techniques for using synopses to discover join relationships and to iteratively generate queries during top-k computation. The Q system [538, 539]538539 developed the feedback-based learning approach described here. Work by Velegrakis et al. [69] considered the issue of how to perform ranked keyword search when only metadata are available for querying, as in certain data integration scenarios. Several works discussed in Chapter 15 addressed the problem of keyword queries on the collection of HTML tables found on the Web [101, 102, 391]101102391.

Naturally, there are connections between search, ranking, and probabilistic data [Chapter 13]. Details on the use of probabilistic ranking schemes are provided in [312].

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780124160446000168

Data Identification and Search Techniques

In E-discovery: Creating and Managing an Enterprisewide Program, 2009

Keyword and Boolean Searches

Keyword and Boolean search methods are among the most widely used and vetted methodologies available. They should be considered as part of any comprehensive search and identification undertaking when searching for potentially relevant ESI. However, as recent cases and studies have shown, there are pitfalls to using this technique as it often fails to uncover a large portion of potentially relevant data.

Keyword searches have been defined as a method of searching for documents which possess keywords specified by a user, a search using a full text search filter whereby a search term list is applied to a full text index to find responsive files, and a search for documents containing one or more words that are specified by a user.13

A keyword search is exactly that. If you have a keyword such as e-discovery, the search would be for that term and only that term. If it is a straight keyword search, you would only pick up e-discovery and not electronic discovery, eDiscovery, or E Discovery. There are limitations with basic keyword searches, as they can fail to uncover variants of a word. Furthermore, if there is a typo or a misspelled word such as edisocvery, or an abbreviation such as eDisco, basic keyword search technology will miss these search terms.

You can eliminate some of the limitations of keyword searches through the use of wildcards that will allow you to search for different forms of a particular word. The typical wildcard symbol is “*” or “!” and this will enable you to search for multiple variations of a word. For example, let's assume you were dealing with a sexual harassment case and you had to search for documents and e-mails that were related to that topic to determine whether others in the organization had knowledge of the potential harassment. You could search for words such as sex, sexual, sexuality, sexist, and sexism with the search term sex as that wildcard search would allow you to search for any word containing sex*.

Although wildcard searches allow you to search for variations on a word, they do have limits. If a word is misspelled or there is an abbreviation, the wildcard search may miss the word. In the preceding example, if someone used sxy or sxist, due to either a typo or an abbreviation, the wildcard phrase sex* would miss those spellings. The search could be altered to take into account misspellings, abbreviations, and typos, but after a certain point, the search term will become so overly broad that it would result in a gross over-collection and not be helpful in determining whether the search was, in fact, accurate.

Using a keyword search can be risky, and the recent federal case Victor Stanley, Inc. v. Creative Pipe, Inc. highlights how the use of keyword searches to identify documents, in this case privileged documents, can be risky. In this case, the defendants claimed inadvertent production of privileged documents because of a failure of the extensive keyword-based privilege cull of text-searchable ESI and a manual review of non-text documents and an added burden of too much data to review in the time allotted. The plaintiffs countered that the privilege review was faulty. In his opinion, U.S. Judge Magistrate Paul Grimm wrote that “all keyword searches are not created equal; and there is a growing body of literature that highlights the risks associated with conducting an unreliable or inadequate keyword search or relying exclusively on such searches for privilege review.” He determined that the defendant waived privilege in large part due to “faulty privilege review of the text-searchable files and by failing to detect the presence of the 165 documents” in the production.14 This highlights the challenges with keyword searching alone.

Boolean searches add an additional dimension to keyword searches, allowing you to search for multiple keywords together, or exclusive of each other, or within a certain distance from each other. The term Boolean refers to a system of logic developed by an early computer pioneer, George Boole. Boolean searching of text is based on the underlying logic functions of various true/false statements and uses standard operators to interlink search terms. The standard operators can include and, or, not, within, near, and more. This method of searching allows multiple keywords or search terms to be linked together to improve the relevancy of the documents identified by this methodology.

Using Boolean search techniques, the and operator between two words results in a search for documents containing both of the words. Therefore, a search term, using the earlier sexual harassment case, for sex* and harass* would turn up documents with sexual harassment in them while also turning up documents where the words sex and harass were anywhere within the document. By way of example, an e-mail discussing how a sex education class gave students ample opportunity to harass the teacher would also come back as relevant, even though it has little relevance to a sexual harassment case.

Other Boolean operators operate in a similar method. The or operator serves to find documents which have one term or the other, so cat or dog would turn up any document with the word cat or dog in it. In contrast, the not operator for cat not dog would turn up documents that mention cat and have no reference to dog. The within or near operator allows you to search for terms within a certain distance, in either words or characters, from each other or near each other; cat w/5 dog would turn up documents that mentioned cat within five words of dog for a search where w/# was configured to find terms within a certain number of words from other terms. In a similar fashion, w/s will turn up search terms in the same sentence as each other, and w/p will turn up search terms within the same sentence as each other. Other Boolean search operators include phonic searching which can find words that sound alike, such as Smythe and Smith, and stemming to find variations on endings, such as applies, applied, and applying in a search for apply.

Fuzzy Search

One of the challenges with traditional keyword-based searches is the potential that typos or misspelled words will be overlooked. Fuzzy search technology is a method that has been developed to find terms that may be misspelled, and it is particularly helpful when there is a need to compensate for errors due to OCR of paper or imaged documents. Fuzzy search algorithms apply the concept of wildcard searches to individual characters in the search term. For example, fuzzy logic would enable you to find both harass and harras if these were two variations on the search terms. Additionally, most fuzzy search engines allow you to adjust the search parameters or accuracy of the search so that you can fine-tune the search to account for a certain level of typographical or OCR errors in the search.

Although fuzzy search can help uncover potentially relevant data, it does not necessarily offer increased recall or accuracy, as it is still subject to the limitations of a keyword search.

Though it is not always necessary to search all forms of potentially relevant data in every case, you must take stock of the potential sources of discoverable data for the purposes of disclosure, including voice and video data. The amended FRCP Rule 26[a] “demand[s] an exhaustive search for and identification of sources of discoverable electronically stored information, regardless of form, including email and voice content for disclosure.” Voice recordings are a growing form of critical digital evidence from call centers in consumer product liability cases to call recordings in regulated industries. For example, in a dispute between two large banks, the defendant's “failure to retain audio recordings of its traders' telephone calls was sanctionable.” In the judge's opinion, the “appropriate sanction was adverse inference jury instruction;” and damages were in excess of $600 million. E-mail and voice communication files are more critical and complex than ever before and legal technology consumers require scalability and analytical tools to more effectively understand and manage them.15

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9781597492966000080

Dropbox Analysis

Darren Quick, ... Kim-Kwang Raymond Choo, in Cloud Storage Forensics, 2014

Keyword search terms

Keyword search terms were determined from the filenames observed, and the text from within the Enron data files. These included the following:

www.dropbox.com,” “dropbox”

“Getting Started.pdf,” “Boston City,” “Costa Rican Frog,” “Pensive Parakeet”

“How to use the Photos,” “How to use the Public”

The username and password of the Dropbox account created for this research

“filecache.dbx,” “dataset.zip,” “Enron,” “3111.txt,” “enron3111,” and “Enron Wholesale Services”

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780124199705000041

Similarity of Private Keyword Search over Encrypted Document Collection

Yousef Elmehdwi, ... Ali Hurson, in Advances in Computers, 2014

5.1 Symmetric-Key PKS

Although PKS schemes [e.g., [8–12,15]] allow a user to securely search over encrypted document collection C through keywords and selectively retrieve documents of interest, these techniques are limited to performing an exact search. That is, they cannot carry out a similarity search. Thus, such schemes are not able to support the typographical errors that exist frequently in real-world applications.

To handle such a problem, Li et al. [4] proposed the first symmetric-key PKS scheme that tolerates both minor typos and format inconsistencies in the user searching inputs over encrypted document collection in cloud. They introduced a pioneer and initial attempt on the idea of similarity keyword set [formally defined in Section 2.3]. Recall that the similarity should here be understood as minor typos introduced by users when entering the request through their keyboard [37]. The idea behind these similarity keyword sets is to index, before the search phase, not only the exact keywords but also the ones differing slightly according to a fixed bound on the tolerated edit distance d.

The key idea of Li et al. [4] is to enumerate all the similarity keywords that are within a predefined edit distance d to a given keyword. For indexing the resulting similarity variants, they use a secure index. This scheme transforms a single similarity keyword search operation into several exact keyword search operations [36]. Specifically, consider a data owner Alice who owns a collection of n documents D = {d1,…,dn}, where each document di ∈ D has a unique identifier idi, and a set of p distinct keyword W = {w1,…,wp}. Alice constructs an inverted-index-based secure index I [discussed in Section 4.1] and outsources it along with the encrypted form of collection C to cloud owned by Bob.

To search the collection for a given keyword w with edit distance k, where k ≤ d, an authorized user Charlie constructs first a similarity keyword set Sw,k for the search keyword w using the same technique used by the data owner Alice for achieving similarity keyword search. Next, he computes the trapdoor set Tw′w′∈Sw,k and sends it to Bob. Upon receiving the search request Tw, Bob compares it with the secure index I and returns all the possible encrypted documents identifiers Encsk,IDwi ||wi, where IDwi denotes a set of documents’ identifiers whose corresponding documents contain the keyword wi. Finally, Charlie decrypts the returned results and retrieves relevant documents of interest. Figure 3.7 shows the communication flow of the proposed PKS scheme.

Figure 3.7. Communication flow of PKS over encrypted document collection in cloud [4].

The succeeding works done by Li et al. [13] and Wang et al. [30] focus on efficiency. Both authors proposed PKS scheme with symbol-trie-based secure index [discussed in Section 4.2] in order to achieve high efficiency. Similar to Li et al. [4], they utilized the wildcard-based technique [formally defined in Section 3] to generate a storage-efficient similarity keyword set Sw,d for the keyword w with a desirable edit distance d.

In Ref. [13], Alice constructs a symbol-trie-based secure index I in the same way as that discussed in Section 4.2 and outsources the secure index I along with the encrypted document collection C to cloud owned by Bob. For searching documents containing a keyword w with edit distance k, where k ≤ d, Charlie generates Sw,k from the keyword w using the same technique used by Alice for achieving similarity keyword search and derives the trapdoor set Tw′w′∈Sw,k using a one-way function f [sk,.], where sk is a secret key, and sends them to Bob. Upon receiving a set of trapdoors as the search request, {Tw}, Bob divides each entry in the trapdoors set into a sequence of symbols from Δ, performs search over I. Specifically, for each trapdoor sequence, the first symbol is matched with the child of the root of I. If there is existing node equal to the symbol, set it as the current node [38]. Subsequently, the second symbol is matched with the child of the current node. This process is carried and when the current node is the leaf node, Bob returns a set Encsk,IDwi||wi attached with this node to Charlie.

However, Wang et al. [30] scheme is slightly different than Li et al. [13] scheme. In Ref. [30], Alice picks two random keys x and y and computes Tw′ = f[x, wi′] as symbols from Δ for each wi′∈Swi,d1≤i≤p . She builds up a symbol-trie-based secure index I that covers all the similarity keywords of wi ∈ W and attaches Encskwi′ ,IDwi||wi for i = 1,…,p to I, where skw i′=gγwi′ and g[key,.] is a pseudo-random function. Then, she outsources these information to Bob. To search for input w with edit distance d, Charlie generates Sw,d from his input w and computes trapdoors Tw′ = [f [x, w′], g[γ, w′]] for all w′∈Swi,d. Next, he sends a set of trapdoors Tw′w′∈Sw,d as a search request to Bob. In fact, the edit distance of user’s may be different from the predefined one. Upon receiving the search request, Bob divides each f [x, w′] in the trapdoors set into a sequence of symbols from Δ, performs search for each trapdoor over secure index I. Specifically, for each trapdoor sequence, the first symbol is matched with the child of the root of I. If there is existing node equal to the symbol, set it as the current node. Subsequently, the second symbol is matched with the child of the current node. This process is carried and when the current node is the leaf node, Bob uses the corresponding g[γ, w′] to decrypt the matched entry. Then, he returns a set IDwi where ed[w, wi] ≤ d to Charlie.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128001615000037

Structured Search Solutions

Mikhail Gilula, in Structured Search for Big Data, 2016

7.1.2 Optimizing and Energizing Marketplace

Another shortcoming of the keyword search is that the search output rankings are generally unrelated to the qualities of merchandise [i.e., specifications] or the deals offered. Since the search results tend to be voluminous, high search ranks are critical for merchants.

The keyword search puts buyers at a disadvantage because they are only able to look through the first few pages of an output, and whereas a better deal may be on the next page that they did not get to see.

When buyers can find the best deals faster and more reliably, the competition in the marketplace increases and the markets work more efficiently. Therefore, working to the buyer’s advantage, the structured search technology enables better prices, better shopping experience, and a more efficient and energized marketplace at large.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128046319000078

Building an Electronic Learning Community: From Design to Implementation

Anne Rose, ... Victor Nolet, in The Craft of Information Visualization, 2003

Provide alternative search strategies

Providing different alternatives [e.g., explorer, resource catalog, keyword search] for teachers to search for resources enhanced the overall usability of the system by allowing users to choose the tool that best suits their work style, their computing environment, and the task at hand. The explorer's dynamic query interface provides an overview of all the available resources and allows users to search and explore using a variety of controls but its current implementation requires a fairly fast CPU with sufficient memory. Teachers working at home on slower machines through modem connections would more likely prefer to use the resource catalog or a simple keyword search. Using the resource catalog or keyword search is sufficient when searching for a specific resource. Some teachers may even prefer to work from a hard copy of the resource catalog that they can read at their leisure.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9781558609150500214

Introduction to Structured Search

Mikhail Gilula, in Structured Search for Big Data, 2016

Abstract

This chapter compares side-by-side the features of the keyword search or information retrieval and the database search. The structured search is conceptualized as a technology for querying multiple data sources in an independent and scalable manner. It occupies the middle ground between keyword search and database search. As in the keyword search paradigm, query originators do not need to know the structure or the number of data sources being queried. As in the database paradigm, users can pose precise queries, control the output order, access data in real time, and manage the data security.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128046319000017

Starting Your Research

Thomas W. Edgar, David O. Manz, in Research Methods for Cyber Security, 2017

Research before the Research

Before setting up and running potentially expensive studies or experiments, it is prudent and necessary to first fully understand what is already known and where the research community stands on the topic. Scientific understanding is built up over time and it is important to understand what knowledge exists to produce useful and progressive results from new research.

Reviewing literature is helpful in providing insights into your cyber system of interest and also what knowledge already exists and through what means it was gathered, both of which are helpful in deciding where you should take your research. Ask yourself: is your topic already heavily researched or fairly untouched? Do you have a new perspective on the problem? Do you have a theory about the cyber system that is different than or contrary to current theories? At the end of this chapter, we will work through these questions and how they fit into how one decides what is the most prudent research path. However, it is critical that you have studied enough of the previous work to be able to answer these questions with confidence. If you are unsure of an answer, then you probably need to continue with your literature search.

There are a few ways to go about a literature search. The first is to perform a keyword search. Most publishers require a list of keywords in each paper to help with this process. To start, generate a list of keywords that surround your research question. Aim for a list of 5 to 10, and be as specific as possible with keywords that scope your topic. For example, if you want to understand user cognitive capacity for passwords, you might create a list of keywords such as cognitive load, password, computer security, authentication, and human factors. A good keyword is specific enough to limit the results to the topic of interest, but general enough to ensure that the resultant search includes a broad coverage of work on the topic and does not limit to a small subfield of research.

Online academic literature databases and search engines have made performing keyword searches quite easy. Publisher search capabilities such as IEEE Xplore6 or the Association for Computing Machinery Digital Library7 are good options or try search engines that query across multiple databases such as Engineering Village 2,8 CiteSeerX,9 CiteUlike.10 Every database and search engine has a little different flavor and covers a different set of fields and topics, so it is good to search across multiple databases and make sure you pick the most appropriate set for the keywords and topic.

Did You Know?

Currently, a good set of computer science search engines are: Engineering Village 2,8 ACM digital library,7 IEEE Xplore,6 ISI web of knowledge,11 ScienceDirect,12 CiteSeerX,9 arxiv,13 SpringerLink,14 and Wiley Online Library,15 but there are search engines and databases that cover other fields that may also include work of interest.

Going back to our example question of user cognition for passwords, it would be apt to search across traditional computer science and engineering, but also searching something like PsycINFO16 would also be prudent, given the psychology aspect of the research question. Starting with a search using something such as ScienceDirect12 or Google Scholar17 is a good start, as it covers a broad range of journals and topics.

Other than a keyword search, another good method of finding papers to review is to look at recent papers on or around the topic of which you are already aware. From these papers you can then start to do an iterative, intelligent crawl of the references to look for additional papers [this is in essence snowball sampling, which we discuss in Chapter 4, Exploratory Study]. Look at when the references are cited in the text and their titles to determine the best papers to review in the next cycle of papers. Also, pay attention to the quality of the paper as better papers will often be based on a stronger understanding of the literature. Iterating this way for five to seven times should give you a fairly large corpus of data with which to start.

We should mention that a lot of databases require a fee for subscription to read their papers. There is a strong and growing movement to provide free access to publications, and so more and more publishers are providing access to papers and new open paper portals such as arxiv. We support this movement to open access to research as understanding, building, and comparing to previous research is critical to the knowledge-building process. However, there is a business aspect to the scientific world that cannot be ignored, therefore some of the highest impact conferences are fee publishers, so it is important to weigh the potential impact of a paper. Additionally, there is a dark side to “free”/open publishing. Several publishers have started to charge sometimes exorbitant amounts to publish in return for allowing anyone to subscribe and read for free. The problem is that this essentially becomes a pay-to-publish scenario and the integrity of peer review comes into question. In general, well reviewed and regarded journals and conferences avoid these pitfalls but be on the lookout for such practices as open publishing continues to flourish.

The benefits of conducting a literature review or “lit search” are twofold. First and foremost, the review of previous work will help to inform your own approach, ideas, theories, or hypothesis, as explained previously. The second is to ensure a sufficient review of the landscape so that you can convince yourself, and any reviewer of your work, that you have diligently evaluated the work of others, to ensure that you are not accidentally duplicating, overlapping, or worst of all, fraudulently copying other work.

Once you have a corpus of papers to review, there are a few different objectives you want to gain from the information acquired from the literature survey analysis. The first and most obvious is to understand what research has been done and if the question you have posed has been sufficiently answered. The second is to help you determine what type of research you should be performing to best contribute to the community. This can include changing the research method to understand a different perspective of the question or attempting to reproduce a result to increase the confidence in a theory. The third objective of the literature survey is to prepare you for writing your results paper after the research is complete [we go into depth on how to write good research at the end of each research method chapter].

You will largely have to learn how to organize your papers to best suit these objectives and your work process. However, an approach to get started is to parse out the information from the papers along a few different categories: the research question they are asking, the conclusions, and the methods used to produce the result. Organizing the information from the papers in this manner enables you to quickly look to see what questions were asked and answered to determine if they have tried to answer your specific research. If you think any of them have, then you can assess the quality of the previous work. Ask if they have fully answered the question and if you believe they missed an assumption or had a bias. If there are multiple papers that answer your question, is there consensus? If not, then it might be worth proceeding to attempt your own research to answer the question. If yes, then is there a specific paper that makes a crucial statement that is worth reproducing? Reviewing how the work was performed and documented by the paper is just as crucial in determining the research path as the results of the papers.

Dig Deeper: Falsifiability

A fundamental understanding in the philosophy of science is that, as limited observers subjected to the principles of the world we are trying to study, we cannot actually prove the existence of a model of the world. To achieve a more deductive process from our natural inferential operation, a falsifiable hypothesis is key. To read more into the philosophy of science look at The Logic of Scientific Discovery.18

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128053492000030

The Semantic Web

Pierre de Keyser, in Indexing, 2012

A timetable for the Semantic Web

In an interesting PowerPoint presentation Nova Spivak gives an outline of the past and the future of the web [8]:

Period Web Indexing and retrieval
1990–2000 Web 1.0, the WWW keyword search
2000–10 Web 2.0, the Social Web tagging
2010–20 Web 3.0, the Semantic Web semantic search
2020–30 Web 4.0, the Intelligent Web reasoning

At this moment we are living the hype of the Social Web, where everybody can put information on the web and add [meaningful or meaningless] tags to it. These techniques are being integrated into all kinds of web-based applications, e.g. library catalogues or museum sites. Meanwhile the instruments for the Semantic Web are developed, but it will be a long way until the Semantic Web is fully operational. Some problems need to be solved first:

More standards must be developed.

These standards must be translated into computer programs.

A lot of information has to be encoded in view of the standards.

But even if all this is realized, there are still some fundamental issues to deal with:

Privacy issues

Many examples of what the Semantic Web could be take it for granted that the necessary information is freely available on the web. This is not the case: a lot of information is stored in corporate databases which will never be opened for the public. In real life, a friend-of-a-friend project will constantly collide with ‘access denied’ messages.

The chaotic nature of the web

To realize the Semantic Web we need to have highly structured documents. They should correspond to the XML, RDF and other standards. It is questionable if the majority of the documents on the web will ever be structured in this way. Since the beginning of the 1990s people have uploaded documents on the web that are not structured in a way a semantic search engine can make sense of them: word processor files, PowerPoint presentations, badly designed HTML pages, etc. These documents not only stay on the web, but the same kind of unstructured information is added every day in a massive way.

The discrepancy between ontologies and Web 2.0 tagging

Web 2.0 is successful because it is fun: you can tag whatever you want in whatever way you like. Some techniques may be used to get a grip on this chaos, e.g. comparing similarities, differences and synchronicity between tags. Even if we can develop instruments to ‘normalise’ them, they are still very different from ontologies which are built according to a set of rigid rules.

Ontologies’ weak spot

But there is more: ontologies are fundamental for Web 3.0 because they form a web of meaning. Navigating through ontologies and from one ontology to another would allow us to refine our search until we reach our goal. The formal rules of how to build an ontology may be well defined in a standard and integrated into ontology editors, but neither one or the other can prevent me from making a completely nonsensical ontology. Let’s not be naive: many people are doing their best to contribute in a positive and meaningful way to the world, but a lot of idiots, criminals and socially frustrated people find satisfaction in sabotaging those efforts.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B978184334292250012X

The first two decades of research on smart city development

Luca Mora, Mark Deakin, in Untangling Smart Cities, 2019

3.1 Introduction

The science of complexity and modern theories of urban dynamics have completely changed the way in which the functioning of cities and their evolutionary process are understood. Cities are now perceived as complex systems whose structure is shaped by a multitude of heterogeneous and apparently disconnected bottom-up activities that give rise to an internal order. This order is extremely sensitive and subject to continuous changes, which trigger an endless process of evolution [Batty, 2005, 2013; Batty and Marshall, 2009; Jacobs, 1961]. Dealing with this never-ending evolution is a challenge whose complexity has required researchers operating in many different academic disciplines to join forces and pool their knowledge [Benevolo, 1993; Secchi, 2011]. This collective understanding of cities has resulted in a unique knowledge domain which is known as urban studies [Liu, 2005], a domain that Kamalski and Kirby [2012: S3] consider to be “one of the longest established interdisciplinary fields within the modern academy.”

Computer science is one of the academic disciplines that has become part of this large interdisciplinary field of study. The interest of computer science in the urban landscape and its development dynamics began at the end of the 20th century, when the digital revolution was on the verge of transforming cities “into a constellation of computers” [Batty, 1997: 155] and large networks of electronic devices started to be embedded into the built environment [Mitchell, 1995]. Despite being only in its early stages of development, this transformative process immediately attracted the attention of academic environments, in which researchers were either concerned with the potential consequences on society or interested in better understanding the opportunities opened up by such a far-reaching change. This interest resulted in a widespread perception that is well explained by Alessandro Aurigi in a briefing paper written in 2003 to introduce a series of seminars on virtual and cyber cities: in the mid 1990s, by looking at the diffusion of ICT devices, many researchers suggested the new frontier of ICTs was to provide spatial, social, economic and environmental challenges with a solution, and “cities looked like the ideal arena where this revolution would test and show itself, changing economic development, services, and above all, community life” [Firmino, 2003].

The relationship between ICTs and the development of urban systems started being analyzed with Graham and Marvin’s book “Telecommunications and the City: Electronic Spaces, Urban Places” [Graham and Marvin, 1996]. Their research activity, along with the work published by Castells [1989, 1996, 2004] and Mitchell [1995, 1999, 2000, 2003], has allowed this new knowledge area to take shape and grow. This knowledge production process has resulted in the publication of a large body of academic literature [see Graham and Marvin, 1996, 1999, 2001, 2004; Graham, 1997, 2000, 2001, 2002, 2004]. Many of these publications can now be considered as some of the key intellectual resources exploring the complex and still poorly understood relationship linking the sustainable development of urban environments to the deployment of information and communication technology [Graham and Marvin, 1996].

Smart city development is part of this knowledge domain and the research investigating this new concept started in 1992 with the book entitled “The Technopolis Phenomenon: Smart Cities, Fast Systems, Global Networks” [Gibson et al., 1992]. Over the years, smart cities have become the symbol of ICT-driven urban sustainability and have received growing attention from many researchers working not only in the academia, but also for governmental organizations, industry, and civil society organizations. Thanks to the interest expressed by such researchers, smart city research has been growing sharply since 1992.

Evidence of this trend emerges when analyzing the data offered by Google Scholar. Following a keyword search aiming at identifying the literature produced between 1992 and 2017 where the term smart city is used, either the singular or plural form, Google Scholar sources 73,239 documents.1 The data collected during the search, which is shown in the bar chart of Fig. 3.1, demonstrates that the annual volume of publications dealing with smart city development has increased by approximately 650 times within 26 years, moving from 26 in 1992 to 16,700 in 2017.

Fig. 3.1. Annual production of smart city literature from 1992 to 2017.

The study reported on in this chapter aims to [1] provide an overall and detailed picture of what happened during the first two decades of research on smart city development and [2] lead to an improved understanding of this research field. This aim is met by answering the following questions:

What are the main characteristics of the literature on smart cities that is produced between 1992 and 2012?

How large is the scientific community researching this subject?

What are the productivity levels of the researchers within this community?

What organizations do members of this research community belong to?

What are the interpretations of smart city development that emerge from this literature?

What are the main factors that have influenced the production of literature during the first two decades of smart city research?

To answer these questions, bibliometrics is used to analyze both [1] the literature on smart city development published between 1992 and 2012 and [2] the community of researchers involved in this process of knowledge production. The methodology adopted to conduct this analysis is presented in Section 3.2, which is followed by a comprehensive account of the findings [see Section 3.3]. The analysis will shed light on the first 21 years of research into smart cities and will start to uncover the division that such a research has produced. Sections 3.4 and 3.5 discuss the relevance of the findings in the framework of the literature on smart city development produced subsequent to the period under investigation.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128154779000037

Is a valuable tool that enables you to find information on the Web by specifying words or phrases related to a topic of interest known as keywords?

Web sites such as Facebook and LinkedIn are examples [ ] of Web sites. A[n] [ ]is a valuable tool that enables you to find information on the Web by specifying words or phrases related to a topic of interest, known as keywords.

Which of the following is used for defining the visual design of a Web page or group of pages?

Answer and Explanation: Cascading style sheets are the markup languages which is used to design and describe the presentation of a written document. It helps in adding styles like font, and colors to the web pages.

Is the standard page description language for Web?

HTML is the standard markup language for creating Web pages.

Which of the following markup languages is considered the standard page description language for Web pages?

The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser.

Chủ Đề