Search tools. determining the purpose of searching for information. conducting a search in information resources

After studying this topic, you will learn and repeat:

What are search servers for?
- purpose of the main parts of search servers;
- what types of information search exist on the Internet;
- basic rules for forming a query in the Yandex search engine.

Search by URL

The fastest and most reliable way to search for information on the Internet is to search by URL. Many of them are presented in printed publications, special reference books, and are heard on popular radio stations and on TV screens.

♦ Fans of the Zenit football club know the address www.fc-zenit.ru by heart.
♦ Fans of the group “The King and the Jester” are well aware of the official website of this group www.korol.spb.ru.
♦ Fans of the NTV channel can easily find its website at www.ntv.ru. To quickly access the above resources, simply launch a browser program, such as Internet Explorer, and type a familiar URL in the address bar.

Search engines

There is a huge amount of documents concentrated on the Internet. To make it easier to find the necessary information, special search engines are created.

Search engines- these are automatic systems that poll servers connected to the global network and store in their database information about the data available on the servers. Based on a specially formulated query, search engines provide information about where you can get the necessary data.

Typically, search engines consist of three parts: robot, index and query processing program.

♦ Robot (Spider, Robot or Bot) is a program that visits web pages and reads (in whole or in part) their content. Search engine robots differ in their individual scheme for analyzing the content of a web page.
♦ Index search engine is a repository of search images of pages visited by robots. A search image of a document (including a web page) is a description of the content of the document in a special information retrieval language. This description contains codes of document keywords that reflect its meaning and content. Indexes in each search engine differ in the volume and method of organizing the stored information. The databases of leading search engines store information about tens of millions of documents, and their index volumes amount to hundreds of gigabytes. Indexes are periodically updated and supplemented, so the results of one search engine with the same query may differ if the search was carried out at different times.

♦ Request Processing Program is a program that, in accordance with the user’s request, “looks” through the index for the presence of the necessary information and returns links to the documents found. The set of links at the output of the system is distributed by the program in descending order of relevance, that is, from the greatest degree of correspondence of the link to the request to the least.

Currently, the most popular for Russian Internet users are three large index-type search engines:

These systems take into account grammatical features Russian language, therefore the results of their search in Russian-language resources are of higher quality than those of Western systems.

Search engines vary in coverage information resources:

♦ general search engines have a database in all areas of knowledge and are distinguished by an extensive index and a large volume of accumulated information;
♦ Special purpose search engines look only at sites on a specific topic, such as music or museums.

The main characteristics of search engines are:

♦ volume of documents in the index;
♦ frequency of information update;
♦ the information space that the search engine robot covers and the variety of types of documents about which information is collected;
♦ request processing speed;
♦ criterion for determining relevance (compliance of the found document with the search query);
♦ the ability to detail and clarify the request.

Search by search engine category

Search directories are a systematic collection (selection) of links to other Internet resources. The links are organized in the form of a thematic rubricator, which is a hierarchical structure, by moving through which you can find the information you need.

Let us take as an example the structure of the Yandex Internet search catalogue. This is a directory general purpose, since it contains links to Internet resources in almost all possible areas. The following topics are highlighted in this catalogue:

♦ Business and economics;
♦ Directories and links;
♦ Society and politics;
♦ Home and family;
♦ Science and education;
♦ Entertainment and relaxation;
♦ Computers and communications;
♦ Culture and art.

Each topic includes many subsections, and these, in turn, contain headings, etc.

Suppose you are preparing an event for Victory Day and want to find the words of Bulat Okudzhava’s famous military song “You hear the boots rattling” on the Internet. The search can be organized as follows: Yandex Catalog Culture and art Music Author's song.

This search method is quite fast and effective. At the end you are offered only 5 links, among which there are links to sites with songs of famous bards. All that remains is to find the archive with the lyrics of B. Okudzhava’s songs on the website and select the desired text from it.

Another example. Suppose you are going to buy a mobile phone and want to compare the characteristics of devices from different companies. The search could be conducted according to the following catalog headings: Yandex Catalog Computers and communications Mobile communications Mobile phones.

Having received a limited number of links, you can quickly view them and select a phone by examining the characteristics of the companies and modifications of the devices.

Search by keywords

Most search engines have the ability to search by keyword. This is one of the most common types of search. To search using keywords, you need to enter the word or several words you want to search in a special window and click on the Search button. The search engine will find and display documents containing these words in its database. There may be many such documents, but many in in this case doesn't necessarily mean good.

Let's conduct several experiments with any of the search engines. Let's assume that we decide to start an aquarium and we are interested in any information on this topic.

At first glance, the simplest thing is to search for the word “aquarium”. Let's check this, for example, in the Yandex search engine. The search result will be more than 460,000 pages on 3,500 sites - a huge number of links. Moreover, if you look more closely, among them there will be sites mentioning B. Grebenshchikov’s group “Aquarium”, shopping centers and informal associations with the same name, and much more that has nothing to do with aquarium fish.

It is not difficult to guess that such a search cannot satisfy even the most unassuming user. Too much time will have to be spent on selecting among all the proposed documents those that relate to the subject we need, and even more so on getting acquainted with their contents.

We can immediately conclude that searching by one word is, as a rule, impractical, because using one word it is very difficult to determine the topic that a document, web page or site is dedicated to. The exception is rare words and terms that are almost never used outside their thematic area.

Let's try to clarify the search conditions and enter the phrase “aquarium fish”. The search result will be a little more than 20,000 pages and about 650 sites. As you can see, the number of links has decreased by more than 20 times. This result suits us more, but still among the proposed links there may be, for example, Russian souvenir sets of match labels with images of fish, and collections of screensavers for the computer desktop, and catalogs of aquarium fish with photographs, and aquarium accessories stores.

It is obvious that we should continue to move towards clarifying the search conditions.

In order to make the search more productive, all search engines have a special query language with its own syntax. These languages are similar in many ways. It is quite difficult to study them all, but any search engine has help system, which will allow you to master the desired language.

Let's give ten simple rules forming a query in the Yandex search engine.

1. Keywords in the query should be written in lowercase (small) letters. This will ensure that all keywords are searched, not just those that start with a capital letter.

2. When searching, all forms of the word are taken into account according to the rules of the Russian language, regardless of the form of the word in the query. For example, if the word “know” was specified in the query, then the words “we know”, “you know”, etc. will also satisfy the search condition.

3. To search stable phrase You should put the words in quotation marks, for example, “china.”

4. To search by exact word form, you need to put before the word exclamation mark. For example, to search for the word “September” in the genitive case, you would write “!September”.

5. To search within a single sentence, words in the query are separated by a space or an & sign: “adventure novel” or “adventure&novel”. Several words typed in a query, separated by spaces, mean that they all must be included in one sentence of the document being searched.

6. If you want only those documents that contain each word specified in the query to be selected, put a plus sign “+” in front of each of them. If, on the contrary, you want to exclude any words from the search result, put a minus “-” in front of this word. The signs “+” and “-” must be written separated by a space from the previous one and together with the next word. For example, the query “Volga-car” will find documents that contain the word “Volga” and not the word “car”.

7. When searching for synonyms or words with similar meanings, you can put a vertical bar “|” between words. For example, for the query “child | baby | baby" documents with any of these words will be found.

8. Instead of one word in a query, you can substitute an entire expression. To do this, it must be put in brackets, for example, “(child | baby | children | baby) + (care | education).”

9. The *~" (tilde) sign allows you to find documents with a sentence containing the first word, but not the second. For example, the query “books ~ store” will find all documents containing the word “books”, next to which (within the sentence) there is no word “store”.

10. If the operator is repeated once (for example, & or ~), the search is performed within the sentence. The double operator (&&, -) specifies a search within a document. For example, the query “cancer - astrology” will find documents with the word “cancer” that are not related to astrology.

Having a certain set of the most common terms in the desired area, you can use advanced search. In Fig. Figure 3.3 shows the advanced search window in the Yandex search engine. In this mode, the capabilities of the query language are implemented in the form of a form. A similar service, including dictionary filters, is offered by almost all search engines.

Rice. 3.3. An example of an advanced search in the Yandex system

Given that the right choice of desirable and required words and excluding undesirable terms, such a search can give good results.

Let's return to the example with aquarium fish. After reading several documents offered by the search engine, it becomes clear that searching for information on the Internet should not begin with choosing aquarium fish. An aquarium is a complex biological system, the creation and maintenance of which requires special knowledge, time and serious investment.

Based on the information received, a person searching on the Internet can radically change the strategy for further search by deciding to study special literature related to the issue under study.

To search for literature or full-text documents, the following query is possible:

After processing the request by the search engine, the following result was obtained: pages - 195, sites - at least 43.

As can be seen from the search statistics, the result was very successful. Already the first links lead to the required documents:

Placing an Aquarium > Tips for the Beginner Aquarist >
Articles > Aq uascope. ru
http://aquascope.ru/modules/wfsection/article.php?page=l&articleid=49 (32KB) - strict compliance.
ADVICE FOR BEGINNING AQUARIUMISTS. How to choose and install an aquarium, how...
http://www.aquariums.ru/sovna.htm (2KB) 07/23/2002 - non-strict compliance.

Now you can summarize the search results, draw certain conclusions and make a decision on possible actions:

♦ Stop further search, since for various reasons you are unable to maintain an aquarium.
♦ Read the suggested articles and start setting up an aquarium.
♦ Look for materials about hamsters or budgies.

Professional search

Researchers and specialists will have to take a more thoughtful approach to organizing the search. When searching for information on the Internet professionally, the following requirements must be met:

♦ high search speed;
♦ reliability of the information received;
♦ complete coverage of resources when searching.

Speed. The speed of a search depends mainly on two factors: competent search planning (selection of search services and tools) and skills in working with an already selected resource (the ability to quickly understand its structure and navigation methods). Search indexes are not enough to ensure search speed. In addition to them, there are a number of search resources on the Internet, the use of which ensures a professional search.

Credibility. The issue of the reliability of information received from the Internet is very relevant, since anyone can post any information there without any control over its compliance with reality. This, in turn, leads to a large number of unreliable sources, such as, for example, abstracts and term papers, flooded the Internet.

There are special search services that allow you to assess the reliability of an information source on the Internet.

Completeness. A necessary condition for successful full-scale collection of information is knowledge of the main types of resources existing today and the use of various search services. No search engine can cover all Internet resources.

As a rule, to achieve a positive result, the user must resort to the services of several search engines. You can do this yourself, moving from system to system, or you can entrust this work to one of the metasearch engines (meta is the first component difficult words, denoting systems for describing and studying other systems).

Rice. 3.4. Metasearch engine windows

Metasearch engines do not have their own search databases and use the resources of many other search engines when searching. Due to this, the probability of finding the necessary information is very high. Work in metasearch systems is carried out according to the same rules as work in search engines. This is due to the fact that metasearch engines are a kind of add-on to search engines and use their index databases in their work. The appearance of metasearch engines resembles appearance famous search engines. In Fig. Figure 3.4 shows the windows of the metasearch engines myweb.ru and metabot.ru.

Experience shows that in most cases, better results are achieved by using several independent search indexes than by using a single metasearch engine.

Test questions and assignments

1. What is the purpose of a browser program?

2. What browser programs do you know?

3. Where can a web searcher find URLs?

4. What is the technology for searching using the search engine's rubricator?

5. What is the technology for searching by keywords?

6. What requirements must be met when searching for information on the Internet professionally?

7. When should “+” or “-” signs be specified in the search criteria?

8. What search criteria in Yandex are specified by the following phrase:

(nanny | teacher | governess) ++ (care | education | supervision).

9. What does doubling the sign (∼∼ or ++) mean when forming a complex query?

10. What is search relevance?

11. What is the purpose of metasearch engines?

Search tools

Search tools are special software whose main purpose is to provide the most optimal and high-quality information search for Internet users. Search tools are hosted on special web servers, each of which performs specific function:

1. Analysis of web pages and entering the analysis results to one or another level of the search server database.

2. Search for information based on the user's request.

3. Providing a convenient interface for searching information and viewing the search result by the user.

The working techniques used when working with certain search tools are almost the same. Before we discuss them, let's consider the following concepts:

1. The search tool interface is presented in the form of a page with hyperlinks, a query line (search line) and query activation tools.

2. A search engine index is an information base containing the result of an analysis of web pages, compiled according to certain rules.

3. A query is a keyword or phrase that the user enters into the search bar. To form various queries, special characters ("", ~), and mathematical symbols (*, +, ?) are used.

The information search scheme is simple. The user types a key phrase and activates the search, thereby receiving a selection of documents based on the formulated (specified) request. This list of documents is ranked according to certain criteria so that at the top of the list are those documents that most closely match the user's request. Each of the search tools uses different criteria for ranking documents, both when analyzing search results and when creating an index (populating an index database of web pages).

Thus, if you specify a query in the search bar for each search tool of the same design, you can get different results search. It is of great importance for the user which documents will appear in the first two to three dozen documents in the search results and how well these documents correspond to the user’s expectations.

Most search tools offer two search methods − simple search(simple search) and advanced search(advanced search) with or without a special request form. Let's consider both types of search using the example of an English-language search engine.

For example, AltaVista is useful for arbitrary queries, “Something about online degrees in information technology,” while Yahoo's search tool allows you to get world news, exchange rate information, or weather forecasts.

Mastering query refinement criteria and advanced search techniques allows you to increase search efficiency and quickly find the necessary information. First of all, you can increase the efficiency of your search by using logical operators (operations) Or, And, Near, Not, mathematical and special symbols in your queries. Using operators and/or symbols, the user associates keywords in the required sequence to obtain the most appropriate search result for the query. Request forms are shown in Table 1.

Table 1

A simple request gives a certain number of links to documents, because... the list includes documents containing one of the words entered during the request, or a simple phrase (see Table 1). The and operator allows you to specify that all keywords should be included in the document content. However, the number of documents may still be large and reviewing them will take quite some time. Therefore, in some cases it is much more convenient to use the context operator near, which indicates that the words should be located in sufficient proximity in the document. Using near significantly reduces the number of documents found. The presence of the "*" character in the query string means that the word will be searched by its mask. For example, we will get a list of documents containing words starting with “gov” if we write “gov*” in the query string. These could be the words government, governor, etc.

The most developed search service for Russian-language information is provided by the Yandex search server. In Yandex, you can simply write a phrase in Russian that describes what you want to find, and the system will analyze and process your request, and then try to find everything that relates to the given topic. Using special operators, you can create a string that explains to the search engine what your requirements should be for the information you are interested in.

The no less popular search engine Rambler keeps statistics on link traffic from its own database; the same logical operators AND, OR, NOT, the metasymbol * (similar to the character * in AltaVista that expands the query range), coefficient symbols + and - are supported to increase or decrease the significance words entered into the request.

Let's look at the most popular technologies for searching information on the Internet.

1. Introduction

Every year the volume of the Internet increases significantly, so the likelihood of finding the necessary information increases sharply. The Internet connects millions of computers, many different networks, and the number of users increases by 15-80% annually. And yet, increasingly, when accessing the Internet, the main problem is not the lack of the information you are looking for, but the ability to find it. As a rule, an ordinary person, due to various circumstances, cannot or does not want to spend more than 15-20 minutes searching for the answer he needs. Therefore, it is especially important to correctly and competently learn a seemingly simple thing - where and how to look in order to get the DESIRED answers.

To find the information you need, you need to find its address. For this purpose, there are specialized search servers (index robots (search engines), thematic Internet directories, meta-search systems, people search services, etc.). This master class reveals the basic technologies for searching information on the Internet, provides general features of search tools, and examines the structures of search queries for the most popular Russian-language and English-language search engines.

2. Search technologies

Web technology World Wide Web (WWW) is considered a special technology for preparing and posting documents on the Internet. The WWW includes both web pages and digital libraries, catalogues, and even virtual museums! With such an abundance of information, the question arises: “How to navigate such a huge and large-scale information space?”
Search tools come to the rescue in solving this problem.

2.1 Search tools

Analysis of web pages and recording of analysis results at one or another level of the search server database.
Searching for information based on user request.
Providing a convenient interface for searching information and viewing search results by the user.

The working techniques used when working with certain search tools are almost the same. Before we discuss them, let's consider the following concepts:

The search tool interface is presented in the form of a page with hyperlinks, a query line (search line) and query activation tools.
A search engine index is an information base containing the result of an analysis of web pages, compiled according to certain rules.
A query is a keyword or phrase that a user enters into the search bar. To form various queries, special characters ("", ~), and mathematical symbols (*, +, ?) are used.

The scheme for searching information on the Internet is simple. The user types a key phrase and activates the search, thereby receiving a selection of documents based on the formulated (specified) request. This list of documents is ranked according to certain criteria so that at the top of the list are those documents that most closely match the user's request. Each of the search tools uses different criteria for ranking documents, both when analyzing search results and when creating an index (populating an index database of web pages).

Thus, if you specify a query of the same design in the search bar for each search tool, you can get different search results. It is of great importance for the user which documents will appear in the first two to three dozen documents in the search results and how well these documents correspond to the user’s expectations.

Table 1

Let's look at the most popular technologies for searching information on the Internet.

2.2 Search engines

Web search engines are servers with a huge database of URLs that automatically access WWW pages at all these addresses, examine the contents of these pages, form and write keywords from the pages into their database (indexes the pages).

Moreover, search engine robots follow links found on pages and re-index them. Since almost any WWW page has many links to other pages, with such work, a search engine can theoretically crawl all sites on the Internet as a final result.

This type of search tools is the most famous and popular among all Internet users. Everyone has heard the names of well-known web search engines (search engines) - Yandex, Rambler, Aport.

To use this type of search tool, you need to go to it and type the keyword you are interested in in the search bar. Next, you will receive results from links stored in the search engine database that are closest to your request. To make your search more effective, pay attention to the following points in advance:

decide on the topic of the request. What exactly do you ultimately want to find?
pay attention to the language, grammar, use of various non-literal symbols, morphology. It is also important to correctly formulate and enter keywords. Each search engine has its own form of query construction - the principle is the same, but the symbols or operators used may differ. The required request forms also vary depending on the complexity of the search engine software and the services they provide. One way or another, each search engine has a “Help” section, where all syntax rules, as well as recommendations and search tips, are clearly explained (screenshot of search engine pages).
use the capabilities of different search engines. If you don't find it on Yandex, try on Google. Use advanced search services.
To exclude documents containing certain terms, use a "-" sign before each such word. For example, if you need information about the works of Shakespeare, with the exception of "Hamlet", then enter a query in the form: "Shakespeare-Hamlet". And in order to ensure that certain links are included in the search results, use the “+” symbol. So, to find links about selling cars specifically, you need the query “sale + car”. To increase the efficiency and accuracy of your search, use combinations of these symbols.
Each link in the list of search results contains several lines from the found document, among which your keywords appear. Before clicking on the link, evaluate the relevance of the snippet to the topic of the request. After following a link to a specific site, carefully look at the main page. As a rule, the first page is enough to understand whether you have come to the right address or not. If yes, then conduct further searches for the necessary information on the selected site (in sections of the site); if not, return to the search results and try the next link.
Remember that search engines do not produce their own information (except for explanations about themselves). A search engine is only an intermediary between the owner of the information (site) and you. Databases are constantly updated, new addresses are added to them, but the lag behind the information that actually exists in the world still remains. Simply because search engines don't work at the speed of light.

The most famous web search engines include Google, Yahoo, Alta Vista, Excite, Hot Bot, Lycos. Among the Russian-speaking ones one can highlight Yandex, Rambler, Aport.

Search engines are the largest and most valuable, but they are far from the only sources of information on the Internet, because in addition to them, there are other ways to search the Internet.

2.3 Directories

The catalog of Internet resources is a constantly updated and expanded hierarchical catalog containing many categories and individual web servers with a brief description of their contents. The catalog search method involves “moving down the steps,” that is, moving from more general categories to more specific ones. One of the advantages of thematic directories is that explanations for the links are given by the creators of the directory and fully reflect its content, that is, it gives you the opportunity to more accurately determine how well the server content corresponds to the purpose of your search.

An example of a thematic Russian-language catalog is the resource http://www.ulitka.ru/.

On the main page of this site there is a thematic rubricator,

with the help of which the user finds himself in a section with links to products of interest to him.

In addition, some thematic directories allow you to search by keywords. The user enters the desired keyword into the search bar

and receives a list of links with descriptions of sites that most closely match his request. It is worth noting that this search does not take place in the contents of WWW servers, but in their brief descriptions stored in the directory.

In our example, the catalog also has the ability to sort sites by number of visits, alphabetically, and by date of entry.

Other examples of Russian-language directories:
[email protected]
Weblist
Vsego.ru
Among the English-language catalogs we can highlight:
http://www.DMOS.org
http://www.yahoo.com/
http://www.looksmart.com

2.4 Link collections

Link collections are links sorted by topic. They are quite different from each other in content, so in order to find a selection that best suits your interests, you need to go through them yourself in order to form your own opinion.

As an example, let's take a selection of links "Internet Treasures" of JSC "Relcom"

The user clicks on any of the sections that interest him

CONTENT

Motorists

Astronomy and astrology
Your home
Your pets
Children are the flowers of life
Leisure
Cities on the Internet
Health and medicine
News agencies and services
Local History Museum, etc.,

Automotive electronics.
Antique Automoto Museum.
Collegium Legal Protection Car owners.
Sportdrive.

The advantage of this type of search tools is their focus; usually the selection includes rare Internet resources selected by a specific webmaster or owner of an Internet page.

2.5 Addresses database

Address databases are special search servers that usually use classifications by type of activity, by products and services provided, and by geography. Sometimes they are supplemented with alphabetical search. Database records store information about sites that provide information about an email address, organization, and postal address for a fee.

The largest English-language address database is: http://www.lookup.com/ -

Once in these subdirectories, the user finds links to sites that offer information of interest to him.

We do not know of widely accessible and official databases of addresses in the Russian Federation.

2.6 Searching Gopher archives

Gopher is an interconnected system of servers (Gopher space) distributed over the Internet.

The Gopher space contains a rich literary library, but the materials are not available for viewing remotely: the user can only view a hierarchically organized table of contents and select a file by name. Using a special program (Veronica), such a search can be done automatically, using queries based on keywords.

Until 1995, Gopher was the most dynamic Internet technology: the growth rate of the number of corresponding servers outpaced the growth rate of servers of all other types of Internet. Gopher servers did not receive active development in the EUnet/Relcom network, and today almost no one remembers them.

2.7 FTP Search System

An FTP file finder is a special type of Internet search tool that allows you to find files available on “anonymous” FTP servers. The FTP protocol is designed for transferring files over a network, and in this sense it is functionally a kind of analogue of Gopher.

The main search criterion is the file name specified in different ways(exact match, substring, regular expression, etc.). This type of search, of course, cannot compete in capabilities with search engines, since the contents of files are not taken into account during the search, and, as you know, files can be given arbitrary names. However, if you need to find a well-known program or standard description, then most likely the file containing it will have the appropriate name, and you can find it using one of the FTP Search servers:

FileSearch searches for files on FTP servers based on the names of the files and directories themselves. If you are looking for any program or something else, then on WWW servers you will most likely find their description, and from FTP servers you can download them to yourself.

2.8 Search system in Usenet News conferences

USENET NEWS is a teleconferencing system for the Internet community. In the West, this service is usually called news. A close analogue of teleconferences are the so-called “echoes” in the FIDO network.

From the point of view of a newsgroup subscriber, USENET is a bulletin board with sections where you can find articles on everything from politics to gardening. This bulletin board is accessible via computer, similar to email. Without leaving your computer, you can read or post articles in a particular conference, find useful advice or engage in discussions. Naturally, articles take up space on computers, so they are not stored forever, but are periodically destroyed to make room for new ones. All over the world, the best service for searching information in Usenet conferences is the Google Groups server (Google Inc.).

Google Groups is a free online community and discussion group service that offers the Internet's largest archive of Usenet messages (over a billion messages). For more information about the service's terms of use, visit http://groups.google.com/intl/en /googlegroups/tour/index.html

Among the Russian-speaking ones, the USENET World System server and Relcom teleconferences stand out. Just like in other search services, the user types a query string, and the server generates a list of conferences containing keywords. Next, you need to subscribe to selected conferences in the news program. There is also a similar Russian server FidoNet Online: Fido conferences on WWW.

2.9 Meta search engines

To quickly search the databases of several search engines at once, it is better to turn to meta-search systems.

Meta search engines are search engines that send your request to a huge number of different search engines, then process the results, remove duplicate resource addresses and present a wider range of what is presented on the Internet.

The most popular meta search engine in the world is Search.com.

CNET, Inc.'s unified search engine Search.com includes almost two dozen search engines, links to which are replete with links throughout the Internet.

Using this type of search tools, the user can search for information in a variety of search engines, however negative side These systems can be called their instability.

2.10 People search systems

People search systems are special servers that allow you to search for people on the Internet; the user can specify their full name. person and get his address email and URL. However, it should be noted that people search engines primarily obtain information about email addresses from public sources such as Usenet forums. Among the most famous people search systems are:

Search for email addresses

In special search columns with contact information (First Name. City, Last Name, Phone number), you can find the information you are interested in.

People search systems are really large servers; their databases contain about 6,000,000 addresses.

3. Conclusion

We reviewed the main technologies for searching information on the Internet and outlined the search tools that exist on the Internet. at the moment on the Internet, as well as the structure of search queries for the most popular Russian-language and English-language search engines and, summing up the above, we would like to note that there is no single optimal scheme for searching information on the Internet. Depending on the specifics of the information you need, you can use the appropriate search tools and services. And the quality of search results depends on how well search services are selected.

IRS (information retrieval system) is a system that provides search and selection of necessary data in a special database with descriptions of information sources (index) based on information retrieval language and corresponding search rules.

Relevance– this is the correspondence of search results to the formulated query.

Pertinence(in information retrieval) - compliance of the information received with the information needs of the user.

Relevance is measured by the degree of correspondence between user expectations and search results (compare with relevance), which is defined as the ratio of the amount of information useful to the user to the total amount of information retrieved found by the search engine.

Achieving a high degree of persistence is the main field of competition for modern search engines. For maximum satisfaction information needs users, the theories and methods of semantic networks, content analysis and in-depth analysis of texts (Text mining, text mining).

To find the necessary information on the Internet, use resource address (English . UniformResourceLocator (URL) address), containing the name of the protocol by which to access the required information, the server address and the name of the file on this server (Fig. 2).

Rice. 2. Example of a resource address

Search engine- a software and hardware complex with a web interface that provides the ability to search for information on the Internet. A search engine usually means a website on which the system interface is located. The software part of a search system is a search engine (search engine) - a set of programs that provides the functionality of a search system and is usually a trade secret of the search engine developer company

Searching for information on the Internet is carried out using special programs that process requests - information retrieval systems (IRS). There are several models on which the work of search engines is based, but historically two models have gained the greatest popularity - this search directories and search indexes.

Search catalogs are organized on the same principle as the subject catalogs of large libraries. They are usually hierarchical hypertext menus with items and sub-items that define the topics of sites whose addresses are contained in this directory, with a gradual clarification of the topic from level to level. Search directories are created manually. Highly qualified editors personally review the WWW information space, select what they consider to be of public interest, and enter it into the catalogue.

The main problem of search directories is the extremely low coverage rate of WWW resources. In order to significantly increase the coverage rate of Web resources, the human factor must be eliminated from the process of filling the search engine database - the work must be automated.

Automatic cataloging of Web resources and satisfaction of customer requests is performed search indexes. The work of the search index can be divided into three stages:

collection of the primary database. To scan the WWW information space, special agent programs are used - worms, whose task is to search for unknown resources and register them in the database;

database indexing - primary processing for the purpose of search optimization. At the indexing stage, specialized documents are created - the actual search indexes;

refining the resulting list. At this stage, a list of links is created that will be passed to the user as a result. Refining the resulting list involves filtering and ranking the search results.

Filtering means filtering out links that are inappropriate to provide to the user (for example, checking for duplicates). Ranking consists of creating a special order for presenting the resulting list (by the number of keywords, related words, etc.).

The main task of any information system is to search for information relevant to the user’s information needs. It is very important not to lose anything as a result of the search, that is, to find all the documents related to the request and not find anything superfluous. Therefore, a qualitative characteristic of the search procedure is introduced - relevance.

Relevance– this is the correspondence of search results to the formulated query.

1 Search tools

This type of search tools is the most famous and popular among all Internet users. Everyone has heard the names of well-known web search engines (search engines) - Yandex, Rambler, Aport.

The way web search engines work is as follows:

Analysis of web pages and recording of analysis results at one or another level of the search server database.

Searching for information based on user request.

Providing a convenient interface for the user to search for information and view search results.

The working techniques used when working with certain search tools are almost the same. When describing them, the following concepts are used:

The search tool interface is presented in the form of a page with hyperlinks, a query line (search line) and query activation tools.

A search engine index is an information base containing the result of an analysis of web pages, compiled according to certain rules.

A query is a keyword or phrase that a user enters into the search bar. To form various queries, special characters ("", ~), and mathematical symbols (*, +, ?) are used.

In Russia, the largest and most popular search indexes are:

"Yandex" (www.yandex.ru)

Rambler (www.rambler.ru)

Google (www.google.ru)

"Aport2000" (www.aport.ru)

2 Search mechanisms

The generalized search technology consists of the following stages:

The user formulates a request

The system searches for documents (or their search images)

The user receives the result (information about documents)

The user improves or reforms the request

Organizing a new search...

Typically, search engines support two modes: simple search mode and advanced search mode. Let's consider the generalized possibilities.

Forming a request in simple search mode. You can simply enter one or more words separated by a space; the search for words with all possible endings is modeled by the symbol * at the end of the word. Many systems allow you to search for phrases or phrases; to do this, you need to enclose it in quotation marks. Mandatory inclusion or exclusion of certain words may be required.

The main problem of searching using a primitively composed query (in the form of listing keywords) is that the search engine will find all pages on which the specified words appear in any part of the document. Typically, the number of pages found will be too large.

To improve the quality of search in simple search mode, it is permissible to use logical operators and operators that allow you to limit the search area, as well as select a specific category of documents from the presented list.

Many search engines include special operators in their query language that allow you to search in certain areas of a document, for example, in its title, or search for a document by a known part of its address.

Advanced or detailed query mode in different systems it is implemented individually, but most often it is a form in which the mentioned operators and key elements are implemented by simply checking the appropriate boxes or selecting parameters from a list.

Below, as an example, is information from the section help Yandex search engine: advanced search window, query language, search in what was found.

Search V found If V result of Yandex request found a lot of documents, but on a broader topic than you want, you can narrow this list by specifying your query. Another option is to enable the checkbox V found V search form, set additional keywords, and the next search will be conducted only on those documents that were selected V previous search.

Reminder for using query language

Example	Meaning
"Come to us for morning pickle"	The words come in a row in the exact form
"The *ambassador has arrived"	Missing word in quote
half a slice & corn	Words within one sentence
equip && get	Words within one document
capercaillie \| partridge \| someone	Search for any of the words
you can't<< винить	Non-ranking "and": the expression after the operator does not affect the position of the document in the search results
I must /2 execute	Distance within two words in any direction (that is, one word can occur between given words)

something I ~~ understand	Elimination of a word I'll understand from search
with my /+2 intelligence	Distance within two words in direct order
tea ~ laptem	Search for a sentence where the word is tea meets without a word bast shoe
cabbage soup /(-1 +2) slurping	Distance from one word in reverse order to two words in forward order
I figure out what! what	Words in exact form with specified case
it turns out && (+ on \| !me)	Parentheses form groups in complex queries
Policy	Dictionary form of the word
title:(in country)	Search by document titles
url:ptici.narod.ru/ptici/kuropatka.htm	Search by URL
certainly inurl:vojne	Search based on URL fragment
	Search by host
	Search by host in reverse entry
site:http://www.lib.ru/PXESY/FILATOW	Search across all subdomains and pages of a given site
	Search by one file type
	Search limited by language
	Domain-limited search
	Search with date restrictions
state business && /3 you catch the thread	Distance 3 sentences in any direction
something I ~~ understand	Elimination of a word I'll understand from search

An interesting option is to search for documents on the web that link to a page with a URL you specify. This way, you can find pages on the web that have links to your Web site. Some systems will allow you to limit your search within a specified domain.

Additional special operators include:

Operators for searching documents with a specific graphic file;

Operators limiting the date of the pages being searched;

Proximity operators between words;

Word form accounting operators;

Operators for sorting results (by relevance, freshness, oldness).

It should be noted that, unfortunately, today there is no standard for the number and syntax of supported operators for various search engines. Efforts are underway to develop a standard for the syntax of supported operators, so it is hoped that search engine developers will take care of the user experience. At this stage of development of search tools, a user, when accessing a particular search engine, must first of all become familiar with its rules for composing queries. As a rule, there will be a link on the home page Help, which will take you to reference information.

Different search engines describe different numbers of information sources on the Internet. Therefore, you cannot limit your search to just one search engine.

Let's consider ways presentation of search results in search engines.

Most often, the number of documents found exceeds several dozen, and in some cases can reach hundreds of thousands! Therefore, as a form of issuance, a list of documents of 5-10-15 units per page is compiled with the ability to move to the next portion at the bottom of the page. The title and URL (address) of the found document must be indicated; sometimes the system indicates the degree of relevance of the document as a percentage.

The description of a document most often contains the first few sentences or excerpts from the text of the document with keywords highlighted. As a rule, the date of update (verification) of the document is indicated, its size in kilobytes; some systems determine the language of the document and its encoding (for Russian-language documents).

What can you do with the results obtained? If the title and description of the document meets your requirements, you can immediately go to its original source using the link. It is more convenient to do this in a new window in order to be able to further analyze the search results. Many search engines allow you to search the documents found, and you can refine your query by introducing additional terms.

If the intelligence of the system is high, you may be offered the service of searching for similar documents. To do this, you select a document you particularly like and point it to the system as a model to follow.

However, automating similarity determination is a very non-trivial task, and often this function does not work as expected. Some search engines allow you to re-sort the results. To save you time, you can save your search results as a file on your local drive for later offline study.