Astronomical Surveys and Big Data

Recent all-sky and large-area astronomical surveys and their catalogued data over the whole range of electromagnetic spectrum are reviewed, from Gamma-ray to radio, such as Fermi-GLAST and INTEGRAL in Gamma-ray, ROSAT, XMM and Chandra in X-ray, GALEX in UV, SDSS and several POSS I and II based catalogues (APM, MAPS, USNO, GSC) in optical range, 2MASS in NIR, WISE and AKARI IRC in MIR, IRAS and AKARI FIS in FIR, NVSS and FIRST in radio and many others, as well as most important surveys giving optical images (DSS I and II, SDSS, etc.), proper motions (Tycho, USNO, Gaia), variability (GCVS, NSVS, ASAS, Catalina, Pan-STARRS) and spectroscopic data (FBS, SBS, Case, HQS, HES, SDSS, CALIFA, GAMA). An overall understanding of the coverage along the whole wavelength range and comparisons between various surveys are given: galaxy redshift surveys, QSO/AGN, radio, Galactic structure, and Dark Energy surveys. Astronomy has entered the Big Data era. Astrophysical Virtual Observatories and Computational Astrophysics play an important role in using and analysis of big data for new discoveries.


INTRODUCTION
Astronomical surveys are the main source for discovery of astronomical objects and accumulation of observational data for further analysis, interpretation, and achieving scientific results. Recent large multiwavelength (MW) surveys both by ground-based and space telescopes and their catalogues during the last 15 years have accumulated vast amounts of data over the whole range of electromagnetic spectrum from γ-ray to radio. Present astronomical databases and archives contain billions of objects, both Galactic and extragalactic, and the vast amount of data on them allow new studies and discoveries. The Big Data era has come. Astrophysical Virtual Observatories (VO) use available databases and current observing material as a collection of interoperating data archives and software tools to form a research environment in which complex research programs can be conducted. Most of the modern databases give at present VO access to the stored information. This makes possible not only the open access but also a fast analysis and managing of these data. VO is a prototype of Grid technologies that allows distributed data computation, analysis and imaging. Particularly important are data reduction and analysis systems: spectral analysis, spectral energy distribu-tion (SED) building and fitting, modelling, variability studies, cross-correlations, etc. Numerical or Computational Astrophysics (part of Computer Science, also called Laboratory Astrophysics) has become an indissoluble part of astronomy and most of modern research is being done by means of it. Very often dozens of thousands of sources hide a few very interesting ones that are needed to be discovered by comparison of various physical characteristics. Cross-correlations result in revealing new objects and new samples. The large amount of data requires new approaches to data reduction, management and analysis. Powerful computer technologies are required, including clusters and grids. Large volume astronomical servers have been established to host Big Data and giving high importance to their maintenance, the International Council of Scientific Unions (ICSU) has recently created World Data System (WDS) to unify data coming from different science fields for further possibility of exchange and new science projects.
In our review paper (Mickaelian 2012) "Large Astronomical Surveys, Catalogs and Databases" we summarized astronomical data accumulated during the recent decades and presented astronomical surveys, catalogues, archives, databases and VOs. During these 3 years, many new ground-based and space surveys provided new data and many new releases appeared. Therefore, here we give an update of this information especially emphasizing the importance and going into details of astronomical surveys, as MW photometric ones from gamma-ray to radio, so as proper motion, variability and spectroscopic surveys, including objective prism low-dispersion surveys and digital ones.

MULTIWAVELENGTH SURVEYS AND CATALOGUES
Multiwavelength studies significantly changed our views on cosmic bodies and phenomena, giving an overall understanding and posiibility to combine and/or compare data coming from various wavelength ranges. MW astronomy appeared during the last few decades and recent MW surveys (including those obtained with space telescopes) led to catalogues containing billions of objects along the whole electromagnetic spectrum. When combining MW data, one can learn much more due to variety of information related to the same object or area, such as e. g. the Milky Way (Fig 1).
Here we list most important (having homogeneous data for a large number of sources over large area) recent surveys and resulted catalogues providing photometric data along the whole wavelength range: • γ-ray. Fermi-GLAST 3FGL catalogue: gamma-ray positions and 10 MeV 100 GeV photon counts for 3,033 sources (Acero et al. 2015); IN-TEGRAL: IBIS/ISGRI soft gamma-ray survey catalog of 1,126 sources (Bird et al. 2010); older CGRO EGRET: gamma-ray positions and 20 keV 30 GeV photon counts for 1300 sources, including only 271 identified ones (Hartman et al. 1999); Swift: survey in deep fields, 9387 sources and BeppoSAX: 1082 gamma-ray bursts and some other data are much more accurate, however no all-sky or large-area catalogue is available from these missions/telescopes; typically gamma-ray sources are difficult to identify due to inaccurate positions (several arcmin errors); • X-ray. Röntgensatellit (ROSAT) BSC: X-ray positions and 0.072.4 keV photon counts and two hardness ratios for 18,806 sources (Voges et al. 1999); ROSAT FSC: X-ray positions and 0.07-2.4 keV photon counts and two hardness ratios for 105,924 sources (Voges et al. 2000), ROSAT sources are difficult to identify due to inaccurate positions (∼1 errors); INTEGRAL: hard X-ray all-sky survey catalog of 403 sources; older EXOSAT Medium energy (18 keV) Slew Survey Catalog (EXMS): 1210 sources, including 992 identified ones; ASCA: 1190 sources and other data are available; much more accurate recent X-ray Multi-Mirror mission (XMM-Newton): 372,728 sources from various surveys and Chandra: 380,000 sources from various surveys (no all-sky or large-area coverage); In Table 1 we give a comparative list of multiwavelength all-sky and large-area surveys.

OPTICAL IMAGES, PROPER MOTIONS, AND VARIABILITY
Beside the main MW catalogues giving photometric data, there have been a number of astronomical surveys aimed at covering large areas and obtaining optical images (atlases), measuring proper motions or variability. Most important among these surveys and catalogues are: • Optical Images. Digitized Sky Survey (DSS) I: all-sky digitized images in two bands, blue and red from POSS-I (Palomar Observatory Sky Survey) / SERC-J, 1.67 sampling (McGlynn et al. 1994); DSS II: all-sky digitized images in three bands, blue, red, and IR from POSS-II/AAO-SES, 1 sampling (Lasker et al. 1996); as mentioned above, most objects are catalogued in USNO B1.0 and GSC 2.3.2; SDSS DR12: digital images for 14,555 deg 2 in five bands, u, g, r, i, z, 0.1 resolution (Alam et al. 2015), this is the most accurate large survey;   : having its major goal to discover and characterize Earth-approaching objects, both asteroids and comets, that might pose a danger to our planet, Pan-STARRS discovers many variable stars, as it performs repeated observations. Summarizing, we have many times covered the entire sky with optical imaging thus providing data for proper motion and variability measurements. More than 200,000 variable objects have been discovered.

SPECTROSCOPIC SURVEYS AND CATALOGUES
Spectroscopic surveys provide important data that can be used for detailed studies of objects. SDSS gives both photometric and spectral data in large area. Most important among these surveys and catalogues are:  (Mickaelian et al. 2007;Hagen et al. 1999;Wisotzki et al. 2000, respectively); thus, objective prism images for most of the extragalactic sky, 17,000 deg 2 in North and 9,000 deg 2 in South are available; • Medium Dispersion Spectroscopy. 2-degree Field and 6-degree Field (2dF/6dF): medium dispersion 3700-7900Å spectra for 346,061 galaxies and 49,425 stellar objects, including 23,660 QSOs (galaxy redshift and QSO surveys) (Colless et al. 2001;Croom et al. 2004); • Digital Spectroscopy. SDSS: 3800-9200ÅÅ R=1800-2200 spectra for 4,355,200 selected objects (mainly galaxy redshift and QSO surveys), including 2,401,952 galaxies, 477,161 QSOs, and 851,968 stars (Alam et al. 2015); Calar Alto Legacy Integral Field Area (CALIFA) Survey: is mapping 600 galaxies with imaging spectroscopy (IFS) and produces more than 1 million spectra; Galaxy and Mass Assembly (GAMA) spectroscopic survey: is for 300,000 galaxies and also provides millions of spectra.
At present some 5 million objects (compared to 300,000 ones 20 years ago) have spectroscopy giving understanding on their nature and possibility for detailed investigation. The number of QSOs doubles every 5 years (Véron-Cetty & Véron 2010;Pâris et al. 2014). Thousands of Blazars have been identified (Massaro et al. 2015).
A comparative table (Table 2) gives an understanding on various parameters of low-dispersion objective prism surveys and SDSS.

BYURAKAN SURVEYS AND THEIR DIGITIZED VERSIONS
The First Byurakan Survey (FBS) has been carried out by Markarian, Lipovetski andStepanian in 1965-1980 with BAO 102/132/213 cm (40 /52 /84 ) Schmidt telescope with 1.5 • prism (Markarian et al. 1989). 2050 Kodak IIAF, IIaF, IIF, and 103aF photographic plates in 1133 fields (4 • ×4 • each, the size being 16 cm × 16 cm) have been taken. FBS covers 17,000 deg 2 of all the Northern sky and part of the Southern sky (δ > -15 • ) at high galactic latitudes (|b|>15 • ). In some regions, it even goes down to δ = -19 • and |b|=10 • . The limiting magnitude on different plates changes in the range of 16.5 m -19.5 m in V, however for the majority it is 17.5 m -18 m . The scale is 96.8 /mm and the dispersion is 1800 A/mm near Hγ and 2500Å/mm near Hβ (mean spectral resolution being about 50Å). Low-dispersion spectra cover the range 3400Å-6900Å, and there is a sensitivity gap near 5300Å, dividing the spectra into red and blue parts. It is possible to compare the red and blue parts of the spectrum (easily separating red and blue objects), follow the spectral energy distribution (SED), notice some emission and absorption lines (such as broad Balmer lines, molecular bands, He, N 1 +N 2 lines, broad emission lines of QSOs and Seyferts, etc.), thus making up some understanding about the nature of the objects. The FBS is made up of zones (strips), each covering 4 • in declination and all right ascensions except the Galactic plane regions. In all there are 28 zones, which are named by their central declination (ex. zone +27 • covers +25 • <δ<+29 • , zone +63 • has +61 • <δ<+65 • , etc). The zones and the neighboring plates in right ascension overlap about 0.1 • , as the exact size of a plate is 4.1 • ×4.1 • , thus making the whole area complete. Each FBS plate contains low-dispersion spectra of some 15,000-20,000 objects, and there are some 20,000,000 objects in the whole survey.
In 1988, Mickaelian continued the survey to low galactic latitudes to check if it is possible to work with the low-dispersion spectra in these crowded regions. Two regions of the Milky Way of the zones +39 • and +43 • have been covered. 28 Kodak IIaF, IIIaF and 103aF plates in 11 and 8 fields having 171 deg 2 and 117 deg 2 surface respectively have been obtained. These plates are especially useful for discovery of white dwarfs and other galactic objects.
Main features of FBS are: • First systematic objective-prism survey in the history of astronomy, • The largest objective-prism survey of the Northern sky (17,000 deg 2 ), • New method of search for active galaxies, • Revelation of 1515 UVX galaxies: some 300 AGN and some 1000 HII galaxies, • Classification of Seyferts into Sy1 and Sy2 types, • Definition of Starburst (SB) galaxies, • FBS Blue Stellar Objects (BSOs) and Late-type Stars, • Optical identification of IRAS galaxies (BIG and BIS objects); discovery of new AGN and ULIRGs.
The Digitized First Byurakan Survey (DFBS) is the digitized version of the famous Markarian Survey. It is the largest spectroscopic database in the world, providing low-dispersion spectra for 20,000,000 objects. DFBS is a joint project of the Byurakan Astrophysical Observatory (BAO), Cornell University (USA) and Universita di Roma "La Sapienza" (Italy). The whole Northern sky and part of the Southern sky at high galactic latitudes have been observed in FBS, altogether more than 17,000 deg 2 . It is especially valuable for extragalactic research. 1500 UV-excess galaxies (Markarian galaxies), 1100 blue stellar objects and 1050 latetype stars have been discovered on the basis of FBS, as well as 1600 infrared (IRAS) sources have been optically identified. The DFBS has been created in 2002-2005 as a result of digitization and reduction of some 2000 FBS plates. Highaccuracy (1 rms) astrometric solution has been made for each plate. Dedicated software allows quick access to any field by given position and extraction of the needed spectra, their calibration, classification and study. The DFBS is free for Fig. 2. DFBS extraction and analysis software bSpec. It finds objects by an input catalogue data and makes mass extraction and storage of all data with necessary measurements. Fig. 3. DFBS web interface in mode "Get Spectra". Spectra may be explored and/or extracted in single or batch modes (by the given list).
the astronomical community. It occupies 360 GB space (85 DVDs). The DFBS catalogue and database are being maintained in Armenia and will be also available at CDS, Cornell and Rome. DFBS is the largest Armenian astronomical database and one of the largest in the world. Some projects based on the DFBS have been already put forward, including search for new QSOs and other AGN, continuation of the FBS Second Part, identifications of IR and X-ray sources, etc. The DFBS is the basis for the Armenian Virtual Observatory (ArVO), which will unify all available astronomical data in Armenia, including all Byurakan archive and data from Byurakan telescopes. ArVO is part of the International Virtual Observatories Alliance (IVOA). SBS has also been partially digitized and DSBS is being created.
In Figure 2 we give the DFBS extraction and analysis software bSpec and in Figure 3, DFBS web interface in mode "Get Spectra", where one can select spectra from the list and retrieve corresponding data individually or in batch mode, including 1D FITS spectra.

ASTRONOMICAL SURVEYS AND BIG DATA
Astronomical surveys give so much information that huge catalogues, dedicated archives and databases are being built to store, maintain and use these Big Data . At present astronomers deal with the following numbers in various wavelength ranges (Table 3), and these numbers increase exponentially. It is estimated that there are some 400 billion stars in the Milky Way galaxy and some 100 billion galaxies in the Universe, so that we are very far to catalogue all these objects. Even after Gaia space mission we will have much more accurate astrometric and photometric data for the stars but not much more completeness of detections. Figure 4 shows the distribution of the numbers of discovered astronomical objects by wavelength range; it is obvious that optical studies are well in front of astronomy together with NIR/MIR (due to 2MASS and WISE), so that even the logarithmic scale does not show the small bars corresponding to other wavelength ranges, beside UV, optical, and NIR/MIR. However MW astronomy was born in the recent decades and makes huge steps toward the overall understanding of the Universe with its various manifestations from γ-ray to radio and in the nearest future most of the objects (e.g. in our Galaxy or all galaxies in the Local Universe) will have their counterparts in all wavelengths.
However, establishing correspondence between sources revealed in different wavelengths is a tricky task. Accurate cross-correlations between various MW catalogues are needed to establish genuine counterparts for each object/sourse. Quick cross-matching is being done for almost all catalogues, however; many objects/sources appear to have false associations, as in crowded regions large contamination with other nighboring objects is happening. Very often individual approach should be applied to such associations. Still, a number of cross-correlation software is in use and is being improved.
We give in Figure 5 a comparison of all-sky and large-area astronomical surveys by their wavelength and sky coverage. It is seen that astronomers have covered the whole sky almost in all wavelength ranges. However, Figure 6 shows that not all wavelength have enough deepness, giving a comparison of all-sky/large-area and deep surveys by their sensitivity (limiting magnitude) and number of catalogued objects, which are rather different (only a few other than optical surveys are seen here; 2MASS, DENIS, WISE, GALEX, and IRAS, others having much smaller numbers).
Large astronomical surveys have become one of the most important directions of investigations in our science and they provide the main bulk of information that has been transformed into Big Data and approached astronomy and computer Fig. 4. Distribution of the numbers of discovered astronomical objects by wavelength range. Numbers in optical, NIR/MIR and UV are so big that even the logarithmic scale does not show the small bars corresponding to γ-ray, X-ray, FIR, sub-mm/mm and radio.