Social Media Segragated by Age

  • Loading metrics

Understanding gender segregation through Telephone call Data Records: An Estonian case study

  • Rahul Goel,
  • Rajesh Sharma,
  • Anto Aasa

PLOS

x

  • Published: March 25, 2021
  • https://doi.org/10.1371/journal.pone.0248212

Abstract

Understanding segregation plays a significant role in determining the development pathways of a state every bit it can aid governmental and other concerned agencies to ready better-targeted policies for the needed groups. However, inferring segregation through culling data, autonomously from governmental surveys remains limited due to the non-availability of representative datasets. In this piece of work, nosotros utilize Call Data Records (CDR) provided past one of Estonia'southward major telecom operators to research the complexities of social interaction and human being behavior in order to explicate gender segregation. We analyze the CDR with 2 objectives. Offset, we written report gender segregation by exploring the social network interactions of the CDR. We find that the males are tightly linked which allows data to spread faster among males compared to females. Second, we perform the micro-analysis using various users' characteristics such equally age, language, and location. Our findings bear witness that the prime number working-age population (i.due east., (24,54] years) is more segregated than others. We also observe that the Estonian-speaking population (both males and females) are more likely to interact with other Estonian-speaking individuals of the same gender. Further to ensure the quality of this dataset, we compare the CDR data features with publicly bachelor Estonian demography datasets. Nosotros detect that the CDR dataset is indeed a good representative of the Estonian population, which indicates that the findings of this study reasonably reflect the reality of gender segregation in the Estonian Mural.

one Introduction

Segregation has long been assumed to play a critical part in many developing countries' socio-economic construction and overall stability [i]. According to [2], consequences of segregation are non express to developing countries, just the detrimental bear upon of segregation is more severe in countries with poor political and legal structures. As a result, a not bad bargain of emphasis is required towards policies to facilitate integration and interaction in diverse societies.

In the past, the enquiry on segregation has been constrained by a lack of apparent information. As a result, many of the previous research piece of work relies on conventional government census data [3]. However, census data can capture the precise pattern of the physical settlement just rarely record trends of social interaction, which are necessary to develop a thorough understanding of the essence of social interaction.

In this paper, we utilize the Call Information Records (CDR) to understand segregation in Estonian society. Over the last decade, CDR has been analysed from a number of perspectives, such equally social network analysis [4], sociocultural aspects of a city [5], identifying the human mobility patterns [6], understanding calling patterns using telephone call duration [7], impact of various events on calling activities [viii] and population distribution [9] to name a few. lso, CDR often integrates with other data such as traffic data [ten], financial data [11], and GIS data [12] for a deeper understanding of human behavior. Previous studies take also used CDR data to explain the indigenous discrimination in social club using call elapsing [7, thirteen], and social group discrimination at the workplace [14]. This piece of work investigates gender segregation inside order past analyzing the users' characteristics and their interaction through social network analysis.

In this work, we analyze anonymized CDR data provided by ane of the leading mobile operators in Estonia to comprehend the following research directions:

  • Macro-assay: In this analysis, we explore the social network interactions to empathize gender segregation using CDR. We further investigate and compare diverse properties of females-just and males-only networks separately. Additionally, to place the relevant users in the network, we use the PageRank centrality algorithm. The network is as well explored to identify gender representation in various counties in Estonia (Section 4.1).
  • Micro-analysis: During this analysis, we investigate users' characteristics such as age, language, and location to understand gender segregation in item. We also clarify the users' interactions based on gender, age-groups, linguistic communication, and locations (Section 4.2).

The findings of our social network assay suggest that the males-only network relatively dumbo and firmly connected while the females-only network is spread out. Also, our assay using users' characteristics, such every bit gender, age, linguistic communication, and location show that the prime working-historic period population (i.east., (24,54] years) is more segregated than other age groups. We as well find that the Estonian-speaking population (both males and females) tend to interact more with other Estonian-speaking, specifically the same-gender individuals.

Nosotros also demonstrate that CDR information tin be used to deduce the relationships between the user'due south characteristics that are quite similar to actual relationships in society by comparing the CDR data with publicly accessible census datasets. Thus, CDR data tin can be used to understand segregation and can be helpful for government agencies to brand target policies for needed groups.

The rest of the newspaper is organized as follows. Side by side, we talk over related works. We and then draw the dataset in Section three. Section 4 presents the results of our descriptive analysis of the dataset and we conclude with a discussion of future directions in Section 5.

2 Related work

In this section, we discuss works related to Phone call Data Records (CDR) and segregation at the intersection of which this work lies.

Over the last decade, CDR has attracted a lot of research. In [five] and [15], the authors showed that CDR data can provide valuable data regarding the social structure of societies when analyzed using social network analysis. The strong and weak ties between individuals in social networks are identified by authors in [16]. In some other work, population density is calculated using CDR [ix]. A fair amount of research has as well been done using CDR data for identifying mobility patterns. For example, [6, 17] authors demonstrate that the human path is predictable and reproducible. In [18], the authors proposed a human mobility model and validated using a real dataset from New York and Los Angeles metropolitan areas. In [xix], the authors analyzed human behaviour to notice the motion pattern across various age-groups.

A set of works have also focused on understanding the varying types of segregation amid society using CDR. Four types of factors appear to contribute to segregation are discrimination, disadvantage, preferences, and social networks [20, 21]. For instance, authors in [13] studied the temporal variation of ethnic segregation in the metropolis of Tallinn, the capital of Estonia. Their findings revealed that segregation is significantly lower on workdays and during the summer holidays. In a different work, segregation is decomposed into two types i) social segregation: observed in interactions among people, and 2) spatial segregation: adamant by the physical locations of people [22]. Furthermore, a framework is proposed to model and measure fine-grained patterns of segregation from big-scale digital data.

In another line of work, authors studied the immigrants' segregation in Estonian society [23] using census data and passive mobile positioning data (CDR). Their results showed that the action space of Russian-speakers of all historic period-groups is smaller and less diverse than those of Estonians; and besides revealed that at that place is higher ethnic segregation in younger age-groups. Discrimination and prejudice by the ascendant group restrict the activities of the members of minority groups [24]. Even though discrimination is illegal in most countries, and the societal tolerance of minorities has increased, bigotry is nonetheless present in everyday life [25]. In some other work [26], authors showed that segregation, isolation, and homophily tin be measured by deriving population estimates from CDR. Their findings revealed that the development of refugees' communication patterns and mobility traces can provide insights into their social integration.

Studies of segregation in workplaces have shown a concentration of minority groups in certain employment niches and workplaces [14]. It has been suggested that segregation in places of residence and segregation in places of work are connected. Even so, workplace-based segregation is lower than residence-based segregation [27, 28]. Outside of the place of residence and the place of work, ethnic differences accept also been studied mainly through single measures of leisure activities, such as going to church [29], casinos [30], or national parks [31, 32].

This work is different from [7, 13] as the focus of their works was generally on phone call duration to understand man behavior. However, in this work, we analyze CDR data to understand gender segregation using social network analysis and characteristic assay based on gender, language, age, and location.

three Dataset

The assay utilizes the anonymized telephone call data records (CDR) provided past a leading mobile operator in Estonia. The dataset includes timestamp information to the level of seconds for each call activity, and the passive mobile positioning of the cell phone tower. The call records bridge six days, that is, from May 8, 2017, to May xiii, 2017. The data consists of 12,317,970 unique telephone call records from 1,175,191 unique users which is 89.32% population of Estonia [33].

For each phone call activity, the following data is available: randomly generated user pseudonymous ID, timestamp (with an accuracy of one 2nd), and location of the network prison cell. The pseudonymous ID guarantees the user's privacy, which cannot be connected to a specific individual or telephone number. Additionally, for research purposes, the gender of the user, twelvemonth of nativity, and preferred language of communication is provided. The options for the preferred language of communication are either Estonian, Russian, or English, equally chosen by the user when signing the contract with the service provider. Delight note that not all users have additional (gender, language and location) information in the dataset. For example, gender information is available for 130,988 users in which 61,933 are males and 69,055 are females. Table 1 summarises various statistics about this dataset.

thumbnail

Tabular array ane. Statistics virtually the dataset.

Users' gender count are listed individually under diverse features to provide a comprehensive insight into the dataset. For example, under feature Languages, users are categorized in iii languages that is Estonian, Russian, and English. The number of males and females under each category is also listed.

https://doi.org/10.1371/journal.pone.0248212.t001

Based on the official age-group categorization proposed by Europe-Bureau and Statistics Estonia [34, 35], we categorise users' age into the following 5 categories:

  1. 0-14 years: Children.
  2. 15-24 years: Early working age.
  3. 25-54 years: Prime working age.
  4. 55-64 years: Mature working age.
  5. 65+: Elderly.

In Table one, the Age-Groups row provides the distribution of the users in the dataset according to gender and age-group. E.g., for the age-group (24,54]; #Male/Female with value: xxx,238/33,864 means that for age-group (24,54], 30,238 users are males and 33,864 users are females.

The age distribution based on call density for overall users (both females and males), females-only users and males-but users is shown in Fig one. Please annotation that we apply the word "user(s)" in case of CDR data individuals, but "population" while referring to the actual population of Estonia. The median age for overall, females-just and males-just users are 52, 52, and 51 years respectively. Information technology is to be noted that in 2017, the median historic period of the population in Republic of estonia was reported as 41.half dozen years (https://www.statista.com). Since the employ of mobile phones is prevalent after a certain age, the difference betwixt the median historic period of the actual population and from CDR users is credible. Please note that due to just 9 full users in the age-group (0,fourteen], nosotros take excluded this age-group for further analysis.

thumbnail

Fig i. Calls density with probabilities based on historic period.

X-axis represents the user's age and y-axis represents the call density for Overall, Female and Male users in CDR information. Using the cumulative density function for the distribution, we map the tail probability direct into colors. For example, the 25th, fiftythursday and 75th quantile for overall users is 44, 52 and 61 respectively. Similarly, these quantiles for female users are 44, 52, 62; and for male users are 44, 51, 60.

https://doi.org/10.1371/journal.pone.0248212.g001

four Descriptive assay

In this section, we first present the findings of our macro-analysis to agreement gender segregation, performed past exploring the social network interactions of the CDR (Department four.i). Next, to explore the segregation in particular, nosotros discuss the results of our micro-assay because various users' characteristics such every bit historic period, language, location, etc (Section 4.2).

four.one Macro-analysis

We create a directed network that represents the phone call connections amongst users where an edge (uv) is formed if a user u has called user v. Fig 2(a) shows the CDR network, where each node is color-coded based on the gender. The ruby nodes stand for male users, and dark-green nodes stand for female users. Links between users are besides color-coded. Links which originate from males are colored crimson (i.eastward., calls from male to male; and male to a female), and similarly links that originate from females are colored greenish (i.e., calls from female to female; and female to male).

thumbnail

Fig 2. Users network formed using CDR information.

A representative of the original network using snowball sampling. In Fig (a), users are color-coded based on gender. Color coding is as follows: scarlet node are male users and green nodes are female users. The nodes with higher PageRank value are shown in relatively bigger size than others. Fig (b) shows the males-just network (with modularity = 0.803). Colour-coded group represents communities. Similarly, Fig (c) shows the female-merely network (with modularity = 0.913). Here, also colour-coded grouping represents communities.

https://doi.org/10.1371/journal.pone.0248212.g002

Furthermore, nosotros employ the well-known PageRank algorithm [36] to identify the axis of nodes in the network. PageRank reflects the importance of a node in terms of its influence in the network. For example, an individual with a college Pagerank could reflect its bigger social influence in propagating a piece of data in the network. In Fig ii(a), the size of the node reflects the Pagerank of the node.

Table 2 provides the statistic of the network. The lower value of the boilerplate clustering coefficient and edge density tin be used to infer that the network is sparse. The values of these metrics further indicate that the network is spread out and the transmission of information would mayhap accept longer to transmit throughout the network. From the values of strongly and weakly connected components size and the number of components, nosotros tin conclude that at that place are a large number of pocket-sized communities. The value of reciprocity indicates that only 24.half dozen% individuals have common interests with each other.

Based on colour-coding, we can easily notice clusters of males and females in the CDR network (Fig ii(a)). For further studying the segregation, nosotros written report the males-only (run across Fig two(b)) and the females-only (see Fig two(c)) networks separately. For creating the males-only network, we drop all the caller and callee ids, which vest to females. Similarly, nosotros drop all the caller and callee ids which are males for creating the females-only network. For better comparisons of these networks, nosotros report the backdrop of each of these networks in Table 3. Although this creation of males-merely and females-only networks is constructed, all the same information technology can provide some significant data most segregation in these networks.

In Fig 2(b) and ii(c) users are grouped into communities based on modularity values (0.803 for the males-only and 0.913 for females-only network). The higher value of modularity indicates that the females-only network has more clusters but these clusters are densely continued within themselves equally also supported by the higher average clustering coefficient of the females-just network, which suggests that females bonds in smaller groups, but these groups are tightly connected compared to their males counterparts. Higher value of edge density for males-only network suggests that males, in general, have more connection compared to females. In addition, smaller bore and average path length values of the males-only network indicate that the network is compact compare to the females-only network, which is more than spread out. These males-only and females-simply network metrics signal out that the transmission of information is fast in the males-only network compared to the females-simply network.

With the aim to sympathize the gender predominance in dissimilar counties of Estonia, we further looked at the top-100 influential users in the CDR network past using the PageRank axis. In Estonia, at that place are 15 counties, with Harju county, which includes capital Tallinn being the most populous and Hiiu county being the least populous. The gender distribution of the top identified nodes is shown in Table iv (column two). The actual gender population distribution amidst counties is shown in Table 4 (column 3). The difference is calculated past subtracting actual population ratio from the PageRank axis ratio. From the difference (see Table 4, Column 4), we can conclude that in 6 counties (Lääne-Viru, Ida-Viru, Rapla, Valga, Hiiu, and Saare), at that place are more than female influencers among acme-100 nodes taking in account these counties population. We can conclude from the difference (run across Table 4, Column 4) that at that place are more female influencers among the superlative-100 nodes in six counties (Lääne-Viru, Ida-Viru, Rapla, Valga, Hiiu, and Saare), taking into account the population of these counties. Similarly, there are more male influencers in full of eleven counties such as Viljandi, Lääne, Võru, etc (come across too Fig iii) [37].

thumbnail

Fig 3. The difference between PageRank gender representation ratio and actual population gender distribution ratio among various counties.

The difference less than zero (in greenish) indicates that females are major source of information in that region and their number is higher compared to their population. Similarly, the difference greater than zero (in red) implies that in that region, higher number of males are the primary source of data compared to their population.

https://doi.org/10.1371/periodical.pone.0248212.g003

thumbnail

Table four. Comparison of population gender distribution and PageRank axis gender distribution (among pinnacle 100).

Deviation is calculated by subtracting population ratio from PageRank ratio. The negative value of difference indicate that the female representation is higher in that county. Similarly, a positive departure value point that male representation is higher in that county.

https://doi.org/10.1371/periodical.pone.0248212.t004

four.2 Micro-analysis

This department focuses on agreement gender interaction in the CDR data by exploring various users' characteristics, including gender, age, language, and location. Nosotros also calculated gender segregation using the Coleman homophily alphabetize (Hello) past considering the mentioned characteristics. In Section 4.2.one, we describe the Hello in detail. The gender segregation based on the mentioned characteristics is explained from Section 4.2.2 to iv.2.5. We and then extracted the relationships between the mentioned characteristics and compared them with census datasets from Statistics Republic of estonia (https://www.stat.ee/en) in Section 4.2.6.

4.2.1 Homophily index for measuring the segregation.

Homophily is the tendency of individuals to connect and bond with other individuals [38]. In the past, homophily has been studied in corking item in various works [39–42]. These studies establish that similarity is associated with the connectedness amid individuals and can exist categorized based on age [43], gender [44], class [45], ethnicity [46], etc. In this work, we employ the Coleman homophily alphabetize (Hullo) [47] to measure out gender segregation in Republic of estonia. We use Hullo equally it efficiently compares the homophily of groups with dissimilar sizes by normalizing the excess homophily of groups by its maximal value [47].

Calculating HI value: Let us consider a network with static attribute groups A and B (of relative size North A and N B with N A + N B = 1) distributed among nodes uniformly at random and independently of the network structure, such that there is a fraction P AB = P BA of edges between groups, and fractions P AA , P BB inside each grouping (P AA + P AB + P BB = 1). In the instance of 2 attribute groups, the probability that a random edge from a node in a group A leads to a node in group A is defined every bit: (1)

Similarly, we can write equation for T bb . The Hello value for group A (Hullo A ) and B (Howdy B ) can be calculated using (2) (three)

The range for both HI A and HI B is from -one to 1, where -one for How-do-you-do A means that group A individuals only connects with group B individuals (only in between groups connections), whereas one for Hello A means that group A individuals only connects with grouping A individuals (merely within-grouping connections). Like is true for group B homophily index Hello B .

four.2.2 Gender segregation based on age-groups.

In this department, we written report gender segregation among users because four different historic period-groups, that is, early working age ([fourteen-24] years), prime number working age ((24-54] years), mature working age ((54-64] years), and elderly (65+ years). Delight annotation that we use the age-groups proper name and age-groups range interchangeably in the rest of the section. Fig 4 compares the calls made by diverse age-groups based on gender. We find that the departure betwixt median of the females and median of the males calls is highest for the age-group [fourteen, 24]. The median of the calls for females and males of age-group [fourteen, 24] are 28 and 25, respectively. This indicates that females of age-group [14, 24] call more ofttimes than males. Side by side, we study gender segregation amidst various historic period-groups using HI values.

thumbnail

Fig four. Median of the calls for various age-groups based on gender.

Median of the calls for female's age-groups (14,24], (24,54], (54,64] and (64,100] are 28, 27, nineteen and 12 respectively. Similarly, median of the calls for male'due south age-groups are 25, 27, xix and 11 respectively.

https://doi.org/10.1371/journal.pone.0248212.g004

Based on Hello values (see Fig 5), nosotros can infer that males and females in all the age-groups (except (24,54]) tend to telephone call more to the reverse gender of the aforementioned historic period-group. Whereas calls of both females and males of age-group (24,54] are more likely to remain within the same gender and age-grouping. Besides, males of age-grouping (14,24] are strongly inclined towards females of the same age-groups with HI value equals to -0.68. Nosotros can also conclude that females of age-group (54,64] and males of age-group (24,54] are well continued with both genders with depression Hello values 0.04 and -0.028, respectively. Age-group (64,100] exhibit similar connectivity beliefs with the contrary gender within the same historic period-group with Hi values of -0.11 and -0.i, respectively.

thumbnail

Fig 5. Coleman'southward homophily index (HI) for various age-group.

Hi for female's age-groups (14,24], (24,54], (54,64] and (64,100] are -0.09, 0.09, -0.03 and -0.1 respectively. Similarly, Howdy for male's age-groups are -0.68, 0.04, -0.08 and -0.11 respectively.

https://doi.org/10.1371/journal.pone.0248212.g005

The comparison of females and males calling pattern based on age-groups also highlights that males of early working historic period, mature working historic period, and elderly calls more than to females of the same age-group. Still, at the same time, they maintain strong connectivity with males of other historic period-groups as well. On the other hand, near females' calls of prime working age remain within the same age-group of females, making their connectivity with other age-groups relatively weak. Based on these calling behavior between historic period-groups, we can conclude that the inclination of both females and males towards the aforementioned gender came from the prime number working age. To explore farther, next, nosotros examined language-based gender segregation.

4.2.iii Gender segregation based on linguistic communication.

As mentioned earlier, our dataset includes three languages spoken by the population of Estonia, namely Estonian, Russian and English language. In this section, nosotros brainstorm our assay by comparison the median of the calls for both females and males based on language (run across Fig half-dozen). Our findings highlight that Estonian-speaking females call more compare to Estonian-speaking males. On the other mitt, Russian-speaking males call more compare to Russian-speaking females. We also observe that median of the calls made by Russian-speaking individuals (both males and females) and English language-speaking males are higher than the Estonian-speaking population (both males and females). Based on call activeness among different languages, nosotros tin can infer that Russian-speaking individuals call comparatively college than Estonian-speaking individuals.

thumbnail

Fig six. Median of the calls for diverse linguistic communication speaking population based on gender.

Median of the calls for female's for languages English, Estonian and Russian are 26, 24 and 25 respectively. Similarly, median of the calls for male'southward for languages are 21.v, 22 and 27 respectively.

https://doi.org/10.1371/periodical.pone.0248212.g006

Next, we measure gender segregation based on language using HI values. The Hi index for Russian-speaking population shows that Russian-speaking females are more inclined towards Russian-speaking females (come across Fig 7). On the other hand, Russian-speaking males are slightly inclined towards Russian-speaking females. The Hello index for the Estonian-speaking population shows that both males and females are inclined towards the same gender and language. For the English-speaking population, the How-do-you-do index indicates that both males and females like to talk more with females. Therefore, based on linguistic communication, nosotros can conclude that females are inclined towards the same language females. On the other mitt, Russian and English-speaking males are inclined towards the same linguistic communication females, only Estonian males are inclined towards Estonian-speaking males. This shows that Estonian-speaking population and Russian-speaking females are more segregated compared to others. To explore further, nosotros also investigate gender segregation based on counties.

thumbnail

Fig 7. Coleman'southward homophily alphabetize (HI) for diverse languages.

HI for male'southward languages English, Estonian and Russian are -0.27, 0.14 and -0.04 respectively. Similarly, HI for female'south historic period-groups are 0.02, 0.1 and 0.14 respectively.

https://doi.org/10.1371/journal.pone.0248212.g007

4.2.4 Gender segregation in Estonian counties.

There are 15 counties in Estonia, with Harju county, which includes majuscule Tallinn being the most populous and Hiiu county existence the to the lowest degree populous. Here, we start our analysis by comparison the median of the calls fabricated in diverse counties based on gender, as shown in Fig 8. We find that in total, 8 counties (i.e., Hiiu, Ida-Viru, Lääne-Viru, Pärnu, Saare, Tartu, Viljandi, and Võru) take a difference in calls past gender. We too detect that even though the distribution of the gender population in the Harju and Ida-Viru counties are the same (see Table 4), county Harju has the least gap in terms of males and females median of the calls; and the county Ida-Viru has the biggest gap. In reality, in that location is effectually 81% Russian-speaking and xviii% Estonian-speaking population in Ida-Viru. On the other hand, Harju has approximately xl% Russian-speaking and lx% Estonian-speaking population. This indicates that the Russian-speaking females tend to telephone call more compared to the Estonian-speaking population. We investigate this in more detail in the next section.

thumbnail

Fig eight. Calls density for counties in Estonia based on gender.

Median of the calls for male'southward in diverse counties (starting from bottom (Harju) to top (Võru)) are vii, ii, 4, two, ii, 2, 5, ii, 2, ii, 3, three, 2, two and 3 respectively. Similarly, median of the calls for female's in diverse counties are vii, iv, vii, 2, two, 2, iii, 3, 2, two, five, 4, 2, three and 4 respectively.

https://doi.org/10.1371/journal.pone.0248212.g008

Additionally, to understand gender segregation in the Estonian counties, we calculated the How-do-you-do values for both males and females separately in each county. We find that in all counties, both males and females are inclined towards the same gender (see Fig 9). In counties (Hiiu, Lääne, Põlva, etc.), males are more inclined towards other males. Similarly, females are more inclined towards other females in Lääne, Hiiu, Jõgeva, etc. We further find that the counties with the most differences between How-do-you-do values for males and females are Järva (0.12), Hiiu (0.11), Ida-Viru (0.eleven), and Põlva (0.07). Also, the counties with the least departure between Hullo value for males and females are Tartu (0.0002), Pärnu (0.002), and Harju (0.01). In the next section, nosotros study two Estonian counties: Harju and Ida-Viru, to further explore gender segregation.

thumbnail

Fig 9. Coleman'southward homophily alphabetize (Hullo) for various counties.

Hi for male's in various counties (starting from bottom (Harju) to elevation (Võru)) are -0.27, 0.14 and -0.04 respectively. Similarly, HI for female'southward age-groups are 0.02, 0.1 and 0.14 respectively.

https://doi.org/10.1371/periodical.pone.0248212.g009

four.two.v Case study of prime number working age individuals in Harju & Ida-Viru.

In this section, we aim to compare gender segregation based on languages specifically in Harju and Ida-Viru counties. We focus on these counties for the following reasons. First, although the actual population of Harju is half dozen times greater than Ida-Viru, however, the percentage of males and females in Harju and Ida-Viru is same, that is, 45% males and 55% females. Second, in Harju majority population is Estonian-speaking, and in Ida-Viru, the majority population is Russian-speaking. In item, Harju has xl% Russian-speaking and lx% Estonian-speaking population, and on the other hand, Ida-Viru has roughly 81% Russian-speaking and 18% Estonian-speaking population.

Additionally, nosotros focus on prime number working age population (i.e., age-group (24,54]) but. This is because the overall prime working age population is more than inclined toward the same gender and at the same fourth dimension, information technology covers more than than 56% of our users' dataset. Furthermore, we filter our information for the Estonian and Russian-speaking population only equally it covers more 99% of individuals in our dataset. After applying the age-group, linguistic communication, and location filters, we are still covering more than than xiii.five% of our dataset.

Tabular array 5 covers diverse possible cases on filtered dataset (Row # 1 to 6) based on (i) languages, and (ii) location. We also report 3 boosted cases (Table 5, Row # 7 to 9) for comparing with previous cases on filtered dataset. Based on our analysis, nosotros tin conclude that

  1. Prime number working age Estonian-speaking population (both females and males) are more than inclined towards same age-group, language and gender individuals (see Table 5, Row # 3, four and viii).
  2. Prime working age Russian-speaking both males and females are more inclined towards prime working age Russian-speaking females (Table 5, Row # 5, 6 and ix).
  3. In Harju, we can notice that Russian-speaking prime number working age females are more than segregated compared to Estonian-speaking prime number working age females (Table 5, Row # 1, 3 and 5).
  4. Similarly, in Ida-Viru, both Estonian-speaking and Russian-speaking prime number working age females are every bit segregated (Tabular array five, Row # 2, 4 and 6). On the other hand, Estonian-speaking prime working age males are more segregated than Russian-speaking prime working age males.

4.2.vi Validating CDR using Estonian census information.

Side by side, we compare the distribution of gender, age-groups, linguistic communication, and locations from CDR data with publicly bachelor Estonian census datasets. The reason for doing this is to understand the representation of different users' groups in the CDR dataset relative to the actual population. Nosotros failed to find out a dataset that covers all users' features present in the CDR dataset (i.east., gender, age, language and location). At last, we came across two separate publicly available datasets that cover three features each. The first dataset (http://pub.stat.ee) includes the gender, age and location of the Estonian population; while gender, language and location are present in the 2d dataset (http://andmebaas.stat.ee). Both these datasets are publicly available on the website of Statistics Republic of estonia. Thus, nosotros decide to utilise these two datasets to compare the representation of Estonian population in CDR dataset. On comparing, nosotros observe that CDR covers dissimilar percentages of bodily historic period-groups and language population. For example, CDR data covers xviii.1% of actual female person prime number working historic period population (see Table 6, Row 2). Similar comparison findings are likewise reported based on language (run across Table vii). Thus, we fence that CDR tin can provide an opportunity to excerpt meaningful relationships among users' features.

From CDR information, we find that mobiles are ordinarily used past prime number working age users (i.e., (24,54]) and mature working age users (i.eastward., (54,64]). These ii categories cover 56.37% and 22.94% of our dataset, respectively (see Fig 10). Mobile usage percentages for early on working age and elderly users are 0.xvi% and xx.53%, respectively. On the other hand, the actual population per centum under prime number working, mature working, early working, and elderly age-groups are 49.63%, xiv.82%, 15%, and 20.55%, respectively. From this, we can infer that in the CDR dataset, the representation of the prime number working age users is significant considering their bodily population in Republic of estonia. Thus, nosotros can argue that the prime working age users' findings can exist considered accurate with reasonable confidence. The same is true for mature working age, and elderly users. On the other paw, the representation of early working age-group in CDR is less to negligible than the actual population, making it difficult to brand whatever concrete statements virtually this age-grouping. This distribution is further compared in counties for both actual and CDR information. We find that distributions for early working again lack similarity with bodily population, as we observed earlier when comparing age-grouping's distribution for CDR with the actual information.

thumbnail

Fig x. Comparison of gender, age-group, and canton relation using (a) CDR records, and (b) demography data from Statistics Republic of estonia.

In both figures (a) and (b), leftmost confined stand for gender, center bars represent age-group, and rightmost bars bear witness the counties of Estonia.

https://doi.org/x.1371/journal.pone.0248212.g010

Similarly, we besides compare the language information present in CDR dataset for gender and counties (see Fig 11(a)) with census datasets (see Fig 11(b)). The representation of Estonian, Russian and English language-speaking users in CDR is 87.09%, 12.7%, and 0.21%; and in actual population information is 69.38%, 30.half dozen% and 0.02%. Nosotros can conclude that representation of Estonian-speaking population in CDR data is i.25 times college than that of the actual population. On the other paw, CDR representation of Russian-speaking population is two.41 times smaller compared to their actual population in Estonia. Since the representation of Estonian and Russian-speaking population covers more 99% in reality and too in CDR data, the results of this report using CDR data can be considered useful for agreement gender segregation based on Estonian and Russian language in Estonia.

thumbnail

Fig 11. Comparison of gender, language and county relation using (a) CDR records, and (b) census information from Statistics Republic of estonia.

In both figures (a) and (b), leftmost bars stand for gender, middle bars represent language, and rightmost confined prove the counties of Estonia.

https://doi.org/10.1371/journal.pone.0248212.g011

Nosotros note that the CDR dataset and actual dataset can vary at various levels, such as: (1) the percentage of prime working age representation is higher in CDR compared to the actual population in Estonia, which is further noticed in the distribution of these age-groups by counties. (2) The nature of the information itself reduces the representation of early working age and old age users, which we confirm by comparing CDR and actual data. (3) We also observe that the CDR information has a higher representation of Estonian-speaking users and lower Russian-speaking users than the actual population, which can be seen equally another limitation of the CDR dataset. (4) Finally, nosotros tin deduce the distribution for gender, age-group, linguistic communication, and county using CDR data, which resembles the reality of Estonian guild with the to a higher place-mentioned limitations (see Fig 12).

thumbnail

Fig 12. Comparison of gender, age-group, language, and county relation using CDR data.

Here, leftmost bars correspond gender, 2d leftmost bars represent age-group, second rightmost confined shows language, and rightmost confined represent the counties of Republic of estonia.

https://doi.org/x.1371/periodical.pone.0248212.g012

v Decision

Agreement gender segregation and integration play a disquisitional role in determining any nation'south social and economic development of any nation [48]. With a quest to sympathize gender segregation in Estonia, we analyze a large call data records provided past 1 of the biggest mobile operators in Republic of estonia. Nosotros analyze the data from the following two broad dimensions:

  1. Firstly, nosotros perform the macro-analysis for understanding gender segregation using social network analysis. The results on network analysis indicate that about females and males phone call within the aforementioned gender, and the female network is relatively spread out more compared to the males' network, which is dense and strongly connected. In particular, to understand gender segregation in terms of identifying the top social connectors, we perform the assay at the county level as well, and we observed that in all counties, both males and females are inclined towards the same gender.
  2. Secondly, we study the impact of various users' features to explore the segregation in detail (micro-analysis). In particular, nosotros analyze the impact of age, language, and location on gender segregation. We find that the prime working age population (i.e., (24,54] years) is more segregated compared to other age-groups. Furthermore, nosotros observe that the Estonian-speaking population (both males and females) tend to communicate more with the Estonian-speaking population and same gender.

We compare the relationships among features of the CDR dataset with the Estonian census data, and observe that we can deduce the substantial relationships between users' characteristics such as gender, historic period, language, location using the CDR dataset. These relationships can be used past government agencies to make target policies for the needed segregated group in particular. Nosotros farther noticed that the major limitation of the CDR dataset comes from the fact that mobile phone employ is widespread later a certain age, therefore, CDR data tin exist considered every bit a valuable guide for understanding the communication blueprint in adults.

We plan to include various future directions for this work. To grasp the economic inequality, nosotros intend to analyze the data to a more detailed location, such as the municipalities. We would similar to investigate a larger dataset that spans a longer period of fourth dimension. Nosotros would too like to identify potential factors that are responsible for gender segregation in gild. We would also like to combine the mobile CDR data with other datasets such as financial data to understand the socioeconomic segregation in Estonia.

References

  1. 1. Boris P, Yifu Lin J. Annual World Bank Conference on Development Economics 2008, Regional: College Education and Evolution. The Earth Bank; 2008.
  2. 2. Alesina A, Devleeschauwer A, Easterly W, Kurlat S, Wacziarg R. Fractionalization. Periodical of Economic growth. 2003;viii(ii):155–194.
  3. 3. Siltanen J. Locating gender: Occupational segregation, wages and domestic responsibilities. Routledge; 2020.
  4. iv. Wasserman S, Faust 1000. Social network analysis: Methods and applications. vol. 8. Cambridge university press; 1994.
  5. v. B Ponieman Due north, Sarraute C, Minnoni M, Travizano M, Rodriguez Zivic P, Salles A. Mobility and sociocultural events in mobile phone information records. Ai Communications. 2015;29:77–86.
  6. half-dozen. Vocal C, Qu Z, Blumm Due north, Barabási AL. Limits of Predictability in Man Mobility. Scientific discipline. 2010;327(5968):1018–1021.
  7. seven. Ghosh A, Monsivais D, Bhattacharya Grand, Dunbar RI, Kaski Thou. Quantifying gender preferences in human social interactions using a large cellphone dataset. EPJ Information Science. 2019;8(1):nine.
  8. 8. Hiir H, Sharma R, Aasa A, Saluveer Eastward. Touch of Natural and Social Events on Mobile Call Data Records–An Estonian Case Written report. In: International Conference on Complex Networks and Their Applications. Springer; 2019. p. 415–426.
  9. 9. Deville P, Linard C, Martin Southward, Gilbert G, Stevens FR, Gaughan AE, et al. Dynamic population mapping using mobile phone information. 2014;111(45):15888–15893.
  10. x. Østbø Sørensen A, Bjelland J, Bull-Berg H, Landmark AD, Akhtar MM, Olsson NOE. Utilise of mobile phone data for analysis of number of train travellers. Journal of Rail Transport Planning & Direction. 2018;eight(ii):123–144.
  11. xi. Leo Y, Karsai M, Sarraute C, Fleury E. Correlations of Consumption Patterns in Social-economical Networks. In: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. ASONAM'16; 2016. p. 493–500.
  12. 12. Williams NE, Thomas TA, Dunbar 1000, Hawkeye Northward, Dobra A. Measures of Human being Mobility Using Mobile Phone Records Enhanced with GIS Data. CoRR. 2014;abs/1408.5420.
  13. 13. Silm Southward, Ahas R. The temporal variation of ethnic segregation in a city: Show from a mobile phone utilise dataset. Social Scientific discipline Research. 2014;47:30–43.
  14. fourteen. Blumen O, Zamir I. Two social environments in a working day: occupation and spatial segregation in metropolitan Tel Aviv. Environs and Planning A. 2001;33(ten):1765–1784.
  15. 15. Eagle N, Pentland Every bit, Lazer D. Mobile Phone Data for Inferring Social Network Construction. In: Liu H, Salerno JJ, Young MJ, editors. Social Computing, Behavioral Modeling, and Prediction. Boston, MA: Springer US; 2008. p. 79–88.
  16. 16. Onnela JP, Saramäki J, Hyvönen J, Szabó G, Lazer D, Kaski K, et al. Structure and Necktie Strengths in Mobile Advice Networks. Proceedings of the National Academy of Sciences of the U.s.a. of America. 2007;104:7332–6. pmid:17456605
  17. 17. Gonzalez MC, Hidalgo CA, Barabasi AL. Understanding individual homo mobility patterns. Nature. 2008;453(7196):779–782.
  18. 18. Isaacman Southward, Becker R, Cáceres R, Martonosi M, Rowland J, Varshavsky A, et al. Human being Mobility Modeling at Metropolitan Scales. In: Proceedings of the tenth International Briefing on Mobile Systems, Applications, and Services. MobiSys'12. New York, NY, USA: ACM; 2012. p. 239–252.
  19. 19. Ahas R, Ülar Marker. Location based services—new challenges for planning and public administration? Futures. 2005;37(half-dozen):547–561.
  20. 20. Allen JP, Turner E. Black–white and Hispanic–white segregation in US counties. The Professional Geographer. 2012;64(4):503–520.
  21. 21. Johnston R, Poulsen Thousand, Forrest J. The comparative study of ethnic residential segregation in the USA, 1980–2000. Tijdschrift voor economische en sociale geografie. 2004;95(v):550–569.
  22. 22. Blumenstock J, Fratamico 50. Social and spatial ethnic segregation: a framework for analyzing segregation with large-calibration spatial network information. In: Proceedings of the 4th Annual Symposium on Calculating for Development; 2013. p. ane–10.
  23. 23. Silm S, Ahas R, Mooses V. Are younger historic period groups less segregated? Measuring ethnic segregation in activity spaces using mobile phone information. Journal of Indigenous and Migration Studies. 2018;44(11):1797–1817.
  24. 24. Yinger J. Closed doors, opportunities lost: The continuing costs of housing bigotry. Russell Sage Foundation; 1995.
  25. 25. Meyer SG. As long as they don't movement next door: Segregation and racial conflict in American neighborhoods. Rowman & Littlefield; 2000.
  26. 26. Boy J, Pastor-Escuredo D, Macguire D, Jimenez RM, Luengo-Oroz Thou. Towards an understanding of refugee segregation, isolation, homophily and ultimately integration in Turkey using call particular records. In: Guide to Mobile Data Analytics in Refugee Scenarios. Springer; 2019. p. 141–164.
  27. 27. Åslund O, Skans ON. Will I see you at work? Indigenous workplace segregation in Sweden, 1985–2002. ILR Review. 2010;63(three):471–493.
  28. 28. Ellis M, Wright R, Parks 5. Work together, alive autonomously? Geographies of racial and ethnic segregation at home and at piece of work. Annals of the Association of American Geographers. 2004;94(3):620–637.
  29. 29. Dougherty KD. How monochromatic is church membership? Racial-indigenous diversity in religious community. Sociology of religion. 2003;64(i):65–85.
  30. xxx. Chhabra D. Ethnicity and marginality effects on casino gambling behavior. Journal of Holiday Marketing. 2007;13(iii):221–238.
  31. 31. Floyd D, et al. Race, ethnicity and use of the National Park Arrangement. 1999;.
  32. 32. Toomet O, Silm S, Saluveer East, Ahas R, Tammaru T. Where practice ethno-linguistic groups come across? How copresence during complimentary-fourth dimension is related to copresence at domicile and at work. PloS ane. 2015;x(5). pmid:25996504
  33. 33. Estonia S. Statistical database. Population data. 2012;2012.
  34. 34. GEMEINSCHÄF EE, ST ED, EUROPEE-BUREA EC. eurostat;.
  35. 35. Estonia S. Mean annual population by sexual activity and age group;. Available from: https://world wide web.stat.ee/esms-metadata?code=30205.
  36. 36. Page L, Brin S, Motwani R, Winograd T. The pagerank citation ranking: Bringing lodge to the web. Stanford InfoLab; 1999.
  37. 37. Board EL. Administrative and settlement units; 2020. Available from: https://geoportaal.maaamet.ee/eng/Spatial-Information/Administrative-and-Settlement-Sectionalisation-p312.html.
  38. 38. Ferguson Due north. The Faux Prophecy of Hyperconnection: How to Survive the Networked Age. Foreign Aff. 2017;96:68.
  39. 39. McPherson One thousand, Smith-Lovin L, Cook JM. Birds of a feather: Homophily in social networks. Annual review of sociology. 2001;27(one):415–444.
  40. twoscore. Kandel DB. Homophily, selection, and socialization in adolescent friendships. American journal of Sociology. 1978;84(2):427–436.
  41. 41. Gillespie BJ, Frederick D, Harari L, Grov C. Homophily, shut friendship, and life satisfaction amid gay, lesbian, heterosexual, and bisexual men and women. PloS one. 2015;10(six):e0128900.
  42. 42. Asikainen A, Iñiguez Thou, Ureña-Carrión J, Kaski K, Kivelä M. Cumulative effects of triadic closure and homophily in social networks. Science Advances. 2020;6(19):eaax7310.
  43. 43. Fu F, Nowak MA, Christakis NA, Fowler JH. The development of homophily. Scientific reports. 2012;two:845.
  44. 44. Shoham DA, Tong Fifty, Lamberson PJ, Auchincloss AH, Zhang J, Dugas Fifty, et al. An actor-based model of social network influence on boyish torso size, screen time, and playing sports. PloS ane. 2012;seven(vi):e39795. pmid:22768124
  45. 45. Smirnov I, Thurner S. Formation of homophily in bookish performance: Students change their friends rather than performance. PloS one. 2017;12(8):e0183473.
  46. 46. Sahasranaman A, Jensen HJ. Ethnicity and wealth: The dynamics of dual segregation. PloS one. 2018;xiii(10):e0204307.
  47. 47. Coleman J. Relational assay: The study of social organizations with survey methods. Human organization. 1958;17(iv):28–36.
  48. 48. Leo Y, Fleury E, Alvarez-Hamelin JI, Sarraute C, Karsai M. Socioeconomic correlations and stratification in social-communication networks. Journal of The Royal Society Interface. 2016;13(125):20160598.

0 Response to "Social Media Segragated by Age"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel