A synthetic data generator for online social network graphs

9/9/2023

The model is dynamic and sustainable to changes in input parameters, such as number of nodes and nodes’ attributes, by conserving its structural properties. The synthetic graph generated by our model is scale-invariant and has symmetric relationships. Homophily is one of the key factors for interactive relationship formation in SN. Our model enables the SN researchers to generate SN synthetic data for the evaluation of multi-facet SN models that are dependent on users’ attributes and similarities. In this paper, we propose a new scale-free social networks (SNs) evolution model that is based on homophily combined with preferential attachments. In the empirical section we perform experiments to demonstrate the scalability of the method and the improvement in terms of reduction of information loss with respect to approaches which do not consider the local neighborhood context when anonymizing. All this is done for a complex dataset which can be fitted to a real dataset in terms of data profiles and distributions. Hence, in the present work we address these issues by designing and implementing a sophisticated synthetic data generator together with an anonymization processor with strict privacy guarantees and which takes into account the local neighborhood when anonymizing. Also, there is a lack of systems which facilitate the work of a data analyst in anonymizing this type of data structures and performing empirical experiments in a controlled manner on different datasets. Thus, improving this aspect will have a high impact on the data utility of anonymized social networks. Current anonymization techniques are good as identifying risks and minimizing them, but not so good at maintaining local contextual data which relate users in a social network. However, when data is anonymized to make it safe for publication in the public domain, information is inevitably lost with respect to the original version, a significant aspect of social networks being the local neighborhood of a user and its associated data. On the other hand, there are many risks for user privacy, as information a user may wish to remain private becomes evident upon analysis. Also, data analysts have found a fertile field for analyzing user behavior at individual and collective levels, for academic and commercial reasons. In recent years, online social networks have become a part of everyday life for millions of individuals. The data generator is also highly configurable, with a sophisticated control parameter set for different “similarity/diversity” levels. A good match is obtained between the generated data and the target profiles and distributions, which is competitive with other state of the art methods. The empirical tests confirm that our approach generates a dataset which is both diverse and with a good fit to the target requirements, with a realistic modeling of noise and fitting to communities. In the following work, we present and validate an approach for populating a graph topology with synthetic data which approximates an online social network. However, this presents a series of challenges related to generating a realistic dataset in terms of topologies, attribute values, communities, data distributions, correlations and so on. One possible solution to both of these problems is to use synthetically generated data.

Two of the difficulties for data analysts of online social networks are (1) the public availability of data and (2) respecting the privacy of the users.

0 Comments

Author

Archives

Categories

A synthetic data generator for online social network graphs

Leave a Reply.