Clustering the Neighbourhoods of French cities

1. Introduction

The Covid 19 crisis has been shaking the world now for more than 6 months. Faced with the health emergency, many governments, notably those of European countries, have for many made the choice of health over economic life by choosing to confine its population and close its borders. One of the consequences of these policies is to radically change the daily lives of citizens around the world, especially in cities.

2. Business Problem

The aim of this project is to identify what characterizes French cities in their decomposition in terms of neighborhood and venues.
Thanks to this decomposition, we will be able to understand in part the choices of economic specification in France.
This could help to identify the strengths, but also the weaknesses of the French economy when crises like the one we are experiencing with Covid hit the world. Are French cities highly dependent on the tourism sector? What about the commercial sector in this festive period that is coming up? We understand here that our conclusion in this project is aimed at interested citizens but also at the stakehodlers (city hall, government etc ... ) to better identify the sectors to be protected in their cities.

3. Data Description

As presented in section 4. methodology, we first retrieve as much data as possible on French cities. We need the postal codes, the names of the cities and their different neighborhoods (if they have any).

3.1 Rank Cities in France

In order to perfom an analysis in the 5., we will reduce our clustering to the top 10 of french cities.

To do that, we scrape our data from https://en.wikipedia.org/wiki/List_of_communes_in_France_with_over_20,000_inhabitants

This wikipedia page has information about list of big communes in France and provide us a ranking by inhabitants.

  1. Commune : Name of Commune
  2. Department : Name of Department
  3. Region : Name of Region
  4. Population, 2013 : Population at year 2013
  5. Population, 2017 : Population at year 2017
  6. Rank : Rank based on the Population at year 2017

3.2 Get french cities and their neighbourhoods

We use JSON data available at https://www.data.gouv.fr/fr/datasets/r/34d4364c-22eb-4ac0-b179-7a1845ac033a

  1. codePostal : Postal codes for France
  2. codeCommune : Code for Commune in France
  3. nomCommune : Name of the boroughs (for big cities), equivalent to Commune in France
  4. libelleAcheminement : Name of city

3.3 Foursquare API Data

To meet the need identified above, we are going to need the different venues of cities in France. Thanks to the Foursquare API, we will find this information. The API needs GPS codes (geocoding) to work.

The api gives the following information :

  1. Neighbourhood : Name of the Neighbourhood
  2. Neighbourhood Latitude : Latitude of the Neighbourhood
  3. Neighbourhood Longitude : Longitude of the Neighbourhood
  4. Venue : Name of the Venue
  5. Venue Latitude : Latitude of Venue
  6. Venue Longitude : Longitude of Venue
  7. Venue Category : Category of Venue

4. Methodology

We are going to collect the maximum amount of data on cities in France in a first step in order to perform our clustering on the largest number of cities.
This clustering would be done according to the different venues categories that the foursquare API will provide us. This provides a complete, ready-to-use clustering for further analysis. We could think of a report for the French government for example.
For this project here, we will filter our results to the Top 10 cities in France and present our conclusions.

In the meantime, we are going to build different maps of cities in France and then of the different clusters.

4.0 Load package

4.1 Importing Data

4.1.1 Top from wikipedia

Response 200 means that we are able to make the connection to the page

We need the first table alone, so we drop the other tables

4.1.2 Collect French cities data

To collect data for cities, we download the JSON file containg all the postal codes of France from https://www.data.gouv.fr/fr/datasets/r/34d4364c-22eb-4ac0-b179-7a1845ac033a

Using Pandas we load the table after reading the JSON file:

4.2 Data Processing

We perform an inner join of the two tables in order to have a classification of the towns with the Cities data.

Sort by rank

Keep only interessed columns

4.3 Feature Engineering

In order to use the api foursquare, we need the geocoding of cities and neighborhoods. We will use the geopy library to geocode our variables.

Let's make a test with Paris 9E Arrondissement

Working !

Let's apply our geocoding to our full dataset

Drop Na with combined_data

4.5 Visualizing the Neighbourhoods of French cities

4.6 Get French Venues with Foursquare API

We have set our Foursquare API and removed it (privacy)

So we have 9728 records and 5 columns. Checking sample data

We have 323 Category Venue

4.7 One Hot encoding the venue Categories

Adding the neighbourhood to the encoded dataframe

4.8 Top Venues in the Neighbourhoods

Let's make a function to get the top most common venue categories

There are way too many venue categories, we can take the top 10 to cluster the neighbourhoods

Let's make the model to cluster our Neighbourhoods

4.9 Model Building - KMeans

Checking the labelling of our model

Let's add the clustering Label column to the top 10 common venue categories

venues_in_france.groupby('Venue Category').max()

Join france_grouped with combined_data on neighbourhood to add latitude & longitude for each neighborhood to prepare it for plotting

4.10 Map Clustering

4.11 Examining our Clusters

Select top ten for Cluster 1

5. Results and Discussion

Cluster 1 Analysis : in general, the 1st commun venue is french restaurant.
The french city top 10 bets on its comparative advantage in terms of food.
One can analyse that this cluster 1 emphasizes the touristi dimension of big City in France : Hotel, Bar, Cafe, Plaza then some Museum.

To this touristic dimension, we can add an important commercial dimension for the top cities of France. There are a lot of shopping centers such as clothing shops or food stores etc....

Cluster 4 : is more about daily life in top city with Park, Pedestrian Plaza, Metro Station, Bakery ...

In addition, we can notice a very important multiculral in the neighborhoods of these cities with different Indian, Italian, greek etc... restaurants.
The food remains very important for these cities and therefore, we can make the hypothesis for the daily life of the French.
The modes of transport differ from city to city: tram for Lyon, Metro for Paris.

6. Conclusion

The aim of this project was to establish comprehensive comparisons between cities in France using the Kmeans technique. In doing so, we can study the attractiveness of these cities, which is also their specificity.

Using a complete database, we have assigned the clustering to all cities in France. This allows us to have, for the future, access to a complete comparison of France. For the sake of this task, we have reduced our anayse to the Top 10 of France by cross-referencing wikipedia data.

We can first observe that the quartiers of the top cities in France are similar: restaurants, bakery, bar, museum etc... This shows the specialization of France in the tourism sector. Each city then has its own cultural specificity. Some cities have an important divisersity, which is linked with their immigrant dimension.

In this period of Covid 19, thanks to this study, we can be worried about the economy of these cities, which relies heavily on trade and tourist places and ring the alarm with the competent authorities.

How to replace the economic contributions of these flows when the borders are closed, the trade forbidden to open?
Are the cities and their neighbourhood going to change as a result of this crisis? Will we have the same clustering in 2 years ? This is an interesting question that remains open at the end of this project.

7.References

  1. GitHub

  2. Foursquare API

  3. ArcGIS API