The Covid 19 crisis has been shaking the world now for more than 6 months. Faced with the health emergency, many governments, notably those of European countries, have for many made the choice of health over economic life by choosing to confine its population and close its borders. One of the consequences of these policies is to radically change the daily lives of citizens around the world, especially in cities.
The aim of this project is to identify what characterizes French cities in their decomposition in terms of neighborhood and venues.
Thanks to this decomposition, we will be able to understand in part the choices of economic specification in France.
This could help to identify the strengths, but also the weaknesses of the French economy when crises like the one we are experiencing with Covid hit the world.
Are French cities highly dependent on the tourism sector? What about the commercial sector in this festive period that is coming up?
We understand here that our conclusion in this project is aimed at interested citizens but also at the stakehodlers (city hall, government etc ... ) to better identify the sectors to be protected in their cities.
As presented in section 4. methodology, we first retrieve as much data as possible on French cities. We need the postal codes, the names of the cities and their different neighborhoods (if they have any).
In order to perfom an analysis in the 5., we will reduce our clustering to the top 10 of french cities.
To do that, we scrape our data from https://en.wikipedia.org/wiki/List_of_communes_in_France_with_over_20,000_inhabitants
This wikipedia page has information about list of big communes in France and provide us a ranking by inhabitants.
We use JSON data available at https://www.data.gouv.fr/fr/datasets/r/34d4364c-22eb-4ac0-b179-7a1845ac033a
To meet the need identified above, we are going to need the different venues of cities in France. Thanks to the Foursquare API, we will find this information. The API needs GPS codes (geocoding) to work.
The api gives the following information :
We are going to collect the maximum amount of data on cities in France in a first step in order to perform our clustering on the largest number of cities.
This clustering would be done according to the different venues categories that the foursquare API will provide us.
This provides a complete, ready-to-use clustering for further analysis. We could think of a report for the French government for example.
For this project here, we will filter our results to the Top 10 cities in France and present our conclusions.
In the meantime, we are going to build different maps of cities in France and then of the different clusters.
import pandas as pd
import requests
from geopy.extra.rate_limiter import RateLimiter
from geopy.geocoders import Nominatim
import numpy as np
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
url = "https://en.wikipedia.org/wiki/List_of_communes_in_France_with_over_20,000_inhabitants"
wiki_url = requests.get(url)
wiki_url
<Response [200]>
Response 200 means that we are able to make the connection to the page
wiki_data = pd.read_html(wiki_url.text)
wiki_data
[ Commune Department Region \ 0 Paris Paris Île-de-France 1 Marseille Bouches-du-Rhône Provence-Alpes-Côte d'Azur 2 Lyon Lyon Metropolis Auvergne-Rhône-Alpes 3 Toulouse Haute-Garonne Occitanie 4 Nice Alpes-Maritimes Provence-Alpes-Côte d'Azur .. ... ... ... 267 Charenton-le-Pont Val-de-Marne Île-de-France 268 Pierrefitte-sur-Seine Seine-Saint-Denis Île-de-France 269 Chatou Yvelines Île-de-France 270 Rillieux-la-Pape Lyon Metropolis Auvergne-Rhône-Alpes 271 Vandœuvre-lès-Nancy Meurthe-et-Moselle Grand Est Population, 2013 Population, 2017 Rank 0 2420069 2187526 1 1 855393 863310 2 2 500715 516092 3 3 458298 479553 4 4 342295 340017 5 .. ... ... ... 267 30408 30374 268 268 28459 30306 269 269 30809 30253 270 270 30645 30012 271 271 29836 30002 272 [272 rows x 6 columns], Commune Department Region \ 0 Savigny-le-Temple Seine-et-Marne Île-de-France 1 Saint-Cloud Hauts-de-Seine Île-de-France 2 Périgueux Dordogne Nouvelle-Aquitaine 3 Villemomble Seine-Saint-Denis Île-de-France 4 Maubeuge Nord Hauts-de-France .. ... ... ... 192 Montaigu-Vendée Vendée Pays de la Loire 193 Voiron Isère Auvergne-Rhône-Alpes 194 Cournon-d'Auvergne Puy-de-Dôme Auvergne-Rhône-Alpes 195 Nogent-sur-Oise Oise Hauts-de-France 196 Le Roc Midi-Pyrénées Lot Population, 2013 Population, 2017 Rank 0 30068.0 29984 273 1 29109.0 29973 274 2 30036.0 29966 275 3 29165.0 29964 276 4 30567.0 29944 277 .. ... ... ... 192 NaN 20126 465 193 19988.0 20108 466 194 19287.0 20043 467 195 18753.0 20033 468 196 232.0 213 469 [197 rows x 6 columns], Commune Territory Pop. 2017/2019 Rank 0 Nouméa New Caledonia 94285 1 1 Dumbéa New Caledonia 35873 2 2 Saint-Martin Collectivity of Saint Martin 35334 3 3 Faaa French Polynesia 29506 4 4 Punaauia French Polynesia 28103 5 5 Le Mont-Dore New Caledonia 27620 6 6 Papeete French Polynesia 26926 7 7 Païta New Caledonia 24563 8, vteFrance topics \ 0 History 1 Periods 2 Regimes 3 Geography 4 Administrative divisions Cities Islands Lakes ... 5 Politics 6 Constitution Elections presidential Foreign re... 7 Economy 8 Agriculture Automotive industry Banking Centra... 9 Society 10 Crime Demographics Education Health care Peopl... 11 Culture 12 Outline Book Category Portal WikiProject vteFrance topics.1 \ 0 Periods Timeline Prehistory Greek colonies Cel... 1 Timeline Prehistory Greek colonies Celtic Gaul... 2 Absolute monarchy Ancien Régime First Republic... 3 Administrative divisions Cities Islands Lakes ... 4 Administrative divisions Cities Islands Lakes ... 5 Constitution Elections presidential Foreign re... 6 Constitution Elections presidential Foreign re... 7 Agriculture Automotive industry Banking Centra... 8 Agriculture Automotive industry Banking Centra... 9 Crime Demographics Education Health care Peopl... 10 Crime Demographics Education Health care Peopl... 11 Architecture Art Cinema Cuisine Cultural icons... 12 Outline Book Category Portal WikiProject vteFrance topics.2 0 NaN 1 NaN 2 NaN 3 NaN 4 NaN 5 NaN 6 NaN 7 NaN 8 NaN 9 NaN 10 NaN 11 NaN 12 Outline Book Category Portal WikiProject , 0 1 0 Periods Timeline Prehistory Greek colonies Celtic Gaul... 1 Regimes Absolute monarchy Ancien Régime First Republic..., 0 \ 0 Administrative divisions Cities Islands Lakes ... 1 0 Administrative divisions Cities Islands Lakes ... , 0 \ 0 Constitution Elections presidential Foreign re... 1 0 Constitution Elections presidential Foreign re... , 0 \ 0 Agriculture Automotive industry Banking Centra... 1 0 Agriculture Automotive industry Banking Centra... , 0 \ 0 Crime Demographics Education Health care Peopl... 1 Culture 1 0 Crime Demographics Education Health care Peopl... 1 Architecture Art Cinema Cuisine Cultural icons... , vteLists of cities in Europe \ 0 Sovereign states 1 States with limitedrecognition 2 Dependencies andother entities vteLists of cities in Europe.1 0 Albania Andorra Armenia Austria Azerbaijan Bel... 1 Abkhazia Artsakh Kosovo Northern Cyprus South ... 2 Gibraltar , vteList of towns in Europe \ 0 Sovereign states 1 States with limitedrecognition 2 Dependencies andother entities vteList of towns in Europe.1 0 Albania Andorra Armenia Austria Azerbaijan Bel... 1 Abkhazia Artsakh Kosovo Northern Cyprus South ... 2 Åland Faroe Islands Gibraltar Guernsey Isle of... ]
len(wiki_data), type(wiki_data)
(11, list)
We need the first table alone, so we drop the other tables
wiki_data = wiki_data[0]
wiki_data
Commune | Department | Region | Population, 2013 | Population, 2017 | Rank | |
---|---|---|---|---|---|---|
0 | Paris | Paris | Île-de-France | 2420069 | 2187526 | 1 |
1 | Marseille | Bouches-du-Rhône | Provence-Alpes-Côte d'Azur | 855393 | 863310 | 2 |
2 | Lyon | Lyon Metropolis | Auvergne-Rhône-Alpes | 500715 | 516092 | 3 |
3 | Toulouse | Haute-Garonne | Occitanie | 458298 | 479553 | 4 |
4 | Nice | Alpes-Maritimes | Provence-Alpes-Côte d'Azur | 342295 | 340017 | 5 |
... | ... | ... | ... | ... | ... | ... |
267 | Charenton-le-Pont | Val-de-Marne | Île-de-France | 30408 | 30374 | 268 |
268 | Pierrefitte-sur-Seine | Seine-Saint-Denis | Île-de-France | 28459 | 30306 | 269 |
269 | Chatou | Yvelines | Île-de-France | 30809 | 30253 | 270 |
270 | Rillieux-la-Pape | Lyon Metropolis | Auvergne-Rhône-Alpes | 30645 | 30012 | 271 |
271 | Vandœuvre-lès-Nancy | Meurthe-et-Moselle | Grand Est | 29836 | 30002 | 272 |
272 rows × 6 columns
wiki_data['Commune'] = wiki_data['Commune'].str.upper()
To collect data for cities, we download the JSON file containg all the postal codes of France from https://www.data.gouv.fr/fr/datasets/r/34d4364c-22eb-4ac0-b179-7a1845ac033a
Using Pandas we load the table after reading the JSON file:
#set the file location as URL or filepath of the json file
f_data_url="https://www.data.gouv.fr/fr/datasets/r/34d4364c-22eb-4ac0-b179-7a1845ac033a"
#load the json data from the file to a pandas dataframe
france_raw = pd.read_json(f_data_url)
france_raw.head()
codePostal | codeCommune | nomCommune | libelleAcheminement | |
---|---|---|---|---|
0 | 10200 | 10002 | Ailleville | AILLEVILLE |
1 | 10160 | 10003 | Aix-Villemaur-Pâlis | AIX-VILLEMAUR-PALIS |
2 | 10190 | 10003 | Aix-Villemaur-Pâlis | AIX-VILLEMAUR-PALIS |
3 | 10700 | 10004 | Allibaudières | ALLIBAUDIERES |
4 | 10140 | 10005 | Amance | AMANCE |
france_raw[france_raw['nomCommune'].str.contains('Lyon')]
codePostal | codeCommune | nomCommune | libelleAcheminement | |
---|---|---|---|---|
6400 | 27480 | 27048 | Beauficel-en-Lyons | BEAUFICEL EN LYONS |
6685 | 27480 | 27377 | Lyons-la-Forêt | LYONS LA FORET |
12472 | 42140 | 42059 | Chazelles-sur-Lyon | CHAZELLES SUR LYON |
24158 | 69110 | 69202 | Sainte-Foy-lès-Lyon | SAINTE FOY LES LYON |
24244 | 69001 | 69381 | Lyon 1er Arrondissement | LYON |
24245 | 69002 | 69382 | Lyon 2e Arrondissement | LYON |
24246 | 69003 | 69383 | Lyon 3e Arrondissement | LYON |
24247 | 69004 | 69384 | Lyon 4e Arrondissement | LYON |
24248 | 69005 | 69385 | Lyon 5e Arrondissement | LYON |
24249 | 69006 | 69386 | Lyon 6e Arrondissement | LYON |
24250 | 69007 | 69387 | Lyon 7e Arrondissement | LYON |
24251 | 69008 | 69388 | Lyon 8e Arrondissement | LYON |
24252 | 69009 | 69389 | Lyon 9e Arrondissement | LYON |
26373 | 76220 | 76067 | Beauvoir-en-Lyons | BEAUVOIR-EN-LYONS |
33410 | 3110 | 03080 | Cognat-Lyonne | COGNAT-LYONNE |
wiki_data.head()
Commune | Department | Region | Population, 2013 | Population, 2017 | Rank | |
---|---|---|---|---|---|---|
0 | PARIS | Paris | Île-de-France | 2420069 | 2187526 | 1 |
1 | MARSEILLE | Bouches-du-Rhône | Provence-Alpes-Côte d'Azur | 855393 | 863310 | 2 |
2 | LYON | Lyon Metropolis | Auvergne-Rhône-Alpes | 500715 | 516092 | 3 |
3 | TOULOUSE | Haute-Garonne | Occitanie | 458298 | 479553 | 4 |
4 | NICE | Alpes-Maritimes | Provence-Alpes-Côte d'Azur | 342295 | 340017 | 5 |
france_raw.head()
codePostal | codeCommune | nomCommune | libelleAcheminement | |
---|---|---|---|---|
0 | 10200 | 10002 | Ailleville | AILLEVILLE |
1 | 10160 | 10003 | Aix-Villemaur-Pâlis | AIX-VILLEMAUR-PALIS |
2 | 10190 | 10003 | Aix-Villemaur-Pâlis | AIX-VILLEMAUR-PALIS |
3 | 10700 | 10004 | Allibaudières | ALLIBAUDIERES |
4 | 10140 | 10005 | Amance | AMANCE |
We perform an inner join of the two tables in order to have a classification of the towns with the Cities data.
combined_data = france_raw.join(wiki_data.set_index('Commune'), on='libelleAcheminement', how='inner')
combined_data.head()
codePostal | codeCommune | nomCommune | libelleAcheminement | Department | Region | Population, 2013 | Population, 2017 | Rank | |
---|---|---|---|---|---|---|---|---|---|
376 | 10000 | 10387 | Troyes | TROYES | Aube | Grand Est | 59671 | 61652 | 90 |
499 | 11000 | 11069 | Carcassonne | CARCASSONNE | Aude | Occitanie | 46724 | 46031 | 148 |
690 | 11100 | 11262 | Narbonne | NARBONNE | Aude | Occitanie | 52082 | 54700 | 109 |
858 | 11150 | 11434 | Villepinte | VILLEPINTE | Seine-Saint-Denis | Île-de-France | 35329 | 36830 | 207 |
31738 | 93420 | 93078 | Villepinte | VILLEPINTE | Seine-Saint-Denis | Île-de-France | 35329 | 36830 | 207 |
Sort by rank
combined_data.sort_values("Rank")
codePostal | codeCommune | nomCommune | libelleAcheminement | Department | Region | Population, 2013 | Population, 2017 | Rank | |
---|---|---|---|---|---|---|---|---|---|
26300 | 75009 | 75109 | Paris 9e Arrondissement | PARIS | Paris | Île-de-France | 2420069 | 2187526 | 1 |
26301 | 75010 | 75110 | Paris 10e Arrondissement | PARIS | Paris | Île-de-France | 2420069 | 2187526 | 1 |
26299 | 75008 | 75108 | Paris 8e Arrondissement | PARIS | Paris | Île-de-France | 2420069 | 2187526 | 1 |
26296 | 75005 | 75105 | Paris 5e Arrondissement | PARIS | Paris | Île-de-France | 2420069 | 2187526 | 1 |
26297 | 75006 | 75106 | Paris 6e Arrondissement | PARIS | Paris | Île-de-France | 2420069 | 2187526 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
31681 | 92240 | 92046 | Malakoff | MALAKOFF | Hauts-de-Seine | Île-de-France | 30304 | 30720 | 266 |
7126 | 28410 | 28185 | Goussainville | GOUSSAINVILLE | Val-d'Oise | Île-de-France | 31212 | 30637 | 267 |
31870 | 95190 | 95280 | Goussainville | GOUSSAINVILLE | Val-d'Oise | Île-de-France | 31212 | 30637 | 267 |
27599 | 78400 | 78146 | Chatou | CHATOU | Yvelines | Île-de-France | 30809 | 30253 | 270 |
27600 | 78110 | 78146 | Chatou | CHATOU | Yvelines | Île-de-France | 30809 | 30253 | 270 |
344 rows × 9 columns
Keep only interessed columns
combined_data = combined_data[['codePostal','nomCommune','libelleAcheminement','Population, 2017','Rank']]
In order to use the api foursquare, we need the geocoding of cities and neighborhoods. We will use the geopy library to geocode our variables.
Let's make a test with Paris 9E Arrondissement
address = 'Paris 9e Arrondissement'
geolocator = Nominatim(user_agent="test")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of Paris 9e are {}, {}.'.format(latitude, longitude))
The coordinates of Paris 9e are 48.876019, 2.339962.
Working !
Let's apply our geocoding to our full dataset
# 1 - create function to delay between geocoding calls
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
# 2- - create location column
combined_data['location'] = combined_data['nomCommune'].apply(geocode)
# 3 - create longitude, laatitude and altitude from location column (returns tuple)
combined_data['point'] = combined_data['location'].apply(lambda loc: tuple(loc.point) if loc else None)
# 4 - split point column into latitude, longitude and altitude columns
combined_data[['latitude', 'longitude', 'altitude']] = pd.DataFrame(combined_data['point'].tolist(), index=combined_data.index)
RateLimiter caught an error, retrying (0/2 tries). Called with (*('Marseille 3e Arrondissement',), **{}). Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connection.py", line 160, in _new_conn (self._dns_host, self.port), self.timeout, **extra_kw File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\util\connection.py", line 84, in create_connection raise err File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\util\connection.py", line 74, in create_connection sock.connect(sa) socket.timeout: timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 677, in urlopen chunked=chunked, File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 381, in _make_request self._validate_conn(conn) File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 978, in _validate_conn conn.connect() File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connection.py", line 309, in connect conn = self._new_conn() File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connection.py", line 167, in _new_conn % (self.host, self.timeout), urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x0000020EA0F5FA08>, 'Connection to nominatim.openstreetmap.org timed out. (connect timeout=1)') During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py", line 449, in send timeout=timeout File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 767, in urlopen **response_kw File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 767, in urlopen **response_kw File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 727, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\util\retry.py", line 446, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): Max retries exceeded with url: /search?q=Marseille+3e+Arrondissement&format=json&limit=1 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000020EA0F5FA08>, 'Connection to nominatim.openstreetmap.org timed out. (connect timeout=1)')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\geopy\adapters.py", line 383, in _request resp = self.session.get(url, timeout=timeout, headers=headers) File "C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py", line 543, in get return self.request('GET', url, **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py", line 530, in request resp = self.send(prep, **send_kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py", line 643, in send r = adapter.send(request, **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py", line 504, in send raise ConnectTimeout(e, request=request) requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): Max retries exceeded with url: /search?q=Marseille+3e+Arrondissement&format=json&limit=1 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000020EA0F5FA08>, 'Connection to nominatim.openstreetmap.org timed out. (connect timeout=1)')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\geopy\extra\rate_limiter.py", line 136, in _retries_gen yield i # Run the function. File "C:\ProgramData\Anaconda3\lib\site-packages\geopy\extra\rate_limiter.py", line 274, in __call__ res = self.func(*args, **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\geopy\geocoders\nominatim.py", line 294, in geocode return self._call_geocoder(url, callback, timeout=timeout) File "C:\ProgramData\Anaconda3\lib\site-packages\geopy\geocoders\base.py", line 360, in _call_geocoder result = self.adapter.get_json(url, timeout=timeout, headers=req_headers) File "C:\ProgramData\Anaconda3\lib\site-packages\geopy\adapters.py", line 373, in get_json resp = self._request(url, timeout=timeout, headers=headers) File "C:\ProgramData\Anaconda3\lib\site-packages\geopy\adapters.py", line 395, in _request raise GeocoderUnavailable(message) geopy.exc.GeocoderUnavailable: HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): Max retries exceeded with url: /search?q=Marseille+3e+Arrondissement&format=json&limit=1 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000020EA0F5FA08>, 'Connection to nominatim.openstreetmap.org timed out. (connect timeout=1)'))
combined_data=combined_data.sort_values("Rank")
combined_data
codePostal | nomCommune | libelleAcheminement | Population, 2017 | Rank | location | point | latitude | longitude | altitude | |
---|---|---|---|---|---|---|---|---|---|---|
26300 | 75009 | Paris 9e Arrondissement | PARIS | 2187526 | 1 | (Paris 9e Arrondissement, Paris, Île-de-France... | (48.876019, 2.339962, 0.0) | 48.876019 | 2.339962 | 0.0 |
26301 | 75010 | Paris 10e Arrondissement | PARIS | 2187526 | 1 | (Paris 10e Arrondissement, Paris, Île-de-Franc... | (48.876106, 2.35991, 0.0) | 48.876106 | 2.359910 | 0.0 |
26299 | 75008 | Paris 8e Arrondissement | PARIS | 2187526 | 1 | (Paris 8e Arrondissement, Paris, Île-de-France... | (48.8774799, 2.31765, 0.0) | 48.877480 | 2.317650 | 0.0 |
26296 | 75005 | Paris 5e Arrondissement | PARIS | 2187526 | 1 | (Paris 5e Arrondissement, Paris, Île-de-France... | (48.8460591, 2.3445228, 0.0) | 48.846059 | 2.344523 | 0.0 |
26297 | 75006 | Paris 6e Arrondissement | PARIS | 2187526 | 1 | (Paris 6e Arrondissement, Paris, Île-de-France... | (48.8504333, 2.3329507, 0.0) | 48.850433 | 2.332951 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
31681 | 92240 | Malakoff | MALAKOFF | 30720 | 266 | (Malakoff, Antony, Hauts-de-Seine, Île-de-Fran... | (48.8211559, 2.3019814, 0.0) | 48.821156 | 2.301981 | 0.0 |
7126 | 28410 | Goussainville | GOUSSAINVILLE | 30637 | 267 | (Goussainville, Sarcelles, Val-d'Oise, Île-de-... | (49.0323168, 2.4733628, 0.0) | 49.032317 | 2.473363 | 0.0 |
31870 | 95190 | Goussainville | GOUSSAINVILLE | 30637 | 267 | (Goussainville, Sarcelles, Val-d'Oise, Île-de-... | (49.0323168, 2.4733628, 0.0) | 49.032317 | 2.473363 | 0.0 |
27599 | 78400 | Chatou | CHATOU | 30253 | 270 | (Chatou, Saint-Germain-en-Laye, Yvelines, Île-... | (48.8897044, 2.1573695, 0.0) | 48.889704 | 2.157370 | 0.0 |
27600 | 78110 | Chatou | CHATOU | 30253 | 270 | (Chatou, Saint-Germain-en-Laye, Yvelines, Île-... | (48.8897044, 2.1573695, 0.0) | 48.889704 | 2.157370 | 0.0 |
344 rows × 10 columns
Drop Na with combined_data
combined_data=combined_data.dropna()
# Creating the map of Toronto
map_france = folium.Map(location=[48.866667, 2.333333], zoom_start=1)
# adding markers to map
for latitude, longitude, nomCommune, Rank in zip(combined_data['latitude'], combined_data['longitude'], combined_data['nomCommune'], combined_data['Rank']):
label = '{}, {},{},{}'.format(latitude, longitude, nomCommune, Rank)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[latitude, longitude],
radius=5,
popup=label,
color='red',
fill=True
).add_to(map_france)
map_france
We have set our Foursquare API and removed it (privacy)
LIMIT=100
def getNearbyVenues(names, latitudes, longitudes, radius=500):
venues_list=[]
for name, lat, lng in zip(names, latitudes, longitudes):
print(name)
# create the API request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lng,
radius,
LIMIT
)
# make the GET request
results = requests.get(url).json()["response"]['groups'][0]['items']
# return only relevant information for each nearby venue
venues_list.append([(
name,
lat,
lng,
v['venue']['name'],
v['venue']['categories'][0]['name']) for v in results])
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['Neighbourhood',
'Neighbourhood Latitude',
'Neighbourhood Longitude',
'Venue',
'Venue Category']
return(nearby_venues)
venues_in_france = getNearbyVenues(combined_data['nomCommune'], combined_data['latitude'], combined_data['longitude'])
Paris 9e Arrondissement Paris 10e Arrondissement Paris 8e Arrondissement Paris 5e Arrondissement Paris 6e Arrondissement Paris 4e Arrondissement Paris 2e Arrondissement Paris 1er Arrondissement Paris 11e Arrondissement Paris 12e Arrondissement Paris 3e Arrondissement Paris 14e Arrondissement Paris 15e Arrondissement Paris 17e Arrondissement Paris 18e Arrondissement Paris 19e Arrondissement Paris 20e Arrondissement Paris 13e Arrondissement Issy-les-Moulineaux Paris 7e Arrondissement Marseille 12e Arrondissement Marseille 12e Arrondissement Marseille 13e Arrondissement Marseille 13e Arrondissement Marseille 15e Arrondissement Marseille 15e Arrondissement Marseille 16e Arrondissement Marseille 12e Arrondissement Marseille 14e Arrondissement Marseille 11e Arrondissement Marseille 14e Arrondissement Marseille 11e Arrondissement Marseille 10e Arrondissement Marseille 10e Arrondissement Marseille 9e Arrondissement Marseille 8e Arrondissement Marseille 7e Arrondissement Marseille 5e Arrondissement Marseille 4e Arrondissement Marseille 4e Arrondissement Marseille 3e Arrondissement Marseille 2e Arrondissement Marseille 1er Arrondissement Marseille 11e Arrondissement Marseille 6e Arrondissement Lyon 2e Arrondissement Lyon 1er Arrondissement Lyon 9e Arrondissement Lyon 8e Arrondissement Lyon 7e Arrondissement Lyon 6e Arrondissement Lyon 5e Arrondissement Lyon 4e Arrondissement Lyon 3e Arrondissement Toulouse Toulouse Toulouse Toulouse Toulouse Toulouse Toulouse Nice Nice Nice Nice Nantes Nantes Nantes Nantes Montpellier Montpellier Montpellier Montpellier Montpellier Strasbourg Strasbourg Strasbourg Bordeaux Bordeaux Bordeaux Bordeaux Bordeaux Bordeaux Lille Lille Lille Rennes Rennes Rennes Reims Toulon Toulon Toulon Toulon Le Havre Le Havre Le Havre Grenoble Grenoble Grenoble Dijon Angers Angers Saint-Denis Villeurbanne Le Mans Le Mans Brest Brest Brest Tours Tours Tours Amiens Amiens Amiens Limoges Limoges Limoges Annecy Annecy Annecy Annecy Annecy Perpignan Perpignan Metz Metz Metz Metz Saint-Denis Argenteuil Rouen Rouen Rouen Montreuil Montreuil Montreuil Montreuil Mulhouse Mulhouse Caen Saint-Paul Saint-Paul Saint-Paul Saint-Paul Nancy Nancy Tourcoing Roubaix Nanterre Avignon Poitiers Aubervilliers Versailles Colombes Saint-Pierre Saint-Pierre Saint-Pierre Saint-Pierre Saint-Pierre Courbevoie Cherbourg-en-Cotentin Cherbourg-en-Cotentin Cherbourg-en-Cotentin Cherbourg-en-Cotentin Cherbourg-en-Cotentin Cherbourg-en-Cotentin Saint-Pierre Le Tampon Pau Pau La Rochelle La Rochelle Calais Cannes Antibes Mamoudzou Drancy Ajaccio Ajaccio Ajaccio Saint-Nazaire Saint-Nazaire Colmar Cergy Cergy Cergy Bourges Pessac Pessac Valence Valence Quimper Antony Troyes Cayenne Clichy Montauban Niort Sarcelles Pantin Lorient Beauvais Meaux Chelles Chelles Villejuif Narbonne Cholet Bobigny Les Abymes Saint-Louis Saint-Louis Bondy Vannes Clamart Clamart Arles Arles Sartrouville Bayonne Saint-Ouen-sur-Seine Sevran Massy Massy Grasse Montrouge Vincennes Laval Laval Albi Sérénac Lescure-d'Albigeois Suresnes Martigues Belfort Gennevilliers Aubagne Aubagne Saint-Priest Saint-Malo Blois Carcassonne Bastia Bastia Salon-de-Provence Meudon Saint-Germain-en-Laye Saint-Germain-en-Laye Saint-Germain-en-Laye Saint-Germain-en-Laye Saint-Germain-en-Laye Saint-Germain-en-Laye Puteaux Saint-Brieuc Alfortville Valenciennes Istres La Courneuve Talence Boissezon Castres Castres Bron Bourg-en-Bresse Tarbes Le Cannet Arras Wattrelos Bagneux Bagneux Bagneux Bagneux Bagneux Bagneux Gap Thionville Melun Le Lamentin Douai Gagny Draguignan Colomiers Anglet Stains Chartres Saint-Joseph Saint-Joseph Poissy Villepinte Villepinte Franconville Tremblay-en-France Annemasse Bagnolet Creil Palaiseau La Ciotat Saint-Chamond Auxerre Haguenau Haguenau Haguenau Haguenau Haguenau Haguenau Haguenau Haguenau Haguenau Roanne Le Port Le Port Sainte-Marie Sainte-Marie Sainte-Marie Sainte-Marie Sainte-Marie Agen Meyzieu Vitrolles Vitrolles La Possession Saint-Paul Nevers Marignane Trappes Cambrai Koungou Houilles Matoury Schiltigheim Plaisir Lens Cachan Dreux Pontoise Pontoise Pontoise Malakoff Goussainville Goussainville Chatou Chatou
venues_in_france.shape
(9728, 5)
So we have 9728 records and 5 columns. Checking sample data
venues_in_france.groupby('Neighbourhood').head()
Neighbourhood | Neighbourhood Latitude | Neighbourhood Longitude | Venue | Venue Category | |
---|---|---|---|---|---|
0 | Paris 9e Arrondissement | 48.876019 | 2.339962 | Caillebotte | French Restaurant |
1 | Paris 9e Arrondissement | 48.876019 | 2.339962 | Le Bouclier de Bacchus | Wine Bar |
2 | Paris 9e Arrondissement | 48.876019 | 2.339962 | So Nat | Vegetarian / Vegan Restaurant |
3 | Paris 9e Arrondissement | 48.876019 | 2.339962 | Farine & O | Bakery |
4 | Paris 9e Arrondissement | 48.876019 | 2.339962 | Juste | Seafood Restaurant |
... | ... | ... | ... | ... | ... |
9690 | Chatou | 48.889704 | 2.157370 | Île des Impressionnistes | Island |
9691 | Chatou | 48.889704 | 2.157370 | Au Bureau | Pub |
9692 | Chatou | 48.889704 | 2.157370 | Les rives de la Courtille | French Restaurant |
9693 | Chatou | 48.889704 | 2.157370 | Monoprix | Supermarket |
9694 | Chatou | 48.889704 | 2.157370 | Planet Sushi | Japanese Restaurant |
942 rows × 5 columns
venues_in_france.groupby('Venue Category').max()
Neighbourhood | Neighbourhood Latitude | Neighbourhood Longitude | Venue | |
---|---|---|---|---|
Venue Category | ||||
ATM | Saint-Louis | 38.626804 | -90.199410 | U.S. Bank ATM |
Afghan Restaurant | Paris 17e Arrondissement | 48.884224 | 2.379703 | Buzkashi |
African Restaurant | Schiltigheim | 48.889343 | 7.748449 | Waly Fay |
Airport Terminal | Tremblay-en-France | 48.980204 | 2.558956 | Ladies Room |
Alsatian Restaurant | Strasbourg | 48.584614 | 7.750713 | Wistub de la Petite Venise |
... | ... | ... | ... | ... |
Wine Bar | Vannes | 50.636565 | 7.750713 | Ze Bar |
Wine Shop | Reims | 49.257789 | 7.013442 | Veuve Clicquot |
Wings Joint | Suresnes | 48.871099 | 2.228400 | JFC |
Women's Store | Toulon | 48.867684 | 5.930492 | Mango |
Yoga Studio | Paris 9e Arrondissement | 48.876019 | 2.339962 | Bikram Yoga |
323 rows × 4 columns
We have 323 Category Venue
france_venue_cat = pd.get_dummies(venues_in_france[['Venue Category']], prefix="", prefix_sep="")
france_venue_cat
ATM | Afghan Restaurant | African Restaurant | Airport Terminal | Alsatian Restaurant | American Restaurant | Antique Shop | Aquarium | Argentinian Restaurant | Art Gallery | ... | Venezuelan Restaurant | Video Game Store | Vietnamese Restaurant | Warehouse Store | Water Park | Wine Bar | Wine Shop | Wings Joint | Women's Store | Yoga Studio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9723 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9724 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
9725 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9726 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9727 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9728 rows × 323 columns
Adding the neighbourhood to the encoded dataframe
france_venue_cat['Neighbourhood'] = venues_in_france['Neighbourhood']
# moving neighborhood column to the first column
fixed_columns = [france_venue_cat.columns[-1]] + list(france_venue_cat.columns[:-1])
france_venue_cat = france_venue_cat[fixed_columns]
# Grouping and calculating the mean
france_grouped = france_venue_cat.groupby('Neighbourhood').mean().reset_index()
france_grouped.head()
Neighbourhood | ATM | Afghan Restaurant | African Restaurant | Airport Terminal | Alsatian Restaurant | American Restaurant | Antique Shop | Aquarium | Argentinian Restaurant | ... | Venezuelan Restaurant | Video Game Store | Vietnamese Restaurant | Warehouse Store | Water Park | Wine Bar | Wine Shop | Wings Joint | Women's Store | Yoga Studio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Agen | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | Ajaccio | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | Albi | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | Alfortville | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | Amiens | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 324 columns
Let's make a function to get the top most common venue categories
def return_most_common_venues(row, num_top_venues):
row_categories = row.iloc[1:]
row_categories_sorted = row_categories.sort_values(ascending=False)
return row_categories_sorted.index.values[0:num_top_venues]
There are way too many venue categories, we can take the top 10 to cluster the neighbourhoods
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
try:
columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
except:
columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = france_grouped['Neighbourhood']
for ind in np.arange(france_grouped.shape[0]):
neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(france_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted.head()
Neighbourhood | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Agen | Bar | Cosmetics Shop | Diner | Department Store | Mobile Phone Shop | Plaza | Bookstore | Multiplex | Hotel | French Restaurant |
1 | Ajaccio | French Restaurant | Restaurant | Chinese Restaurant | Steakhouse | Supermarket | Bistro | Grocery Store | Harbor / Marina | Boat or Ferry | Plaza |
2 | Albi | Restaurant | French Restaurant | Historic Site | Tea Room | Multiplex | Pub | Farmers Market | Garden | Bar | Performing Arts Venue |
3 | Alfortville | Supermarket | Bus Stop | Convenience Store | Music Venue | Pool | Bakery | Plaza | Flea Market | Park | Outdoor Sculpture |
4 | Amiens | Bar | Hotel | Plaza | Restaurant | Italian Restaurant | Supermarket | Fast Food Restaurant | Clothing Store | Japanese Restaurant | Department Store |
Let's make the model to cluster our Neighbourhoods
# set number of clusters
k_num_clusters = 5
france_grouped_clustering = france_grouped.drop('Neighbourhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=k_num_clusters, random_state=0).fit(france_grouped_clustering)
kmeans
KMeans(n_clusters=5, random_state=0)
Checking the labelling of our model
kmeans.labels_[0:100]
array([1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 1, 1, 1, 1, 2, 2, 2, 1, 0, 2, 1, 1, 2, 1, 1, 1, 4, 2, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2])
Let's add the clustering Label column to the top 10 common venue categories
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
venues_in_france.groupby('Venue Category').max()
Join france_grouped with combined_data on neighbourhood to add latitude & longitude for each neighborhood to prepare it for plotting
france_merged = combined_data
france_merged = france_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='nomCommune')
france_merged.head()
codePostal | nomCommune | libelleAcheminement | Population, 2017 | Rank | location | point | latitude | longitude | altitude | ... | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
26300 | 75009 | Paris 9e Arrondissement | PARIS | 2187526 | 1 | (Paris 9e Arrondissement, Paris, Île-de-France... | (48.876019, 2.339962, 0.0) | 48.876019 | 2.339962 | 0.0 | ... | French Restaurant | Hotel | Bar | Burger Joint | Restaurant | Wine Bar | Tea Room | Vegetarian / Vegan Restaurant | Bakery | Bistro |
26301 | 75010 | Paris 10e Arrondissement | PARIS | 2187526 | 1 | (Paris 10e Arrondissement, Paris, Île-de-Franc... | (48.876106, 2.35991, 0.0) | 48.876106 | 2.359910 | 0.0 | ... | French Restaurant | Hotel | Coffee Shop | Bar | Café | Bistro | Restaurant | Pizza Place | Indian Restaurant | Breakfast Spot |
26299 | 75008 | Paris 8e Arrondissement | PARIS | 2187526 | 1 | (Paris 8e Arrondissement, Paris, Île-de-France... | (48.8774799, 2.31765, 0.0) | 48.877480 | 2.317650 | 0.0 | ... | Hotel | French Restaurant | Bistro | Pub | Pizza Place | Restaurant | Sandwich Place | Thai Restaurant | Sushi Restaurant | Bakery |
26296 | 75005 | Paris 5e Arrondissement | PARIS | 2187526 | 1 | (Paris 5e Arrondissement, Paris, Île-de-France... | (48.8460591, 2.3445228, 0.0) | 48.846059 | 2.344523 | 0.0 | ... | French Restaurant | Hotel | Bar | Italian Restaurant | Indie Movie Theater | Pub | Café | Bakery | Ice Cream Shop | Plaza |
26297 | 75006 | Paris 6e Arrondissement | PARIS | 2187526 | 1 | (Paris 6e Arrondissement, Paris, Île-de-France... | (48.8504333, 2.3329507, 0.0) | 48.850433 | 2.332951 | 0.0 | ... | French Restaurant | Italian Restaurant | Plaza | Café | Wine Bar | Chocolate Shop | Bistro | Ice Cream Shop | Seafood Restaurant | Fountain |
5 rows × 21 columns
france_merged_nonan = france_merged.dropna(subset=['Cluster Labels'])
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(france_merged_nonan['latitude'], france_merged_nonan['longitude'], france_merged_nonan['nomCommune'], france_merged_nonan['Cluster Labels']):
label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
folium.CircleMarker(
[lat, lon],
radius=5,
popup=label,
color=rainbow[int(cluster-1)],
fill=True,
fill_color=rainbow[int(cluster-1)]
).add_to(map_clusters)
map_clusters
Cluster1=france_merged[france_merged['Cluster Labels'] == 1]
Select top ten for Cluster 1
Cluster1[Cluster1["Rank"]<= 10]
codePostal | nomCommune | libelleAcheminement | Population, 2017 | Rank | location | point | latitude | longitude | altitude | ... | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
26300 | 75009 | Paris 9e Arrondissement | PARIS | 2187526 | 1 | (Paris 9e Arrondissement, Paris, Île-de-France... | (48.876019, 2.339962, 0.0) | 48.876019 | 2.339962 | 0.0 | ... | French Restaurant | Hotel | Bar | Burger Joint | Restaurant | Wine Bar | Tea Room | Vegetarian / Vegan Restaurant | Bakery | Bistro |
26301 | 75010 | Paris 10e Arrondissement | PARIS | 2187526 | 1 | (Paris 10e Arrondissement, Paris, Île-de-Franc... | (48.876106, 2.35991, 0.0) | 48.876106 | 2.359910 | 0.0 | ... | French Restaurant | Hotel | Coffee Shop | Bar | Café | Bistro | Restaurant | Pizza Place | Indian Restaurant | Breakfast Spot |
26299 | 75008 | Paris 8e Arrondissement | PARIS | 2187526 | 1 | (Paris 8e Arrondissement, Paris, Île-de-France... | (48.8774799, 2.31765, 0.0) | 48.877480 | 2.317650 | 0.0 | ... | Hotel | French Restaurant | Bistro | Pub | Pizza Place | Restaurant | Sandwich Place | Thai Restaurant | Sushi Restaurant | Bakery |
26296 | 75005 | Paris 5e Arrondissement | PARIS | 2187526 | 1 | (Paris 5e Arrondissement, Paris, Île-de-France... | (48.8460591, 2.3445228, 0.0) | 48.846059 | 2.344523 | 0.0 | ... | French Restaurant | Hotel | Bar | Italian Restaurant | Indie Movie Theater | Pub | Café | Bakery | Ice Cream Shop | Plaza |
26297 | 75006 | Paris 6e Arrondissement | PARIS | 2187526 | 1 | (Paris 6e Arrondissement, Paris, Île-de-France... | (48.8504333, 2.3329507, 0.0) | 48.850433 | 2.332951 | 0.0 | ... | French Restaurant | Italian Restaurant | Plaza | Café | Wine Bar | Chocolate Shop | Bistro | Ice Cream Shop | Seafood Restaurant | Fountain |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9089 | 33300 | Bordeaux | BORDEAUX | 254436 | 9 | (Bordeaux, Gironde, Nouvelle-Aquitaine, France... | (44.841225, -0.5800364, 0.0) | 44.841225 | -0.580036 | 0.0 | ... | Plaza | French Restaurant | Coffee Shop | Hotel | Pedestrian Plaza | Shopping Mall | Multiplex | Electronics Store | Bistro | Bakery |
9088 | 33000 | Bordeaux | BORDEAUX | 254436 | 9 | (Bordeaux, Gironde, Nouvelle-Aquitaine, France... | (44.841225, -0.5800364, 0.0) | 44.841225 | -0.580036 | 0.0 | ... | Plaza | French Restaurant | Coffee Shop | Hotel | Pedestrian Plaza | Shopping Mall | Multiplex | Electronics Store | Bistro | Bakery |
19062 | 59777 | Lille | LILLE | 232787 | 10 | (Lille, Nord, Hauts-de-France, France métropol... | (50.6365654, 3.0635282, 0.0) | 50.636565 | 3.063528 | 0.0 | ... | French Restaurant | Bar | Plaza | Bakery | Burger Joint | Cocktail Bar | Café | Japanese Restaurant | Coffee Shop | Hotel |
19060 | 59000 | Lille | LILLE | 232787 | 10 | (Lille, Nord, Hauts-de-France, France métropol... | (50.6365654, 3.0635282, 0.0) | 50.636565 | 3.063528 | 0.0 | ... | French Restaurant | Bar | Plaza | Bakery | Burger Joint | Cocktail Bar | Café | Japanese Restaurant | Coffee Shop | Hotel |
19059 | 59800 | Lille | LILLE | 232787 | 10 | (Lille, Nord, Hauts-de-France, France métropol... | (50.6365654, 3.0635282, 0.0) | 50.636565 | 3.063528 | 0.0 | ... | French Restaurant | Bar | Plaza | Bakery | Burger Joint | Cocktail Bar | Café | Japanese Restaurant | Coffee Shop | Hotel |
64 rows × 21 columns
Cluster4=france_merged[france_merged['Cluster Labels'] == 4]
Cluster4[Cluster4["Rank"]<= 10]
codePostal | nomCommune | libelleAcheminement | Population, 2017 | Rank | location | point | latitude | longitude | altitude | ... | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue |
---|
0 rows × 21 columns
Cluster 1 Analysis : in general, the 1st commun venue is french restaurant.
The french city top 10 bets on its comparative advantage in terms of food.
One can analyse that this cluster 1 emphasizes the touristi dimension of big City in France : Hotel, Bar, Cafe, Plaza then some Museum.
To this touristic dimension, we can add an important commercial dimension for the top cities of France. There are a lot of shopping centers such as clothing shops or food stores etc....
Cluster 4 : is more about daily life in top city with Park, Pedestrian Plaza, Metro Station, Bakery ...
In addition, we can notice a very important multiculral in the neighborhoods of these cities with different Indian, Italian, greek etc... restaurants.
The food remains very important for these cities and therefore, we can make the hypothesis for the daily life of the French.
The modes of transport differ from city to city: tram for Lyon, Metro for Paris.
The aim of this project was to establish comprehensive comparisons between cities in France using the Kmeans technique. In doing so, we can study the attractiveness of these cities, which is also their specificity.
Using a complete database, we have assigned the clustering to all cities in France. This allows us to have, for the future, access to a complete comparison of France. For the sake of this task, we have reduced our anayse to the Top 10 of France by cross-referencing wikipedia data.
We can first observe that the quartiers of the top cities in France are similar: restaurants, bakery, bar, museum etc... This shows the specialization of France in the tourism sector. Each city then has its own cultural specificity. Some cities have an important divisersity, which is linked with their immigrant dimension.
In this period of Covid 19, thanks to this study, we can be worried about the economy of these cities, which relies heavily on trade and tourist places and ring the alarm with the competent authorities.
How to replace the economic contributions of these flows when the borders are closed, the trade forbidden to open?
Are the cities and their neighbourhood going to change as a result of this crisis? Will we have the same clustering in 2 years ? This is an interesting question that remains open at the end of this project.