Skip to content

Commit 8159c65

Browse files
committed
DOC fix typo in output shape of fetch_lfw_pairs (and minor additions)
1 parent 5ff34b2 commit 8159c65

File tree

1 file changed

+31
-21
lines changed

1 file changed

+31
-21
lines changed

sklearn/datasets/lfw.py

Lines changed: 31 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
import numpy as np
3131

3232
try:
33-
import urllib.request as urllib #for backwards compatibility
33+
import urllib.request as urllib # for backwards compatibility
3434
except ImportError:
3535
import urllib
3636

@@ -231,33 +231,36 @@ def fetch_lfw_people(data_home=None, funneled=True, resize=0.5,
231231
picture of a face, find the name of the person given a training set
232232
(gallery).
233233
234+
The original images are 250 x 250 pixels, but the default slice and resize
235+
arguments reduce them to 62 x 74.
236+
234237
Parameters
235238
----------
236-
data_home: optional, default: None
239+
data_home : optional, default: None
237240
Specify another download and cache folder for the datasets. By default
238241
all scikit learn data is stored in '~/scikit_learn_data' subfolders.
239242
240-
funneled: boolean, optional, default: True
243+
funneled : boolean, optional, default: True
241244
Download and use the funneled variant of the dataset.
242245
243-
resize: float, optional, default 0.5
246+
resize : float, optional, default 0.5
244247
Ratio used to resize the each face picture.
245248
246-
min_faces_per_person: int, optional, default None
249+
min_faces_per_person : int, optional, default None
247250
The extracted dataset will only retain pictures of people that have at
248251
least `min_faces_per_person` different pictures.
249252
250-
color: boolean, optional, default False
253+
color : boolean, optional, default False
251254
Keep the 3 RGB channels instead of averaging them to a single
252255
gray level channel. If color is True the shape of the data has
253256
one more dimension than than the shape with color = False.
254257
255-
slice_: optional
258+
slice_ : optional
256259
Provide a custom 2D slice (height, width) to extract the
257260
'interesting' part of the jpeg files and avoid use statistical
258261
correlation from the background
259262
260-
download_if_missing: optional, True by default
263+
download_if_missing : optional, True by default
261264
If False, raise a IOError if the data is not locally available
262265
instead of trying to download the data from the source site.
263266
@@ -267,11 +270,13 @@ def fetch_lfw_people(data_home=None, funneled=True, resize=0.5,
267270
268271
dataset.data : numpy array of shape (13233, 2914)
269272
Each row corresponds to a ravelled face image of original size 62 x 47
270-
pixels.
273+
pixels. Changing the ``slice_`` or resize parameters will change the shape
274+
of the output.
271275
272276
dataset.images : numpy array of shape (13233, 62, 47)
273277
Each row is a face image corresponding to one of the 5749 people in
274-
the dataset.
278+
the dataset. Changing the ``slice_`` or resize parameters will change the shape
279+
of the output.
275280
276281
dataset.target : numpy array of shape (13233,)
277282
Labels associated to each face image. Those labels range from 0-5748
@@ -389,36 +394,39 @@ def fetch_lfw_pairs(subset='train', data_home=None, funneled=True, resize=0.5,
389394
390395
.. _`README.txt`: http://vis-www.cs.umass.edu/lfw/README.txt
391396
397+
The original images are 250 x 250 pixels, but the default slice and resize
398+
arguments reduce them to 62 x 74.
399+
392400
Parameters
393401
----------
394-
subset: optional, default: 'train'
402+
subset : optional, default: 'train'
395403
Select the dataset to load: 'train' for the development training
396404
set, 'test' for the development test set, and '10_folds' for the
397405
official evaluation set that is meant to be used with a 10-folds
398406
cross validation.
399407
400-
data_home: optional, default: None
408+
data_home : optional, default: None
401409
Specify another download and cache folder for the datasets. By
402410
default all scikit learn data is stored in '~/scikit_learn_data'
403411
subfolders.
404412
405-
funneled: boolean, optional, default: True
413+
funneled : boolean, optional, default: True
406414
Download and use the funneled variant of the dataset.
407415
408-
resize: float, optional, default 0.5
416+
resize : float, optional, default 0.5
409417
Ratio used to resize the each face picture.
410418
411-
color: boolean, optional, default False
419+
color : boolean, optional, default False
412420
Keep the 3 RGB channels instead of averaging them to a single
413421
gray level channel. If color is True the shape of the data has
414422
one more dimension than than the shape with color = False.
415423
416-
slice_: optional
424+
slice_ : optional
417425
Provide a custom 2D slice (height, width) to extract the
418426
'interesting' part of the jpeg files and avoid use statistical
419427
correlation from the background
420428
421-
download_if_missing: optional, True by default
429+
download_if_missing : optional, True by default
422430
If False, raise a IOError if the data is not locally available
423431
instead of trying to download the data from the source site.
424432
@@ -427,12 +435,14 @@ def fetch_lfw_pairs(subset='train', data_home=None, funneled=True, resize=0.5,
427435
The data is returned as a Bunch object with the following attributes:
428436
429437
data : numpy array of shape (2200, 5828)
430-
Each row corresponds to 2 ravel'd face images of original size 62 x 67
431-
pixels.
438+
Each row corresponds to 2 ravel'd face images of original size 62 x 47
439+
pixels. Changing the ``slice_`` or resize parameters will change the shape
440+
of the output.
432441
433-
pairs : numpy array of shape (2200, 2, 62, 67)
442+
pairs : numpy array of shape (2200, 2, 62, 47)
434443
Each row has 2 face images corresponding to same or different person
435-
from the dataset containing 5749 people.
444+
from the dataset containing 5749 people. Changing the ``slice_`` or resize
445+
parameters will change the shape of the output.
436446
437447
target : numpy array of shape (13233,)
438448
Labels associated to each pair of images. The two label values being

0 commit comments

Comments
 (0)