This module contains all the basic functions we need in other modules of the fastai library (split with torch_core that contains the ones requiring pytorch). Its documentation can easily be skipped at a first read, unless you want to know what a given function does.
from fastai.gen_doc.nbdoc import *
from fastai.core import *
default_cpus = min(16, num_cpus())
show_doc(has_arg)
Examples for two fastai.core functions. Docstring shown before calling has_arg for reference
has_arg(download_url,'url')
True
has_arg(index_row,'x')
False
has_arg(index_row,'a')
True
show_doc(ifnone)
param,alt_param = None,5
ifnone(param,alt_param)
5
param,alt_param = None,[1,2,3]
ifnone(param,alt_param)
[1, 2, 3]
show_doc(is1d)
two_d_array = np.arange(12).reshape(6,2)
print( two_d_array )
print( is1d(two_d_array) )
[[ 0 1] [ 2 3] [ 4 5] [ 6 7] [ 8 9] [10 11]] False
is1d(two_d_array.flatten())
True
show_doc(is_listy)
is_listy[source]
is_listy(x:Any) →bool
Check if x is a Collection. Tuple or List qualify
some_data = [1,2,3]
is_listy(some_data)
True
some_data = (1,2,3)
is_listy(some_data)
True
some_data = 1024
print( is_listy(some_data) )
False
print( is_listy( [some_data] ) )
True
some_data = dict([('a',1),('b',2),('c',3)])
print( some_data )
print( some_data.keys() )
{'a': 1, 'b': 2, 'c': 3}
dict_keys(['a', 'b', 'c'])
print( is_listy(some_data) )
print( is_listy(some_data.keys()) )
False False
print( is_listy(list(some_data.keys())) )
True
show_doc(is_tuple)
is_tuple[source]
is_tuple(x:Any) →bool
Check if x is a tuple.
print( is_tuple( [1,2,3] ) )
False
print( is_tuple( (1,2,3) ) )
True
arange_of([5,6,7])
array([0, 1, 2])
type(arange_of([5,6,7]))
numpy.ndarray
show_doc(array)
array[source]
array(a,dtype:type=*None, ***kwargs**) →ndarray
Same as np.array but also handles generators. kwargs are passed to np.array with dtype.
array([1,2,3])
array([1, 2, 3])
Note that after we call the generator, we do not reset. So the array call has 5 less entries than it would if we ran from the start of the generator.
def data_gen():
i = 100.01
while i<200:
yield i
i += 1.
ex_data_gen = data_gen()
for _ in range(5):
print(next(ex_data_gen))
100.01 101.01 102.01 103.01 104.01
array(ex_data_gen)
array([105.01, 106.01, 107.01, 108.01, ..., 196.01, 197.01, 198.01, 199.01])
ex_data_gen_int = data_gen()
array(ex_data_gen_int,dtype=int) #Cast output to int array
array([100, 101, 102, 103, ..., 196, 197, 198, 199])
show_doc(arrays_split)
arrays_split[source]
arrays_split(mask:ndarray, ***arrs**:NPArrayableList) →SplitArrayList
Given arrs is [a,b,...] and maskindex - return[(a[mask],a[~mask]),(b[mask],b[~mask]),...].
data_a = np.arange(15)
data_b = np.arange(15)[::-1]
mask_a = (data_a > 10)
print(data_a)
print(data_b)
print(mask_a)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14] [14 13 12 11 10 9 8 7 6 5 4 3 2 1 0] [False False False False False False False False False False False True True True True]
arrays_split(mask_a,data_a)
[(array([11, 12, 13, 14]),), (array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]),)]
np.vstack([data_a,data_b]).transpose().shape
(15, 2)
arrays_split(mask_a,np.vstack([data_a,data_b]).transpose()) #must match on dimension 0
[(array([[11, 3],
[12, 2],
[13, 1],
[14, 0]]),), (array([[ 0, 14],
[ 1, 13],
[ 2, 12],
[ 3, 11],
[ 4, 10],
[ 5, 9],
[ 6, 8],
[ 7, 7],
[ 8, 6],
[ 9, 5],
[10, 4]]),)]
show_doc(chunks)
You can transform a Collection into an Iterable of 'n' sized chunks by calling chunks:
data = [0,1,2,3,4,5,6,7,8,9]
for chunk in chunks(data, 2):
print(chunk)
[0, 1] [2, 3] [4, 5] [6, 7] [8, 9]
for chunk in chunks(data, 3):
print(chunk)
[0, 1, 2] [3, 4, 5] [6, 7, 8] [9]
show_doc(df_names_to_idx)
df_names_to_idx[source]
df_names_to_idx(names:IntsOrStrs,df:DataFrame)
Return the column indexes of names in df.
ex_df = pd.DataFrame.from_dict({"a":[1,1,1],"b":[2,2,2]})
print(ex_df)
a b 0 1 2 1 1 2 2 1 2
df_names_to_idx('b',ex_df)
[1]
show_doc(extract_kwargs)
extract_kwargs[source]
extract_kwargs(names:StrList,kwargs:KWArgs)
Extract the keys in names from the kwargs.
key_word_args = {"a":2,"some_list":[1,2,3],"param":'mean'}
key_word_args
{'a': 2, 'some_list': [1, 2, 3], 'param': 'mean'}
(extracted_val,remainder) = extract_kwargs(['param'],key_word_args)
print( extracted_val,remainder )
{'param': 'mean'} {'a': 2, 'some_list': [1, 2, 3]}
show_doc(idx_dict)
idx_dict(['a','b','c'])
{'a': 0, 'b': 1, 'c': 2}
show_doc(index_row)
index_row[source]
index_row(a:Union[Collection[T_co],DataFrame,Series],idxs:Collection[int]) →Any
Return the slice of a corresponding to idxs.
data = [0,1,2,3,4,5,6,7,8,9]
index_row(data,4)
4
index_row(pd.Series(data),7)
7
data_df = pd.DataFrame([data[::-1],data]).transpose()
data_df
| 0 | 1 | |
|---|---|---|
| 0 | 9 | 0 |
| 1 | 8 | 1 |
| 2 | 7 | 2 |
| 3 | 6 | 3 |
| 4 | 5 | 4 |
| 5 | 4 | 5 |
| 6 | 3 | 6 |
| 7 | 2 | 7 |
| 8 | 1 | 8 |
| 9 | 0 | 9 |
index_row(data_df,7)
0 2 1 7 Name: 7, dtype: int64
show_doc(listify)
listify[source]
listify(p:OptListOrItem=*None,q:OptListOrItem=None*)
Make p listy and the same length as q.
to_match = np.arange(12)
listify('a',to_match)
['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a']
listify('a',5)
['a', 'a', 'a', 'a', 'a']
listify(77.1,3)
[77.1, 77.1, 77.1]
listify( (1,2,3) )
[1, 2, 3]
listify((1,2,3),('a','b','c'))
[1, 2, 3]
show_doc(random_split)
random_split[source]
random_split(valid_pct:float, ***arrs**:NPArrayableList) →SplitArrayList
Randomly split arrs with valid_pct ratio. good for creating validation set.
Splitting is done here with random.uniform() so you may not get the exact split percentage for small data sets
data = np.arange(20).reshape(10,2)
data.tolist()
[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, 11], [12, 13], [14, 15], [16, 17], [18, 19]]
random_split(0.20,data.tolist())
[(array([[ 0, 1],
[ 2, 3],
[ 6, 7],
[ 8, 9],
[10, 11],
[16, 17],
[18, 19]]),), (array([[ 4, 5],
[12, 13],
[14, 15]]),)]
random_split(0.20,pd.DataFrame(data))
[(array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15],
[16, 17],
[18, 19]]),), (array([], shape=(0, 2), dtype=int64),)]
show_doc(range_of)
range_of([5,4,3])
[0, 1, 2]
range_of(np.arange(10)[::-1])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
show_doc(series2cat)
series2cat[source]
series2cat(df:DataFrame, ***col_names**)
Categorifies the columns col_names in df.
data_df = pd.DataFrame.from_dict({"a":[1,1,1,2,2,2],"b":['f','e','f','g','g','g']})
data_df
| a | b | |
|---|---|---|
| 0 | 1 | f |
| 1 | 1 | e |
| 2 | 1 | f |
| 3 | 2 | g |
| 4 | 2 | g |
| 5 | 2 | g |
data_df['b']
0 f 1 e 2 f 3 g 4 g 5 g Name: b, dtype: object
series2cat(data_df,'b')
data_df['b']
0 f 1 e 2 f 3 g 4 g 5 g Name: b, dtype: category Categories (3, object): [e < f < g]
series2cat(data_df,'a')
data_df['a']
0 1 1 1 2 1 3 2 4 2 5 2 Name: a, dtype: category Categories (2, int64): [1 < 2]
show_doc(split_kwargs_by_func)
split_kwargs_by_func[source]
split_kwargs_by_func(kwargs,func)
Split kwargs between those expected by func and the others.
key_word_args = {'url':'http://fast.ai','dest':'./','new_var':[1,2,3],'testvalue':42}
split_kwargs_by_func(key_word_args,download_url)
({'url': 'http://fast.ai', 'dest': './'},
{'new_var': [1, 2, 3], 'testvalue': 42})
show_doc(to_int)
to_int(3.1415)
3
data = [1.2,3.4,7.25]
to_int(data)
[1, 3, 7]
show_doc(uniqueify)
uniqueify( pd.Series(data=['a','a','b','b','f','g']) )
['a', 'b', 'f', 'g']
show_doc(download_url)
download_url[source]
download_url(url:str,dest:str,overwrite:bool=*False,pbar:ProgressBar=None,show_progress=True,chunk_size=1048576,timeout=4,retries=5*)
Download url to dest unless it exists and not overwrite.
show_doc(find_classes)
find_classes[source]
find_classes(folder:Path) →FilePathList
List of label subdirectories in imagenet-style folder.
show_doc(join_path)
join_path[source]
join_path(fname:PathOrStr,path:PathOrStr=*'.'*) →Path
Return Path(path)/Path(fname), path defaults to current dir.
show_doc(join_paths)
join_paths[source]
join_paths(fnames:FilePathList,path:PathOrStr=*'.'*) →FilePathList
Join path to every file name in fnames.
show_doc(loadtxt_str)
loadtxt_str[source]
loadtxt_str(path:PathOrStr) →ndarray
Return ndarray of str of lines of text from path.
show_doc(save_texts)
show_doc(num_cpus)
show_doc(parallel)
parallel[source]
parallel(func,arr:Collection[T_co],max_workers:int=*None*)
Call func on every element of arr in parallel using max_workers.
show_doc(partition)
partition[source]
partition(a:Collection[T_co],sz:int) →List[Collection[T_co]]
Split iterables a in equal parts of size sz
show_doc(partition_by_cores)
partition_by_cores[source]
partition_by_cores(a:Collection[T_co],n_cpus:int) →List[Collection[T_co]]
Split data in a equally among n_cpus cores
show_doc(ItemBase, title_level=3)
All items used in fastai should subclass this. Must have a data field that will be used when collating in mini-batches.
show_doc(ItemBase.apply_tfms)
show_doc(ItemBase.show)
The default behavior is to set the string representation of this object as title of ax.
show_doc(Category, title_level=3)
show_doc(EmptyLabel, title_level=3)
show_doc(MultiCategory, title_level=3)
Create a MultiCategory with an obj that is a collection of labels. data corresponds to the one-hot encoded labels and raw is a list of associated string.
show_doc(FloatItem)
show_doc(camel2snake)
camel2snake('DeviceDataLoader')
'device_data_loader'
show_doc(even_mults)
even_mults[source]
even_mults(start:float,stop:float,n:int) →ndarray
Build log-stepped array from start to stop in n steps.
show_doc(func_args)
show_doc(noop)
noop[source]
noop(x)
Return x.
show_doc(one_hot)
show_doc(show_some)
show_some[source]
show_some(items:Collection[T_co],n_max:int=*5,sep:str=','*)
Return the representation of the first n_max elements in items.
show_doc(subplots)
subplots[source]
subplots(rows:int,cols:int,imgsize:int=*4,figsize:Optional[Tuple[int,int]]=None,title=None, ***kwargs**)
Like plt.subplots but with consistent axs shape, kwargs passed to fig.suptitle with title
show_doc(text2html_table)
text2html_table[source]
text2html_table(items:Tokens,widths:Collection[int]) →str
Put the texts in items in an HTML table, widths are the widths of the columns in %.