API¶
-
class
dataswim.
Ds
(df=None, db=None, nbload_libs=True)¶ Bases:
dataswim.db.Db
,dataswim.data.Df
,dataswim.charts.Plot
,dataswim.maps.Map
,dataswim.report.Report
,dataswim.base.DsBase
Main class
-
add
(col: str, value)¶ Add a column with default values
Parameters: - col (str) – column name
- value (any) – column value
Example: ds.add("Col 4", 0)
-
aenc
(key, value)¶ Add an entry to the altair encoding dict
-
altair_encode
= {}¶
-
altair_header_
()¶ Returns html script tags for Altair
-
amap
(lat, long, zoom=13, tiles='map')¶ Sets a map
-
append
(vals: list, index=None)¶ Append a row to the main dataframe
Parameters: - vals (list) – list of the row values to add
- index – index key, defaults to None
- index – any, optional
Example: ds.append([0, 2, 2, 3, 4])
-
apply
(function, *cols, axis=1, **kwargs)¶ Apply a function on columns values
Parameters: - function (function) – a function to apply to the columns
- cols (name of columns) – columns names
- axis – index (0) or column (1), default is 1
- kwargs (optional) – arguments for
df.apply
Example: def f(row): # add a new column with a value row["newcol"] = row["Col 1] + 1 return row ds.apply(f)
-
area_
(label=None, style=None, opts=None, options={})¶ Get an area chart
-
arrow_
(xloc, yloc, text, orientation='v', arrowstyle='->')¶ Returns an arrow for a chart. Params: the text, xloc and yloc are coordinates to position the arrow. Orientation is the way to display the arrow: possible values are
[<, ^, >, v]
. Arrow style is the graphic style of the arrow: possible values:[-, ->, -[, -|>, <->, <|-|>]
-
autoprint
= False¶
-
backup
()¶ Backup the main dataframe
-
backup_df
= None¶
-
bar_
(label=None, style=None, opts=None, options={})¶ Get a bar chart
-
bar_num_
(label=None, style=None, opts=None)¶ Get an Altair bar + number marks chart
-
bokeh_header_
()¶ Returns html script tags for Bokeh
-
chart
(x=None, y=None, chart_type=None, opts=None, style=None, label=None, options={}, **kwargs)¶ Get a chart
-
chart_
(x=None, y=None, chart_type=None, opts=None, style=None, label=None, options={}, **kwargs)¶ Get a chart
-
chart_obj
= None¶
-
chart_opts
= {'width': 880}¶
-
chart_style
= {}¶
-
chartjs_header_
()¶ Returns html script tags for Chartjs
-
circle_
(label=None, style=None, opts=None, options={})¶ Get a circle chart
-
clone_
(quiet=False)¶ Clone the DataSwim instance
Parameters: quiet (bool, optional) – print a message, defaults to False Returns: a dataswim instance Return type: Ds
-
color
(val)¶ Change the chart’s color
-
color_
(i=None)¶ Get a color from the palette
-
color_index
= 0¶
-
cols_
() → pandas.core.frame.DataFrame¶ Returns a dataframe with columns info
Returns: a pandas dataframe Return type: pd.DataFrame Example: ds.cols_()
-
concat
(*dss, **kwargs)¶ - Concatenate dataswim instances from and
- set it to the main dataframe
Parameters: - dss (Ds) – dataswim instances to concatenate
- kwargs – keyword arguments for
pd.concat
-
concat_
(*dss, **kwargs)¶ - Concatenate dataswim instances and
- return a new Ds instance
Parameters: - dss (Ds) – dataswim instances to concatenate
- kwargs – keyword arguments for
pd.concat
Return type:
-
connect
(url: str)¶ Connect to the database and set it as main database
Parameters: url (str) – path to the database, uses the Sqlalchemy format Example: ds.connect("sqlite:///mydb.slqite")
-
contains
(column, value)¶ Set the main dataframe instance to rows that contains a string value in a column
-
copycol
(origin_col: str, dest_col: str)¶ Copy a columns values in another column
Parameters: - origin_col (str) – name of the column to copy
- dest_col (str) – name of the new column
Example: ds.copy("col 1", "New col")
-
count
()¶ Counts the number of rows of the main dataframe
-
count_
() → int¶ Returns the number of rows of the main dataframe
Returns: number of rows Return type: int
-
count_empty
(field: str)¶ List of empty row indices
Parameters: field (str) – column to count from
-
count_nulls
(field: str)¶ Count the number of null values in a column
Parameters: field (str) – the column to count from
-
count_unique_
(field: str) → int¶ Return the number of unique values in a column
Parameters: field (str) – column to count from Returns: number of unique values Return type: int
-
count_zero
(field: str)¶ List of row with 0 values
Parameters: field (str) – column to count from
-
cvar_
(col)¶ Returns the coefficient of variance of a column in percentage
-
datapath
= None¶
-
date
(col: str, **kwargs)¶ Convert a column to date type
Parameters: - col (str) – column name
- **kwargs (optional) – keyword arguments for
pd.to_datetime
Example: ds.date("mycol")
-
dateindex
(col: str)¶ Set a datetime index from a column
Parameters: col (str) – column name where to index the date from Example: ds.dateindex("mycol")
-
dateparser
(dformat='%d/%m/%Y')¶ Returns a date parser for pandas
-
daterange
(datecol: str, date_start: datetime.datetime, op: str, **args)¶ Set the main dataframe rows in a date range
Parameters: - datecol (str) – the column to use for range
- date_start (datetime.datetime) – the date to start from
- op (str) –
- or -
-
daterange_
(datecol: str, date_start: datetime.datetime, op: str, **args) → Ds¶ Returns a DataSwim instance with rows in a date range
Parameters: - datecol (str) – the column to use for range
- date_start (datetime.datetime) – the date to start from
- op (str) –
- or -
Returns: a dataswim instance
Return type:
-
db
= None¶
-
debug
(*msg)¶ Prints a warning
-
defaults
()¶ Reset the chart options and style to defaults
-
density_
(label=None, style=None, opts=None)¶ Get a Seaborn density chart
-
describe_
()¶ Return a description of the data
Returns: a pandas dataframe Return type: pd.DataFrame Example: ds.describe()
-
df
= None¶
-
diffm
(diffcol: str, name: str = 'Diff', default=nan)¶ Add a diff column to the main dataframe: calculate the diff from the column mean
Parameters: - diffcol (str) – column to diff from
- name – diff column name, defaults to “Diff”
- name – str, optional
- default – column default value, defaults to nan
- default – optional
Example: ds.diffm("Col 1", "New col")
-
diffn
(diffcol: str, name: str = 'Diff')¶ Add a diff column to the main dataframe: calculate the diff from the next value
Parameters: - diffcol (str) – column to diff from
- name (str, optional) – diff column name, defaults to “Diff”
Example: ds.diffn("Col 1", "New col")
-
diffp
(diffcol: str, name: str = 'Diff')¶ Add a diff column to the main dataframe: calculate the diff from the previous value
Parameters: - diffcol (str) – column to diff from
- name (str, optional) – diff column name, defaults to “Diff”
Example: ds.diffp("Col 1", "New col")
-
diffs
(col: str, serie: iterable, name: str = 'Diff')¶ Add a diff column from a serie. The serie is an iterable of the same length than the dataframe
Parameters: - col (str) – column to diff
- serie (iterable) – serie to diff from
- name – name of the diff col, defaults to “Diff”
- name – str, optional
Example: ds.diffs("Col 1", [1, 1, 4], "New col")
-
diffsp
(col: str, serie: iterable, name: str = 'Diff')¶ Add a diff column in percentage from a serie. The serie is an iterable of the same length than the dataframe
Parameters: - col (str) – column to diff
- serie (iterable) – serie to diff from
- name – name of the diff col, defaults to “Diff”
- name – str, optional
Example: ds.diffp("Col 1", [1, 1, 4], "New col")
-
distrib_
(label=None, style=None, opts=None)¶ Get a Seaborn distribution chart
-
dlinear_
(label=None, style=None, opts=None)¶ Get a Seaborn linear + distribution chart
-
drop
(*cols)¶ Drops columns from the main dataframe
Parameters: cols (str) – names of the columns Example: ds.drop("Col 1", "Col 2")
-
drop_nan
(col: str = None, method: str = 'all', **kwargs)¶ Drop rows with NaN values from the main dataframe
Parameters: - col (str, optional) – name of the column, defaults to None. Drops in
- method (str, optional) –
how
param fordf.dropna
, defaults to “all” - **kwargs (optional) – params for
df.dropna
Example: ds.drop_nan("mycol")
-
dropr
(*rows)¶ Drops some rows from the main dataframe
Parameters: rows (list of ints) – rows names Example: ds.drop_rows([0, 2])
-
dsmap
= None¶
-
end
(*msg)¶ Prints an end message with elapsed time
-
engine
= 'bokeh'¶
-
err
(*args)¶ Handle an error
-
errorbar_
(label=None, style=None, opts=None, options={})¶ Get a point chart
-
errors_handling
= 'exceptions'¶
-
exact
(column, *values)¶ Sets the main dataframe to rows that has the exact string value in a column
-
exact_
(column, *values)¶ Returns a Dataswim instance with rows that has the exact string value in a column
-
exclude
(col: str, val)¶ Delete rows based on value
Parameters: - col (str) – column name
- val (any) – value to delete
Example: ds.exclude("Col 1", "value")
-
fdate
(*cols, precision: str = 'S', format: str = None)¶ Convert column values to formated date string
Parameters: - *cols (str, at least one) – names of the colums
- precision (str, optional) – time precision: Y, M, D, H, Min S, defaults to “S”
- format (str, optional) – python date format, defaults to None
Example: ds.fdate("mycol1", "mycol2", precision)
-
fill_nan
(val: str, *cols)¶ Fill NaN values with new values in the main dataframe
Parameters: - val (str) – new value
- *cols (str, at least one) – names of the colums
Example: ds.fill_nan("new value", "mycol1", "mycol2")
-
fill_nulls
(col: str)¶ Fill all null values with NaN values in a column. Null values are
None
or en empty stringParameters: col (str) – column name Example: ds.fill_nulls("mycol")
-
first_
() → pandas.core.series.Series¶ Select the first row
Returns: the first row as a serie Return type: pd.Series
-
flat_
(col, nums=True)¶ Returns a flat representation of a column’s values
-
format_date_
(date: datetime.datetime) → str¶ Format a date string
Parameters: date (datetime.datetime) – the input date Returns: output date string Return type: str
-
get_html
(chart_obj=None, slug=None)¶ Get the html and script tag for a chart
-
getall_
(table)¶ Get all rows values for a table
-
gmean_
(col: str, index_col: bool = True) → Ds¶ Group by and mean column
Parameters: - col (str) – column to group
- index_col (bool) –
Returns: a dataswim instance
Return type: Example: ds2 = ds.gmean("Col 1")
-
gsum_
(col: str, index_col: bool = True) → Ds¶ Group by and sum column
Parameters: - col (str) – column to group
- index_col (bool) –
Returns: a dataswim instance
Return type: Example: ds2 = ds.gsum("Col 1")
-
header
= None¶
-
heatmap_
(label=None, style=None, opts=None, options={})¶ Get a heatmap chart
-
height
(val)¶ Change the chart’s height
-
hist_
(label=None, style=None, opts=None, options={})¶ Get an historiogram chart
-
hline_
(label=None, style=None, opts=None, options={})¶ Get a mean line chart
-
html
(label, *msg)¶ Prints html in notebook
-
index
(col: str)¶ Set an index to the main dataframe
Parameters: col (str) – column name where to index from Example: ds.index("mycol")
-
indexcol
(col: str)¶ Add a column from the index
Parameters: col (str) – name of the new column Example: ds.index_col("New col")
-
influx_cli
= None¶
-
influx_count_
(measurement)¶ Count the number of rows for a measurement
-
influx_init
(url, port, user, pwd, db)¶ Initialize an Influxdb database client
-
influx_query_
(q)¶ Runs an Influx db query
-
influx_to_csv
(measurement, batch_size=5000)¶ Batch export data from an Influxdb measurement to csv
-
info
(*msg)¶ Prints a message with an info prefix
-
insert
(table: str, records: dict, create_cols: bool = False, dtypes: List[sqlalchemy.sql.sqltypes.SchemaType] = None)¶ - Insert one or many records in the database from a dictionary
- or a list of dictionaries
Parameters: - table (str) – the table to insert into
- records (dict) – a dictionnary or list of dictionnaries of the data to insert
- create_cols (bool, optional) – create the columns if they don’t exist, defaults to False
- dtypes (SchemaType, optional) – list of SqlAlchemy table types, defaults to None. The types are infered if not provided
-
keep
(*cols)¶ Limit the dataframe to some columns
Parameters: cols (str) – names of the columns Example: ds.keep("Col 1", "Col 2")
-
keep_
(*cols) → Ds¶ Returns a dataswim instance with a dataframe limited to some columns
Parameters: cols (str) – names of the columns Returns: a dataswim instance Return type: Ds Example: ds2 = ds.keep_("Col 1", "Col 2")
-
label
= None¶
-
layout_
(chart_objs, cols=3)¶ Returns a Holoview Layout from chart objects
-
limit
(r: int = 5)¶ Limit selection to a range in the main dataframe
Parameters: r (int, optional) – number of rows to keep, defaults to 5
-
limit_
(r: int = 5) → Ds¶ Returns a DataSwim instance with limited selection
Returns: a Ds instance Return type: Ds
-
line_
(label=None, style=None, opts=None, options={})¶ Get a line chart
-
line_num_
(label=None, style=None, opts=None)¶ Get an Altair line + number marks chart
-
line_point_
(label=None, style=None, opts=None, options={}, colors={'line': 'orange', 'point': '#30A2DA'})¶ Get a line and point chart
-
load
(table: str)¶ Set the main dataframe from a table’s data
Parameters: table (str) – table name Example: ds.load("mytable")
-
load_csv
(url, **kwargs)¶ Loads csv data in the main dataframe
Parameters: - url (str) – url of the csv file to load:
can be absolute if it starts with
/
or relative if it starts with./
- kwargs – keyword arguments to pass to Pandas
read_csv
function
Example: ds.load_csv("./myfile.csv")
- url (str) – url of the csv file to load:
can be absolute if it starts with
-
load_django
(query: django query)¶ Load the main dataframe from a django orm query
Parameters: query (django query) – django query from a model Example: ds.load_django(Mymodel.objects.all())
-
load_django_
(query: django query) → Ds¶ Returns a DataSwim instance from a django orm query
Parameters: query (django query) – django query from a model Returns: a dataswim instance with data from a django query Return type: Ds Example: ds2 = ds.load_django_(Mymodel.objects.all())
-
load_excel
(filepath, **kwargs)¶ Set the main dataframe with the content of an Excel file
Parameters: - filepath (str) – url of the csv file to load,
can be absolute if it starts with
/
or relative if it starts with./
- kwargs – keyword arguments to pass to
Pandas
read_excel
function
Example: ds.load_excel("./myfile.xlsx")
- filepath (str) – url of the csv file to load,
can be absolute if it starts with
-
load_h5
(filepath)¶ Load a Hdf5 file to the main dataframe
Parameters: filepath (str) – url of the csv file to load, can be absolute if it starts with /
or relative if it starts with./
Example: ds.load_h5("./myfile.hdf5")
-
load_json
(path, **kwargs)¶ Load data in the main dataframe from json
Parameters: - filepath (str) – url of the csv file to load,
can be absolute if it starts with
/
or relative if it starts with./
- kwargs – keyword arguments to pass to
Pandas
read_json
function
Example: ds.load_json("./myfile.json")
- filepath (str) – url of the csv file to load,
can be absolute if it starts with
-
lreg
(xcol, ycol, name='Regression')¶ Add a column to the main dataframe populted with the model’s linear regression for a column
-
lreg_
(label=None, style=None, opts=None, options={})¶ Get a linear regression chart
-
map_
(lat, long, zoom=13, tiles='map')¶ Returns a map
-
marker
(lat, long, text, color=None, icon=None)¶ Set the main map with a marker to the default map
-
marker_
(lat, long, text, pmap, color=None, icon=None)¶ Returns the map with a marker to the default map
-
mbar_
(col, x=None, y=None, rsum=None, rmean=None)¶ Splits a column into multiple series based on the column’s unique values. Then visualize theses series in a chart. Parameters: column to split, x axis column, y axis column Optional: rsum=”1D” to resample and sum data an rmean=”1D” to mean the data
-
mcluster
(lat_col: str, lon_col: str)¶ Add a markers cluster to the map
-
merge
(df: pandas.core.frame.DataFrame, on: str, how: str = 'outer', **kwargs)¶ Set the main dataframe from the current dataframe and the passed dataframe
Parameters: - df (pd.DataFrame) – the pandas dataframe to merge
- on (str) – param for
pd.merge
- how (str, optional) – param for
pd.merge
, defaults to “outer” - kwargs – keyword arguments for
pd.merge
-
mfw_
(col, sw_lang='english', limit=100)¶ Returns a Dataswim instance with the most frequent words in a column exluding the most common stop words
-
mline_
(col, x=None, y=None, rsum=None, rmean=None)¶ Splits a column into multiple series based on the column’s unique values. Then visualize theses series in a chart. Parameters: column to split, x axis column, y axis column Optional: rsum=”1D” to resample and sum data an rmean=”1D” to mean the data
-
mline_point_
(col, x=None, y=None, rsum=None, rmean=None)¶ Splits a column into multiple series based on the column’s unique values. Then visualize theses series in a chart. Parameters: column to split, x axis column, y axis column Optional: rsum=”1D” to resample and sum data an rmean=”1D” to mean the data
-
mpoint_
(col, x=None, y=None, rsum=None, rmean=None)¶ Splits a column into multiple series based on the column’s unique values. Then visualize theses series in a chart. Parameters: column to split, x axis column, y axis column Optional: rsum=”1D” to resample and sum data an rmean=”1D” to mean the data
-
msg
= None¶
-
msg_
(label, *msg)¶ Returns a message with a label
-
nan
= None¶
-
nan_empty
(col: str)¶ Fill empty values with NaN values
Parameters: col (str) – name of the colum Example: ds.nan_empty("mycol")
-
ncontains
(column, value)¶ Set the main dataframe instance to rows that do not contains a string value in a column
-
ndlayout_
(dataset, kdims, cols=3)¶ Create a Holoview NdLayout from a dictionnary of chart objects
-
notebook
= False¶
-
nowrange
(col: str, timeframe: str)¶ Set the main dataframe with rows within a date range from now
Parameters: - col (str) – the column to use for range
- timeframe (str) – units are: S, H, D, W, M, Y
example:
ds.nowrange("Date", "3D")
-
nowrange_
(col: str, timeframe: str) → Ds¶ Returns a Dataswim instance with rows within a date range from now
Parameters: - col (str) – the column to use for range
- timeframe (str) – units are: S, H, D, W, M, Y
Returns: [description]
Return type: [type]
example:
ds2 = ds.nowrange_("Date", "3D")
-
ok
(*msg)¶ Prints a message with an ok prefix
-
one
()¶ Shows one row of the dataframe and the field names wiht count
Returns: a pandas dataframe Return type: pd.DataFrame Example: ds.one()
-
opt
(name, value)¶ Add or update one option
-
opts
(dictobj)¶ Add or update options
-
pivot
(index, **kwargs)¶ Pivots a dataframe
-
point_
(label=None, style=None, opts=None, options={})¶ Get a point chart
-
point_num_
(label=None, style=None, opts=None)¶ Get an Altair point + number marks chart
-
progress
(*msg)¶ Prints a progress message
-
quants_
(inf, sup, chart_type='point', color='green')¶ Draw a chart to visualize quantiles
-
query
(q: str) → dataset.util.ResultIter¶ Query the database
Parameters: q (str) – the query to perform Returns: a dictionary with the query results Return type: dataset.util.ResultIter
-
quiet
= False¶
-
radar_
(label=None, style=None, opts=None, options={})¶ Get a radar chart
-
raenc
(key)¶ Remove an entry from the altair encoding dict
-
raencs
()¶ Reset the altair encoding dict
-
ratio
(col: str, ratio_col: str = 'Ratio')¶ Add a column whith the percentages ratio from a column
Parameters: - col (str) – column to calculate ratio from
- ratio_col – new ratio column name, defaults to “Ratio”
- ratio_col – str, optional
Example: ds2 = ds.ratio("Col 1")
-
rcolor
()¶ Reset the color to the base color
-
relation
(table: str, origin_field: str, search_field: str, destination_field: str = None, id_field: str = 'id')¶ Add a column to the main dataframe from a relation foreign key
Parameters: - table (str) – the table to select from
- origin_field (str) – the column name in the origin table to search from, generally an id column
- search_field (str) – the column name in the foreign table
- destination_field (str, optional) – name of the column to be created with the data in the datframe, defaults to None, will be named as the origin_field if not provided
- id_field (str, optional) – name of the primary key to use, defaults to “id”
example:
ds.relation("product", "category_id", "name")
-
relation_
(table: str, origin_field: str, search_field: str, destination_field=None, id_field='id') → pandas.core.frame.DataFrame¶ - Returns a DataSwim instance with a column filled from a relation
- foreign key
Parameters: - table (str) – the table to select from
- origin_field (str) – the column name in the origin table to search from, generally an id column
- search_field (str) – the column name in the foreign table
- destination_field (str, optional) – name of the column to be created with the data in the datframe, defaults to None, will be named as the origin_field if not provided
- id_field (str, optional) – name of the primary key to use, defaults to “id”
Returns: a pandas DataFrame
Return type: DataFrame
-
rename
(source_col: str, dest_col: str)¶ Renames a column in the main dataframe
Parameters: - source_col (str) – name of the column to rename
- dest_col (str) – new name of the column
Example: ds.rename("Col 1", "New col")
-
replace
(col: str, searchval: str, replaceval: str)¶ Replace a value in a column in the main dataframe
Parameters: - col (str) – column name
- searchval (str) – value to replace
- replaceval (str) – new value
Example: ds.replace("mycol", "value", "new_value")
-
report_engines
= []¶
-
report_path
= None¶
-
reports
= []¶
-
residual_
(label=None, style=None, opts=None)¶ Returns a Seaborn models residuals chart
-
restore
()¶ Restore the main dataframe
-
reverse
()¶ Reverses the main dataframe order
Example: ds.reverse()
-
rmean
(time_period: str, num_col: str = 'Number', dateindex: str = None)¶ Resample and add a mean column the main dataframe to a time period
Parameters: - time_period – unit + period: periods are Y, M, D, H, Min, S
- time_period – str
- num_col – number of the new column, defaults to “Number”
- num_col – str, optional
- dateindex – column name to use as date index, defaults to None
Example: ds.rmean("1Min")
-
rmean_
(time_period: str, num_col: str = 'Number', dateindex: str = None)¶ Resample and add a mean column the main dataframe to a time period and returns a new Ds instance
Parameters: - time_period – unit + period: periods are Y, M, D, H, Min, S
- time_period – str
- num_col – number of the new column, defaults to “Number”
- num_col – str, optional
- dateindex – column name to use as date index, defaults to None
Example: ds.rmean_("1Min")
-
ropt
(name)¶ Remove one option
-
ropts
()¶ Reset the chart options
-
roundvals
(col: str, precision: int = 2)¶ Round floats in a column. Numbers are going to be converted to floats if they are not already
Parameters: - col (str) – column name
- precision – float precision, defaults to 2
- precision – int, optional
Example: ds.roundvals("mycol")
-
rstyle
(name)¶ Remove one style
-
rstyles
()¶ Reset the chart options
-
rsum
(time_period: str, num_col: str = 'Number', dateindex: str = None)¶ Resample and add a sum the main dataframe to a time period
Parameters: - time_period – unit + period: periods are Y, M, D, H, Min, S
- time_period – str
- num_col – number of the new column, defaults to “Number”
- num_col – str, optional
- dateindex – column name to use as date index, defaults to None
- dateindex – str, optional
Example: ds.rsum("1D")
-
rsum_
(time_period: str, num_col: str = 'Number', dateindex: str = None)¶ Resample and add a sum the main dataframe to a time period and returns a new Ds instance
Parameters: - time_period – unit + period: periods are Y, M, D, H, Min, S
- time_period – str
- num_col – number of the new column, defaults to “Number”
- num_col – str, optional
- dateindex – column name to use as date index, defaults to None
- dateindex – str, optional
Example: ds.rsum_("1D")
-
rule_
(label=None, style=None, opts=None, options={})¶ Get a rule chart
-
sarea_
(col, x=None, y=None, rsum=None, rmean=None)¶ Get an stacked area chart
-
sbar_
(stack_index=None, label=None, style=None, opts=None, options={})¶ Get a stacked bar chart
-
scolor
()¶ Set a unique color from a serie
-
scommit
()¶
-
sconnect
(url: str)¶
-
seaborn_bar_
(label=None, style=None, opts=None)¶ Get a Seaborn bar chart
-
show
(rows: int = 5, dataframe: pandas.core.frame.DataFrame = None) → pandas.core.frame.DataFrame¶ Display info about the dataframe
Parameters: - rows – number of rows to show, defaults to 5
- rows – int, optional
- dataframe – a pandas dataframe, defaults to None
- dataframe – pd.DataFrame, optional
Returns: a pandas dataframe
Return type: pd.DataFrame
Example: ds.show()
-
size
(val)¶ Change the chart’s point size
-
sline_
(window_size=5, y_label='Moving average', chart_label=None)¶ Get a moving average curve chart to smooth between points
-
sort
(col: str)¶ Sorts the main dataframe according to the given column
Parameters: col (str) – column name Example: ds.sort("Col 1")
-
split_
(col: str) -> list(Ds)¶ Split the main dataframe according to a column’s unique values and return a dict of dataswim instances
Returns: list of dataswim instances Return type: list(Ds) Example: dss = ds.slit_("Col 1")
-
sq_
(query: str)¶
-
sqm_
(query: str, values)¶
-
square_
(label=None, style=None, opts=None, options={})¶ Get a square chart
-
stack
(slug, chart_obj=None, title=None)¶ Get the html for a chart and store it
-
start
(*msg)¶ Prints an start message
-
start_time
= None¶
-
static_path
= None¶
-
status
(*msg)¶ Prints a status message
-
strip
(col: str)¶ Remove leading and trailing white spaces in a column’s values
Parameters: col (str) – name of the column Example: ds.strip("mycol")
-
strip_cols
()¶ Remove leading and trailing white spaces in columns names
Example: ds.strip_cols()
-
style
(name, value)¶ Add or update one style
-
styles
(dictobj)¶ Add or update styles
-
subset
(*args)¶ Set the main dataframe to a subset based in positions Select a subset of the main dataframe based on position: ex: ds.subset(0,10) or ds.subset(10) is equivalent: it starts at the first row if only one argument is provided
-
subset_
(*args)¶ Returns a Dataswim instance with a subset data based in positions Select a subset of the main dataframe based on position: ex: ds.subset(0,10) or ds.subset(10) is equivalent: it starts at the first row if only one argument is provided
-
subtitle
(txt)¶ Prints a subtitle for pipelines
-
sum_
(col: str) → float¶ Returns the sum of all values in a column
Parameters: col (str) – column name Returns: sum of all the column values Return type: float Example: sum = ds.sum_("Col 1")
-
table
(name: str)¶ Display info about a table: number of rows and columns
Parameters: name (str) – name of the table Example: tables = ds.table("mytable")
-
tables
()¶ Print the existing tables in a database
Example: ds.tables()
-
tables_
() → list¶ Return a list of the existing tables in a database
Returns: list of the table names Return type: list Example: tables = ds.tables_()
-
tail
(rows: int = 5)¶ Returns the main dataframe’s tail
Parameters: - rows – number of rows to print, defaults to 5
- rows – int, optional
Returns: a pandas dataframe
Return type: pd.DataFrame
Example: ds.tail()
-
text_
(label=None, style=None, opts=None)¶ Get an Altair text marks chart
-
tick_
(label=None, style=None, opts=None, options={})¶ Get an tick chart
-
timestamps
(col: str, **kwargs)¶ ” Add a timestamps column from a date column
Parameters: - col (str) – name of the timestamps column to add
- **kwargs (optional) – keyword arguments for
pd.to_datetime
Example: ds.timestamps("mycol")
-
title
(txt)¶ Prints a title for pipelines
-
tmarker
(lat, long, text, color=None, icon=None, style=None)¶ Returns the map with a text marker to the default map
-
tmarker_
(lat, long, text, pmap, color=None, icon=None, style=None)¶ Returns the map with a text marker to the default map
-
to_csv
(filepath: str, index: bool = False, **kwargs)¶ Write the main dataframe to a csv file
Parameters: - filepath (str) – path of the file to save
- index – [description], defaults to False
- index – bool, optional
- *args – arguments to pass to
pd.to_csv
Example: ds.to_csv_("myfile.csv", header=false)
-
to_db
(table: str, dtypes: List[sqlalchemy.sql.sqltypes.SchemaType] = None)¶ Save the main dataframe to the database
Parameters: - table (str) – the table to create
- dtypes (List[SchemaType], optional) – SqlAlchemy columns type, defaults to None, will be infered if not provided
-
to_excel
(filepath: str, title: str)¶ Write the main dataframe to an Excell file
Parameters: - filepath (str) – path of the Excel file to write
- title (str) – Title of the stylesheet
Example: ds.to_excel_("./myfile.xlsx", "My data")
-
to_file
(slug, folderpath=None, header=None, footer=None)¶ Writes the html report to a file from the report stack
-
to_files
(folderpath=None)¶ Writes the html report to one file per report
-
to_float
(col: str, **kwargs)¶ Convert colums values to float
Parameters: - col (str, at least one) – name of the colum
- **kwargs (optional) – keyword arguments for
df.astype
Example: ds.to_float("mycol1")
-
to_hdf5
(filepath: str)¶ Write the main dataframe to Hdf5 file
Parameters: filepath (str) – path where to save the file Example: ds.to_hdf5_("./myfile.hdf5")
-
to_html_
() → str¶ Convert the main dataframe to html
Returns: html data Return type: str Example: ds.to_html_()
-
to_int
(*cols, **kwargs)¶ Convert some column values to integers
Parameters: - *cols (str, at least one) – names of the colums
- **kwargs (optional) – keyword arguments for
pd.to_numeric
Example: ds.to_int("mycol1", "mycol2", errors="coerce")
-
to_javascript_
(table_name: str = 'data') → str¶ Convert the main dataframe to javascript code
Parameters: - table_name – javascript variable name, defaults to “data”
- table_name – str, optional
Returns: a javascript constant with the data
Return type: str
Example: ds.to_javastript_("myconst")
-
to_json_
() → str¶ Convert the main dataframe to json
Returns: json data Return type: str Example: ds.to_json_()
-
to_markdown_
() → str¶ Convert the main dataframe to markdown
Returns: markdown data Return type: str Example: ds.to_markdown_()
-
to_numpy_
(table_name: str = 'data') → numpy.array¶ Convert the main dataframe to a numpy array
Parameters: - table_name – name of the python variable, defaults to “data”
- table_name – str, optional
Returns: a numpy array
Return type: np.array
Example: ds.to_numpy_("myvar")
-
to_python_
(table_name: str = 'data') → list¶ Convert the main dataframe to python a python list
Parameters: - table_name – python variable name, defaults to “data”
- table_name – str, optional
Returns: a python list of lists with the data
Return type: str
Example: ds.to_python_("myvar")
-
to_records_
() → dict¶ Returns a list of dictionary records from the main dataframe
Returns: a python dictionnary with the data Return type: str Example: ds.to_records_()
-
to_rst_
() → str¶ Convert the main dataframe to restructured text
Returns: rst data Return type: str Example: ds.to_rst_()
-
to_type
(dtype: type, *cols, **kwargs)¶ Convert colums values to a given type in the main dataframe
Parameters: - dtype (type) – a type to convert to: ex:
str
- *cols (str, at least one) – names of the colums
- **kwargs (optional) – keyword arguments for
df.astype
Example: ds.to_type(str, "mycol")
- dtype (type) – a type to convert to: ex:
-
trimiquants
(col: str, inf: float)¶ Remove superior and inferior quantiles from the dataframe
Parameters: - col (str) – column name
- inf (float) – inferior quantile
Example: ds.trimiquants("Col 1", 0.05)
-
trimquants
(col: str, inf: float, sup: float)¶ Remove superior and inferior quantiles from the dataframe
Parameters: - col (str) – column name
- inf (float) – inferior quantile
- sup (float) – superior quantile
Example: ds.trimquants("Col 1", 0.01, 0.99)
-
trimsquants
(col: str, sup: float)¶ Remove superior quantiles from the dataframe
Parameters: - col (str) – column name
- sup (float) – superior quantile
Example: ds.trimsquants("Col 1", 0.99)
-
types_
(col: str) → pandas.core.frame.DataFrame¶ Display types of values in a column
Parameters: col (str) – column name Returns: a pandas dataframe Return type: pd.DataFrame Example: ds.types_("Col 1")
-
unique_
(col: str) → list¶ Returns unique values in a column
Parameters: col (str) – the column to select from Returns: a list of unique values in the column Return type: list
-
update_table
(table: str, pks: List[str] = ['id'], mirror: bool = True)¶ Update records in a database table from the main dataframe
Parameters: - table (str) – table to update
- pks (List[str], optional :param mirror: delete the rows not in the new datataset) – if rows with matching pks exist they will be updated, otherwise a new row is inserted in the table, defaults to [“id”]
-
upsert
(table: str, record: dict, create_cols: bool = False, dtypes: List[sqlalchemy.sql.sqltypes.SchemaType] = None, pks: List[str] = ['id'])¶ Upsert a record in a table
Parameters: - table (str) – the table to upsert into
- record (dict) – dictionary with the data to upsert
- create_cols (bool, optional) – create the columns if it doesn’t exist, defaults to False
- dtypes (List[SchemaType], optional) – list of SqlAlchemy column types, defaults to None
- pks (List[str], optional) – if rows with matching pks exist they will be updated, otherwise a new row is inserted in the table, defaults to [“id”]
-
version
= '0.6.0'¶
-
warning
(*msg)¶ Prints a warning
-
width
(val)¶ Change the chart’s width
-
wunique_
(col)¶ Weight unique values: returns a dataframe with a count of unique values
-
x
= None¶
-
y
= None¶
-
zero_nan
(*cols)¶ Converts zero values to nan values in selected columns
Parameters: *cols (str, at least one) – names of the colums Example: ds.zero_nan("mycol1", "mycol2")
-