API¶

class dataswim.Ds(df=None, db=None, nbload_libs=True)¶

Bases: dataswim.db.Db, dataswim.data.Df, dataswim.charts.Plot, dataswim.maps.Map, dataswim.report.Report, dataswim.base.DsBase

Main class

add(col: str, value)¶

Add a column with default values

Parameters:	col (str) – column name value (any) – column value
Example:	`ds.add("Col 4", 0)`

aenc(key, value)¶: Add an entry to the altair encoding dict

altair_encode = {}¶

altair_header_()¶: Returns html script tags for Altair

amap(lat, long, zoom=13, tiles='map')¶: Sets a map

append(vals: list, index=None)¶

Append a row to the main dataframe

Parameters:	vals (list) – list of the row values to add index – index key, defaults to None index – any, optional
Example:	`ds.append([0, 2, 2, 3, 4])`

apply(function, *cols, axis=1, **kwargs)¶

Apply a function on columns values

Parameters:	function (function) – a function to apply to the columns cols (name of columns) – columns names axis – index (0) or column (1), default is 1 kwargs (optional) – arguments for `df.apply`
Example:	def f(row): # add a new column with a value row["newcol"] = row["Col 1] + 1 return row ds.apply(f)

area_(label=None, style=None, opts=None, options={})¶: Get an area chart

arrow_(xloc, yloc, text, orientation='v', arrowstyle='->')¶: Returns an arrow for a chart. Params: the text, xloc and yloc are coordinates to position the arrow. Orientation is the way to display the arrow: possible values are [<, ^, >, v]. Arrow style is the graphic style of the arrow: possible values: [-, ->, -[, -|>, <->, <|-|>]

autoprint = False¶

backup()¶: Backup the main dataframe

backup_df = None¶

bar_(label=None, style=None, opts=None, options={})¶: Get a bar chart

bar_num_(label=None, style=None, opts=None)¶: Get an Altair bar + number marks chart

bokeh_header_()¶: Returns html script tags for Bokeh

chart(x=None, y=None, chart_type=None, opts=None, style=None, label=None, options={}, **kwargs)¶: Get a chart

chart_(x=None, y=None, chart_type=None, opts=None, style=None, label=None, options={}, **kwargs)¶: Get a chart

chart_obj = None¶

chart_opts = {'width': 880}¶

chart_style = {}¶

chartjs_header_()¶: Returns html script tags for Chartjs

circle_(label=None, style=None, opts=None, options={})¶: Get a circle chart

clone_(quiet=False)¶

Clone the DataSwim instance

Parameters:	quiet (bool, optional) – print a message, defaults to False
Returns:	a dataswim instance
Return type:	Ds

color(val)¶: Change the chart’s color

color_(i=None)¶: Get a color from the palette

color_index = 0¶

cols_() → pandas.core.frame.DataFrame¶

Returns a dataframe with columns info

Returns:	a pandas dataframe
Return type:	pd.DataFrame
Example:	`ds.cols_()`

concat(*dss, **kwargs)¶

Concatenate dataswim instances from and: set it to the main dataframe

Parameters:	dss (Ds) – dataswim instances to concatenate kwargs – keyword arguments for `pd.concat`

concat_(*dss, **kwargs)¶

Concatenate dataswim instances and: return a new Ds instance

Parameters:	dss (Ds) – dataswim instances to concatenate kwargs – keyword arguments for `pd.concat`
Return type:	Ds

connect(url: str)¶

Connect to the database and set it as main database

Parameters:	url (str) – path to the database, uses the Sqlalchemy format
Example:	`ds.connect("sqlite:///mydb.slqite")`

contains(column, value)¶: Set the main dataframe instance to rows that contains a string value in a column

copycol(origin_col: str, dest_col: str)¶

Copy a columns values in another column

Parameters:	origin_col (str) – name of the column to copy dest_col (str) – name of the new column
Example:	`ds.copy("col 1", "New col")`

count()¶: Counts the number of rows of the main dataframe

count_() → int¶

Returns the number of rows of the main dataframe

Returns:	number of rows
Return type:	int

count_empty(field: str)¶

List of empty row indices

Parameters:	field (str) – column to count from

count_nulls(field: str)¶

Count the number of null values in a column

Parameters:	field (str) – the column to count from

count_unique_(field: str) → int¶

Return the number of unique values in a column

Parameters:	field (str) – column to count from
Returns:	number of unique values
Return type:	int

count_zero(field: str)¶

List of row with 0 values

Parameters:	field (str) – column to count from

cvar_(col)¶: Returns the coefficient of variance of a column in percentage

datapath = None¶

date(col: str, **kwargs)¶

Convert a column to date type

Parameters:	col (str) – column name *kwargs (optional*) – keyword arguments for `pd.to_datetime`
Example:	`ds.date("mycol")`

dateindex(col: str)¶

Set a datetime index from a column

Parameters:	col (str) – column name where to index the date from
Example:	`ds.dateindex("mycol")`

dateparser(dformat='%d/%m/%Y')¶: Returns a date parser for pandas

daterange(datecol: str, date_start: datetime.datetime, op: str, **args)¶

Set the main dataframe rows in a date range

Parameters:	datecol (str) – the column to use for range date_start (datetime.datetime) – the date to start from op (str) – or -

daterange_(datecol: str, date_start: datetime.datetime, op: str, **args) → Ds¶

Returns a DataSwim instance with rows in a date range

Parameters:	datecol (str) – the column to use for range date_start (datetime.datetime) – the date to start from op (str) – or -
Returns:	a dataswim instance
Return type:	Ds

db = None¶

debug(*msg)¶: Prints a warning

defaults()¶: Reset the chart options and style to defaults

density_(label=None, style=None, opts=None)¶: Get a Seaborn density chart

describe_()¶

Return a description of the data

Returns:	a pandas dataframe
Return type:	pd.DataFrame
Example:	`ds.describe()`

df = None¶

diffm(diffcol: str, name: str = 'Diff', default=nan)¶

Add a diff column to the main dataframe: calculate the diff from the column mean

Parameters:	diffcol (str) – column to diff from name – diff column name, defaults to “Diff” name – str, optional default – column default value, defaults to nan default – optional
Example:	`ds.diffm("Col 1", "New col")`

diffn(diffcol: str, name: str = 'Diff')¶

Add a diff column to the main dataframe: calculate the diff from the next value

Parameters:	diffcol (str) – column to diff from name (str, optional) – diff column name, defaults to “Diff”
Example:	`ds.diffn("Col 1", "New col")`

diffp(diffcol: str, name: str = 'Diff')¶

Add a diff column to the main dataframe: calculate the diff from the previous value

Parameters:	diffcol (str) – column to diff from name (str, optional) – diff column name, defaults to “Diff”
Example:	`ds.diffp("Col 1", "New col")`

diffs(col: str, serie: iterable, name: str = 'Diff')¶

Add a diff column from a serie. The serie is an iterable of the same length than the dataframe

Parameters:	col (str) – column to diff serie (iterable) – serie to diff from name – name of the diff col, defaults to “Diff” name – str, optional
Example:	`ds.diffs("Col 1", [1, 1, 4], "New col")`

diffsp(col: str, serie: iterable, name: str = 'Diff')¶

Add a diff column in percentage from a serie. The serie is an iterable of the same length than the dataframe

Parameters:	col (str) – column to diff serie (iterable) – serie to diff from name – name of the diff col, defaults to “Diff” name – str, optional
Example:	`ds.diffp("Col 1", [1, 1, 4], "New col")`

distrib_(label=None, style=None, opts=None)¶: Get a Seaborn distribution chart

dlinear_(label=None, style=None, opts=None)¶: Get a Seaborn linear + distribution chart

drop(*cols)¶

Drops columns from the main dataframe

Parameters:	cols (str) – names of the columns
Example:	`ds.drop("Col 1", "Col 2")`

drop_nan(col: str = None, method: str = 'all', **kwargs)¶

Drop rows with NaN values from the main dataframe

Parameters:	col (str, optional) – name of the column, defaults to None. Drops in method (str, optional) – `how` param for `df.dropna`, defaults to “all” *kwargs (optional*) – params for `df.dropna`
Example:	`ds.drop_nan("mycol")`

dropr(*rows)¶

Drops some rows from the main dataframe

Parameters:	rows (list of ints) – rows names
Example:	`ds.drop_rows([0, 2])`

dsmap = None¶

end(*msg)¶: Prints an end message with elapsed time

engine = 'bokeh'¶

err(*args)¶: Handle an error

errorbar_(label=None, style=None, opts=None, options={})¶: Get a point chart

errors_handling = 'exceptions'¶

exact(column, *values)¶: Sets the main dataframe to rows that has the exact string value in a column

exact_(column, *values)¶: Returns a Dataswim instance with rows that has the exact string value in a column

exclude(col: str, val)¶

Delete rows based on value

Parameters:	col (str) – column name val (any) – value to delete
Example:	`ds.exclude("Col 1", "value")`

fdate(*cols, precision: str = 'S', format: str = None)¶

Convert column values to formated date string

Parameters:	cols (str, at least one) – names of the colums precision* (str, optional) – time precision: Y, M, D, H, Min S, defaults to “S” format (str, optional) – python date format, defaults to None
Example:	`ds.fdate("mycol1", "mycol2", precision)`

fill_nan(val: str, *cols)¶

Fill NaN values with new values in the main dataframe

Parameters:	val (str) – new value cols (str, at least one*) – names of the colums
Example:	`ds.fill_nan("new value", "mycol1", "mycol2")`

fill_nulls(col: str)¶

Fill all null values with NaN values in a column. Null values are None or en empty string

Parameters:	col (str) – column name
Example:	`ds.fill_nulls("mycol")`

first_() → pandas.core.series.Series¶

Select the first row

Returns:	the first row as a serie
Return type:	pd.Series

flat_(col, nums=True)¶: Returns a flat representation of a column’s values

footer = None¶

format_date_(date: datetime.datetime) → str¶

Format a date string

Parameters:	date (datetime.datetime) – the input date
Returns:	output date string
Return type:	str

get_html(chart_obj=None, slug=None)¶: Get the html and script tag for a chart

getall_(table)¶: Get all rows values for a table

gmean_(col: str, index_col: bool = True) → Ds¶

Group by and mean column

Parameters:	col (str) – column to group index_col (bool) –
Returns:	a dataswim instance
Return type:	Ds
Example:	`ds2 = ds.gmean("Col 1")`

gsum_(col: str, index_col: bool = True) → Ds¶

Group by and sum column

Parameters:	col (str) – column to group index_col (bool) –
Returns:	a dataswim instance
Return type:	Ds
Example:	`ds2 = ds.gsum("Col 1")`

header = None¶

heatmap_(label=None, style=None, opts=None, options={})¶: Get a heatmap chart

height(val)¶: Change the chart’s height

hist_(label=None, style=None, opts=None, options={})¶: Get an historiogram chart

hline_(label=None, style=None, opts=None, options={})¶: Get a mean line chart

html(label, *msg)¶: Prints html in notebook

index(col: str)¶

Set an index to the main dataframe

Parameters:	col (str) – column name where to index from
Example:	`ds.index("mycol")`

indexcol(col: str)¶

Add a column from the index

Parameters:	col (str) – name of the new column
Example:	`ds.index_col("New col")`

influx_cli = None¶

influx_count_(measurement)¶: Count the number of rows for a measurement

influx_init(url, port, user, pwd, db)¶: Initialize an Influxdb database client

influx_query_(q)¶: Runs an Influx db query

influx_to_csv(measurement, batch_size=5000)¶: Batch export data from an Influxdb measurement to csv

info(*msg)¶: Prints a message with an info prefix

insert(table: str, records: dict, create_cols: bool = False, dtypes: List[sqlalchemy.sql.sqltypes.SchemaType] = None)¶

Insert one or many records in the database from a dictionary: or a list of dictionaries

Parameters:	table (str) – the table to insert into records (dict) – a dictionnary or list of dictionnaries of the data to insert create_cols (bool, optional) – create the columns if they don’t exist, defaults to False dtypes (SchemaType, optional) – list of SqlAlchemy table types, defaults to None. The types are infered if not provided

keep(*cols)¶

Limit the dataframe to some columns

Parameters:	cols (str) – names of the columns
Example:	`ds.keep("Col 1", "Col 2")`

keep_(*cols) → Ds¶

Returns a dataswim instance with a dataframe limited to some columns

Parameters:	cols (str) – names of the columns
Returns:	a dataswim instance
Return type:	Ds
Example:	`ds2 = ds.keep_("Col 1", "Col 2")`

label = None¶

layout_(chart_objs, cols=3)¶: Returns a Holoview Layout from chart objects

limit(r: int = 5)¶

Limit selection to a range in the main dataframe

Parameters:	r (int, optional) – number of rows to keep, defaults to 5

limit_(r: int = 5) → Ds¶

Returns a DataSwim instance with limited selection

Returns:	a Ds instance
Return type:	Ds

line_(label=None, style=None, opts=None, options={})¶: Get a line chart

line_num_(label=None, style=None, opts=None)¶: Get an Altair line + number marks chart

line_point_(label=None, style=None, opts=None, options={}, colors={'line': 'orange', 'point': '#30A2DA'})¶: Get a line and point chart

load(table: str)¶

Set the main dataframe from a table’s data

Parameters:	table (str) – table name
Example:	`ds.load("mytable")`

load_csv(url, **kwargs)¶

Loads csv data in the main dataframe

Parameters:	url (str) – url of the csv file to load: can be absolute if it starts with `/` or relative if it starts with `./` kwargs – keyword arguments to pass to Pandas `read_csv` function
Example:	`ds.load_csv("./myfile.csv")`

load_django(query: django query)¶

Load the main dataframe from a django orm query

Parameters:	query (django query) – django query from a model
Example:	`ds.load_django(Mymodel.objects.all())`

load_django_(query: django query) → Ds¶

Returns a DataSwim instance from a django orm query

Parameters:	query (django query) – django query from a model
Returns:	a dataswim instance with data from a django query
Return type:	Ds
Example:	`ds2 = ds.load_django_(Mymodel.objects.all())`

load_excel(filepath, **kwargs)¶

Set the main dataframe with the content of an Excel file

Parameters:	filepath (str) – url of the csv file to load, can be absolute if it starts with `/` or relative if it starts with `./` kwargs – keyword arguments to pass to Pandas `read_excel` function
Example:	`ds.load_excel("./myfile.xlsx")`

load_h5(filepath)¶

Load a Hdf5 file to the main dataframe

Parameters:	filepath (str) – url of the csv file to load, can be absolute if it starts with `/` or relative if it starts with `./`
Example:	`ds.load_h5("./myfile.hdf5")`

load_json(path, **kwargs)¶

Load data in the main dataframe from json

Parameters:	filepath (str) – url of the csv file to load, can be absolute if it starts with `/` or relative if it starts with `./` kwargs – keyword arguments to pass to Pandas `read_json` function
Example:	`ds.load_json("./myfile.json")`

lreg(xcol, ycol, name='Regression')¶: Add a column to the main dataframe populted with the model’s linear regression for a column

lreg_(label=None, style=None, opts=None, options={})¶: Get a linear regression chart

map_(lat, long, zoom=13, tiles='map')¶: Returns a map

marker(lat, long, text, color=None, icon=None)¶: Set the main map with a marker to the default map

marker_(lat, long, text, pmap, color=None, icon=None)¶: Returns the map with a marker to the default map

mbar_(col, x=None, y=None, rsum=None, rmean=None)¶: Splits a column into multiple series based on the column’s unique values. Then visualize theses series in a chart. Parameters: column to split, x axis column, y axis column Optional: rsum=”1D” to resample and sum data an rmean=”1D” to mean the data

mcluster(lat_col: str, lon_col: str)¶: Add a markers cluster to the map

merge(df: pandas.core.frame.DataFrame, on: str, how: str = 'outer', **kwargs)¶

Set the main dataframe from the current dataframe and the passed dataframe

Parameters:	df (pd.DataFrame) – the pandas dataframe to merge on (str) – param for `pd.merge` how (str, optional) – param for `pd.merge`, defaults to “outer” kwargs – keyword arguments for `pd.merge`

mfw_(col, sw_lang='english', limit=100)¶: Returns a Dataswim instance with the most frequent words in a column exluding the most common stop words

mline_(col, x=None, y=None, rsum=None, rmean=None)¶: Splits a column into multiple series based on the column’s unique values. Then visualize theses series in a chart. Parameters: column to split, x axis column, y axis column Optional: rsum=”1D” to resample and sum data an rmean=”1D” to mean the data

mline_point_(col, x=None, y=None, rsum=None, rmean=None)¶: Splits a column into multiple series based on the column’s unique values. Then visualize theses series in a chart. Parameters: column to split, x axis column, y axis column Optional: rsum=”1D” to resample and sum data an rmean=”1D” to mean the data

mpoint_(col, x=None, y=None, rsum=None, rmean=None)¶: Splits a column into multiple series based on the column’s unique values. Then visualize theses series in a chart. Parameters: column to split, x axis column, y axis column Optional: rsum=”1D” to resample and sum data an rmean=”1D” to mean the data

msg = None¶

msg_(label, *msg)¶: Returns a message with a label

nan = None¶

nan_empty(col: str)¶

Fill empty values with NaN values

Parameters:	col (str) – name of the colum
Example:	`ds.nan_empty("mycol")`

ncontains(column, value)¶: Set the main dataframe instance to rows that do not contains a string value in a column

ndlayout_(dataset, kdims, cols=3)¶: Create a Holoview NdLayout from a dictionnary of chart objects

notebook = False¶

nowrange(col: str, timeframe: str)¶

Set the main dataframe with rows within a date range from now

Parameters:	col (str) – the column to use for range timeframe (str) – units are: S, H, D, W, M, Y

example: ds.nowrange("Date", "3D")

nowrange_(col: str, timeframe: str) → Ds¶

Returns a Dataswim instance with rows within a date range from now

Parameters:	col (str) – the column to use for range timeframe (str) – units are: S, H, D, W, M, Y
Returns:	[description]
Return type:	[type]

example: ds2 = ds.nowrange_("Date", "3D")

ok(*msg)¶: Prints a message with an ok prefix

one()¶

Shows one row of the dataframe and the field names wiht count

Returns:	a pandas dataframe
Return type:	pd.DataFrame
Example:	`ds.one()`

opt(name, value)¶: Add or update one option

opts(dictobj)¶: Add or update options

pivot(index, **kwargs)¶: Pivots a dataframe

point_(label=None, style=None, opts=None, options={})¶: Get a point chart

point_num_(label=None, style=None, opts=None)¶: Get an Altair point + number marks chart

progress(*msg)¶: Prints a progress message

quants_(inf, sup, chart_type='point', color='green')¶: Draw a chart to visualize quantiles

query(q: str) → dataset.util.ResultIter¶

Query the database

Parameters:	q (str) – the query to perform
Returns:	a dictionary with the query results
Return type:	dataset.util.ResultIter

quiet = False¶

radar_(label=None, style=None, opts=None, options={})¶: Get a radar chart

raenc(key)¶: Remove an entry from the altair encoding dict

raencs()¶: Reset the altair encoding dict

ratio(col: str, ratio_col: str = 'Ratio')¶

Add a column whith the percentages ratio from a column

Parameters:	col (str) – column to calculate ratio from ratio_col – new ratio column name, defaults to “Ratio” ratio_col – str, optional
Example:	`ds2 = ds.ratio("Col 1")`

rcolor()¶: Reset the color to the base color

relation(table: str, origin_field: str, search_field: str, destination_field: str = None, id_field: str = 'id')¶

Add a column to the main dataframe from a relation foreign key

Parameters:

table (str) – the table to select from
origin_field (str) – the column name in the origin table to search from, generally an id column
search_field (str) – the column name in the foreign table
destination_field (str, optional) – name of the column to be created with the data in the datframe, defaults to None, will be named as the origin_field if not provided
id_field (str, optional) – name of the primary key to use, defaults to “id”

example: ds.relation("product", "category_id", "name")

relation_(table: str, origin_field: str, search_field: str, destination_field=None, id_field='id') → pandas.core.frame.DataFrame¶

Returns a DataSwim instance with a column filled from a relation: foreign key

Parameters:	table (str) – the table to select from origin_field (str) – the column name in the origin table to search from, generally an id column search_field (str) – the column name in the foreign table destination_field (str, optional) – name of the column to be created with the data in the datframe, defaults to None, will be named as the origin_field if not provided id_field (str, optional) – name of the primary key to use, defaults to “id”
Returns:	a pandas DataFrame
Return type:	DataFrame

rename(source_col: str, dest_col: str)¶

Renames a column in the main dataframe

Parameters:	source_col (str) – name of the column to rename dest_col (str) – new name of the column
Example:	`ds.rename("Col 1", "New col")`

replace(col: str, searchval: str, replaceval: str)¶

Replace a value in a column in the main dataframe

Parameters:	col (str) – column name searchval (str) – value to replace replaceval (str) – new value
Example:	`ds.replace("mycol", "value", "new_value")`

report_engines = []¶

report_path = None¶

reports = []¶

residual_(label=None, style=None, opts=None)¶: Returns a Seaborn models residuals chart

restore()¶: Restore the main dataframe

reverse()¶

Reverses the main dataframe order

Example:	`ds.reverse()`

rmean(time_period: str, num_col: str = 'Number', dateindex: str = None)¶

Resample and add a mean column the main dataframe to a time period

Parameters:	time_period – unit + period: periods are Y, M, D, H, Min, S time_period – str num_col – number of the new column, defaults to “Number” num_col – str, optional dateindex – column name to use as date index, defaults to None
Example:	`ds.rmean("1Min")`

rmean_(time_period: str, num_col: str = 'Number', dateindex: str = None)¶

Resample and add a mean column the main dataframe to a time period and returns a new Ds instance

Parameters:	time_period – unit + period: periods are Y, M, D, H, Min, S time_period – str num_col – number of the new column, defaults to “Number” num_col – str, optional dateindex – column name to use as date index, defaults to None
Example:	`ds.rmean_("1Min")`

ropt(name)¶: Remove one option

ropts()¶: Reset the chart options

roundvals(col: str, precision: int = 2)¶

Round floats in a column. Numbers are going to be converted to floats if they are not already

Parameters:	col (str) – column name precision – float precision, defaults to 2 precision – int, optional
Example:	`ds.roundvals("mycol")`

rstyle(name)¶: Remove one style

rstyles()¶: Reset the chart options

rsum(time_period: str, num_col: str = 'Number', dateindex: str = None)¶

Resample and add a sum the main dataframe to a time period

Parameters:	time_period – unit + period: periods are Y, M, D, H, Min, S time_period – str num_col – number of the new column, defaults to “Number” num_col – str, optional dateindex – column name to use as date index, defaults to None dateindex – str, optional
Example:	`ds.rsum("1D")`

rsum_(time_period: str, num_col: str = 'Number', dateindex: str = None)¶

Resample and add a sum the main dataframe to a time period and returns a new Ds instance

Parameters:	time_period – unit + period: periods are Y, M, D, H, Min, S time_period – str num_col – number of the new column, defaults to “Number” num_col – str, optional dateindex – column name to use as date index, defaults to None dateindex – str, optional
Example:	`ds.rsum_("1D")`

rule_(label=None, style=None, opts=None, options={})¶: Get a rule chart

sarea_(col, x=None, y=None, rsum=None, rmean=None)¶: Get an stacked area chart

sbar_(stack_index=None, label=None, style=None, opts=None, options={})¶: Get a stacked bar chart

scolor()¶: Set a unique color from a serie

scommit()¶

sconnect(url: str)¶

seaborn_bar_(label=None, style=None, opts=None)¶: Get a Seaborn bar chart

show(rows: int = 5, dataframe: pandas.core.frame.DataFrame = None) → pandas.core.frame.DataFrame¶

Display info about the dataframe

Parameters:	rows – number of rows to show, defaults to 5 rows – int, optional dataframe – a pandas dataframe, defaults to None dataframe – pd.DataFrame, optional
Returns:	a pandas dataframe
Return type:	pd.DataFrame
Example:	`ds.show()`

size(val)¶: Change the chart’s point size

sline_(window_size=5, y_label='Moving average', chart_label=None)¶: Get a moving average curve chart to smooth between points

sort(col: str)¶

Sorts the main dataframe according to the given column

Parameters:	col (str) – column name
Example:	`ds.sort("Col 1")`

split_(col: str) -> list(Ds)¶

Split the main dataframe according to a column’s unique values and return a dict of dataswim instances

Returns:	list of dataswim instances
Return type:	list(Ds)
Example:	`dss = ds.slit_("Col 1")`

sq_(query: str)¶

sqm_(query: str, values)¶

square_(label=None, style=None, opts=None, options={})¶: Get a square chart

stack(slug, chart_obj=None, title=None)¶: Get the html for a chart and store it

start(*msg)¶: Prints an start message

start_time = None¶

static_path = None¶

status(*msg)¶: Prints a status message

strip(col: str)¶

Remove leading and trailing white spaces in a column’s values

Parameters:	col (str) – name of the column
Example:	`ds.strip("mycol")`

strip_cols()¶

Remove leading and trailing white spaces in columns names

Example:	`ds.strip_cols()`

style(name, value)¶: Add or update one style

styles(dictobj)¶: Add or update styles

subset(*args)¶: Set the main dataframe to a subset based in positions Select a subset of the main dataframe based on position: ex: ds.subset(0,10) or ds.subset(10) is equivalent: it starts at the first row if only one argument is provided

subset_(*args)¶: Returns a Dataswim instance with a subset data based in positions Select a subset of the main dataframe based on position: ex: ds.subset(0,10) or ds.subset(10) is equivalent: it starts at the first row if only one argument is provided

subtitle(txt)¶: Prints a subtitle for pipelines

sum_(col: str) → float¶

Returns the sum of all values in a column

Parameters:	col (str) – column name
Returns:	sum of all the column values
Return type:	float
Example:	`sum = ds.sum_("Col 1")`

table(name: str)¶

Display info about a table: number of rows and columns

Parameters:	name (str) – name of the table
Example:	`tables = ds.table("mytable")`

tables()¶

Print the existing tables in a database

Example:	`ds.tables()`

tables_() → list¶

Return a list of the existing tables in a database

Returns:	list of the table names
Return type:	list
Example:	`tables = ds.tables_()`

tail(rows: int = 5)¶

Returns the main dataframe’s tail

Parameters:	rows – number of rows to print, defaults to 5 rows – int, optional
Returns:	a pandas dataframe
Return type:	pd.DataFrame
Example:	`ds.tail()`

text_(label=None, style=None, opts=None)¶: Get an Altair text marks chart

tick_(label=None, style=None, opts=None, options={})¶: Get an tick chart

timestamps(col: str, **kwargs)¶

” Add a timestamps column from a date column

Parameters:	col (str) – name of the timestamps column to add *kwargs (optional*) – keyword arguments for `pd.to_datetime`
Example:	`ds.timestamps("mycol")`

title(txt)¶: Prints a title for pipelines

tmarker(lat, long, text, color=None, icon=None, style=None)¶: Returns the map with a text marker to the default map

tmarker_(lat, long, text, pmap, color=None, icon=None, style=None)¶: Returns the map with a text marker to the default map

to_csv(filepath: str, index: bool = False, **kwargs)¶

Write the main dataframe to a csv file

Parameters:	filepath (str) – path of the file to save index – [description], defaults to False index – bool, optional *args – arguments to pass to `pd.to_csv`
Example:	`ds.to_csv_("myfile.csv", header=false)`

to_db(table: str, dtypes: List[sqlalchemy.sql.sqltypes.SchemaType] = None)¶

Save the main dataframe to the database

Parameters:	table (str) – the table to create dtypes (List[SchemaType], optional) – SqlAlchemy columns type, defaults to None, will be infered if not provided

to_excel(filepath: str, title: str)¶

Write the main dataframe to an Excell file

Parameters:	filepath (str) – path of the Excel file to write title (str) – Title of the stylesheet
Example:	`ds.to_excel_("./myfile.xlsx", "My data")`

to_file(slug, folderpath=None, header=None, footer=None)¶: Writes the html report to a file from the report stack

to_files(folderpath=None)¶: Writes the html report to one file per report

to_float(col: str, **kwargs)¶

Convert colums values to float

Parameters:	col (str, at least one) – name of the colum *kwargs (optional*) – keyword arguments for `df.astype`
Example:	`ds.to_float("mycol1")`

to_hdf5(filepath: str)¶

Write the main dataframe to Hdf5 file

Parameters:	filepath (str) – path where to save the file
Example:	`ds.to_hdf5_("./myfile.hdf5")`

to_html_() → str¶

Convert the main dataframe to html

Returns:	html data
Return type:	str
Example:	`ds.to_html_()`

to_int(*cols, **kwargs)¶

Convert some column values to integers

Parameters:	cols (str, at least one) – names of the colums kwargs (optional*) – keyword arguments for `pd.to_numeric`
Example:	`ds.to_int("mycol1", "mycol2", errors="coerce")`

to_javascript_(table_name: str = 'data') → str¶

Convert the main dataframe to javascript code

Parameters:	table_name – javascript variable name, defaults to “data” table_name – str, optional
Returns:	a javascript constant with the data
Return type:	str
Example:	`ds.to_javastript_("myconst")`

to_json_() → str¶

Convert the main dataframe to json

Returns:	json data
Return type:	str
Example:	`ds.to_json_()`

to_markdown_() → str¶

Convert the main dataframe to markdown

Returns:	markdown data
Return type:	str
Example:	`ds.to_markdown_()`

to_numpy_(table_name: str = 'data') → numpy.array¶

Convert the main dataframe to a numpy array

Parameters:	table_name – name of the python variable, defaults to “data” table_name – str, optional
Returns:	a numpy array
Return type:	np.array
Example:	`ds.to_numpy_("myvar")`

to_python_(table_name: str = 'data') → list¶

Convert the main dataframe to python a python list

Parameters:	table_name – python variable name, defaults to “data” table_name – str, optional
Returns:	a python list of lists with the data
Return type:	str
Example:	`ds.to_python_("myvar")`

to_records_() → dict¶

Returns a list of dictionary records from the main dataframe

Returns:	a python dictionnary with the data
Return type:	str
Example:	`ds.to_records_()`

to_rst_() → str¶

Convert the main dataframe to restructured text

Returns:	rst data
Return type:	str
Example:	`ds.to_rst_()`

to_type(dtype: type, *cols, **kwargs)¶

Convert colums values to a given type in the main dataframe

Parameters:	dtype (type) – a type to convert to: ex: `str` cols (str, at least one) – names of the colums kwargs (optional*) – keyword arguments for `df.astype`
Example:	`ds.to_type(str, "mycol")`

trimiquants(col: str, inf: float)¶

Remove superior and inferior quantiles from the dataframe

Parameters:	col (str) – column name inf (float) – inferior quantile
Example:	`ds.trimiquants("Col 1", 0.05)`

trimquants(col: str, inf: float, sup: float)¶

Remove superior and inferior quantiles from the dataframe

Parameters:	col (str) – column name inf (float) – inferior quantile sup (float) – superior quantile
Example:	`ds.trimquants("Col 1", 0.01, 0.99)`

trimsquants(col: str, sup: float)¶

Remove superior quantiles from the dataframe

Parameters:	col (str) – column name sup (float) – superior quantile
Example:	`ds.trimsquants("Col 1", 0.99)`

types_(col: str) → pandas.core.frame.DataFrame¶

Display types of values in a column

Parameters:	col (str) – column name
Returns:	a pandas dataframe
Return type:	pd.DataFrame
Example:	`ds.types_("Col 1")`

unique_(col: str) → list¶

Returns unique values in a column

Parameters:	col (str) – the column to select from
Returns:	a list of unique values in the column
Return type:	list

update_table(table: str, pks: List[str] = ['id'], mirror: bool = True)¶

Update records in a database table from the main dataframe

Parameters:	table (str) – table to update pks (List[str], optional :param mirror: delete the rows not in the new datataset) – if rows with matching pks exist they will be updated, otherwise a new row is inserted in the table, defaults to [“id”]

upsert(table: str, record: dict, create_cols: bool = False, dtypes: List[sqlalchemy.sql.sqltypes.SchemaType] = None, pks: List[str] = ['id'])¶

Upsert a record in a table

Parameters:

table (str) – the table to upsert into
record (dict) – dictionary with the data to upsert
create_cols (bool, optional) – create the columns if it doesn’t exist, defaults to False
dtypes (List[SchemaType], optional) – list of SqlAlchemy column types, defaults to None
pks (List[str], optional) – if rows with matching pks exist they will be updated, otherwise a new row is inserted in the table, defaults to [“id”]

version = '0.6.0'¶

warning(*msg)¶: Prints a warning

width(val)¶: Change the chart’s width

wunique_(col)¶: Weight unique values: returns a dataframe with a count of unique values

x = None¶

y = None¶

zero_nan(*cols)¶

Converts zero values to nan values in selected columns

Parameters:	cols (str, at least one*) – names of the colums
Example:	`ds.zero_nan("mycol1", "mycol2")`