API Reference¶
fyrd.queue¶
The core class in this file is the Queue()
class which does most of the queue
management. In addition, get_cluster_environment()
attempts to autodetect the
cluster type (torque, slurm, normal) and sets the global cluster type for
the whole file. Finally, the wait()
function accepts a list of jobs and will
block until those jobs are complete.
The Queue class relies on a few simple queue parsers defined by the
torque_queue_parser
and slurm_queue_parser
functions. These call qstat -x
or squeue
and sacct
to get job information, and yield a simple tuple of
that data with the following members:
job_id, name, userid, partition, state, node-list, node-count, cpu-per-node, exit-code
The Queue class then converts this information into a Queue.QueueJob
object and
adds it to the internal jobs
dictionary within the Queue class. This list is
now the basis for all of the other functionality encoded by the Queue class. It
can be accessed directly, or sliced by accessing the completed
, queued
, and
running
attributes of the Queue class, these are used to simply divide up the
jobs dictionary to make finding information easy.
fyrd.queue.Queue¶
-
class
fyrd.queue.
Queue
(user=None, partition=None, qtype=None)[source]¶ Bases:
object
A wrapper for all defined batch systems.
-
active_job_count
A count of all jobs that are either pending or running in the current queue
- Type
-
can_submit
True if total active jobs is less than max_jobs
- Type
-
job_states
A set of all current job states
- Type
-
wait_to_submit
(max_jobs=None)[source]¶ Block until fewer running/pending jobs in queue than max_jobs.
Can filter by user, queue type or partition on initialization.
-
Methods¶
-
Queue.
wait
(jobs, return_disp=False, notify=True)[source] Block until all jobs in jobs are complete.
Update time is dependant upon the queue_update parameter in your ~/.fyrd/config.txt file.
- Parameters
jobs (list) – List of either fyrd.job.Job, fyrd.queue.QueueJob, job_id
return_disp (bool, optional) – If a job disappeares from the queue, return ‘disapeared’ instead of True
notify (str, True, or False, optional) – If True, both notification address and wait_time must be set in the [notify] section of the config. A notification email will be sent if the time exceeds this time. This is the default. If a string is passed, notification is forced and the string must be the to address. False means no notification
- Returns
True on success False or None on failure unless return_disp is True and the job disappeares, then returns ‘disappeared’
- Return type
-
Queue.
get
(jobs)[source] Get all results from a bunch of Job objects.
-
Queue.
wait_to_submit
(max_jobs=None)[source] Block until fewer running/queued jobs in queue than max_jobs.
- Parameters
max_jobs (int) – Override self.max_jobs for wait
-
Queue.
test_job_in_queue
(job_id, array_id=None)[source]¶ Check to make sure job is in self.
Tries 12 times with 1 second between each. If found returns True, else False.
-
Queue.
get_jobs
(key)[source] Return a dict of jobs where state matches key.
-
Queue.
get_user_jobs
(users)[source] Filter jobs by user.
-
Queue.
update
()[source] Refresh the list of jobs from the server, limit queries.
fyrd.queue Jobs¶
Hold information about individual jobs, QueueJob
about primary jobs,
QueueChild
about individual array jobs (which are stored in the children
attribute of QueueJob
objects.
-
class
fyrd.queue.
QueueJob
[source]¶ A very simple class to store info about jobs in the queue.
Only used for torque and slurm queues.
Initialize.
fyrd.job¶
Job management is handled by the Job()
class. This is a very large class
that defines all the methods required to build and submit a job to the cluster.
It accepts keyword arguments defined in fyrd.options on initialization, which are then fleshed out using profile information from the config files defined by fyrd.conf.
The primary argument on initialization is the function or script to submit.
Examples:
Job('ls -lah | grep myfile')
Job(print, ('hi',))
Job('echo hostname', profile='tiny')
Job(huge_function, args=(1,2) kwargs={'hi': 'there'},
profile='long', cores=28, mem='200GB')
fyrd.job.Job¶
-
class
fyrd.
Job
(command, args=None, kwargs=None, name=None, qtype=None, profile=None, queue=None, **kwds)[source]¶ Bases:
object
Information about a single job on the cluster.
Holds information about submit time, number of cores, the job script, and more.
Below are the core attributes and methods required to use this class, note that this is an incomplete list.
-
state
¶ - A slurm-style one word description of the state of the job, one of:
Not_Submitted
queued
running
completed
failed
- Type
-
exitcode
¶ The exitcode of the running processes (the script runner if the Job is a function).
- Type
-
submit_time
¶ A datetime object for the time of submission
- Type
datetime
-
start
¶ A datetime object for time execution started on the remote node.
- Type
datetime
-
end
¶ A datetime object for time execution ended on the remote node.
- Type
datetime
-
runtime
¶ A timedelta object containing runtime.
- Type
timedelta
-
kwds
¶ Keyword arguments to the batch system (e.g. mem, cores, walltime), this is initialized by taking every additional keyword argument to the Job. e.g. Job(‘echo hi’, profile=large, walltime=’00:20:00’, mem=’2GB’) will result in kwds containing {walltime: ‘00:20:00’, mem: ‘2GB’}. There is no need to alter this manually.
- Type
-
submit_args
¶ List of parsed submit arguments that will be passed at runtime to the submit function. Generated within the Job object, no need to set manually, use the kwds attribute instead.
- Type
-
submit
(wait_on_max_queue=True)[source]¶ Submit the job if it is ready and the queue is sufficiently open.
-
get
()[source]¶ Block until the job is done and then return the output (stdout if job is a script), by default saves all outputs to self (i.e. .out, .stdout, .stderr) and deletes all intermediate files before returning. If save argument is False, does not delete the output files by default.
Notes
Printing or reproducing the class will display detailed job information.
Both wait() and get() will update the queue every few seconds (defined by the queue_update item in the config) and add queue information to the job as they go.
If the job disappears from the queue with no information, it will be listed as ‘completed’.
All jobs have a .submission attribute, which is a Script object containing the submission script for the job and the file name, plus a ‘written’ bool that checks if the file exists.
In addition, some batch systems (e.g. SLURM) have an .exec_script attribute, which is a Script object containing the shell command to run. This difference is due to the fact that some SLURM systems execute multiple lines of the submission file at the same time.
Finally, if the job command is a function, this object will also contain a .function attribute, which contains the script to run the function.
Initialization function arguments.
- Parameters
command (function/str) – The command or function to execute.
args (tuple/dict, optional) – Optional arguments to add to command, particularly useful for functions.
kwargs (dict, optional) – Optional keyword arguments to pass to the command, only used for functions.
name (str, optional) – Optional name of the job. If not defined, guessed. If a job of the same name is already queued, an integer job number (not the queue number) will be added, ie. <name>.1
qtype (str, optional) – Override the default queue type
profile (str, optional) – The name of a profile saved in the conf
queue (fyrd.queue.Queue, optional) – An already initiated Queue class to use.
kwds – All other keywords are parsed into cluster keywords by the options system. For available keywords see fyrd.option_help()
-
Methods¶
-
Job.
clean
(delete_outputs=None, get_outputs=True)[source]¶ Delete all scripts created by this module, if they were written.
-
Job.
submit
(wait_on_max_queue=True, additional_keywords=None, max_jobs=None)[source]¶ Submit this job.
To disable max_queue_len, set it to 0. None will allow override by the default settings in the config file, and any positive integer will be interpretted to be the maximum queue length.
- Parameters
- Returns
self
- Return type
-
Job.
resubmit
(wait_on_max_queue=True, cancel_running=None)[source]¶ Attempt to auto resubmit, deletes prior files.
- Parameters
wait_on_max_queue (bool, optional) – Block until queue limit is below the maximum before submitting.
cancel_running (bool or None, optional) – If the job is currently running, cancel it before resubmitting. If None (default), will ask the user.
disable max_queue_len, set it to 0. None will allow override by (To) –
default settings in the config file, and any positive integer will (the) –
interpretted to be the maximum queue length. (be) –
- Returns
self
- Return type
-
Job.
get
(save=True, cleanup=None, delete_outfiles=None, del_no_save=None, raise_on_error=True)[source]¶ Block until job completed and return output of script/function.
By default saves all outputs to this class and deletes all intermediate files.
- Parameters
save (bool, optional) – Save all outputs to the class also (advised)
cleanup (bool, optional) – Clean all intermediate files after job completes.
delete_outfiles (bool, optional) – Clean output files after job completes.
del_no_save (bool, optional) – Delete output files even if save is False
raise_on_error (bool, optional) – If the returned output is an Exception, raise it.
- Returns
Function output if Function, else STDOUT
- Return type
-
Job.
get_output
(save=True, delete_file=None, update=True, raise_on_error=True)[source]¶ Get output of function or script.
This is the same as stdout for a script, or the function output for a function.
By default, output file is kept unless delete_file is True or self.clean_files is True.
- Parameters
save (bool, optional) – Save the output to self.out, default True. Would be a good idea to set to False if the output is huge.
delete_file (bool, optional) – Delete the output file when getting
update (bool, optional) – Update job info from queue first.
raise_on_error (bool, optional) – If the returned output is an Exception, raise it.
- Returns
output – The output of the script or function. Always a string if script.
- Return type
anything
-
Job.
get_stdout
(save=True, delete_file=None, update=True)[source]¶ Get stdout of function or script, same for both.
By default, output file is kept unless delete_file is True or self.clean_files is True.
Also sets self.start and self.end from the contents of STDOUT if possible.
- Returns
save (bool, optional) – Save the output to self.stdout, default True. Would be a good idea to set to False if the output is huge.
delete_file (bool, optional) – Delete the stdout file when getting
update (bool, optional) – Update job info from queue first.
- Returns
The contents of STDOUT, with runtime info and trailing newline removed.
- Return type
-
Job.
get_stderr
(save=True, delete_file=None, update=True)[source]¶ Get stderr of function or script, same for both.
By default, output file is kept unless delete_file is True or self.clean_files is True.
- Parameters
- Returns
The contents of STDERR, with trailing newline removed.
- Return type
-
Job.
get_times
(update=True, stdout=None)[source]¶ Get stdout of function or script, same for both.
Sets self.start and self.end from the contents of STDOUT if possible.
fyrd.submission_scripts¶
This module defines to classes that are used to build the actual jobs for submission,
including writing the files. Function
is actually a child class of Script
.
-
class
fyrd.submission_scripts.
Script
(file_name, script)[source]¶ Bases:
object
A script string plus a file name.
Initialize the script and file name.
-
property
exists
¶ True if file is on disk, False if not.
-
property
-
class
fyrd.submission_scripts.
Function
(file_name, function, args=None, kwargs=None, imports=None, syspaths=None, pickle_file=None, outfile=None)[source]¶ Bases:
fyrd.submission_scripts.Script
A special Script used to run a function.
Create a function wrapper.
NOTE: Function submission will fail if the parent file’s code is not wrapped in an if __main__ wrapper.
- Parameters
file_name (str) – A root name to the outfiles
function (callable) – Function handle.
args (tuple, optional) – Arguments to the function as a tuple.
kwargs (dict, optional) – Named keyword arguments to pass in the function call
imports (list, optional) – A list of imports, if not provided, defaults to all current imports, which may not work if you use complex imports. The list can include the import call, or just be a name, e.g [‘from os import path’, ‘sys’]
syspaths (list, optional) – Paths to be included in submitted function
pickle_file (str, optional) – The file to hold the function.
outfile (str, optional) – The file to hold the output.
fyrd.batch_systems¶
All batch systems are defined here.
fyrd.batch_systems functions¶
-
fyrd.batch_systems.
get_cluster_environment
(overwrite=False)[source]¶ Detect the local cluster environment and set MODE globally.
Detect the current batch system by looking for command line utilities. Order is important here, so we hard code the batch system lookups.
Paths to files can also be set in the config file.
-
fyrd.batch_systems.
check_queue
(qtype=None)[source]¶ Check if both MODE and qtype are valid.
First checks the MODE global and autodetects its value, if that fails, no other tests are done, the qtype argument is ignored.
After MODE is found to be a reasonable value, the queried queue is tested for functionality. If qtype is defined, this queue is tested, else the queue in MODE is tested.
Tests are defined per batch system.
- Parameters
qtype (str) –
- Returns
batch_system_functional
- Return type
- Raises
ClusterError – If MODE or qtype is not in DEFINED_SYSTEMS
See also
get_cluster_environment()
Auto detect the batch environment
get_batch_system()
Return the batch system module
fyrd.batch_systems.options¶
All keyword arguments are defined in dictionaries in the
options.py
file, alongside function to manage those dictionaries. Of
particular importance is option_help()
, which can display all of the keyword
arguments as a string or a table. check_arguments()
checks a dictionary to
make sure that the arguments are allowed (i.e. defined), it is called on all
keyword arguments in the package.
To see keywords, run fyrd keywords
from the console or fyrd.option_help()
from a python session.
The way that option handling works in general, is that all hard-coded keyword
arguments must contain a dictionary entry for ‘torque’ and ‘slurm’, as well as a
type declaration. If the type is NoneType, then the option is assumed to be a
boolean option. If it has a type though, check_argument()
attempts to cast the
type and specific idiosyncrasies are handled in this step, e.g. memory is converted
into an integer of MB. Once the arguments are sanitized format()
is called on
the string held in either the ‘torque’ or the ‘slurm’ values, and the formatted
string is then used as an option. If the type is a list/tuple, the ‘sjoin’ and
‘tjoin’ dictionary keys must exist, and are used to handle joining.
The following two functions are used to manage this formatting step.
option_to_string()
will take an option/value pair and return an appropriate
string that can be used in the current queue mode. If the option is not
implemented in the current mode, a debug message is printed to the console and
an empty string is returned.
options_to_string()
is a wrapper around option_to_string()
and can handle a
whole dictionary of arguments, it explicitly handle arguments that cannot be
managed using a simple string format.
-
fyrd.batch_systems.options.
option_help
(mode='string', qtype=None, tablefmt='simple')[source]¶ Print a sting to stdout displaying information on all options.
The possible run modes for this extension are:
string
Return a formatted string
print
Print the string to stdout
list
Return a simple list of keywords
table
Return a table of lists
merged_table
Combine all keywords into a single table
- Parameters
mode ({'string', 'print', 'list', 'table', 'merged_table'}, optional) –
qtype (str, optional) – If provided only return info on that queue type.
tablefmt (str, optional) –
A tabulate-style table format, one of:
'plain', 'simple', 'grid', 'pipe', 'orgtbl', 'rst', 'mediawiki', 'latex', 'latex_booktabs'
- Returns
A formatted string
- Return type
-
fyrd.batch_systems.options.
sanitize_arguments
(kwds)[source]¶ Run check_arguments, but return unmatched keywords as is.
-
fyrd.batch_systems.options.
split_keywords
(kwargs)[source]¶ Split a dictionary of keyword arguments into two dictionaries.
The first dictionary will contain valid arguments for fyrd, the second will contain all others.
- Returns
valid_args, other_args
- Return type
-
fyrd.batch_systems.options.
check_arguments
(kwargs)[source]¶ Make sure all keywords are allowed.
Raises OptionsError on error, returns sanitized dictionary on success.
- Note: Checks in SYNONYMS if argument is not recognized, raises OptionsError
if it is not found there either.
-
fyrd.batch_systems.options.
options_to_string
(option_dict, qtype=None)[source]¶ Return a multi-line string for job submission.
This function pre-parses options and then passes them to the parse_strange_options function of each batch system, before using the option_to_string function to parse the remaining options.
- Parameters
- Returns
parsed_options (str) – A multi-line string of parsed options
runtime_options (list) – A list of parsed options to be used at submit time
fyrd.conf¶
fyrd.conf
handles the config (~/.fyrd/config.txt
) file and the profiles
(~/.fyrd/profiles.txt
) file.
Profiles are combinations of keyword arguments
that can be called in any of the submission functions. Both the config and profiles
are just ConfigParser
objects, conf.py
merely adds an abstraction layer on top of this to maintain
the integrity of the files.
config¶
The config has three sections (and no defaults):
queue — sets options for handling the queue
jobs — sets options for submitting jobs
jobqueue — local option handling, will be removed in the future
For a complete reference, see the config documentation : Configuration
Options can be managed with the get_option()
and set_option()
functions, but
it is actually easier to use the console script:
fyrd conf list
fyrd conf edit max_jobs 3000
-
fyrd.conf.
get_option
(section=None, key=None, default=None)[source]¶ Get a single key or section.
All args are optional, if they are missing, the parent section or entire config will be returned.
- Parameters
- Returns
Option value if key exists, None if no key exists.
- Return type
option_value
See also
set_option()
Set an option
get_config()
Get the entire config
-
fyrd.conf.
load_config
()[source]¶ Load config from the config file.
If any section or key from DEFAULTS is not present in the config, it is added back, enforcing a minimal configuration.
- Returns
- Return type
ConfigParser
-
fyrd.conf.
create_config
(cnf=None, def_queue=None)[source]¶ Create an initial config file.
Gets all information from the file-wide DEFAULTS constant and overwrites specific keys using the values in cnf.
This means that any records in the cnf dict that are not present in DEFAULTS will be ignored, and any records that are absent will be populated from DEFAULTS.
profiles¶
Profiles are wrapped in a Profile()
class to make attribute access easy, but
they are fundamentally just dictionaries of keyword arguments. They can be
created with cluster.conf.Profile(name, {keywds})
and then written to a file
with the write()
method.
The easiest way to interact with profiles is not with class but with the
get_profile()
, set_profile()
, and del_profile()
functions. These make it
very easy to go from a dictionary of keywords to a profile.
Profiles can then be called with the profile=
keyword in any submission
function or Job class.
As with the config, profile management is the easiest and most stable when using the console script:
fyrd profile list
fyrd profile add very_long walltime:120:00:00
fyrd profile edit default partition:normal cores:4 mem:10GB
fyrd profile delete small
fyrd.conf.Profile¶
-
class
fyrd.conf.
Profile
(name, kwds)[source]¶ Bases:
object
A job submission profile. Just a thin wrapper around a dict.
-
write : Write self to config file
Set up bare minimum attributes.
- Parameters
-
fyrd.helpers¶
The helpers are all high level functions that are not required for the library but make difficult jobs easy to assist in the goal of trivially easy cluster submission.
The functions in fyrd.basic below are different in that they
provide simple job submission and management, while the functions in
fyrd.helpers
allow the submission of many jobs.
-
fyrd.helpers.
jobify
(name=None, profile=None, qtype=None, submit=True, **kwds)[source]¶ Decorator to make any function a job.
Will make any function return a Job object that will execute the function on the cluster.
If submit is True, the job will be submitted when it is returned.
Usage:
@fyrd.jobify(name='my_job', profile='small', mem='8GB', time='00:10:00', imports=['from time import sleep']) def do_something(file_path, iteration_count=24): for i in range(iteration_count): print(file_path + i) sleep(1) return file_path job = do_something('my_file.txt') out = job.get()
- Parameters
name (str, optional) – Optional name of the job. If not defined, guessed. If a job of the same name is already queued, an integer job number (not the queue number) will be added, ie. <name>.1
qtype (str, optional) – Override the default queue type
profile (str, optional) – The name of a profile saved in the conf
submit (bool, optional) – Submit the Job before returning it
kwds – All other keywords are parsed into cluster keywords by the options system. For available keywords see fyrd.option_help()
- Returns
A Job class initialized with the decorated function.
- Return type
fyrd.job.Job
Examples
>>> import fyrd >>> @fyrd.jobify(name='test_job', mem='1GB') ... def test(string, iterations=4): ... """This does basically nothing!""" ... outstring = "" ... for i in range(iterations): ... outstring += "Version {0}: {1}".format(i, string) ... return outstring >>> j = test('hi') >>> j.get() 'Version 0: hiVersion 1: hiVersion 2: hiVersion 3: hiVersion 4: hi'
-
fyrd.helpers.
parapply
(jobs, df, func, args=(), profile=None, applymap=False, merge_axis=0, merge_apply=False, name='parapply', imports=None, direct=True, **kwds)[source]¶ Split a dataframe, run apply in parallel, return result.
This function will split a dataframe into however many pieces are requested with the jobs argument, run apply in parallel by submitting the jobs to the cluster, and then recombine the outputs.
If the ‘clean_files’ and ‘clean_outputs’ arguments are not passed, we delete all intermediate files and output files by default.
This function will take any keyword arguments accepted by Job, which can be found by running fyrd.options.option_help(). It also accepts any of the keywords accepted by by pandas.DataFrame.apply(), found here
- Parameters
jobs (int) – Number of pieces to split the dataframe into
df (DataFrame) – Any pandas DataFrame
args (tuple) – Positional arguments to pass to the function, keyword arguments can just be passed directly.
profile (str) – A fyrd cluster profile to use
applymap (bool) – Run applymap() instead of apply()
merge_axis (int) – Which axis to merge on, 0 or 1, default is 1 as apply transposes columns
merge_apply (bool) – Apply the function on the merged dataframe also
name (str) – A prefix name for all of the jobs
imports (list) – A list of imports in any format, e.g. [‘import numpy’, ‘scipy’, ‘from numpy import mean’]
direct (bool) – Whether to run the function directly or to return a Job. Default True.
keyword arguments recognized by fyrd will be used for job (Any) –
submission. –
keyword arguments will be passed to DataFrame.apply()* (*Additional) –
- Returns
A recombined DataFrame: concatenated version of original split DataFrame
- Return type
DataFrame
Example
>>> import numpy >>> import pandas >>> import fyrd >>> df = pandas.DataFrame([[0, 1], [2, 6], [9, 24], [13, 76], [4, 12]]) >>> df['sum'] = fyrd.helpers.parapply(2, df, lambda x: x[0]+x[1], axis=1) >>> df 0 1 sum 0 0 1 1 1 2 6 8 2 9 24 33 3 13 76 89 4 4 12 16
See also
parapply_summary()
Merge results of parapply using applied function
splitrun()
Run a command in parallel on a split file
-
fyrd.helpers.
parapply_summary
(jobs, df, func, args=(), profile=None, applymap=False, name='parapply', imports=None, direct=True, **kwds)[source]¶ Run parapply for a function with summary stats.
Instead of returning the concatenated result, merge the result using the same function as was used during apply.
This works best for summary functions like .mean(), which do a linear operation on a whole dataframe or series.
- Parameters
jobs (int) – Number of pieces to split the dataframe into
df (DataFrame) – Any pandas DataFrame
args (tuple) – Positional arguments to pass to the function, keyword arguments can just be passed directly.
profile (str) – A fyrd cluster profile to use
applymap (bool) – Run applymap() instead of apply()
merge_axis (int) – Which axis to merge on, 0 or 1, default is 1 as apply transposes columns
merge_apply (bool) – Apply the function on the merged dataframe also
name (str) – A prefix name for all of the jobs
imports (list) – A list of imports in any format, e.g. [‘import numpy’, ‘scipy’, ‘from numpy import mean’]
direct (bool) – Whether to run the function directly or to return a Job. Default True.
keyword arguments recognized by fyrd will be used for job (Any) –
submission. –
keyword arguments will be passed to DataFrame.apply()* (*Additional) –
- Returns
A recombined DataFrame
- Return type
DataFrame
Example
>>> import numpy >>> import pandas >>> import fyrd >>> df = pandas.DataFrame([[0, 1], [2, 6], [9, 24], [13, 76], [4, 12]]) >>> df = fyrd.helpers.parapply_summary(2, df, numpy.mean) >>> df 0 6.083333 1 27.166667 dtype: float64
-
fyrd.helpers.
splitrun
(jobs, infile, inheader, command, args=None, kwargs=None, name=None, qtype=None, profile=None, outfile=None, outheader=False, merge_func=None, direct=True, **kwds)[source]¶ Split a file, run command in parallel, return result.
This function will split a file into however many pieces are requested with the jobs argument, and run command on each.
Accepts exactly the same arguments as the Job class, with the exception of the first three and last four arguments, which are:
the number of jobs
the file to work on
whether the input file has a header
an optional output file
whether the output file has a header
an optional function to use to merge the resulting list, only used if there is no outfile.
whether to run directly or to return a Job. If direct is True, this function will just run and thus block until complete, if direct is False, the function will submit as a Job and return that Job.
Note: If command is a string, .format(file={file}) will be called on it, where file is each split file. If command is a function, the there must be an argument in either args or kwargs that contains {file}. It will be replaced with the path to the file, again by the format command.
If outfile is specified, there must also be an ‘{outfile}’ line in any script or an ‘{outfile}’ argument in either args or kwargs. When this function completes, the file at outfile will contain the concatenated output files of all of the jobs.
If the ‘clean_files’ and ‘clean_outputs’ arguments are not passed, we delete all intermediate files and output files by default.
The intermediate files will be stored in the ‘scriptpath’ directory.
Any header line is kept at the top of the file.
Primary return value varies and is decided in this order:
- If outfile:
the absolute path to that file
- If merge_func:
the result of merge_func(list), where list is the list of outputs.
- Else:
a list of results
If direct is False, this function returns a fyrd.job.Job object which will return the results described above on get().
- Parameters
jobs (int) – Number of pieces to split the dataframe into
infile (str) – The path to the file to be split.
inheader (bool) – Does the input file have a header?
command (function/str) – The command or function to execute.
args (tuple/dict) – Optional arguments to add to command, particularly useful for functions.
kwargs (dict) – Optional keyword arguments to pass to the command, only used for functions.
name (str) – Optional name of the job. If not defined, guessed. If a job of the same name is already queued, an integer job number (not the queue number) will be added, ie. <name>.1
qtype (str) – Override the default queue type
profile (str) – The name of a profile saved in the conf
outfile (str) – The path to the expected output file.
outheader (bool) – Does the input outfile have a header?
merge_func (function) – An optional function used to merge the output list if there is no outfile.
direct (bool) – Whether to run the function directly or to return a Job. Default True.
other keywords are parsed into cluster keywords by the options (*All) –
For available keywords see fyrd.option_help() * (system.) –
- Returns
See description above
- Return type
Varies
fyrd.basic¶
This module holds high level functions to make job submission easy, allowing the user
to skip multiple steps and to avoid using the Job
class directly.
submit()
, make_job()
, and make_job_file()
all create Job
objects in the
background and allow users to submit jobs. All of these functions accept the exact
same arguments as the Job
class does, and all of them return a Job
object.
submit_file()
is different, it simply submits a pre-formed job file, either one that
has been written by this software or by any other method. The function makes no attempt
to fix arguments to allow submission on multiple clusters, it just submits the file.
clean()
takes a list of job objects and runs the clean()
method on all of them,
clean_dir()
uses known directory and suffix information to clean out all job files
from any directory.
-
fyrd.basic.
submit
()[source]¶ Submit a script to the cluster.
- Parameters
command (function/str) – The command or function to execute.
args (tuple/dict, optional) – Optional arguments to add to command, particularly useful for functions.
kwargs (dict, optional) – Optional keyword arguments to pass to the command, only used for functions.
name (str, optional) – Optional name of the job. If not defined, guessed. If a job of the same name is already queued, an integer job number (not the queue number) will be added, ie. <name>.1
qtype (str, optional) – Override the default queue type
profile (str, optional) – The name of a profile saved in the conf
queue (fyrd.queue.Queue, optional) – An already initiated Queue class to use.
kwds – All other keywords are parsed into cluster keywords by the options system. For available keywords see fyrd.option_help()
- Returns
- Return type
Job object
-
fyrd.basic.
make_job
()[source]¶ Make a job compatible with the chosen cluster but do not submit.
- Parameters
command (function/str) – The command or function to execute.
args (tuple/dict, optional) – Optional arguments to add to command, particularly useful for functions.
kwargs (dict, optional) – Optional keyword arguments to pass to the command, only used for functions.
name (str, optional) – Optional name of the job. If not defined, guessed. If a job of the same name is already queued, an integer job number (not the queue number) will be added, ie. <name>.1
qtype (str, optional) – Override the default queue type
profile (str, optional) – The name of a profile saved in the conf
queue (fyrd.queue.Queue, optional) – An already initiated Queue class to use.
kwds – All other keywords are parsed into cluster keywords by the options system. For available keywords see fyrd.option_help()
- Returns
- Return type
Job object
-
fyrd.basic.
make_job_file
()[source]¶ Make a job file compatible with the chosen cluster.
- Parameters
command (function/str) – The command or function to execute.
args (tuple/dict, optional) – Optional arguments to add to command, particularly useful for functions.
kwargs (dict, optional) – Optional keyword arguments to pass to the command, only used for functions.
name (str, optional) – Optional name of the job. If not defined, guessed. If a job of the same name is already queued, an integer job number (not the queue number) will be added, ie. <name>.1
qtype (str, optional) – Override the default queue type
profile (str, optional) – The name of a profile saved in the conf
queue (fyrd.queue.Queue, optional) – An already initiated Queue class to use.
kwds – All other keywords are parsed into cluster keywords by the options system. For available keywords see fyrd.option_help()
- Returns
Path to job file
- Return type
-
fyrd.basic.
submit_file
()[source]¶ Submit an existing job file to the cluster.
This function is independent of the Job object and just submits a file using a cluster appropriate method.
- Parameters
script_file (str) – The path to the file to submit
dependencies (str or list of strings, optional) – A job number or list of job numbers to depend on
qtype (str, optional) – The name of the queue system to use, auto-detected if not given.
submit_args (dict) – A dictionary of keyword arguments for the submission script.
- Returns
job_number
- Return type
-
fyrd.basic.
clean
()[source]¶ Delete all files in jobs list or single Job object.
- Parameters
jobs (fyrd.job.Job or list of fyrd.job.Job) – Job objects to clean
clean_outputs (bool) – Also clean outputs.
-
fyrd.basic.
clean_dir
()[source]¶ Delete all files made by this module in directory.
- CAUTION: The clean() function will delete EVERY file with
- extensions matching those these::
.<suffix>.err .<suffix>.out .<suffix>.out.func.pickle .<suffix>.sbatch & .<suffix>.script for slurm mode .<suffix>.qsub for torque mode .<suffix>.job for local mode _func.<suffix>.py _func.<suffix>.py.pickle.in _func.<suffix>.py.pickle.out
Note
This function will change in the future to use batch system defined paths.
- Parameters
- Returns
A set of deleted files
- Return type
fyrd.run¶
A library of useful functions used throughout the fyrd package.
These include functions to handle data, format outputs, handle file opening, run commands, check file extensions, get user input, and search and format imports.
These functions are not intended to be accessed directly and so documentation is limited.
-
class
fyrd.run.
CustomFormatter
(prog, indent_increment=2, max_help_position=24, width=None)[source]¶ Bases:
argparse.ArgumentDefaultsHelpFormatter
,argparse.RawDescriptionHelpFormatter
Custom argparse formatting.
-
fyrd.run.
cmd
(command, args=None, stdout=None, stderr=None, tries=1)[source]¶ Run command and return status, output, stderr.
- Parameters
- Returns
exit_code (int)
STDOUT (str)
STDERR (str)
-
fyrd.run.
cmd_or_file
(string)[source]¶ If string is a file, return the contents, else return the string.
-
fyrd.run.
count_lines
(infile, force_blocks=False)[source]¶ Return the line count of a file as quickly as possible.
Uses wc if avaialable, otherwise does a rapid read.
-
fyrd.run.
export_imports
(function, kwds)[source]¶ Get imports from a function and from kwds.
Also sets globals and adds path to module to sys path.
-
fyrd.run.
export_run
(function, args, kwargs)[source]¶ Execute a function after first exporting all imports.
-
fyrd.run.
file_getter
(file_strings, variables, extra_vars=None, max_count=None)[source]¶ Get a list of files and variable values using the search string.
The file strings can contain standard unix glob (like *) and variable containing strings in the form {name}.
For example, a file_string of {dir}/*.txt will match every file that ends in .txt in every directory relative to the current path.
The result for a directory name test with two files named 1.txt and 2.txt is a list of:
[(('dir/1.txt'), {'dir': 'test'}), (('dir/2.txt'), {'dir': 'test'})]
This is repeated for every file_string in file_strings, and the following tests are done:
All file_strings must result in identical numbers of files
All variables must have only a single value in every file string
If there are multiple file_strings, they are added to the result x in order, but the dictionary remains the same as variables must be shared. If multiple file_strings are provided the results are combined by alphabetical order.
- Parameters
file_strings (list of str) – List of search strings, e.g. */*, */*.txt, {dir}/*.txt or {dir}/{file}.txt
variables (list of str) – List of variables to look for
extra_vars (list of str, optional) –
A list of additional variables specified in a very precise format:
new_var:orig_var:regex:sub_str or new_var:value
The orig_var must correspond to a variable in variables. var will be generated by running re.sub(regex, sub_str, string) where string is the result of orig_var for the given file set
max_count (int, optional) – Max number of file_strings to parse, default is all.
- Returns
A list of files. Each list item will be a two-item tuple of (files, variables). Files will be a tuple with the same length as max_count, or file_strings if max_count is None. Variables will be a dictionary of all variables and extra_vars for this file set. e.g.:
[((file1, dir1, file2), {var1: val, var2: val})]
- Return type
- Raises
ValueError – Raised if any of the above tests are not met.
-
fyrd.run.
get_all_imports
(function, kwds, prot=False)[source]¶ Get all imports from a function and from kwds.
-
fyrd.run.
get_function_path
(function)[source]¶ Return path to module defining a function if it exists.
-
fyrd.run.
get_imports
(function, mode='string')[source]¶ Build a list of potentially useful imports from a function handle.
Gets:
All modules from globals()
All modules from the function’s globals()
All functions from the function’s globals()
Modes:
- string:
Return a list of strings formatted as unprotected import calls
- prot:
Similar to string, but with try..except blocks
- list:
Return two lists: (import name, module name) for modules and (import name, function name, module name) for functions
-
fyrd.run.
get_input
(message, valid_answers=None, default=None)[source]¶ Get input from the command line and check answers.
Allows input to work with python 2/3
- Parameters
- Returns
response
- Return type
-
fyrd.run.
get_pbar
(iterable, name=None, unit=None, **kwargs)[source]¶ Return a tqdm progress bar iterable.
If progressbar is set to False in the config, will not be shown.
-
fyrd.run.
import_function
(function, mode='string')[source]¶ Return an import string for the function.
Attempts to resolve the parent module also, if the parent module is a file, ie it isn’t __main__, the import string will include a call to sys.path.append to ensure the module is importable.
If this function isn’t defined by a module, returns an empty string.
- Parameters
mode ({'string', 'list'}, optional) – string/list, return as a unified string or a list.
-
fyrd.run.
is_exc
(x)[source]¶ Check if x is the output of sys.exc_info().
- Returns
True if matched the output of sys.exc_info().
- Return type
-
fyrd.run.
normalize_imports
(imports, prot=True)[source]¶ Take a heterogenous list of imports and normalize it.
-
fyrd.run.
open_zipped
(infile, mode='r')[source]¶ Open a regular, gzipped, or bz2 file.
If infile is a file handle or text device, it is returned without changes.
- Returns
- Return type
text mode file handle.
-
fyrd.run.
opt_split
(opt, split_on)[source]¶ Split options by chars in split_on, merge all into single list.
-
fyrd.run.
parse_glob
(string, get_vars=None)[source]¶ Return a list of files that match a simple regex glob.
- Parameters
- Returns
Keys are all files that match the string, values are None if get_vars is not passed. If get_vars is passed, the values are dictionaries of {‘variable’: ‘result’}. e.g. for {name}.txt and hi.txt:
{hi.txt: {name: 'hi'}}
- Return type
- Raises
ValueError – If blank or numeric variable names are used or if get_vars returns multiple different names for a file.
-
fyrd.run.
replace_argument
(args, find_string, replace_string, error=True)[source]¶ Replace find_string with replace string in a tuple or dict.
If dict, the values are replaced, not the keys.
Note: args can also be a list, in which case the first item is assumed to be a tuple, and the second a dictionary
-
fyrd.run.
split_file
(infile, parts, outpath='', keep_header=False)[source]¶ Split a file in parts and return a list of paths.
Note
Linux specific (uses wc).
If has_header is True, the top line is stripped off the infile prior to splitting and assumed to be the header.
-
fyrd.run.
string_getter
(string)[source]¶ Parse a string for {}, {#}, and {string}.
- Parameters
string (str) –
- Returns
ints (set) – A set of ints containing all {#} values
vrs (set) – A set of {string} values
- Raises
ValueError – If both {} and {#} are passed
-
fyrd.run.
syspath_fmt
(syspaths)[source]¶ Take a list of paths and return a sys of sys.path.append strings.
fyrd.logme¶
This is a package I wrote myself and keep using because I like it. It provides syslog style leveled logging (e.g. ‘debug’->’info’->’warn’->’error’->’critical’) and it implements colors and timestamped messages.
The minimum print level can be set module wide at runtime by changing
cluster.logme.MIN_LEVEL
.
-
fyrd.logme.
log
(message, level='info', logfile=None, also_write=None, min_level=None, kind=None)[source]¶ Print a string to logfile.
Levels display as:
verbose: <timestamp> VERBOSE --> debug: <timestamp> DEBUG --> info: <timestamp> INFO --> warn: <timestamp> WARNING --> error: <timestamp> ERROR --> critical: <timestamp> CRITICAL -->
- Parameters
message (str, optional) – The message to print.
logfile (file or logging object, optional) – Optional file to log to, defaults to STDERR. Can provide a logging object
level ({'debug', 'info', 'warn', 'error', 'normal'}, optional) – Will only print if level > MIN_LEVEL
also_write ({'stdout', 'stderr'}, optional) – Print to STDOUT or STDERR also. These only have an effect if the output is not already set to the same device.
min_level (str, deprecated) – Retained for backwards compatibility, min_level should be set using the logme.MIN_LEVEL constant.
kind (str, deprecated) – synonym for level, kept to retain backwards compatibility
Logging with timestamps and optional log files.
Print a timestamped message to a logfile, STDERR, or STDOUT.
If STDERR or STDOUT are used, colored flags are added. Colored flags are INFO, WARNINING, ERROR, or CRITICAL.
It is possible to write to both logfile and STDOUT/STDERR using the also_write argument.
If level is ‘error’ or ‘critical’, error is written to STDERR unless also_write == -1
MIN_LEVEL can also be provided, logs will only print if vlevel > MIN_LEVEL. Level order: critical>error>warn>info>debug>verbose
Usage:
import logme as lm
lm.log("Screw up!", <outfile>,
level='debug'|'info'|'warn'|'error'|'normal',
also_write='stderr'|'stdout')
Example:
lm.log('Hi')
Prints: 20160223 11:46:24.969 | INFO --> Hi
lm.log('Hi', level='debug')
Prints nothing
lm.MIN_LEVEL = 'debug'
lm.log('Hi', level='debug')
Prints: 20160223 11:46:24.969 | DEBUG --> Hi
Note: Uses terminal colors and STDERR, not compatible with non-unix systems
-
fyrd.logme.
log
(message, level='info', logfile=None, also_write=None, min_level=None, kind=None)[source] Print a string to logfile.
Levels display as:
verbose: <timestamp> VERBOSE --> debug: <timestamp> DEBUG --> info: <timestamp> INFO --> warn: <timestamp> WARNING --> error: <timestamp> ERROR --> critical: <timestamp> CRITICAL -->
- Parameters
message (str, optional) – The message to print.
logfile (file or logging object, optional) – Optional file to log to, defaults to STDERR. Can provide a logging object
level ({'debug', 'info', 'warn', 'error', 'normal'}, optional) – Will only print if level > MIN_LEVEL
also_write ({'stdout', 'stderr'}, optional) – Print to STDOUT or STDERR also. These only have an effect if the output is not already set to the same device.
min_level (str, deprecated) – Retained for backwards compatibility, min_level should be set using the logme.MIN_LEVEL constant.
kind (str, deprecated) – synonym for level, kept to retain backwards compatibility