The C API

You can integrate YARA into your C/C++ project by using the API provided by the libyara library. This API gives you access to every YARA feature and it’s the same API used by the command-line tools yara and yarac.

Initializing and finalizing libyara

The first thing your program must do when using libyara is initializing the library. This is done by calling the yr_initialize() function. This function allocates any resources needed by the library and initializes internal data structures. Its counterpart is yr_finalize(), which must be called when you are finished using the library.

In a multi-threaded program only the main thread must call yr_initialize() and yr_finalize(), but any additional thread using the library must call yr_finalize_thread() before exiting.

Compiling rules

Before using your rules to scan any data you need to compile them into binary form. For that purpose you’ll need a YARA compiler, which can be created with yr_compiler_create(). After being used, the compiler must be destroyed with yr_compiler_destroy().

You can use yr_compiler_add_file(), yr_compiler_add_fd(), or yr_compiler_add_string() to add one or more input sources to be compiled. Both of these functions receive an optional namespace. Rules added under the same namespace behave as if they were contained within the same source file or string, so, rule identifiers must be unique among all the sources sharing a namespace. If the namespace argument is NULL the rules are put in the default namespace.

The yr_compiler_add_file(), yr_compiler_add_fd(), and yr_compiler_add_string() functions return the number of errors found in the source code. If the rules are correct they will return 0. For more detailed error information you must set a callback function by using yr_compiler_set_callback() before calling any of the compiling functions. The callback function has the following prototype:

void callback_function(
    int error_level,
    const char* file_name,
    int line_number,
    const char* message,
    void* user_data)

Changed in version 3.3.0.

Possible values for error_level are YARA_ERROR_LEVEL_ERROR and YARA_ERROR_LEVEL_WARNING. The arguments file_name and line_number contains the file name and line number where the error or warning occurs. file_name is the one passed to yr_compiler_add_file() or yr_compiler_add_fd(). It can be NULL if you passed NULL or if you’re using yr_compiler_add_string(). The user_data pointer is the same you passed to yr_compiler_set_callback().

After you successfully added some sources you can get the compiled rules using the yr_compiler_get_rules() function. You’ll get a pointer to a YR_RULES structure which can be used to scan your data as described in Scanning data. Once yr_compiler_get_rules() is invoked you can not add more sources to the compiler, but you can get multiple instances of the compiled rules by calling yr_compiler_get_rules() multiple times.

Each instance of YR_RULES must be destroyed with yr_rules_destroy().

Saving and retrieving compiled rules

Compiled rules can be saved to a file and retrieved later by using yr_rules_save() and yr_rules_load(). Rules compiled and saved in one machine can be loaded in another machine as long as they have the same endianness, no matter the operating system or if they are 32-bit or 64-bit systems. However files saved with older versions of YARA may not work with newer versions due to changes in the file layout.

You can also save and retrieve your rules to and from generic data streams by using functions yr_rules_save_stream() and yr_rules_load_stream(). These functions receive a pointer to a YR_STREAM structure, defined as:

typedef struct _YR_STREAM
{
  void* user_data;

  YR_STREAM_READ_FUNC read;
  YR_STREAM_WRITE_FUNC write;

} YR_STREAM;

You must provide your own implementation for read and write functions. The read function is used by yr_rules_load_stream() to read data from your stream and the write function is used by yr_rules_save_stream() to write data into your stream.

Your read and write functions must respond to these prototypes:

size_t read(
    void* ptr,
    size_t size,
    size_t count,
    void* user_data);

size_t write(
    const void* ptr,
    size_t size,
    size_t count,
    void* user_data);

The ptr argument is a pointer to the buffer where the read function should put the read data, or where the write function will find the data that needs to be written to the stream. In both cases size is the size of each element being read or written and count the number of elements. The total size of the data being read or written is size * count. Both functions must return the total size of the data read/written.

The user_data pointer is the same you specified in the YR_STREAM structure. You can use it to pass arbitrary data to your read and write functions.

Scanning data

Once you have an instance of YR_RULES you can use it with either yr_rules_scan_file(), yr_rules_scan_fd() or yr_rules_scan_mem(). The results from the scan are returned to your program via a callback function. The callback has the following prototype:

int callback_function(
    int message,
    void* message_data,
    void* user_data);

Possible values for message are:

CALLBACK_MSG_RULE_MATCHING
CALLBACK_MSG_RULE_NOT_MATCHING
CALLBACK_MSG_SCAN_FINISHED
CALLBACK_MSG_IMPORT_MODULE
CALLBACK_MSG_MODULE_IMPORTED

Your callback function will be called once for each rule with either a CALLBACK_MSG_RULE_MATCHING or CALLBACK_MSG_RULE_NOT_MATCHING message, depending if the rule is matching or not. In both cases a pointer to the YR_RULE structure associated with the rule is passed in the message_data argument. You just need to perform a typecast from void* to YR_RULE* to access the structure.

This callback is also called with the CALLBACK_MSG_IMPORT_MODULE message. All modules referenced by an import statement in the rules are imported once for every file being scanned. In this case message_data points to a YR_MODULE_IMPORT structure. This structure contains a module_name field pointing to a null terminated string with the name of the module being imported and two other fields module_data and module_data_size. These fields are initially set to NULL and 0, but your program can assign a pointer to some arbitrary data to module_data while setting module_data_size to the size of the data. This way you can pass additional data to those modules requiring it, like the Cuckoo module for example.

Once a module is imported the callback is called again with the CALLBACK_MSG_MODULE_IMPORTED. When this happens message_data points to a YR_OBJECT_STRUCTURE structure. This structure contains all the information provided by the module about the currently scanned file.

Lastly, the callback function is also called with the CALLBACK_MSG_SCAN_FINISHED message when the scan is finished. In this case message_data is NULL.

Your callback function must return one of the following values:

CALLBACK_CONTINUE
CALLBACK_ABORT
CALLBACK_ERROR

If it returns CALLBACK_CONTINUE YARA will continue normally, CALLBACK_ABORT will abort the scan but the result from the yr_rules_scan_XXXX function will be ERROR_SUCCESS. On the other hand CALLBACK_ERROR will abort the scanning too, but the result from yr_rules_scan_XXXX will be ERROR_CALLBACK_ERROR.

The user_data argument passed to your callback function is the same you passed yr_rules_scan_XXXX. This pointer is not touched by YARA, it’s just a way for your program to pass arbitrary data to the callback function.

All yr_rules_scan_XXXX functions receive a flags argument and a timeout argument. The only flag defined at this time is SCAN_FLAGS_FAST_MODE, so you must pass either this flag or a zero value. The timeout argument forces the function to return after the specified number of seconds approximately, with a zero meaning no timeout at all.

The SCAN_FLAGS_FAST_MODE flag makes the scanning a little faster by avoiding multiple matches of the same string when not necessary. Once the string was found in the file it’s subsequently ignored, implying that you’ll have a single match for the string, even if it appears multiple times in the scanned data. This flag has the same effect of the -f command-line option described in Running YARA from the command-line.

API reference

Data structures

YR_COMPILER

Data structure representing a YARA compiler.

YR_MATCH

Data structure representing a string match.

int64_t base

Base offset/address for the match. While scanning a file this field is usually zero, while scanning a process memory space this field is the virtual address of the memory block where the match was found.

int64_t offset

Offset of the match relative to base.

int32_t match_length

Length of the matching string

uint8_t* data

Pointer to a buffer containing a portion of the matching string.

int32_t data_length

Length of data buffer. data_length is the minimum of match_length and MAX_MATCH_DATA.

Changed in version 3.5.0.

YR_META

Data structure representing a metadata value.

const char* identifier

Meta identifier.

int32_t type

One of the following metadata types:

META_TYPE_NULL META_TYPE_INTEGER META_TYPE_STRING META_TYPE_BOOLEAN
YR_MODULE_IMPORT
const char* module_name

Name of the module being imported.

void* module_data

Pointer to additional data passed to the module. Initially set to NULL, your program is responsible for setting this pointer while handling the CALLBACK_MSG_IMPORT_MODULE message.

size_t module_data_size

Size of additional data passed to module. Your program must set the appropriate value if module_data is modified.

YR_RULE

Data structure representing a single rule.

const char* identifier

Rule identifier.

const char* tags

Pointer to a sequence of null terminated strings with tag names. An additional null character marks the end of the sequence. Example: tag1\0tag2\0tag3\0\0. To iterate over the tags you can use yr_rule_tags_foreach().

YR_META* metas

Pointer to a sequence of YR_META structures. To iterate over the structures use yr_rule_metas_foreach().

YR_STRING* strings

Pointer to a sequence of YR_STRING structures. To iterate over the structures use yr_rule_strings_foreach().

YR_NAMESPACE* ns

Pointer to a YR_NAMESPACE structure.

YR_RULES

Data structure representing a set of compiled rules.

YR_STREAM

New in version 3.4.0.

Data structure representing a stream used with functions yr_rules_load_stream() and yr_rules_save_stream().

void* user_data

A user-defined pointer.

YR_STREAM_READ_FUNC read

A pointer to the stream’s read function provided by the user.

YR_STREAM_WRITE_FUNC write

A pointer to the stream’s write function provided by the user.

YR_STRING

Data structure representing a string declared in a rule.

const char* identifier

String identifier.

YR_NAMESPACE

Data structure representing a rule namespace.

const char* name

Rule namespace.

Functions

int yr_initialize(void)

Initialize the library. Must be called by the main thread before using any other function. Return ERROR_SUCCESS on success another error code in case of error. The list of possible return codes vary according to the modules compiled into YARA.

int yr_finalize(void)

Finalize the library. Must be called by the main free to release any resource allocated by the library. Return ERROR_SUCCESS on success another error code in case of error. The list of possible return codes vary according to the modules compiled into YARA.

void yr_finalize_thread(void)

Any thread using the library, except the main thread, must call this function when it finishes using the library.

int yr_compiler_create(YR_COMPILER** compiler)

Create a YARA compiler. You must pass the address of a pointer to a YR_COMPILER, the function will set the pointer to the newly allocated compiler. Returns one of the following error codes:

void yr_compiler_destroy(YR_COMPILER* compiler)

Destroy a YARA compiler.

void yr_compiler_set_callback(YR_COMPILER* compiler, YR_COMPILER_CALLBACK_FUNC callback, void* user_data)

Changed in version 3.3.0.

Set a callback for receiving error and warning information. The user_data pointer is passed to the callback function.

int yr_compiler_add_file(YR_COMPILER* compiler, FILE* file, const char* namespace, const char* file_name)

Compile rules from a file. Rules are put into the specified namespace, if namespace is NULL they will be put into the default namespace. file_name is the name of the file for error reporting purposes and can be set to NULL. Returns the number of errors found during compilation.

int yr_compiler_add_fd(YR_COMPILER* compiler, YR_FILE_DESCRIPTOR rules_fd, const char* namespace, const char* file_name)

New in version 3.6.0.

Compile rules from a file descriptor. Rules are put into the specified namespace, if namespace is NULL they will be put into the default namespace. file_name is the name of the file for error reporting purposes and can be set to NULL. Returns the number of errors found during compilation.

int yr_compiler_add_string(YR_COMPILER* compiler, const char* string, const char* namespace_)

Compile rules from a string. Rules are put into the specified namespace, if namespace is NULL they will be put into the default namespace. Returns the number of errors found during compilation.

int yr_compiler_get_rules(YR_COMPILER* compiler, YR_RULES** rules)

Get the compiled rules from the compiler. Returns one of the following error codes:

int yr_compiler_define_integer_variable(YR_COMPILER* compiler, const char* identifier, int64_t value)

Defines an integer external variable.

int yr_compiler_define_float_variable(YR_COMPILER* compiler, const char* identifier, double value)

Defines a float external variable.

int yr_compiler_define_boolean_variable(YR_COMPILER* compiler, const char* identifier, int value)

Defines a boolean external variable.

int yr_compiler_define_string_variable(YR_COMPILER* compiler, const char* identifier, const char* value)

Defines a string external variable.

void yr_rules_destroy(YR_RULES* rules)

Destroy compiled rules.

int yr_rules_save(YR_RULES* rules, const char* filename)

Save compiled rules into the file specified by filename. Returns one of the following error codes:

int yr_rules_save_stream(YR_RULES* rules, YR_STREAM* stream)

New in version 3.4.0.

Save compiled rules into stream. Returns one of the following error codes:

int yr_rules_load(const char* filename, YR_RULES** rules)

Load compiled rules from the file specified by filename. Returns one of the following error codes:

int yr_rules_load_stream(YR_STREAM* stream, YR_RULES** rules)

New in version 3.4.0.

Load compiled rules from stream. Returns one of the following error codes:

int yr_rules_scan_mem(YR_RULES* rules, uint8_t* buffer, size_t buffer_size, int flags, YR_CALLBACK_FUNC callback, void* user_data, int timeout)

Scan a memory buffer. Returns one of the following error codes:

int yr_rules_scan_file(YR_RULES* rules, const char* filename, int flags, YR_CALLBACK_FUNC callback, void* user_data, int timeout)

Scan a file. Returns one of the following error codes:

int yr_rules_scan_fd(YR_RULES* rules, YR_FILE_DESCRIPTOR fd, int flags, YR_CALLBACK_FUNC callback, void* user_data, int timeout)

Scan a file descriptor. In POSIX systems YR_FILE_DESCRIPTOR is an int, as returned by the open() function. In Windows YR_FILE_DESCRIPTOR is a HANDLE as returned by CreateFile().

Returns one of the following error codes:

yr_rule_tags_foreach(rule, tag)

Iterate over the tags of a given rule running the block of code that follows each time with a different value for tag of type const char*. Example:

const char* tag;

/* rule is a YR_RULE object */

yr_rule_tags_foreach(rule, tag)
{
  ..do something with tag
}
yr_rule_metas_foreach(rule, meta)

Iterate over the YR_META structures associated with a given rule running the block of code that follows each time with a different value for meta. Example:

YR_META* meta;

/* rule is a YR_RULE object */

yr_rule_metas_foreach(rule, meta)
{
  ..do something with meta
}
yr_rule_strings_foreach(rule, string)

Iterate over the YR_STRING structures associated with a given rule running the block of code that follows each time with a different value for string. Example:

YR_STRING* string;

/* rule is a YR_RULE object */

yr_rule_strings_foreach(rule, string)
{
  ..do something with string
}
yr_string_matches_foreach(string, match)

Iterate over the YR_MATCH structures associated with a given string running the block of code that follows each time with a different value for match. Example:

YR_MATCH* match;

/* string is a YR_STRING object */

yr_string_matches_foreach(string, match)
{
  ..do something with match
}
yr_rules_foreach(rules, rule)

Iterate over each YR_RULE in a YR_RULES object running the block of code that follows each time with a different value for rule. Example:

YR_RULE* rule;

/* rules is a YR_RULES object */

yr_rules_foreach(rules, rule)
{
  ..do something with rule
}

Error codes

ERROR_SUCCESS

Everything went fine.

ERROR_INSUFFICIENT_MEMORY

Insufficient memory to complete the operation.

ERROR_COULD_NOT_OPEN_FILE

File could not be opened.

ERROR_COULD_NOT_MAP_FILE

File could not be mapped into memory.

ERROR_ZERO_LENGTH_FILE

File length is zero.

ERROR_INVALID_FILE

File is not a valid rules file.

ERROR_CORRUPT_FILE

Rules file is corrupt.

ERROR_UNSUPPORTED_FILE_VERSION

File was generated by a different YARA and can’t be loaded by this version.

ERROR_TOO_MANY_SCAN_THREADS

Too many threads trying to use the same YR_RULES object simultaneously. The limit is defined by MAX_THREADS in ./include/yara/limits.h

ERROR_SCAN_TIMEOUT

Scan timed out.

ERROR_CALLBACK_ERROR

Callback returned an error.

ERROR_TOO_MANY_MATCHES

Too many matches for some string in your rules. This usually happens when your rules contains very short or very common strings like 01 02 or FF FF FF FF. The limit is defined by MAX_STRING_MATCHES in ./include/yara/limits.h