UnitGen

UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据：代码补全、测试生成、文档生成等。

Thanks to OpenBayes for providing computing resources.

Finetune Model Examples:

name	model download (HuggingFace)	finetune Notebook	model download (OpenBayes)
DeepSeek 6.7B	unit-mesh/autodev-coder	finetune.ipynb	AutoDev Coder

Language support by Chapi

supported:
- Java
- Kotlin
doing:
- TypeScript/JavaScript
- Rust
future:
- Go
- Python
- C/C++
- C#
- Scala

Features:

Code context strategy: Related code completion, Similar Code Completion
Instruction Builder type: inline, block, after block, documentation, test gen
Code quality filter and pipeline. Code smell, test smell, estimation and more.

Architecture

Layered Architecture

Architecture

Workflow

UnitGen Workflow

Design Philosophy

Unique prompt. Integrated use of fine-tuning, evaluation, and tooling.
Code quality pipeline. With estimate with code complex, bad smell, test bad smell, and more rules.
Extendable customize quality thresholds. Custom rules, custom thresholds, custom quality type or more.

Unique Prompt

Keep the same prompt: AutoDev <-> UnitGen <-> UnitEval

AutoDev prompt

AutoDev prompt template example:

Write unit test for following code.

${context.coc}

${context.framework}

${context.related_model}

```${context.language}
${context.selection}
```

Unit Picker prompt

Unit Picker prompt should keep the same structure as the AutoDev prompt. Prompt example:

val input = "$relatedCode\n\nCode:\n```${language}\n$beforeCursor\n```"
return Instruction(
    instruction = "Complete $language code, return rest code, no explaining",
    output = output,
    input = input
)

UnitGen prompt

UnitGen prompt should keep the same structure as the AutoDev prompt. Prompt example:

Complete ${language} code, return rest code, no explaining

```${language}
${relatedCode}
```

Code:
```${language}
${beforeCursor}
```

Code quality pipeline

Code Quality Workflow

Extendable customize quality thresholds

Optional quality type:

enum class CodeQualityType {
    BadSmell,
    TestBadSmell,
    JavaController,
    JavaRepository,
    JavaService,
}

Custom thresholds’ config:

data class BsThresholds(
    val bsLongParasLength: Int = 5,
    val bsIfSwitchLength: Int = 8,
    val bsLargeLength: Int = 20,
    val bsMethodLength: Int = 30,
    val bsIfLinesLength: Int = 3,
)

Custom rules:

val apis = apiAnalyser.toContainerServices()
val ruleset = RuleSet(
    RuleType.SQL_SMELL,
    "normal",
    UnknownColumnSizeRule(),
    LimitTableNameLengthRule()
    // more rules
)

val issues = WebApiRuleVisitor(apis).visitor(listOf(ruleset))
// if issues are not empty, then the code has bad smell

Thanks to

abstract syntax tree: Chapi. Used features: multiple language to same data structure.
legacy system analysis: Coca. Inspired: Bad Smell, Test Bad Smell
architecture governance tool: ArchGuard. Used features: Estimation, Rule Lint (API, SQL)
code database CodeDB. Used features: Code analysis pipeline